CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection
Binary code similarity detection (BCSD) is a fundamental technique for various applications. Many BCSD solutions have been proposed recently, which mostly are embedding-based, but have shown limited accuracy and efficiency especially when the volume of target binaries to search is large. To address this issue, we propose a cost-effective BCSD framework, CEBin, which fuses embedding-based and comparison-based approaches to significantly improve accuracy while minimizing overheads. Specifically, CEBin utilizes a refined embedding-based approach to extract features of target code, which efficiently narrows down the scope of candidate similar code and boosts performance. Then, it utilizes a comparison-based approach that performs a pairwise comparison on the candidates to capture more nuanced and complex relationships, which greatly improves the accuracy of similarity detection. By bridging the gap between embedding-based and comparison-based approaches, CEBin is able to provide an effective and efficient solution for detecting similar code (including vulnerable ones) in large-scale software ecosystems. Experimental results on three well-known datasets demonstrate the superiority of CEBin over existing state-of-the-art (SOTA) baselines. To further evaluate the usefulness of BCSD in real world, we construct a large-scale benchmark of vulnerability, offering the first precise evaluation scheme to assess BCSD methods for the 1-day vulnerability detection task. CEBin could identify the similar function from millions of candidate functions in just a few seconds and achieves an impressive recall rate of 85.46% on this more practical but challenging task, which are several order of magnitudes faster and 4.07× better than the best SOTA baseline.
Wed 18 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
13:30 - 14:50 | Vulnerability DetectionTechnical Papers at EI 3 Sahulka Chair(s): Cuiyun Gao Harbin Institute of Technology | ||
13:30 20mTalk | Automated Data Binding Vulnerability Detection for Java Web Frameworks via Nested Property Graph Technical Papers Xiaoyong Yan Zhejiang University, Biao He Ant Group, Wenbo Shen Zhejiang University, Yu Ouyang Ant Group, Kaihang Zhou Zhejiang University, Xingjian Zhang Zhejiang University, Xingyu Wang Zhejiang University, Yukai Cao Zhejiang University, Rui Chang Zhejiang University DOI | ||
13:50 20mTalk | SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection Technical Papers Xin-Cheng Wen Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Shuzheng Gao Chinese University of Hong Kong, Yang Xiao Chinese Academy of Sciences, Michael Lyu Chinese University of Hong Kong DOI | ||
14:10 20mTalk | CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection Technical Papers Hao Wang Tsinghua University, Zeyu Gao Tsinghua University, Chao Zhang Tsinghua University, Mingyang Sun University of Electronic Science and Technology of China, Yuchen Zhou Beijing University of Technology, Han Qiu Tsinghua University, Xi Xiao Tsinghua University DOI | ||
14:30 20mTalk | Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation Technical Papers Zhaoyang Chu Huazhong University of Science and Technology, Yao Wan Huazhong University of Science and Technology, Qian Li Curtin University, Yang Wu Huazhong University of Science and Technology, Hongyu Zhang Chongqing University, Yulei Sui UNSW, Guandong Xu University of Technology, Hai Jin Huazhong University of Science and Technology DOI Pre-print |