SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection (ISSTA 2024 - Technical Papers)

Who

Xin-Cheng Wen, Cuiyun Gao, Shuzheng Gao, Yang Xiao, Michael Lyu

Track

ISSTA 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 18 Sep 2024 13:50 - 14:10 at EI 3 Sahulka - Vulnerability Detection Chair(s): Cuiyun Gao

Abstract

Recently, there has been a growing interest in automatic software vulnerability detection.

Pre-trained model-based approaches have demonstrated superior performance than other Deep Learning (DL)-based approaches in detecting vulnerabilities.

However, the existing pre-trained model-based approaches generally employ code sequences as input during prediction, and may ignore vulnerability-related structural information, as reflected in the following two aspects.

First, they tend to fail to infer the semantics of the code statements with complex logic such as those containing multiple operators and pointers.

Second, they are hard to comprehend various code execution sequences, which is essential for precise vulnerability detection.

To mitigate the challenges, we propose a {\textbf{S}tructured Natural Language \textbf{C}omment tree-based} vulner\textbf{A}bi\textbf{L}ity d\textbf{E}tection framework based on the pre-trained models, named \textbf{\tool}. The proposed Structured Natural Language Comment Tree (SCT) integrates the semantics of code statements with code execution sequences based on the Abstract Syntax Trees (ASTs).Specifically, \tool comprises three main modules:

(1) \textit{Comment Tree Construction}, which aims at enhancing the model's ability to infer the semantics of code statements by first incorporating Large Language Models (LLMs) for comment generation and then adding the comment node to ASTs.

(2) \textit{Structured Natural Language Comment Tree Construction}, which aims at explicitly involving code execution sequence by combining the code syntax templates with the comment tree.

(3) \textit{SCT-Enhanced Representation}, which finally incorporates the constructed SCTs for well capturing vulnerability patterns.

Experimental results demonstrate that \tool outperforms the best-performing baseline, including the pre-trained model and LLMs, with improvements of 2.96%, 13.47%, and 3.75% in terms of F1 score on the FFMPeg+Qemu, Reveal, and SVulD datasets, respectively. Furthermore, \tool can be applied to different pre-trained models, such as CodeBERT and UniXcoder, yielding the F1 score performance enhancements ranging from 1.37% to 10.87%.

DOI

https://doi.org/10.1145/3650212.3652124

Xin-Cheng Wen

Harbin Institute of Technology

China

Cuiyun Gao

Harbin Institute of Technology

China

Shuzheng Gao

Chinese University of Hong Kong

China

Yang Xiao

Chinese Academy of Sciences

China

Michael Lyu

Chinese University of Hong Kong

Hong Kong

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 18 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

13:30 - 14:50	Vulnerability DetectionTechnical Papers at EI 3 Sahulka Chair(s): Cuiyun Gao Harbin Institute of Technology

13:30 20m Talk		Automated Data Binding Vulnerability Detection for Java Web Frameworks via Nested Property Graph Technical Papers Xiaoyong Yan Zhejiang University, Biao He Ant Group, Wenbo Shen Zhejiang University, Yu Ouyang Ant Group, Kaihang Zhou Zhejiang University, Xingjian Zhang Zhejiang University, Xingyu Wang Zhejiang University, Yukai Cao Zhejiang University, Rui Chang Zhejiang University DOI
13:50 20m Talk		SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection Technical Papers Xin-Cheng Wen Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Shuzheng Gao Chinese University of Hong Kong, Yang Xiao Chinese Academy of Sciences, Michael Lyu Chinese University of Hong Kong DOI
14:10 20m Talk		CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection Technical Papers Hao Wang Tsinghua University, Zeyu Gao Tsinghua University, Chao Zhang Tsinghua University, Mingyang Sun University of Electronic Science and Technology of China, Yuchen Zhou Beijing University of Technology, Han Qiu Tsinghua University, Xi Xiao Tsinghua University DOI
14:30 20m Talk		Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation Technical Papers Zhaoyang Chu Huazhong University of Science and Technology, Yao Wan Huazhong University of Science and Technology, Qian Li Curtin University, Yang Wu Huazhong University of Science and Technology, Hongyu Zhang Chongqing University, Yulei Sui UNSW, Guandong Xu University of Technology, Hai Jin Huazhong University of Science and Technology DOI Pre-print

Information for Participants

Wed 18 Sep 2024 13:30 - 14:50 at EI 3 Sahulka - Vulnerability Detection Chair(s): Cuiyun Gao

Info for room EI 3 Sahulka:

Map: https://tuw-maps.tuwien.ac.at/?q=CF0205

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/15663