Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines (ISSTA 2024 - Technical Papers)

Who

Gao Cuiying, Yueming Wu, Heng Li, Wei Yuan, Haoyu Jiang, Qidan He, Yang Liu

Track

ISSTA 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 20 Sep 2024 11:30 - 11:50 at EI 10 Fritz Paschke - Compilers and Decompilers Chair(s): Sang Kil Cha

Abstract

With the widespread application of machine learning-based Android malware detection methods, building a high-quality dataset has become increasingly important.

Existing large-scale datasets are mostly annotated with VirusTotal by aggregating the decisions of antivirus engines, and most of them indiscriminately accept the decisions of all engines. In reality, however, these engines have different capabilities in detecting malware, especially those that have been obfuscated.

Previous research has revealed that code obfuscation degrades the detection performance of these engines to varying degrees. This makes us believe that using all engines indiscriminately is unreasonable for dataset annotation.

Therefore, in this paper, we first conduct a data-driven evaluation to

confirm the negative effects of code obfuscation on engine-based dataset annotation.

To gain a deeper understanding of the reasons behind this phenomenon, we evaluate the availability, effectiveness and robustness of every engine under various code obfuscation techniques.

Then we categorize the engines and select a set of obfuscation-robust engines. Finally, we conduct comprehensive experiments to verify the effectiveness of the selected engines for dataset annotation.

Our experiments show that when 50% obfuscated samples are mixed into the training set, on the classic malware detectors Drebin and Malscan, using our selected engines can effectively improve detection performance by 15.21% and 19.23%, respectively, compared to using all the engines.

DOI

https://doi.org/10.1145/3650212.3680302

Gao Cuiying

Huazhong University of Science and Technology; JD.com

China

Yueming Wu

Nanyang Technological University

Singapore

Heng Li

Huazhong University of Science and Technology

China

Wei Yuan

Huazhong University of Science and Technology

China

Haoyu Jiang

Huazhong University of Science and Technology

China

Qidan He

JD.com

China

Yang Liu

Nanyang Technological University

Singapore

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 20 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 11:50	Compilers and DecompilersTechnical Papers at EI 10 Fritz Paschke Chair(s): Sang Kil Cha KAIST

10:30 20m Talk		Inconsistencies in TeX-Produced Documents Technical Papers Jovyn Tan National University of Singapore, Manuel Rigger National University of Singapore DOI Pre-print
10:50 20m Talk		Fuzzing MLIR Compiler Infrastructure via Operation Dependency Analysis Technical Papers Chenyao Suo Tianjin University, Junjie Chen Tianjin University, Shuang Liu Renmin University of China, Jiajun Jiang Tianjin University, Yingquan Zhao Tianjin University, Jianrong Wang Tianjin University DOI
11:10 20m Talk		Towards Understanding the Bugs in Solidity Compiler Technical Papers Haoyang Ma Hong Kong University of Science and Technology, Wuqi Zhang Hong Kong University of Science and Technology, Qingchao Shen Tianjin University, Yongqiang Tian Hong Kong University of Science and Technology, Junjie Chen Tianjin University, Shing-Chi Cheung Hong Kong University of Science and Technology DOI
11:30 20m Talk		Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines Technical Papers Gao Cuiying Huazhong University of Science and Technology; JD.com, Yueming Wu Nanyang Technological University, Heng Li Huazhong University of Science and Technology, Wei Yuan Huazhong University of Science and Technology, Haoyu Jiang Huazhong University of Science and Technology, Qidan He JD.com, Yang Liu Nanyang Technological University DOI

Information for Participants

Fri 20 Sep 2024 10:30 - 11:50 at EI 10 Fritz Paschke - Compilers and Decompilers Chair(s): Sang Kil Cha

Info for room EI 10 Fritz Paschke:

Map: https://tuw-maps.tuwien.ac.at/?q=CAEG31

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/13948