Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines
This program is tentative and subject to change.
With the widespread application of machine learning-based Android malware detection methods, building a high-quality dataset has become increasingly important. Existing large-scale datasets are mostly annotated with VirusTotal by aggregating the decisions of antivirus engines, and most of them indiscriminately accept the decisions of all engines. In reality, however, these engines have different capabilities in detecting malware, especially those that have been obfuscated. Previous research has revealed that code obfuscation degrades the detection performance of these engines to varying degrees. This makes us believe that using all engines indiscriminately is unreasonable for dataset annotation. Therefore, in this paper, we first conduct a data-driven evaluation to confirm the negative effects of code obfuscation on engine-based dataset annotation. To gain a deeper understanding of the reasons behind this phenomenon, we evaluate the availability, effectiveness and robustness of every engine under various code obfuscation techniques. Then we categorize the engines and select a set of obfuscation-robust engines. Finally, we conduct comprehensive experiments to verify the effectiveness of the selected engines for dataset annotation. Our experiments show that when 50% obfuscated samples are mixed into the training set, on the classic malware detectors Drebin and Malscan, using our selected engines can effectively improve detection performance by 15.21% and 19.23%, respectively, compared to using all the engines.
This program is tentative and subject to change.
Thu 19 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
15:30 - 17:10 | |||
15:30 20mTalk | One-to-One or One-to-Many? Suggesting Extract Class Refactoring Opportunities with Intra-class Dependency Hypergraph Neural Network Technical Papers Di Cui , Qiangqiang Wang Xidian University, Yutong Zhao University of Central Missouri, USA, Jiaqi Wang Xidian University, Minjie Wei Xidian University, Jingzhao Hu Xidian University, Luqiao Wang Xidian University, Qingshan Li Xidian University | ||
15:50 20mTalk | CoEdPilot: Recommending Code Edits with Learned Prior Edit Relevance, Project-wise Awareness, and Interactive Nature Technical Papers Chenyan Liu Shanghai Jiao Tong University; National University of Singapore, Cai Yufan Shanghai Jiao Tong University; National University of Singapore, Yun Lin Shanghai Jiao Tong University, Yuhuan Huang Shanghai Jiao Tong University, Yunrui Pei Shanghai Jiao Tong University, Bo Jiang Bytedance Network Technology, Ping Yang Bytedance Network Technology, Jin Song Dong National University of Singapore, Hong Mei Peking University DOI | ||
16:10 20mTalk | Arfa: an Agile Regime-based Floating-point Optimization Approach for Rounding Errors Technical Papers Jinchen Xu Information Engineering University, Mengqi Cui Information Engineering University, Fei Li Information Engineering University, Zuoyan Zhang Hunan University, Changsha, Hunan, Hongru Yang Information Engineering University, Bei Zhou Information Engineering University, Jie Zhao Hunan University | ||
16:30 20mTalk | Automated Deep Learning Optimization via DSL-Based Source Code Transformation Technical Papers Ruixin Wang Purdue University, Minghai Lu Purdue University, Cody Hao Yu BosonAI, Yi-Hsiang Lai Amazon Web Services, Tianyi Zhang Purdue University DOI | ||
16:50 20mTalk | Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines Technical Papers Cuiying Gao Huazhong University of Science and Technology, Yueming Wu Nanyang Technological University, Heng Li Huazhong University of Science and Technology, Wei Yuan Huazhong University of Science and Technology, Haoyu Jiang Huazhong University of Science and Technology, Qidan He Jingdong Group, Yang Liu Nanyang Technological University |