Enhancing Robustness of Code Authorship Attribution through Expert Feature Knowledge (ISSTA 2024 - Technical Papers)

Who

Xiaowei Guo, Cai Fu, Juan Chen, Hongle Liu, Lansheng Han, Wenjin Li

Track

ISSTA 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 20 Sep 2024 15:30 - 15:50 at EI 10 Fritz Paschke - Analysis of Code Origin Chair(s): Darko Marinov

Abstract

Code authorship attribution has been an interesting research problem for decades. Recent studies have revealed that existing methods for code authorship attribution suffer from weak robustness. Under the influence of small perturbations added by the attacker, the accuracy of the method will be greatly reduced. As of now, there is no code authorship attribution method capable of effectively handling such attacks. In this paper, we attribute the weak robustness of code authorship attribution methods to dataset bias and argue that this bias can be mitigated through adjustments to the feature learning strategy. We first propose a robust code authorship attribution feature combination framework, which is composed of only simple shallow neural network structures, and introduces controllability for the framework in the feature extraction by incorporating expert knowledge. Experiments show that the framework has significantly improved robustness over mainstream code authorship attribution methods, with an average drop of 23.4% (from 37.8% to 14.3%) in the success rate of targeted attacks and 25.9% (from 46.7% to 20.8%) in the success rate of untargeted attacks. At the same time, it can also achieve results comparable to mainstream code authorship attribution methods in terms of accuracy.

DOI

https://doi.org/10.1145/3650212.3652121

Xiaowei Guo

Huazhong University of Science and Technology

China

Cai Fu

Huazhong University of Science and Technology

China

Juan Chen

Huazhong University of Science and Technology

China

Hongle Liu

Huazhong University of Science and Technology

China

Lansheng Han

Huazhong University of Science and Technology

China

Wenjin Li

NSFOCUS Technologies Group

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 20 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:30 - 16:30	Analysis of Code OriginTechnical Papers at EI 10 Fritz Paschke Chair(s): Darko Marinov University of Illinois at Urbana-Champaign

15:30 20m Talk		Enhancing Robustness of Code Authorship Attribution through Expert Feature Knowledge Technical Papers Xiaowei Guo Huazhong University of Science and Technology, Cai Fu Huazhong University of Science and Technology, Juan Chen Huazhong University of Science and Technology, Hongle Liu Huazhong University of Science and Technology, Lansheng Han Huazhong University of Science and Technology, Wenjin Li NSFOCUS Technologies Group DOI
15:50 20m Talk		Your “Notice” Is Missing: Detecting and Fixing Violations of Modification Terms in Open Source Licenses during Forking Technical Papers Kaifeng Huang Tongji University, Yingfeng Xia Fudan University, Bihuan Chen Fudan University, Siyang He Fudan University, Huazheng Zeng Fudan University, Zhuotong Zhou Fudan University, Jin Guo Fudan University, Xin Peng Fudan University DOI
16:10 20m Talk		DeLink: Source File Information Recovery in Binaries Technical Papers Zhe Lang Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Zhengzi Xu Nanyang Technological University; Imperial Global Singapore, Xiaohui Chen China Mobile Research Institute, Shichao Lv Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Zhanwei Song Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Zhiqiang Shi Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Limin Sun Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences DOI

Information for Participants

Fri 20 Sep 2024 15:30 - 16:30 at EI 10 Fritz Paschke - Analysis of Code Origin Chair(s): Darko Marinov

Info for room EI 10 Fritz Paschke:

Map: https://tuw-maps.tuwien.ac.at/?q=CAEG31

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/13948