ISSTA 2024
Mon 16 - Fri 20 September 2024 Vienna, Austria
co-located with ISSTA/ECOOP 2024

Code authorship attribution has been an interesting research problem for decades. Recent studies have revealed that existing methods for code authorship attribution suffer from weak robustness. Under the influence of small perturbations added by the attacker, the accuracy of the method will be greatly reduced. As of now, there is no code authorship attribution method capable of effectively handling such attacks. In this paper, we attribute the weak robustness of code authorship attribution methods to dataset bias and argue that this bias can be mitigated through adjustments to the feature learning strategy. We first propose a robust code authorship attribution feature combination framework, which is composed of only simple shallow neural network structures, and introduces controllability for the framework in the feature extraction by incorporating expert knowledge. Experiments show that the framework has significantly improved robustness over mainstream code authorship attribution methods, with an average drop of 23.4% (from 37.8% to 14.3%) in the success rate of targeted attacks and 25.9% (from 46.7% to 20.8%) in the success rate of untargeted attacks. At the same time, it can also achieve results comparable to mainstream code authorship attribution methods in terms of accuracy.