Benchmarking Automated Program Repair: An Extensive Study on Both Real-World and Artificial Bugs (ISSTA 2024 - Technical Papers)

Who

Yicheng Ouyang, Jun Yang, Lingming Zhang

Track

ISSTA 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 19 Sep 2024 10:50 - 11:10 at EI 7 - Program Repair 2 Chair(s): Chao Peng

Abstract

As bugs are inevitable and prevalent in real-world programs, many Automated Program Repair (APR) techniques have been proposed to generate patches for them. However, due to the lack of a standard for evaluating APR techniques, prior works tend to use different settings and benchmarks in evaluation, threatening the trustworthiness of the evaluation results. Additionally, they typically only adopt plausibility and genuineness as evaluation metrics, which may potentially mask some underlying issues in APR techniques. To overcome these issues, in this paper, we conduct an extensive and multi-dimensional evaluation of nine learning-based and three traditional state-of-the-art APR techniques under the same environment and settings. We employ the widely studied Defects4J V2.0.0 benchmark and a newly constructed large-scale mutation-based benchmark named MuBench, derived from Defects4J and including 1,700 artificial bugs generated by various mutators, to uncover potential limitations in these APR techniques. We also apply multi-dimensional metrics, including compilability/plausibility/genuineness metrics, as well as SYE (SYntactic Equivalence) and TCE (Trivial Compiler Equivalence) metrics, to thoroughly analyze the 1,814,652 generated patches. This paper presents noteworthy findings from the extensive evaluation: Firstly, Large Language Model (LLM) based APR demonstrates less susceptibility to overfitting on the Defects4J V1.2.0 dataset and fixes the most number of bugs. Secondly, the study suggests a promising future for combining traditional and learning-based APR techniques, as they exhibit complementary advantages in fixing different types of bugs. Additionally, this work highlights the necessity for further enhancing patch compilability of learning-based APR techniques, despite the presence of various existing strategies attempting to improve it. The study also reveals other guidelines for enhancing APR techniques, including the need for handling unresolvable symbol compilability issues and reducing duplicate/no-op patch generation. Finally, our study uncovers seven implementation issues in the studied techniques, with five of them confirmed and fixed by the corresponding authors.

DOI

https://doi.org/10.1145/3650212.3652140

Yicheng Ouyang

University of Illinois at Urbana-Champaign

United States

Jun Yang

University of Illinois at Urbana-Champaign

United States

Lingming Zhang

University of Illinois at Urbana-Champaign

United States

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 19 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 11:50	Program Repair 2Technical Papers at EI 7 Chair(s): Chao Peng ByteDance

10:30 20m Talk		Automating Zero-Shot Patch Porting for Hard Forks Technical Papers Shengyi Pan Zhejiang University, You Wang Zhejiang University, Zhongxin Liu Zhejiang University, Xing Hu Zhejiang University, Xin Xia Huawei, Shanping Li Zhejiang University DOI Pre-print
10:50 20m Talk		Benchmarking Automated Program Repair: An Extensive Study on Both Real-World and Artificial Bugs Technical Papers Yicheng Ouyang University of Illinois at Urbana-Champaign, Jun Yang University of Illinois at Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign DOI
11:10 20m Talk		Neurosymbolic Repair of Test Flakiness Technical Papers Yang Chen University of Illinois at Urbana-Champaign, Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign DOI
11:30 20m Talk		AutoCodeRover: Autonomous Program Improvement Technical Papers Yuntong Zhang National University of Singapore, Haifeng Ruan National University of Singapore, Zhiyu Fan National University of Singapore, Abhik Roychoudhury National University of Singapore DOI

Information for Participants

Thu 19 Sep 2024 10:30 - 11:50 at EI 7 - Program Repair 2 Chair(s): Chao Peng

Info for room EI 7:

Map: https://tuw-maps.tuwien.ac.at/?q=CDEG13

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/15417