Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT
Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Traditional APR techniques suffer from a lack of patch variety as they rely heavily on handcrafted or mined bug fixing patterns and cannot easily generalize to other bug/fix types. To address this limitation, recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. Such LLM-based APR tools work by first constructing an input prompt built using the original buggy code and then querying the LLM to either fill-in (cloze-style APR) the correct code at the bug location or to produce a completely new code snippet as the patch. While the LLM-based APR tools are able to achieve state-of-the-art results, they still follow the classic Generate and Validate (G&V) repair paradigm of first generating lots of patches by sampling from the same initial prompt and then validating each one afterwards. This not only leads to many repeated patches that are incorrect, but also misses the crucial and yet previously ignored information in test failures as well as in plausible patches.
To address these aforementioned limitations, we propose ChatRepair, the first fully automated conversation-driven APR approach that interleaves patch generation with instant feedback to perform APR in a conversational style. ChatRepair first feeds the LLM with relevant test failure information to start with, and then learns from both failures and successes of earlier patching attempts of the same bug for more powerful APR. For earlier patches that failed to pass all tests, we combine the incorrect patches with their corresponding relevant test failure information to construct a new prompt for the LLM to generate the next patch. In this way, we can avoid making the same
mistakes. For earlier patches that passed all the tests (i.e., plausible patches), we further ask the LLM to generate alternative variations of the original plausible patches. In this way, we can further build on and learn from earlier successes to generate more plausible patches to increase the chance of having correct patches. While our approach is general, we implement ChatRepair using state-of-the-art dialogue-based LLM – ChatGPT. Our evaluation on the widely studied Defects4j dataset shows that ChatRepair is able to achieve the new state-of-the-art in repair performance, achieving 114 and 48 correct fixes on Defects4j 1.2 and 2.0 respectively. By calculating the cost
of accessing ChatGPT, we can fix 162 out of 337 bugs for $0.42 each!
Wed 18 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
15:30 - 17:10 | |||
15:30 20mTalk | Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT Technical Papers Chunqiu Steven Xia University of Illinois at Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign DOI | ||
15:50 20mTalk | ThinkRepair: Self-Directed Automated Program Repair Technical Papers Xin Yin Zhejiang University, Chao Ni Zhejiang University, Shaohua Wang Central University of Finance and Economics, Zhenhao Li York University, Limin Zeng Zhejiang University, Xiaohu Yang Zhejiang University DOI | ||
16:10 20mTalk | BRAFAR: Bidirectional Refactoring, Alignment, Fault Localization, and Repair for Programming Assignments Technical Papers Linna Xie Nanjing University, Chongmin Li Nanjing University, Yu Pei Hong Kong Polytechnic University, Tian Zhang Nanjing University, Minxue Pan Nanjing University DOI | ||
16:30 20mTalk | CREF: An LLM-Based Conversational Software Repair Framework for Programming Tutors Technical Papers Boyang Yang Yanshan University; Beijing JudaoYouda Network Technology, Haoye Tian University of Melbourne, Weiguo Pian University of Luxembourg, Haoran Yu Beijing JudaoYouda Network Technology, Haitao Wang Beijing JudaoYouda Network Technology, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg, Shunfu Jin Yanshan University DOI | ||
16:50 20mTalk | One Size Does Not Fit All: Multi-granularity Patch Generation for Better Automated Program RepairACM SIGSOFT Distinguished Paper Award Technical Papers Bo Lin National University of Defense Technology, Shangwen Wang National University of Defense Technology, Ming Wen Huazhong University of Science and Technology, Liqian Chen National University of Defense Technology, Xiaoguang Mao National University of Defense Technology DOI Pre-print |