CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors
This program is tentative and subject to change.
Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of large language models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially causing data leakage. To evaluate LLMs’ realistic repair capabilities, ① we introduce an extensive, non-crawled benchmark, referred to as TutorCode, comprising 1,239 C++ defect codes and associated information such as tutor guidance, solution description, failing test cases, and the corrected code. Our work assesses the repair performance of 12 LLMs on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). ② We then provide a comprehensive investigation into which types of extra information can help LLMs improve their performance in repairing defects. Among these types, tutor guidance was found to be the most effective information in enhancing LLM repair capabilities. To fully harness LLMs’ conversational capabilities and the benefits of augmented information, ③ we introduce a novel conversational semi-automatic repair framework CREF assisting human programming tutors. It demonstrates a remarkable AVG-5 improvement of 17.2%-24.6% compared to the baseline, achieving an impressive AVG-5 of 76.6% when utilizing GPT-4. These results highlight the potential for enhancing LLMs’ repair capabilities through interactions with tutors and historical conversations involving incorrect responses. The successful application of CREF in a real-world educational setting demonstrates its effectiveness in reducing tutors’ workload and improving students’ learning experience while showcasing its promise for facilitating other software engineering tasks, such as code review.
This program is tentative and subject to change.
Wed 18 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
15:30 - 17:10 | |||
15:30 20mTalk | Automated Program Repair via Conversation: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT Technical Papers Chunqiu Steven Xia University of Illinois at Urbana-Champaign, Lingming Zhang University of Illinois Urbana-Champaign | ||
15:50 20mTalk | ThinkRepair: Self-Directed Automated Program Repair Technical Papers Xin Yin The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Chao Ni School of Software Technology, Zhejiang University, Shaohua Wang Central University of Finance and Economics, Zhenhao Li York University, Limin Zeng School of Software Technology, Zhejiang University, Xiaohu Yang Zhejiang University | ||
16:10 20mTalk | BRAFAR: Bidirectional Refactoring, Alignment, Fault Localization, and Repair for Programming Assignments Technical Papers Linna Xie Nanjing University, Chongmin Li Nanjing University, Yu Pei The Hong Kong Polytechnic University, Tian Zhang Nanjing University, Minxue Pan Nanjing University | ||
16:30 20mTalk | CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors Technical Papers Boyang Yang Yanshan University & Jisuanke Co. Ltd., Haoye Tian University of Melbourne, Weiguo PIAN University of Luxembourg, Haoran Yu Jisuanke Co. Ltd., Haitao Wang Jisuanke Co. Ltd., Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg, Shunfu Jin Yanshan University | ||
16:50 20mTalk | One Size Does Not Fit All: Multi-Granularity Patch Generation for Better Automated Program Repair Technical Papers Bo Lin National University of Defense Technology, Shangwen Wang National University of Defense Technology, Ming Wen Huazhong University of Science and Technology, Liqian Chen National University of Defense Technology, China, Xiaoguang Mao National University of Defense Technology Pre-print |