Oracle-Guided Program Selection from Large Language Models (ISSTA 2024 - Technical Papers)

Who

Zhiyu Fan, Haifeng Ruan, Sergey Mechtaev, Abhik Roychoudhury

Track

ISSTA 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 20 Sep 2024 14:10 - 14:30 at EI 7 - LLMs for Code Chair(s): Jacques Klein

Abstract

While large language models (LLMs) have shown significant advancements in code generation, their susceptibility to producing incorrect code poses a significant challenge to the adoption of LLM-generated programs. This issue largely stems from the reliance on natural language descriptions as informal oracles in code generation. Current strategies to mitigate this involve selecting the best program from multiple LLM-generated alternatives, judged by criteria like the consistency of their execution results on an LLM-generated test suite. However, this approach has crucial limitations: (1) LLMs often generate redundant tests or tests that cannot distinguish between correct and incorrect solutions, (2) the used consistency criteria, such as the majority vote, fail to foster developer trust due to the absence of transparent rationale behind the made choices. In this work, we propose a new perspective on increasing the quality of LLM-generated code via program selection using the LLM as a test oracle. Our method is based on our experimentally confirmed observation that LLMs serve more effectively as oracles when tasked with selecting the correct output from multiple choices. Leveraging this insight, we first generate distinguishing inputs that capture semantic discrepancies of programs sampled from an LLM, and record outputs produced by the programs on these inputs. An LLM then selects the most likely to be correct output from these, guided by the natural language problem description. We implemented this idea in a tool LLMCodeChoice and evaluated its accuracy in generating and selecting standalone programs. Our experiments demonstrated its effectiveness in improving pass@1 by 3.6-7% on HumanEval and MBPP benchmarks compared to the state-of-art CodeT. Most interestingly, the selected input-output specifications helped us to uncover incompleteness and ambiguities in task descriptions and also identify incorrect ground-truth implementations in the benchmarks.

DOI

https://doi.org/10.1145/3650212.3680308

Zhiyu Fan

National University of Singapore

Singapore

Haifeng Ruan

National University of Singapore

Singapore

Sergey Mechtaev

Peking University

China

Abhik Roychoudhury

National University of Singapore

Singapore

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 20 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

13:30 - 14:50	LLMs for CodeTechnical Papers at EI 7 Chair(s): Jacques Klein University of Luxembourg

13:30 20m Talk		Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code Technical Papers Yujia Chen Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Zezhou Yang Harbin Institute of Technology, Hongyu Zhang Chongqing University, Qing Liao Harbin Institute of Technology DOI
13:50 20m Talk		CoSec: On-the-Fly Security Hardening of Code LLMs via Supervised Co-decoding Technical Papers Dong Li Chongqing University, Meng Yan Chongqing University, Yaosheng Zhang Chongqing University, Zhongxin Liu Zhejiang University, Chao Liu Chongqing University, Xiaohong Zhang Chongqing University, Ting Chen University of Electronic Science and Technology of China, David Lo Singapore Management University DOI
14:10 20m Talk		Oracle-Guided Program Selection from Large Language Models Technical Papers Zhiyu Fan National University of Singapore, Haifeng Ruan National University of Singapore, Sergey Mechtaev Peking University, Abhik Roychoudhury National University of Singapore DOI
14:30 20m Talk		How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation Technical Papers Cen Zhang Nanyang Technological University, Yaowen Zheng Nanyang Technological University, Mingqiang Bai Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yeting Li Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Wei Ma Nanyang Technological University, Xiaofei Xie Singapore Management University, Yuekang Li UNSW, Limin Sun Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yang Liu Nanyang Technological University DOI

Information for Participants

Fri 20 Sep 2024 13:30 - 14:50 at EI 7 - LLMs for Code Chair(s): Jacques Klein

Info for room EI 7:

Map: https://tuw-maps.tuwien.ac.at/?q=CDEG13

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/15417