ISSTA 2024
Mon 16 - Fri 20 September 2024 Vienna, Austria
co-located with ISSTA/ECOOP 2024

Data-driven techniques are promising for automatically locating and fixing bugs, which can reduce enormous time and effort for developers. However, the effectiveness of these techniques heavily
relies on the quality and scale of bug datasets. Despite that emerging approaches to automatic bug dataset construction partially provide a solution for scalability, data quality remains a concern. Specifically, it remains a barrier for humans to isolate the minimal set of bug-inducing or bug-fixing changes, known as critical changes. Although delta debugging (DD) techniques are capable of extracting critical changes on benchmark datasets in academia, the efficiency and accuracy are still limited when dealing with real-world bugs, where code change dependencies could be overly complicated.
In this paper, we propose C2D2, a novel delta debugging approach for critical change extraction, which estimates the probabilities of dependencies between code change elements. C2D2 considers the probabilities of dependencies and introduces a matrix-based search mechanism to resolve compilation errors (CE) caused by missing dependencies. It also provides hybrid mechanisms for flexibly selecting code change elements during the DD process. Experiments on Defect4J and a real-world regression bug dataset reveal that C2D2 is significantly more efficient than the traditional DD algorithm ddmin with competitive effectiveness, and significantly more effective and more efficient than the state-of-the-art DD algorithm ProbDD. Furthermore, compared to human-isolated critical changes, C2D2 produces the same or better critical change results in 56% cases in Defects4J and 86% cases in the regression dataset, demonstrating its usefulness in automatically extracting
critical changes and saving human efforts in constructing large-scale bug datasets with real-world bugs.