Flaky tests hinder the development process by exhibiting uncertain behavior in regression testing. A flaky test may pass in some runs and fail in others while running on the same code version. The non-deterministic outcome frequently misleads the developers into debugging non-existent faults in the code. To effectively debug the flaky tests, developers need to reproduce them. The industry de facto to reproduce flaky tests is to rerun them multiple times. However, rerunning a flaky test numerous times is time and resource-consuming.

This work presents a technique for rapidly and reliably reproducing timing-dependent GUI flaky tests, acknowledged as the most common type of flaky tests in Android apps. Our insight is that flakiness in such tests often stems from event racing on GUI data. Given stack traces of a failure, our technique employs dynamic analysis to infer event races likely leading to the failure and reproduces it by selectively delaying only \textit{relevant} events involved in these races. Thus, our technique can efficiently reproduce a failure within minimal test runs. The experiments conducted on 80 timing-dependent flaky tests collected from 22 widely-used Android apps show our technique is efficient in flaky test failure reproduction. Out of the 80 flaky tests, our technique could successfully reproduce 73 within 1.71 test runs on average. Notably, it exhibited extremely high reliability by consistently reproducing the failure for 20 runs.

Toward the Automated Localization of Buggy Mobile App UIs from Bug Descriptions
Antu Saha William & Mary, Yang Song William & Mary, Junayed Mahmud University of Central Florida, Ying Zhou George Mason University, Kevin Moran University of Central Florida, Oscar Chaparro William & Mary
Reproducing Timing-Dependent GUI Flaky Tests in Android Apps via a Single Event Delay
Xiaobao Cai Fudan University, Zhen Dong Fudan University, Yongjiang Wang Fudan University, Abhishek Tiwari University of Passau, Xin Peng Fudan University
Semantic Constraint Inference for Web Form Test Generation
Parsa Alian University of British Columbia, Noor Nashid University of British Columbia, Mobina Shahbandeh University of British Columbia, Ali Mesbah University of British Columbia
Guardian: A Runtime Framework for LLM-Based UI Exploration
Dezhi Ran Peking University, Hao Wang Peking University, Zihe Song University of Texas at Dallas, Mengzhou Wu Peking University, Yuan Cao Peking University, Ying Zhang Peking University, Wei Yang University of Texas at Dallas, Tao Xie Peking University

