ISSTA 2024
Mon 16 - Fri 20 September 2024 Vienna, Austria
co-located with ISSTA/ECOOP 2024

This program is tentative and subject to change.

Wed 18 Sep 2024 11:30 - 11:50 at EI 3 Sahulka - UI-Level Testing

Tests for feature-based UI testing have been indispensable for ensuring high quality of mobile applications (\textit{apps} for short). The high manual labor costs to create such tests have led to a strong interest in \textit{automated feature-based UI testing}, where an approach automatically explores the App under Test (AUT) to find correct sequences of UI events achieving the target test objective, given only a high-level \emph{test objective description}. Given that the task of automated feature-based UI testing resembles conventional AI planning problems, large language models (LLMs), known for their effectiveness in AI planning, could be ideal for this task. However, our study reveals that LLMs struggle with following specific instructions for UI testing and replanning based on new information. This limitation results in reduced effectiveness of LLM-driven solutions for automated feature-based UI testing, despite the use of advanced prompting techniques. Toward addressing the preceding limitation, we propose Guardian, a runtime system framework to improve the effectiveness of automated feature-based UI testing based on an LLM integrated with two major strategies. First, Guardian refines UI action space that the LLM can plan over, enforcing the instruction following of the LLM by construction. Second, Guardian deliberately checks whether the gradually enriched information invalidates previous planning by the LLM. Guardian removes the invalidated UI actions from the UI action space that the LLM can plan over, restores the state of the AUT to the state before the execution of the invalidated UI actions, and prompts the LLM to re-plan with the new UI action space. We instantiate Guardian with ChatGPT and construct a benchmark named \textit{FestiVal} with 58 tasks from 23 highly popular apps. Evaluation results on FestiVal show that Guardian achieves 48.3% success rate and 64.0% average completion proportion, outperforming state-of-the-art approaches with 154% and 132% relative improvement with respect to the two metrics, respectively.

This program is tentative and subject to change.

Wed 18 Sep

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 11:50
UI-Level TestingTechnical Papers at EI 3 Sahulka
10:30
20m
Talk
Toward the Automated Localization of Buggy Mobile App UIs from Bug Descriptions
Technical Papers
Antu Saha William & Mary, Yang Song College of William and Mary, Junayed Mahmud University of Central Florida, Ying Zhou George Mason University, USA, Kevin Moran University of Central Florida, Oscar Chaparro William & Mary
10:50
20m
Talk
Reproducing Timing-dependent GUI Flaky Tests in Android Apps via A Single Event Delay
Technical Papers
Xiaobao Cai Fudan University, Zhen Dong Fudan University, China, Yongjiang Wang Fudan University, Abhishek Tiwari Software Institute - USI, Lugano, Switzerland, Xin Peng Fudan University
11:10
20m
Talk
Semantic Constraint Inference for Web Form Test Generation
Technical Papers
Parsa Alian University of British Columbia, Noor Nashid University of British Columbia, Mobina Shahbandeh University of British Columbia, Ali Mesbah The University of British Columbia
11:30
20m
Talk
Guardian: A Runtime Framework for LLM-based UI Exploration
Technical Papers
Dezhi Ran Peking University, Hao Wang Peking University, China, Zihe Song University of Texas at Dallas, Mengzhou Wu Peking University, Yuan Cao Peking University, Ying Zhang Peking University, Wei Yang University of Texas at Dallas, Tao Xie Peking University