Guardian: A Runtime Framework for LLM-Based UI Exploration (ISSTA 2024 - Technical Papers)

Who

Dezhi Ran, Hao Wang, Zihe Song, Mengzhou Wu, Yuan Cao, Ying Zhang, Wei Yang, Tao Xie

Track

ISSTA 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 18 Sep 2024 11:30 - 11:50 at EI 3 Sahulka - UI-Level Testing Chair(s): Chunyang Chen

Abstract

Tests for feature-based UI testing have been indispensable for ensuring the quality of mobile applications (\textit{apps} for short).

The high manual labor costs to create such tests have led to a strong interest in \textit{automated feature-based UI testing}, where an approach automatically explores the App under Test (AUT) to find correct sequences of UI events achieving the target test objective, given only a high-level \emph{test objective description}.

Given that the task of automated feature-based UI testing resembles conventional AI planning problems, large language models (LLMs), known for their effectiveness in AI planning, could be ideal for this task.

However, our study reveals that LLMs struggle with following specific instructions for UI testing and replanning based on new information. This limitation results in reduced effectiveness of LLM-driven solutions for automated feature-based UI testing, despite the use of advanced prompting techniques.

Toward addressing the preceding limitation, we propose Guardian, a runtime system framework to improve the effectiveness of automated feature-based UI testing by offloading computational tasks from LLMs with two major strategies.

First, Guardian refines UI action space that the LLM can plan over, enforcing the instruction following of the LLM by construction.

Second, Guardian deliberately checks whether the gradually enriched information invalidates previous planning by the LLM.

Guardian removes the invalidated UI actions from the UI action space that the LLM can plan over, restores the state of the AUT to the state before the execution of the invalidated UI actions, and prompts the LLM to re-plan with the new UI action space.

We instantiate Guardian with ChatGPT and construct a benchmark named \textit{FestiVal} with 58 tasks from 23 highly popular apps.

Evaluation results on FestiVal show that Guardian achieves 48.3% success rate and 64.0% average completion proportion, outperforming state-of-the-art approaches with 154% and 132% relative improvement with respect to the two metrics, respectively.

DOI

https://doi.org/10.1145/3650212.3680334

Dezhi Ran

Peking University

China

Hao Wang

Peking University

China

Zihe Song

University of Texas at Dallas

United States

Mengzhou Wu

Peking University

China

Yuan Cao

Peking University

China

Ying Zhang

Peking University

China

Wei Yang

University of Texas at Dallas

United States

Tao Xie

Peking University

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 18 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 11:50	UI-Level TestingTechnical Papers at EI 3 Sahulka Chair(s): Chunyang Chen TU Munich

10:30 20m Talk		Toward the Automated Localization of Buggy Mobile App UIs from Bug Descriptions Technical Papers Antu Saha William & Mary, Yang Song William & Mary, Junayed Mahmud University of Central Florida, Ying Zhou George Mason University, Kevin Moran University of Central Florida, Oscar Chaparro William & Mary DOI
10:50 20m Talk		Reproducing Timing-Dependent GUI Flaky Tests in Android Apps via a Single Event Delay Technical Papers Xiaobao Cai Fudan University, Zhen Dong Fudan University, Yongjiang Wang Fudan University, Abhishek Tiwari University of Passau, Xin Peng Fudan University DOI
11:10 20m Talk		Semantic Constraint Inference for Web Form Test Generation Technical Papers Parsa Alian University of British Columbia, Noor Nashid University of British Columbia, Mobina Shahbandeh University of British Columbia, Ali Mesbah University of British Columbia DOI
11:30 20m Talk		Guardian: A Runtime Framework for LLM-Based UI Exploration Technical Papers Dezhi Ran Peking University, Hao Wang Peking University, Zihe Song University of Texas at Dallas, Mengzhou Wu Peking University, Yuan Cao Peking University, Ying Zhang Peking University, Wei Yang University of Texas at Dallas, Tao Xie Peking University DOI

Information for Participants

Wed 18 Sep 2024 10:30 - 11:50 at EI 3 Sahulka - UI-Level Testing Chair(s): Chunyang Chen

Info for room EI 3 Sahulka:

Map: https://tuw-maps.tuwien.ac.at/?q=CF0205

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/15663