A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? (ISSTA 2024 - Technical Papers)

Who

Zhihan Jiang, Jinyang Liu, Junjie Huang, Yichen LI, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Jieming Zhu, Michael Lyu

Track

ISSTA 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 19 Sep 2024 10:30 - 10:50 at EI 10 Fritz Paschke - Logging and Field Bugs Chair(s): Willem Visser

Abstract

Log data have facilitated various tasks of software development and maintenance, such as testing, debugging and diagnosing. Due to the unstructured nature of logs, log parsing is typically required to transform log messages into structured data for automated log analysis. Given the abundance of log parsers that employ various techniques, evaluating these tools to comprehend their characteristics and performance becomes imperative. Loghub serves as a commonly used dataset for benchmarking log parsers, but it suffers from limited scale and representativeness, posing significant challenges for studies to comprehensively evaluate existing log parsers or develop new methods. This limitation is particularly pronounced when assessing these log parsers for production use. To address these limitations, we provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems. Loghub-2.0 comprises 14 datasets with an average of 3.6 million log lines in each dataset. Based on Loghub-2.0, we conduct a thorough re-evaluation of 15 state-of-the-art log parsers in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions. We are also the first to investigate the granular performance of log parsers on logs that represent rare system events, offering in-depth details for software diagnosis. Accurately parsing such logs is essential, yet it remains a challenge. We believe this work could shed light on the evaluation and design of log parsers in practical settings, thereby facilitating their deployment in production systems.

Link to Preprint

https://arxiv.org/abs/2308.10828

DOI

https://doi.org/10.1145/3650212.3652123

Zhihan Jiang

Chinese University of Hong Kong

China

Jinyang Liu

Chinese University of Hong Kong

China

Junjie Huang

Chinese University of Hong Kong

China

Yichen LI

Chinese University of Hong Kong

China

Yintong Huo

Chinese University of Hong Kong

China

Jiazhen Gu

Chinese University of Hong Kong

China

Zhuangbin Chen

Sun Yat-sen University

China

Jieming Zhu

Huawei Noah’s Ark Lab

China

Michael Lyu

Chinese University of Hong Kong

Hong Kong

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 19 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 11:50	Logging and Field BugsTechnical Papers at EI 10 Fritz Paschke Chair(s): Willem Visser Amazon Web Services

10:30 20m Research paper		A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? Technical Papers Zhihan Jiang Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Junjie Huang Chinese University of Hong Kong, Yichen LI Chinese University of Hong Kong, Yintong Huo Chinese University of Hong Kong, Jiazhen Gu Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Jieming Zhu Huawei Noah’s Ark Lab, Michael Lyu Chinese University of Hong Kong DOI Pre-print
10:50 20m Talk		FastLog: An End-to-End Method to Efficiently Generate and Insert Logging Statements Technical Papers Xiaoyuan Xie Wuhan University, Zhipeng Cai Wuhan University, Songqiang Chen The Hong Kong University of Science and Technology, Jifeng Xuan Wuhan University DOI
11:10 20m Talk		Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs Technical Papers Shiwen Shan Sun Yat-sen University, Yintong Huo Chinese University of Hong Kong, Yuxin Su Sun Yat-sen University, Yichen LI Chinese University of Hong Kong, Dan Li Sun Yat-sen University, Zibin Zheng Sun Yat-sen University DOI
11:30 20m Talk		Foliage: Nourishing Evolving Software by Characterizing and Clustering Field Bugs Technical Papers Zhanyao Lei Shanghai Jiao Tong University, Yixiong Chen Shanghai Jiao Tong University, Mingyuan Xia AppetizerIO, Zhengwei Qi Shanghai Jiao Tong University DOI

Information for Participants

Thu 19 Sep 2024 10:30 - 11:50 at EI 10 Fritz Paschke - Logging and Field Bugs Chair(s): Willem Visser

Info for room EI 10 Fritz Paschke:

Map: https://tuw-maps.tuwien.ac.at/?q=CAEG31

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/13948