TeDA: A Testing Framework for Data Usage Auditing in Deep Learning Model Development (ISSTA 2024 - Technical Papers)

Who

Xiangshan Gao, Jialuo Chen, Jingyi Wang, Jie Shi, Peng Cheng, Jiming Chen

Track

ISSTA 2024 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 19 Sep 2024 11:30 - 11:50 at EI 9 Hlawka - Testing Neural Networks Chair(s): Paolo Tonella

Abstract

It is notoriously challenging to audit the potential unauthorized data usage in deep learning (DL) model development lifecycle, i.e., to \emph{judge whether certain private user data has been used to train or fine-tune a DL model without authorization}. Yet, such data usage auditing is crucial to respond to the urgent requirements of trustworthy Artificial Intelligence (AI) such as data transparency, which are promoted and enforced in recent AI regulation rules or acts like General Data Protection Regulation (GDPR) and EU AI Act.

In this work, we propose TeDA, a simple and flexible \emph{te}sting framework for auditing \emph{da}ta usage in DL model development process.

Given a set of user's private data to protect ($D_p$), the intuition of TeDA is to apply \emph{membership inference} (with good intention) for judging whether the model to audit ($M_a$) is likely to be trained with $D_p$. Notably, to significantly expose the usage under membership inference, TeDA applies imperceptible perturbation directed by boundary search to generate a carefully crafted test suite $D_t$ (which we call `isotope') based on $D_p$.

With the test suite, TeDA then adopts membership inference combined with hypothesis testing to decide whether a user's private data has been used to train $M_a$ with statistical guarantee.

We evaluated TeDA through extensive experiments on ranging data volumes across various model architectures for data-sensitive face recognition and medical diagnosis tasks. TeDA demonstrates high feasibility, effectiveness and robustness under various adaptive strategies (e.g., pruning and distillation).

DOI

https://doi.org/10.1145/3650212.3680375

Xiangshan Gao

Zhejiang University; Huawei Technology

China

Jialuo Chen

Zhejiang University

China

Jingyi Wang

Zhejiang University

China

Jie Shi

Huawei International

Singapore

Peng Cheng

Zhejiang University

China

Jiming Chen

Zhejiang University; Hangzhou Dianzi University

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 19 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 11:50	Testing Neural NetworksTechnical Papers at EI 9 Hlawka Chair(s): Paolo Tonella USI Lugano

10:30 20m Talk		Distance-Aware Test Input Selection for Deep Neural Networks Technical Papers Zhong Li Nanjing University, Zhengfeng Xu Nanjing University, Ruihua Ji Nanjing University, Minxue Pan Nanjing University, Tian Zhang Nanjing University, Linzhang Wang Nanjing University, Xuandong Li Nanjing University DOI
10:50 20m Talk		Test Selection for Deep Neural Networks using Meta-Models with Uncertainty Metrics Technical Papers Demet Demir Middle East Technical University, Aysu Betin Can Middle East Technical University, Elif Surer Middle East Technical University DOI
11:10 20m Talk		Datactive: Data Fault Localization for Object Detection Systems Technical Papers Yining Yin Nanjing University, Yang Feng Nanjing University, Shihao Weng Nanjing University, Yuan Yao Nanjing University, Jia Liu Nanjing University, Zhihong Zhao Nanjing University DOI
11:30 20m Talk		TeDA: A Testing Framework for Data Usage Auditing in Deep Learning Model Development Technical Papers Xiangshan Gao Zhejiang University; Huawei Technology, Jialuo Chen Zhejiang University, Jingyi Wang Zhejiang University, Jie Shi Huawei International, Peng Cheng Zhejiang University, Jiming Chen Zhejiang University; Hangzhou Dianzi University DOI

Information for Participants

Thu 19 Sep 2024 10:30 - 11:50 at EI 9 Hlawka - Testing Neural Networks Chair(s): Paolo Tonella

Info for room EI 9 Hlawka:

Map: https://tuw-maps.tuwien.ac.at/?q=CAEG17

Room tech: https://raumkatalog.tiss.tuwien.ac.at/room/13939