TeDA: A Testing Framework for Data Usage Auditing in Deep Learning Model Development
It is notoriously challenging to audit the potential unauthorized data usage in deep learning (DL) model development lifecycle, i.e., to \emph{judge whether certain private user data has been used to train or fine-tune a DL model without authorization}. Yet, such data usage auditing is crucial to respond to the urgent requirements of trustworthy Artificial Intelligence (AI) such as data transparency, which are promoted and enforced in recent AI regulation rules or acts like General Data Protection Regulation (GDPR) and EU AI Act.
In this work, we propose TeDA, a simple and flexible \emph{te}sting framework for auditing \emph{da}ta usage in DL model development process.
Given a set of user's private data to protect ($D_p$), the intuition of TeDA is to apply \emph{membership inference} (with good intention) for judging whether the model to audit ($M_a$) is likely to be trained with $D_p$. Notably, to significantly expose the usage under membership inference, TeDA applies imperceptible perturbation directed by boundary search to generate a carefully crafted test suite $D_t$ (which we call `isotope') based on $D_p$.
With the test suite, TeDA then adopts membership inference combined with hypothesis testing to decide whether a user's private data has been used to train $M_a$ with statistical guarantee.
We evaluated TeDA through extensive experiments on ranging data volumes across various model architectures for data-sensitive face recognition and medical diagnosis tasks. TeDA demonstrates high feasibility, effectiveness and robustness under various adaptive strategies (e.g., pruning and distillation).
Thu 19 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:30 - 11:50 | |||
10:30 20mTalk | Distance-Aware Test Input Selection for Deep Neural Networks Technical Papers Zhong Li Nanjing University, Zhengfeng Xu Nanjing University, Ruihua Ji Nanjing University, Minxue Pan Nanjing University, Tian Zhang Nanjing University, Linzhang Wang Nanjing University, Xuandong Li Nanjing University DOI | ||
10:50 20mTalk | Test Selection for Deep Neural Networks using Meta-Models with Uncertainty Metrics Technical Papers Demet Demir Middle East Technical University, Aysu Betin Can Middle East Technical University, Elif Surer Middle East Technical University DOI | ||
11:10 20mTalk | Datactive: Data Fault Localization for Object Detection Systems Technical Papers Yining Yin Nanjing University, Yang Feng Nanjing University, Shihao Weng Nanjing University, Yuan Yao Nanjing University, Jia Liu Nanjing University, Zhihong Zhao Nanjing University DOI | ||
11:30 20mTalk | TeDA: A Testing Framework for Data Usage Auditing in Deep Learning Model Development Technical Papers Xiangshan Gao Zhejiang University; Huawei Technology, Jialuo Chen Zhejiang University, Jingyi Wang Zhejiang University, Jie Shi Huawei International, Peng Cheng Zhejiang University, Jiming Chen Zhejiang University; Hangzhou Dianzi University DOI |