-
Notifications
You must be signed in to change notification settings - Fork 10
Membership Inference Attacks
LeakPro supports a wide range of attack scenarios targeting different privacy vulnerabilities in machine learning models. This section provides an in-depth look at each scenario, including supported attack types, data modalities, and references to specific implementations.
In the figure above, the MIA workflow is outlined. The upper part of the image (above the dashed line) shows user-controlled inputs, while the lower part illustrates the inner workings of LeakPro. To evaluate MIAs using LeakPro, the following steps are necessary:
-
Step 1: Ensure access to an auxiliary dataset, which may originate from the same distribution as the training dataset or a different one. The figure above illustrates the former case. Additionally, the user must split the dataset into a training set and a test set. The training set is used for model training, while the test set is used to assess the generalization gap. During evaluation, the attack will be tested on both training samples (in-members) and testing samples (out-members). The complete dataset (including the training, test, and auxiliary sets) will be provided to LeakPro and referred to as the population data. Importantly, the population dataset must be indexable.
-
Step 2: The user must provide a function to train a model using the training set. This function can either be used to train a target model or be bypassed if a pre-trained target model is provided. Regardless, the user must supply training functionality to enable the adversary to train shadow models during the evaluation process. It is important to note that this training functionality can be designed to limit the adversary's knowledge, as the training process may differ from the actual model training used in practice. Additionally, along with the target model, the user should provide the following metadata:
- Training and testing indices within the population data.
- The optimizer function used during training.
- The loss function applied during model evaluation.
-
Step 3: The user-provided inputs–population data, target model, target model metadata, and the training function–are supplied to LeakPro when the LeakPro object is created. This information is stored within the Handler object, which acts as an interface between the user and the various attacks performed by LeakPro.
-
Step 4: The relevant attacks are prepared within LeakPro, utilizing different tools based on the specific attacks being performed. For instance, some attacks rely on shadow models, while others leverage techniques such as model distillation or quantile regression. These tools are built using the auxiliary data, which is assumed to be accessible to the adversary, along with the training loop provided by the user. Additionally, certain attacks feature both online and offline versions:
- Offline attacks: The adversary can only sample from the provided auxiliary dataset, limiting access to other data sources.
- Online attacks: The adversary can also sample from the training and test datasets, though without knowing whether specific samples were used during the training of the target model.
-
Step 5: The attack tools are utilized during attack execution to generate signals for membership inference. The membership inference attacks are evaluated on both the training and test datasets. The adversary's objective is twofold:
- Correctly infer that the training data samples are in-members (part of the training set).
- Accurately detect that the test data samples are out-members (not part of the training set).
-
Step 6: The signals generated by each attack, along with the corresponding decisions, are passed to LeakPro's report module for summarization. The report module compiles the results into a comprehensive PDF report for easy sharing while also storing the individual data outputs produced by the attacks for further analysis. Once the results have been generated and stored, the auditing process is considered complete.
- Label-Only Attacks: Infer membership using only predicted labels.
- Logit-Based Attacks: Use model confidence scores (logits) to increase inference accuracy.
- Image Data: Vision models trained on datasets like CIFAR-10 and MNIST.
- Text Data: NLP models such as sentiment analyzers.
- Tabular Data: Structured datasets like financial or healthcare records.
- Graph Data: Graph neural networks used in social network analysis.
This table lists all implemented attacks in LeakPro across different scenarios, linking directly to the code implementation and original research papers.
Attack | Adversary Access | Code Implementation | Original Paper |
---|---|---|---|
HSJ | Label-Based | LeakPro Code | Choquette-Choo et al., in ICML (2021) |
LiRA | Logit-Based | LeakPro Code | Carlini et al., IEEE SP (2022) |
TrajectoryMIA | Logit-Based | LeakPro Code | Liu et al., in ACM SIGSAC (2022) |
P-Attack | Loss-Based | LeakPro Code | Ye et al., in ACM CCS (2022) |
QMIA | Logit-Based | LeakPro Code | Bertran et al., in NeurIPS (2023) |
RMIA | Logit-Based | LeakPro Code | Zarifzadeh et al., in ICML (2024) |
YOQO | Label-Based | LeakPro Code | Wu et al., in ICLR (2024) |
- Click on the Code Implementation links to explore the corresponding LeakPro modules.
- Click on the Original Paper links to read the foundational research behind each attack.