This repository provides a proof-of-concept implementation of the black-box f-differential privacy (fDP) estimato/auditor as introduced in our paper General-Purpose
This project introduces two novel estimators for
-
Perturbed Likelihood Ratio (PTLR) Test-Based Estimator
Leverages a perturbed likelihood ratio test approach (Algorithm 3 in our paper) to estimate the$f$ -differential privacy curve. -
Classifier-Based Estimator (Baybox Estimator)
Uses a binary classification approach (e.g., k-Nearest Neighbors) to approximate privacy guarantees. This method, referred to as the Baybox estimator, is detailed in Algorithm 1 of our paper.
Both these approaches can provide an estimate of the f-differential privacy curve. On top of these estimators, we offer an auditor that merges the above techniques to statistically test an
This repository demonstrates the following:
-
Black-box Estimation of
$f$ -DP: Minimal prior knowledge of the algorithm under investigation. - Classifier-based Framework: Flexibility to use different binary classification algorithms. (kNN is included, but others can be integrated.)
- PTLR-based Estimator: An alternative approach rooted in likelihood ratio testing.
- Broad Applicability: Evaluation of standard and complex DP mechanisms (e.g., Gaussian, Laplacian, DP-SGD) to expose subtle bugs or test privacy properties.
-
Auditor for
$f$ -DP Violations: Harnesses the strengths of both estimators and employs hypothesis testing theory/learning theory for robust auditing. - Comprehensive Demonstrations: Jupyter notebooks showcasing end-to-end usage on diverse mechanisms.
The experiment is tested on a Google Virtual Machine instance with an Ubuntu 22.04.5 LTS system.
First, ensure your system is up-to-date:
sudo apt update
sudo apt upgrade -y
sudo apt update && sudo apt upgrade -y
sudo apt install git # Install Git if not already installed
git clone https://github.com/stoneboat/fdp-estimation.git
- Install R: Run the following command to install R:
sudo apt install r-base -y
Note that our test version is 4.1.2.
- Install R package:
Open R as a superuser by running:
sudo R
Then, install the package:
install.packages("fdrtool")
-
Install Python Virtual Environment Support:
sudo apt install python3-venv # Ensure the correct version of Python sudo apt install python3-pip # Install pip if not already installed
-
Create and Activate a Virtual Environment:
python3 -m venv fdp-env source fdp-env/bin/activate
-
Navigate to the Project Directory and Install Dependencies:
cd fdp-estimation pip install --upgrade -r requirements.txt
-
Install Jupyter Kernel for Running Jupyter Notebooks:
To register the virtual environment as a Jupyter kernel, run the following command:python -m ipykernel install --user --name=fdp-env --display-name "Python (fdp-env)"
To edit the code, we recommend using JupyterLab. Use the following commands to configure:
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser
The added parameters allow external access, as Jupyter defaults to binding only to localhost. Also, note that if you get the error that command jupyter is not found, you might need to add the jupyter binary directory to your PATH by running the following commands:
export PATH=$HOME/.local/bin:$PATH
source ~/.bashrc # or source ~/.zshrc
jupyter lab list
This will show the URL for accessing the JupyterLab web service.
jupyter lab stop
To learn how to use the API for the estimator and auditor, navigate to the notebooks
folder. This directory contains three demonstration packages, each corresponding to a different component: the PTLR-based estimator, the classifier-based estimator, and the auditor. Within each folder, we provide example scripts illustrating how to call the API for estimation and inference tasks. Additionally, we demonstrate how to validate estimation and inference results, perform accuracy analysis, and compare outcomes against theoretical expectations.
This section describes how to run the Stochastic Gradient Descent (SGD) experiment. The experiment involves training multiple SGD models and use them to estimate/audit the SGD algorithm under test's privacy bounds. Please be aware that this process will generate thousands of model files, so ensure you have ample disk space available. Additionally, the experiment may require several hours to complete.
To begin, open a terminal and navigate to the project's root directory. Execute the following command to generate DPSGD models:
./scripts/sgd_experiment/run_generate_sgd_models.sh
To view available parameters, open the script:
cat scripts/sgd_experiment/run_generate_sgd_models.sh
Run the following command to generate samples for auditing. This process uses the model created in the previous step and will take a few minutes to compelete.
./scripts/sgd_experiment/run_generate_sgd_samples.sh
To view available parameters, open the script:
cat scripts/sgd_experiment/run_generate_sgd_samples.sh
After completing the preprocessing steps, you can audit and estimate the privacy of SGD with examples in the provided Jupyter notebooks.
-
PTLR-based Estimation
- Navigate to
notebook/ptlr estimation/
- Open
estimating_fdp_curve_full_sgd_cnn.ipynb
for FDP curve estimation - Open
sgd_cnn_theoretical_upperbound_vs_estimated_lowerbound.ipynb
to compare theoretical and estimated bounds
- Navigate to
-
Classifier-based Estimation
- Navigate to
notebook/classifier-based estimation/
- Open
estimating_fdp_curve_full_sgd_cnn.ipynb
to run the classifier-based estimation
- Navigate to
-
Privacy Auditing
- Navigate to
notebook/auditor/
- Open
auditing_fdp_curve_full_sgd_cnn.ipynb
to perform privacy auditing
- Navigate to
One of the key advantages of our estimation and auditing framework is its black-box nature, allowing users to experiment with different classifiers and mechanisms in a plug-and-play manner. Below, we discuss how to customize and extend the framework:
-
Adding New Mechanisms:
- To integrate a new mechanism, implement it in the
src/mech/
directory. Specifically, you need to define a mechanism sampler to generate independent samples of the mechanism's output, along with two mechanism-specific estimators—one based on the PTLR-based estimator and another based on the classifier-based estimator. Additionally, a mechanism-specific auditor should be provided. - The only part that users need to implement is the mechanism sampler itself; all other components can be instantiated using pre-defined abstract classes.
- For example, consider the Gaussian mechanism, implemented in
GaussianDist.py
undersrc/mech/
. This file contains four key classes:-
GaussianDistSampler
: Generates independent samples of the Gaussian mechanism's output. -
GaussianDistEstimator
: Implements the classification-based estimator for the Gaussian mechanism. -
GaussianPTLREstimator
: Implements the PTLR-based estimator for the Gaussian mechanism. -
GaussianAuditor
: Implements the auditor for testing$f$ -DP claims.
-
- Users only need to define the
preprocess
function inGaussianDistSampler
, which is responsible for generating$n$ independent samples. The remaining classes can be instantiated using the abstract classes_GeneralNaiveEstimator
,_PTLREstimator
, and_GeneralNaiveAuditor
, requiring only the concrete sampler (e.g.,GaussianDistSampler
).
- To integrate a new mechanism, implement it in the
-
Integrating Alternative Classifiers:
- Our modular framework supports seamless integration of custom classifiers. To use the Baybox estimator with a different binary classification algorithm, simply implement a new classifier following the interface defined in
src/classifier/
. Ensure that the required methods are properly defined to maintain compatibility with the framework.
- Our modular framework supports seamless integration of custom classifiers. To use the Baybox estimator with a different binary classification algorithm, simply implement a new classifier following the interface defined in
-
Parameter Tuning:
- Users can fine-tune various parameters, such as classifier configurations, sample sizes, and database settings, to optimize estimation quality. The Jupyter notebooks provide an interactive platform to experiment with these adjustments and observe their impact. Additionally, users can refer to the
generate_params
function within each mechanism file to identify available tunable parameters and explore potential configurations.
- Users can fine-tune various parameters, such as classifier configurations, sample sizes, and database settings, to optimize estimation quality. The Jupyter notebooks provide an interactive platform to experiment with these adjustments and observe their impact. Additionally, users can refer to the
We welcome contributions! If you have suggestions for improvements, new features, or find any issues, please open an issue or submit a pull request. For questions, feel free to send them to: martin.dunsche@rub.de or ywei368@gatech.edu