A guide covering Differential Privacy including the applications, libraries and tools that will make you a better and more efficient developer with protecting users data and their privacy.
Note: You can easily convert this markdown file to a PDF in VSCode using this handy extension Markdown PDF.
Above is a simple diagram of how Differential Privacy-Preserving Data Sharing and Data Mining protects a User's Data
Differential Privacy is a system that simultaneously enables researchers and analysts to extract useful insights from datasets containing personal information and offers stronger privacy protections. This is achieved by introducing "statistical noise".
Statistical Noise is a process that small aletrations to masked datasets. The statistical noise hides identifiable characteristics of individuals, ensuring that the privacy of personal information is protected, but it's small enough to not materially impact the accuracy of the answers extracted by analysts and researchers.
Laplacian Noise is a mechanism that adds Laplacian-distributed noise to a function.
Differential Privacy Blog Series by the National Institute of Standards and Technology(NIST)
Apple's Differential Privacy Overview
Learning with Privacy at Scale with Apple Machine Learning
Microsoft Research Differential Privacy Overview
Responsible Machine Learning with Microsoft Azure
Responsible AI Resources with Microsoft AI
Preserve data privacy by using differential privacy and the SmartNoise package
Open Differential Privacy(OpenDP) Initiative by Microsoft and Harvard
Google's Differential Privacy Library
Computing Private Statistics with Privacy on Beam from Google Codelabs
Introducing TensorFlow Privacy: Learning with Differential Privacy for Training Data
TensorFlow Federated: Machine Learning on Decentralized Data
Federated Analytics: Collaborative Data Science without Data Collection
Differentially-Private Stochastic Gradient Descent(DP-SGD)
Learning Differential Privacy from Harvard University Privacy Tools Project
Harvard University Privacy Tools Project Courses & Educational Materials
The Weaknesses of Differential Privacy course on Coursera
The Differential Privacy of Bayesian Inference
Simultaneous private learning of multiple concepts
The Complexity of Computing the Optimal Composition of Differential Privacy
Order revealing encryption and the hardness of private learning
SAP HANA data anonymization using SAP Software Solutions
SAP HANA Security using their In-Memory Database
DEFCON Differential Privacy Training Launch
Secure and Private AI course on Udacity
Differential Privacy - Security and Privacy for Big Data - Part 1 course on Coursera
Differential Privacy - Security and Privacy for Big Data - Part 2 course on Coursera
Certified Ethical Emerging Technologist Professional Certificate course on Coursera
PySyft is a Python library for secure and private Deep Learning. PySyft decouples private data from model training, using Federated Learning, Differential Privacy, and Encrypted Computation (like Multi-Party Computation (MPC) and Homomorphic Encryption (HE) within the main Deep Learning frameworks like PyTorch and TensorFlow.
TensorFlow Privacy is a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. The library comes with tutorials and analysis tools for computing the privacy guarantees provided.
TensorFlow Federated (TFF) is an open-source framework for machine learning and other computations on decentralized data. TFF has been developed to facilitate open research and experimentation with Federated Learning (FL), an approach to machine learning where a shared global model is trained across many participating clients that keep their training data locally.
Privacy on Beam is an end-to-end differential privacy solution built on Apache Beam. It is intended to be usable by all developers, regardless of their differential privacy expertise.
PyDP is a Python wrapper for Google's Differential Privacy project.
PennyLane is a cross-platform Python library for differentiable programming of quantum computers. By training a quantum computer the same way as a neural network.
BoTorch is a library for Bayesian Optimization built on PyTorch.
PyTorch Geometric (PyG) is a geometric deep learning extension library for PyTorch.
Skorch is a scikit-learn compatible neural network library that wraps PyTorch.
Diffprivlib is the IBM Differential Privacy Library for experimenting with, investigating and developing applications in, differential privacy.
Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance and allows the client to online track the privacy budget expended at any given moment.
Smart Noise is a toolkit that uses state-of-the-art differential privacy (DP) techniques to inject noise into data, to prevent disclosure of sensitive information and manage exposure risk.
- If would you like to contribute to this guide simply make a Pull Request.
Distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) Public License.