diff --git a/docs/faq.md b/docs/faq.md index b8c4c969..7b7c356a 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -13,12 +13,12 @@ Yes! Opacus is open-source for public use, and it is licensed under the [Apache ## How can I report a bug or ask a question? -You can report bugs by submitting GitHub issues. To submit a Github issue, please [click here](https://github.com/pytorch/opacus/issues). +You can report bugs by submitting GitHub issues. To submit a GitHub issue, please [click here](https://github.com/pytorch/opacus/issues). You can ask questions in our dedicated PyTorch [Discussion Forum](https://discuss.pytorch.org/c/opacus/29). We actively monitor questions in the PyTorch forums with the category `Opacus`. ## I'd like to contribute to Opacus. How can I do that? -Thank you for your interest in contributing to Opacus! Submit your contributions using Github pull requests [here](https://github.com/pytorch/opacus/pulls). Please take a look at [Opacus contribution guide](https://github.com/pytorch/opacus/blob/main/CONTRIBUTING.md). +Thank you for your interest in contributing to Opacus! Submit your contributions using GitHub pull requests [here](https://github.com/pytorch/opacus/pulls). Please take a look at [Opacus contribution guide](https://github.com/pytorch/opacus/blob/main/CONTRIBUTING.md). ## If I use Opacus in my paper, how can I cite it? @@ -62,7 +62,7 @@ model, optimizer, data_loader = privacy_engine.make_private( Not all pseudo random number generators (RNGs) are born equal. Most of them (including Python’s and PyTorch’s default generators, which are based on the Mersenne Twister) cannot support the quality of randomness required by cryptographic applications. The RNGs that do qualify are generally referred to as cryptographically secure RNGs, [CSPRNGs](https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator). Opacus supports a CSPRNG provided by the [`torchcsprng`](https://github.com/pytorch/csprng) library. This option is controlled by setting `secure_rng` to `True`. -However, using a CSPRNG comes with a large performance hit, so we normally recommend that you do your experimentation with `secure_rng` set to `False`. Once you identify a training regime that works for your application (i.e., the model’s architecture, the right hyperparameters, how long to train for, etc), then we recommend that you turn it on and train again from scratch, so that your final model can enjoy the security this brings. +However, using a CSPRNG comes with a large performance hit, so we normally recommend that you do your experimentation with `secure_rng` set to `False`. Once you identify a training regime that works for your application (i.e., the model’s architecture, the right hyper parameters, how long to train for, etc.), then we recommend that you turn it on and train again from scratch, so that your final model can enjoy the security this brings. ## My model doesn’t converge with default privacy settings. What do I do? @@ -70,17 +70,17 @@ Opacus has several settings that control the amount of noise, which affects conv The next parameter to adjust would be the learning rate. Compared to the non-private training, Opacus-trained models converge with a smaller learning rate (each gradient update is noisier, thus we want to take smaller steps). -Next one on the list is `max_grad_norm` . It sets the threshold above which Opacus clips the gradients, impairing convergence. Deeper models are less impacted by this threshold, while linear models can be badly hurt if its value is not set right. +Next one on the list is `max_grad_norm` . It sets the threshold above which Opacus clips the gradients, impairing convergence. Deeper models are less impacted by this threshold, while linear models can be badly hurt if their value is not set right. -If these interventions don’t help (or the models starts to converge but its privacy is wanting), it is time to take a hard look at the model architecture or its components. [[Papernot et al. 2019]](https://openreview.net/forum?id=rJg851rYwH) can be a good starting point. +If these interventions don’t help (or the model starts to converge but its privacy is wanting), it is time to take a hard look at the model architecture or its components. [[Papernot et al. 2019]](https://openreview.net/forum?id=rJg851rYwH) can be a good starting point. -## How to deal with out of memory errors? +## How to deal with out-of-memory errors? Dealing with per-sample gradients will inevitably put more pressure on your memory: after all, if you want to train with batch size 64, you are looking to keep 64 copies of your parameter gradients. The first sanity check to do is to make sure that you don’t go out of memory with "standard" training (without DP). That should guarantee that you can train with batch size of 1 at least. Then, you can check your memory usage with e.g. `nvidia-smi` as usual, gradually increasing the batch size until you find your sweet spot. Note that this may mean that you still train with small batch size, which comes with its own training behavior (i.e. higher variance between batches). Training with larger batch sizes can be beneficial, and we built `virtual_step` to make this possible while still memory efficient (see *what is virtual batch size* in these FAQs). ## What does epsilon=1.1 really mean? How about delta? -The (epsilon, delta) pair quantifies the privacy properties of the DP-SGD algorithm (see the [blog post](https://bit.ly/dp-sgd-algorithm-explained)). A model trained with (epsilon, delta)-differential privacy (DP) protects privacy of any one training example, no matter how strange, ill-fitting, or perfect this example is. +The (epsilon, delta) pair quantifies the privacy properties of the DP-SGD algorithm (see the [blog post](https://bit.ly/dp-sgd-algorithm-explained)). A model trained with (epsilon, delta)-differential privacy (DP) protects the privacy of any training example, no matter how strange, ill-fitting, or perfect this example is. Formally, (epsilon, delta)-DP statement implies that the probabilities of outputting a model *W* trained on two datasets *D* and *D*′ that differ in a single example are close: ![epsilon-delta-dp](https://raw.githubusercontent.com/pytorch/opacus/main/docs/img/epsilon-delta-dp.png) @@ -98,7 +98,7 @@ Assuming that batches are randomly selected, an increase in the batch size incre ## My model throws IncompatibleModuleException. What is going wrong? -Your model most likely contains modules that are not compatible with Opacus. The most prominent example of these modules are batch-norm types. Luckily there is a good substitute for a `BatchNorm` layer and it is called `GroupNorm`. You can convert all your batch norm sub-modules using this utility function: `opacus.utils.module_modification.convert_batchnorm_modules.` +Your model most likely contains modules that are not compatible with Opacus. The most prominent example of these modules is batch-norm types. Luckily there is a good substitute for a `BatchNorm` layer, and it is called `GroupNorm`. You can convert all your batch norm submodules using this utility function: `opacus.utils.module_modification.convert_batchnorm_modules.` ## What is virtual batch size? @@ -114,7 +114,7 @@ A call to `privacy_engine.get_epsilon(delta=delta)` returns a pair: an epsilon s