diff --git a/docs/faq.md b/docs/faq.md
index b8c4c969..7b7c356a 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -13,12 +13,12 @@ Yes! Opacus is open-source for public use, and it is licensed under the [Apache
 
 ## How can I report a bug or ask a question?
 
-You can report bugs by submitting GitHub issues. To submit a Github issue, please [click here](https://github.com/pytorch/opacus/issues).
+You can report bugs by submitting GitHub issues. To submit a GitHub issue, please [click here](https://github.com/pytorch/opacus/issues).
 You can ask questions in our dedicated PyTorch [Discussion Forum](https://discuss.pytorch.org/c/opacus/29). We actively monitor questions in the PyTorch forums with the category `Opacus`.
 
 ## I'd like to contribute to Opacus. How can I do that?
 
-Thank you for your interest in contributing to Opacus! Submit your contributions using Github pull requests [here](https://github.com/pytorch/opacus/pulls). Please take a look at [Opacus contribution guide](https://github.com/pytorch/opacus/blob/main/CONTRIBUTING.md).
+Thank you for your interest in contributing to Opacus! Submit your contributions using GitHub pull requests [here](https://github.com/pytorch/opacus/pulls). Please take a look at [Opacus contribution guide](https://github.com/pytorch/opacus/blob/main/CONTRIBUTING.md).
 
 ## If I use Opacus in my paper, how can I cite it?
 
@@ -62,7 +62,7 @@ model, optimizer, data_loader = privacy_engine.make_private(
 
 Not all pseudo random number generators (RNGs) are born equal. Most of them (including Python’s and PyTorch’s default generators, which are based on the Mersenne Twister) cannot support the quality of randomness required by cryptographic applications. The RNGs that do qualify are generally referred to as cryptographically secure RNGs, [CSPRNGs](https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator). Opacus supports a CSPRNG provided by the [`torchcsprng`](https://github.com/pytorch/csprng) library. This option is controlled by setting `secure_rng` to `True`.
 
-However, using a CSPRNG comes with a large performance hit, so we normally recommend that you do your experimentation with `secure_rng` set to `False`. Once you identify a training regime that works for your application (i.e., the model’s architecture, the right hyperparameters, how long to train for, etc), then we recommend that you turn it on and train again from scratch, so that your final model can enjoy the security this brings.
+However, using a CSPRNG comes with a large performance hit, so we normally recommend that you do your experimentation with `secure_rng` set to `False`. Once you identify a training regime that works for your application (i.e., the model’s architecture, the right hyper parameters, how long to train for, etc.), then we recommend that you turn it on and train again from scratch, so that your final model can enjoy the security this brings.
 
 ## My model doesn’t converge with default privacy settings. What do I do?
 
@@ -70,17 +70,17 @@ Opacus has several settings that control the amount of noise, which affects conv
 
 The next parameter to adjust would be the learning rate. Compared to the non-private training, Opacus-trained models converge with a smaller learning rate (each gradient update is noisier, thus we want to take smaller steps).
 
-Next one on the list is `max_grad_norm` . It sets the threshold above which Opacus clips the gradients, impairing convergence. Deeper models are less impacted by this threshold, while linear models can be badly hurt if its value is not set right.
+Next one on the list is `max_grad_norm` . It sets the threshold above which Opacus clips the gradients, impairing convergence. Deeper models are less impacted by this threshold, while linear models can be badly hurt if their value is not set right.
 
-If these interventions don’t help (or the models starts to converge but its privacy is wanting), it is time to take a hard look at the model architecture or its components. [[Papernot et al. 2019]](https://openreview.net/forum?id=rJg851rYwH) can be a good starting point.
+If these interventions don’t help (or the model starts to converge but its privacy is wanting), it is time to take a hard look at the model architecture or its components. [[Papernot et al. 2019]](https://openreview.net/forum?id=rJg851rYwH) can be a good starting point.
 
-## How to deal with out of memory errors?
+## How to deal with out-of-memory errors?
 
 Dealing with per-sample gradients will inevitably put more pressure on your memory: after all, if you want to train with batch size 64, you are looking to keep 64 copies of your parameter gradients. The first sanity check to do is to make sure that you don’t go out of memory with "standard" training (without DP). That should guarantee that you can train with batch size of 1 at least. Then, you can check your memory usage with e.g. `nvidia-smi` as usual, gradually increasing the batch size until you find your sweet spot. Note that this may mean that you still train with small batch size, which comes with its own training behavior (i.e. higher variance between batches). Training with larger batch sizes can be beneficial, and we built `virtual_step` to make this possible while still memory efficient (see *what is virtual batch size* in these FAQs).
 
 ## What does epsilon=1.1 really mean? How about delta?
 
-The (epsilon, delta) pair quantifies the privacy properties of the DP-SGD algorithm (see the [blog post](https://bit.ly/dp-sgd-algorithm-explained)). A model trained with (epsilon, delta)-differential privacy (DP) protects privacy of any one training example, no matter how strange, ill-fitting, or perfect this example is.
+The (epsilon, delta) pair quantifies the privacy properties of the DP-SGD algorithm (see the [blog post](https://bit.ly/dp-sgd-algorithm-explained)). A model trained with (epsilon, delta)-differential privacy (DP) protects the privacy of any training example, no matter how strange, ill-fitting, or perfect this example is.
 
 Formally, (epsilon, delta)-DP statement implies that the probabilities of outputting a model *W* trained on two datasets *D* and *D*′ that differ in a single example are close:
 ![epsilon-delta-dp](https://raw.githubusercontent.com/pytorch/opacus/main/docs/img/epsilon-delta-dp.png)
@@ -98,7 +98,7 @@ Assuming that batches are randomly selected, an increase in the batch size incre
 
 ## My model throws IncompatibleModuleException. What is going wrong?
 
-Your model most likely contains modules that are not compatible with Opacus. The most prominent example of these modules are batch-norm types. Luckily there is a good substitute for a `BatchNorm` layer and it is called `GroupNorm`. You can convert all your batch norm sub-modules using this utility function: `opacus.utils.module_modification.convert_batchnorm_modules.`
+Your model most likely contains modules that are not compatible with Opacus. The most prominent example of these modules is batch-norm types. Luckily there is a good substitute for a `BatchNorm` layer, and it is called `GroupNorm`. You can convert all your batch norm submodules using this utility function: `opacus.utils.module_modification.convert_batchnorm_modules.`
 
 ## What is virtual batch size?
 
@@ -114,7 +114,7 @@ A call to `privacy_engine.get_epsilon(delta=delta)` returns a pair: an epsilon s
 
 <!-- ## How do I run Opacus in Colab?
 
-If you are getting an error like this, `ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory`, the reason is that you are on the wrong CUDA version. For example, as of October 2020 Colab is still on Cuda 10.1, while PyTorch has moved on to Cuda 10.2 as its default. You would actually see this issue even with installing PyTorch - you don't see it because PyTorch comes pre-installed in Colab so they have the right Cuda version already.
+If you are getting an error like this, `ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory`, the reason is that you are on the wrong CUDA version. For example, as of October 2020 Colab is still on Cuda 10.1, while PyTorch has moved on to Cuda 10.2 as its default. You would actually see this issue even with installing PyTorch - you don't see it because PyTorch comes pre-installed in Colab, so they have the right Cuda version already.
 
 The fix is to just install the package for the right Cuda version you have :)
 
diff --git a/docs/introduction.md b/docs/introduction.md
index 8cb08c56..f13886a2 100644
--- a/docs/introduction.md
+++ b/docs/introduction.md
@@ -3,7 +3,7 @@ id: introduction
 title: Introduction
 ---
 
-Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance and allows the client to online track the privacy budget expended at any given moment.
+Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance, and allows the client to online track the privacy budget expended at any given moment.
 
 Please refer to [this post](https://ai.facebook.com/blog/introducing-opacus-a-high-speed-library-for-training-pytorch-models-with-differential-privacy/) to read more about Opacus.