Add dist_tuto to resources (#2317)

subramen · Svetlana Karslioglu · web-flow · commit 5f0556a36d4c · 2023-05-11T08:36:14.000-07:00
Co-authored-by: Svetlana Karslioglu &lt;svekars@fb.com&gt;
diff --git a/beginner_source/ddp_series_theory.rst b/beginner_source/ddp_series_theory.rst
@@ -37,6 +37,8 @@ ensures each device gets a non-overlapping input batch. The model is replicated
 each replica calculates gradients and simultaneously synchronizes with the others using the `ring all-reduce
 algorithm <https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/>`__.
 
+This `illustrative tutorial <https://pytorch.org/tutorials/intermediate/dist_tuto.html#>`__ provides a more in-depth python view of the mechanics of DDP.
+
 Why you should prefer DDP over DataParallel (DP)
 -------------------------------------------------
 
@@ -66,3 +68,4 @@ Further Reading
    API <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
 -  `DDP Internal
    Design <https://pytorch.org/docs/master/notes/ddp.html#internal-design>`__
+-  `DDP Mechanics Tutorial <https://pytorch.org/tutorials/intermediate/dist_tuto.html#>`__