Better documentation and warning (pytorch#13946)

Summary: This is to address pytorch#12603 Pull Request resolved: pytorch#13946 Differential Revision: D13055254 Pulled By: teng-li fbshipit-source-id: 20a206ebd3456eac9dc50584664c4bca3ee955d1
lcy-seso · Nov 14, 2018 · 4983397 · 4983397
1 parent 143ba72
commit 4983397
Showing 1 changed file with 11 additions and 0 deletions.
diff --git a/torch/nn/parallel/distributed.py b/torch/nn/parallel/distributed.py
@@ -122,6 +122,17 @@ class DistributedDataParallel(Module):
         won't be invoked anymore, unless the hooks are initialized in the
         :meth:`forward` method.
 
+    .. warning::
+        You should never try to change your model's parameters after wrapping
+        up your model with DistributedDataParallel. In other words, when
+        wrapping up your model with DistributedDataParallel, the constructor of
+        DistributedDataParallel will register the additional gradient
+        reduction functions on all the parameters of the model itself at the
+        time of construction. If you change the model's parameters after
+        the DistributedDataParallel construction, this is not supported and
+        unexpected behaviors can happen, since some parameters' gradient
+        reduction functions might not get called.
+
     .. note::
         Parameters are never broadcast between processes. The module performs
         an all-reduce step on gradients and assumes that they will be modified