documentation touchups (#4099)

tom-thompson · web-flow · commit 78bb8c76fe2c · 2020-07-07T10:42:58.000-07:00
* doc updates

getting started page now uses consistent run-id

re-order create-new docs to have less back/forth between unity and text editor

* add link explaining decisions where we tell the reader to modify its parameter
diff --git a/docs/Getting-Started.md b/docs/Getting-Started.md
@@ -236,7 +236,7 @@ If you've quit the training early using `Ctrl+C` and want to resume training,
 run the same command again, appending the `--resume` flag:
 
 ```sh
-mlagents-learn config/ppo/3DBall.yaml --run-id=firstRun --resume
+mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun --resume
 ```
 
 Your trained model will be at `results/<run-identifier>/<behavior_name>.nn` where
diff --git a/docs/Learning-Environment-Create-New.md b/docs/Learning-Environment-Create-New.md
@@ -269,7 +269,7 @@ component, `rBody`, using the `Rigidbody.AddForce` function:
 Vector3 controlSignal = Vector3.zero;
 controlSignal.x = action[0];
 controlSignal.z = action[1];
-rBody.AddForce(controlSignal * speed);
+rBody.AddForce(controlSignal * forceMultiplier);
 ```
 
 #### Rewards
@@ -313,14 +313,14 @@ With the action and reward logic outlined above, the final version of the
 `OnActionReceived()` function looks like:
 
 ```csharp
-public float speed = 10;
+public float forceMultiplier = 10;
 public override void OnActionReceived(float[] vectorAction)
 {
     // Actions, size = 2
     Vector3 controlSignal = Vector3.zero;
     controlSignal.x = vectorAction[0];
     controlSignal.z = vectorAction[1];
-    rBody.AddForce(controlSignal * speed);
+    rBody.AddForce(controlSignal * forceMultiplier);
 
     // Rewards
     float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);
@@ -340,33 +340,9 @@ public override void OnActionReceived(float[] vectorAction)
 }
 ```
 
-Note the `speed` class variable is defined before the function. Since `speed` is
+Note the `forceMultiplier` class variable is defined before the function. Since `forceMultiplier` is
 public, you can set the value from the Inspector window.
 
-## Final Editor Setup
-
-Now, that all the GameObjects and ML-Agent components are in place, it is time
-to connect everything together in the Unity Editor. This involves changing some
-of the Agent Component's properties so that they are compatible with our Agent
-code.
-
-1. Select the **RollerAgent** GameObject to show its properties in the Inspector
-   window.
-1. Add the `Decision Requester` script with the Add Component button from the
-   RollerAgent Inspector.
-1. Change **Decision Period** to `10`.
-1. Drag the Target GameObject from the Hierarchy window to the RollerAgent
-   Target field.
-1. Add the `Behavior Parameters` script with the Add Component button from the
-   RollerAgent Inspector.
-1. Modify the Behavior Parameters of the Agent :
-   - `Behavior Name` to _RollerBall_
-   - `Vector Observation` > `Space Size` = 8
-   - `Vector Action` > `Space Type` = **Continuous**
-   - `Vector Action` > `Space Size` = 2
-
-Now you are ready to test the environment before training.
-
 ## Testing the Environment
 
 It is always a good idea to first test your environment by controlling the Agent
@@ -392,6 +368,30 @@ the platform. Make sure that there are no errors displayed in the Unity Editor
 Console window and that the Agent resets when it reaches its target or falls
 from the platform.
 
+## Final Editor Setup
+
+Now, that all the GameObjects and ML-Agent components are in place, it is time
+to connect everything together in the Unity Editor. This involves changing some
+of the Agent Component's properties so that they are compatible with our Agent
+code.
+
+1. Select the **RollerAgent** GameObject to show its properties in the Inspector
+   window.
+1. Add the `Decision Requester` script with the Add Component button from the
+   RollerAgent Inspector.
+1. Change **Decision Period** to `10`. For more information on decisions, see [the Agent documentation](Learning-Environment-Design-Agents.md#decisions)
+1. Drag the Target GameObject from the Hierarchy window to the RollerAgent
+   Target field.
+1. Add the `Behavior Parameters` script with the Add Component button from the
+   RollerAgent Inspector.
+1. Modify the Behavior Parameters of the Agent :
+   - `Behavior Name` to _RollerBall_
+   - `Vector Observation` > `Space Size` = 8
+   - `Vector Action` > `Space Type` = **Continuous**
+   - `Vector Action` > `Space Size` = 2
+
+Now you are ready to test the environment before training.
+
 ## Training the Environment
 
 The process is the same as described in the
@@ -427,6 +427,8 @@ behaviors:
     summary_freq: 10000
 ```
 
+Hyperparameters are explained in [the training configuration file documentation](Training-Configuration-File.md)
+
 Since this example creates a very simple training environment with only a few
 inputs and outputs, using small batch and buffer sizes speeds up the training
 considerably. However, if you add more complexity to the environment or change