Documented neural network

MikeS96 · Jun 13, 2020 · ad7e79e · ad7e79e
1 parent 0368f1d
commit ad7e79e
Showing 1 changed file with 22 additions and 5 deletions.
diff --git a/LunarLander.ipynb b/LunarLander.ipynb
@@ -22,8 +22,7 @@
     "Image and Text taken from [Official documentaiton Lunar Lander](https://gym.openai.com/envs/LunarLander-v2/).\n",
     "\n",
     "\n",
-    "Neural Networks for function approximation\n",
-    "[Section 9.7 of Reinforment Learning an Introduction](http://www.incompleteideas.net/book/RLbook2018.pdf#page=246)\n",
+    "\n",
     "\n",
     "Episodis semi gradient control\n",
     "\n",
@@ -33,9 +32,7 @@
     "\n",
     "$$w \\leftarrow w + \\alpha[R_{t+1} + \\gamma \\sum_{a'}\\pi(a' | S_{t+1}) \\hat{q}(S_{t+1}, a', w) - \\hat{q}(S_t, A_t, w)]\\nabla \\hat{q}(S_t, A_t, w)$$\n",
     "\n",
-    "Fucntion with neural networks\n",
     "\n",
-    "$$ q_\\pi(s) \\approx \\hat{q}(s, a, w) = NN(s,a,w) $$\n",
     "\n",
     "episodic semi gradient control\n",
     "[Section 10.1 of Reinforment Learning an Introduction](http://www.incompleteideas.net/book/RLbook2018.pdf#page=265)\n",
@@ -182,6 +179,26 @@
     "print(\"The Observations at a given timestep are {0}\\n\".format(env.observation_space.sample()))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Computing action-values with neural networks\n",
+    "\n",
+    "To compute action-values, a feed-forward neural network is used. This apporach allows us to compute action-values using the weights of the neural network.\n",
+    "\n",
+    "$$ q_\\pi(s) \\approx \\hat{q}(s, a, w) = NN(s,a,w) $$\n",
+    "\n",
+    "Neural networks are used to solve the control problem in RL, particularly, this networl is going to be used with an Episodic Semi-gradient Expected Sarsa agent. The inputs of the network are the states, which in this case are eight, the number of hidden layers and hidden units can vary. Finally, the number of inputs is equals to the number of actions in the problem, therefore, four output nodes are needed in the final layer. Each output node corresponds to the action value of a particular action.\n",
+    "\n",
+    "<img src=\"./assets/nn.png\" width=\"380\" />\n",
+    "\n",
+    "\n",
+    "For further information about Neural Networks for function approximation see [Section 9.7 of Reinforment Learning an Introduction](http://www.incompleteideas.net/book/RLbook2018.pdf#page=246)\n",
+    "\n",
+    "Image taken from [Reinforcement learning specialization, C4L5S1](https://www.coursera.org/learn/complete-reinforcement-learning-system/lecture/CVH40/meeting-with-adam-getting-the-agent-details-right)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -773,7 +790,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      " 16%|█▌        | 81/500 [09:31<1:16:33, 10.96s/it]"
+      " 93%|█████████▎| 463/500 [29:19<01:25,  2.32s/it]  "
      ]
     }
    ],