Skip to content

Commit a70773d

Browse files
author
cer
committed
update
1 parent ef04fe4 commit a70773d

File tree

1 file changed

+29
-17
lines changed

1 file changed

+29
-17
lines changed

reinforcement_learning.ipynb

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,14 @@
2222
"- [7. n-step Bootstrapping](#7.-n-step-Bootstrapping)\n",
2323
"- [8. Planning and Learning with Tabular Methods](#8.-Planning-and-Learning-with-Tabular-Methods)\n",
2424
"- [9. On-policy Prediction with Approximation](#9.-On-policy-Prediction-with-Approximation)\n",
25-
"- [](#)\n",
26-
"- [](#)\n",
27-
"- [](#)\n",
25+
"- [10. On-policy Control with Approximation](#10.-On-policy-Control-with-Approximation)\n",
26+
"- [11. Off-policy Methods with Approximation](#11.-Off-policy-Methods-with-Approximation)\n",
27+
"- [12. Eligibility Traces](#12.-Eligibility-Traces)\n",
2828
"- [13. Policy Gradient Methods](#13.-Policy-Gradient-Methods)\n",
29-
"- [](#)\n",
30-
"- [](#)\n",
31-
"- [](#)\n",
32-
"- [](#)\n",
33-
"- [](#)\n",
34-
"\n"
29+
"- [14. Psychology](#14.-Psychology)\n",
30+
"- [15. Neuroscience](#15.-Neuroscience)\n",
31+
"- [16. Applications and Case Studies](#16.-Applications-and-Case-Studies)\n",
32+
"- [17. Frontiers](#17.-Frontiers)\n"
3533
]
3634
},
3735
{
@@ -370,7 +368,7 @@
370368
"- 因为直接使用现有的估计取更新估计,因此这种方法被称为**自举(bootstrap)**。\n",
371369
"- ![](https://github.com/applenob/rl_learn/raw/master/res/td0_est.png)\n",
372370
"- **TD error**:$\\delta_t = R_{t+1}+\\gamma V(S_{t+1})-V(S_t)$\n",
373-
"- ![](https://github.com/applenob/rl_learn/raw/master/res/td0.png)\n",
371+
"- ![](https://github.com/applenob/rl_learn/raw/master/res/td_0.png)\n",
374372
"\n",
375373
"### Sarsa\n",
376374
"- 一种on-policy的TD控制。\n",
@@ -420,17 +418,23 @@
420418
{
421419
"cell_type": "markdown",
422420
"metadata": {},
423-
"source": []
421+
"source": [
422+
"## 10. On-policy Control with Approximation"
423+
]
424424
},
425425
{
426426
"cell_type": "markdown",
427427
"metadata": {},
428-
"source": []
428+
"source": [
429+
"## 11. Off-policy Methods with Approximation"
430+
]
429431
},
430432
{
431433
"cell_type": "markdown",
432434
"metadata": {},
433-
"source": []
435+
"source": [
436+
"## 12. Eligibility Traces"
437+
]
434438
},
435439
{
436440
"cell_type": "markdown",
@@ -442,22 +446,30 @@
442446
{
443447
"cell_type": "markdown",
444448
"metadata": {},
445-
"source": []
449+
"source": [
450+
"## 14. Psychology"
451+
]
446452
},
447453
{
448454
"cell_type": "markdown",
449455
"metadata": {},
450-
"source": []
456+
"source": [
457+
"## 15. Neuroscience"
458+
]
451459
},
452460
{
453461
"cell_type": "markdown",
454462
"metadata": {},
455-
"source": []
463+
"source": [
464+
"## 16. Applications and Case Studies"
465+
]
456466
},
457467
{
458468
"cell_type": "markdown",
459469
"metadata": {},
460-
"source": []
470+
"source": [
471+
"## 17. Frontiers"
472+
]
461473
},
462474
{
463475
"cell_type": "code",

0 commit comments

Comments
 (0)