|
22 | 22 | "- [7. n-step Bootstrapping](#7.-n-step-Bootstrapping)\n",
|
23 | 23 | "- [8. Planning and Learning with Tabular Methods](#8.-Planning-and-Learning-with-Tabular-Methods)\n",
|
24 | 24 | "- [9. On-policy Prediction with Approximation](#9.-On-policy-Prediction-with-Approximation)\n",
|
25 |
| - "- [](#)\n", |
26 |
| - "- [](#)\n", |
27 |
| - "- [](#)\n", |
| 25 | + "- [10. On-policy Control with Approximation](#10.-On-policy-Control-with-Approximation)\n", |
| 26 | + "- [11. Off-policy Methods with Approximation](#11.-Off-policy-Methods-with-Approximation)\n", |
| 27 | + "- [12. Eligibility Traces](#12.-Eligibility-Traces)\n", |
28 | 28 | "- [13. Policy Gradient Methods](#13.-Policy-Gradient-Methods)\n",
|
29 |
| - "- [](#)\n", |
30 |
| - "- [](#)\n", |
31 |
| - "- [](#)\n", |
32 |
| - "- [](#)\n", |
33 |
| - "- [](#)\n", |
34 |
| - "\n" |
| 29 | + "- [14. Psychology](#14.-Psychology)\n", |
| 30 | + "- [15. Neuroscience](#15.-Neuroscience)\n", |
| 31 | + "- [16. Applications and Case Studies](#16.-Applications-and-Case-Studies)\n", |
| 32 | + "- [17. Frontiers](#17.-Frontiers)\n" |
35 | 33 | ]
|
36 | 34 | },
|
37 | 35 | {
|
|
370 | 368 | "- 因为直接使用现有的估计取更新估计,因此这种方法被称为**自举(bootstrap)**。\n",
|
371 | 369 | "- \n",
|
372 | 370 | "- **TD error**:$\\delta_t = R_{t+1}+\\gamma V(S_{t+1})-V(S_t)$\n",
|
373 |
| - "- \n", |
| 371 | + "- \n", |
374 | 372 | "\n",
|
375 | 373 | "### Sarsa\n",
|
376 | 374 | "- 一种on-policy的TD控制。\n",
|
|
420 | 418 | {
|
421 | 419 | "cell_type": "markdown",
|
422 | 420 | "metadata": {},
|
423 |
| - "source": [] |
| 421 | + "source": [ |
| 422 | + "## 10. On-policy Control with Approximation" |
| 423 | + ] |
424 | 424 | },
|
425 | 425 | {
|
426 | 426 | "cell_type": "markdown",
|
427 | 427 | "metadata": {},
|
428 |
| - "source": [] |
| 428 | + "source": [ |
| 429 | + "## 11. Off-policy Methods with Approximation" |
| 430 | + ] |
429 | 431 | },
|
430 | 432 | {
|
431 | 433 | "cell_type": "markdown",
|
432 | 434 | "metadata": {},
|
433 |
| - "source": [] |
| 435 | + "source": [ |
| 436 | + "## 12. Eligibility Traces" |
| 437 | + ] |
434 | 438 | },
|
435 | 439 | {
|
436 | 440 | "cell_type": "markdown",
|
|
442 | 446 | {
|
443 | 447 | "cell_type": "markdown",
|
444 | 448 | "metadata": {},
|
445 |
| - "source": [] |
| 449 | + "source": [ |
| 450 | + "## 14. Psychology" |
| 451 | + ] |
446 | 452 | },
|
447 | 453 | {
|
448 | 454 | "cell_type": "markdown",
|
449 | 455 | "metadata": {},
|
450 |
| - "source": [] |
| 456 | + "source": [ |
| 457 | + "## 15. Neuroscience" |
| 458 | + ] |
451 | 459 | },
|
452 | 460 | {
|
453 | 461 | "cell_type": "markdown",
|
454 | 462 | "metadata": {},
|
455 |
| - "source": [] |
| 463 | + "source": [ |
| 464 | + "## 16. Applications and Case Studies" |
| 465 | + ] |
456 | 466 | },
|
457 | 467 | {
|
458 | 468 | "cell_type": "markdown",
|
459 | 469 | "metadata": {},
|
460 |
| - "source": [] |
| 470 | + "source": [ |
| 471 | + "## 17. Frontiers" |
| 472 | + ] |
461 | 473 | },
|
462 | 474 | {
|
463 | 475 | "cell_type": "code",
|
|
0 commit comments