We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 7325933 commit e86b9b6Copy full SHA for e86b9b6
notes/intro_note_07.md
@@ -41,15 +41,15 @@
41
- 使用所有**叶子节点**的动作价值函数去更新动作价值函数。
42
- 回报:$G_{t:t+n} = R_{t+1} + \gamma \sum_{a\neq A_{t+1}}\pi(a|S_{t+1})Q_{t+n-1}(S_{t+1}, a) + \gamma\pi(A_{t+1}|S_{t+1})G_{t+1:t+n}$
43
44
-
+
45
46
-
+
47
48
## n-step $Q(\sigma)$
49
50
- $\sigma$代表是否使用全采样。
51
- 回报:$G_{t:h} = R_{t+1} + \gamma(\sigma_{t+1}\rho_{t+1}+(1-\sigma_{t+1})\pi(A_{t+1}|S_{t+1}))(G_{t+1:h}-Q_{h-1}(S_{t+1}, A_{t+1})) + \gamma \bar V_{h-1}(S_{t+1})$
52
53
-
+
54
55
-
+
0 commit comments