Skip to content

Commit

Permalink
Update chapter4_questions&keywords.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yyysjz1997 authored May 24, 2021
1 parent 14a00af commit beafd08
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions docs/chapter4/chapter4_questions&keywords.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,9 @@
$$
带入第三个式子,可以将其化简为:
$$\begin{aligned}
\nabla_{\theta}J(\theta) &=& E_{\tau \sim p_{\theta}(\tau)}[{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)] \\
&=& E_{\tau \sim p_{\theta}}[(\nabla_{\theta}log\pi_{\theta}(a_t|s_t))(\sum_{t=1}^Tr(s_t,a_t))] \\
&=& \frac{1}{N}\sum_{i=1}^N[(\sum_{t=1}^T\nabla_{\theta}log \pi_{\theta}(a_{i,t}|s_{i,t}))(\sum_{t=1}^Nr(s_{i,t},a_{i,t}))]
\nabla_{\theta}J(\theta) &= E_{\tau \sim p_{\theta}(\tau)}[{\nabla}_{\theta}logp_{\theta}(\tau)r(\tau)] \\
&= E_{\tau \sim p_{\theta}}[(\nabla_{\theta}log\pi_{\theta}(a_t|s_t))(\sum_{t=1}^Tr(s_t,a_t))] \\
&= \frac{1}{N}\sum_{i=1}^N[(\sum_{t=1}^T\nabla_{\theta}log \pi_{\theta}(a_{i,t}|s_{i,t}))(\sum_{t=1}^Nr(s_{i,t},a_{i,t}))]
\end{aligned}$$

- 高冷的面试官:可以说一下你了解到的基于梯度策略的优化时的小技巧吗?
Expand Down

0 comments on commit beafd08

Please sign in to comment.