You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that the first equation of system {eq}`eq:impliedq` presumes that after the agent has accepted an offer, he will not have the objection to reject that same offer in the future.
278
+
Note that the first equation of system {eq}`eq:impliedq` presumes that after the agent has accepted an offer, he will not have the option to reject that same offer in the future.
279
279
280
280
These equations are aligned with the Bellman equation for the worker's optimal value function that we studied in {doc}`this quantecon lecture <mccall_model>`.
281
281
@@ -326,7 +326,7 @@ $$ (eq:probtosample1)
326
326
327
327
Notice the integral over $F(w')$ on the second line.
328
328
329
-
Erasing the integral sign sets the stage for an illegitmate argument that can get us started thinking about Q-learning.
329
+
Erasing the integral sign sets the stage for an illegitimate argument that can get us started thinking about Q-learning.
330
330
331
331
Thus, construct a difference equation system that keeps the first equation of {eq}`eq:probtosample1`
332
332
but replaces the second by removing integration over $F (w')$:
@@ -456,7 +456,7 @@ pseudo-code for our McCall worker to do Q-learning:
456
456
457
457
4. Update the state associated with the chosen action and compute $\widetilde{TD}$ according to {eq}`eq:old4` and update $\widetilde{Q}$ according to {eq}`eq:old3`.
458
458
459
-
5. Either draw a new state $w'$ if required or else take existing wage if and update the Q-table again according to {eq}`eq:old3`.
459
+
5. Either draw a new state $w'$ if required or else take the existing wage and update the Q-table again according to {eq}`eq:old3`.
460
460
461
461
6. Stop when the old and new Q-tables are close enough, i.e., $\lVert\tilde{Q}^{new}-\tilde{Q}^{old}\rVert_{\infty}\leq\delta$ for given $\delta$ or if the worker keeps accepting for $T$ periods for a prescribed $T$.
462
462
@@ -474,7 +474,7 @@ The Q-table is updated via temporal difference learning.
474
474
475
475
We iterate this until convergence of the Q-table or the maximum length of an episode is reached.
476
476
477
-
Multiple episodes allow the agent to start afresh and visit states that she was less likely to visit from the terminal state of a previos episode.
477
+
Multiple episodes allow the agent to start afresh and visit states that she was less likely to visit from the terminal state of a previous episode.
478
478
479
479
For example, an agent who has accepted a wage offer based on her Q-table will be less likely to draw a new offer from other parts of the wage distribution.
480
480
@@ -488,7 +488,7 @@ For simplicity and convenience, we let `s` represent the state index between $0$
488
488
489
489
The first column of the Q-table represents the value associated with rejecting the wage and the second represents accepting the wage.
490
490
491
-
We use `numba` compilation to accelerate computations.
491
+
We use JAX compilation to accelerate computations.
492
492
493
493
```{code-cell} ipython3
494
494
class QlearningMcCall(NamedTuple):
@@ -746,7 +746,7 @@ This is an option that the McCall worker described in {doc}`this quantecon lectu
746
746
See {cite}`Ljungqvist2012`, chapter 6 on search, for a proof.
747
747
748
748
But in the context of Q-learning, giving the worker the option to quit and get unemployment compensation while
749
-
unemployed turns out to accelerate the learning process by promoting experimentation vis a vis premature
749
+
unemployed turns out to accelerate the learning process by promoting experimentation versus premature
750
750
exploitation only.
751
751
752
752
To illustrate this, we'll amend our formulas for temporal differences to forbid an employed worker from quitting a job she had accepted earlier.
@@ -762,7 +762,7 @@ $$ (eq:temp-diff)
762
762
763
763
It turns out that formulas {eq}`eq:temp-diff` combined with our Q-learning recursion {eq}`eq:old3` can lead our agent to eventually learn the optimal value function as well as in the case where an option to redraw can be exercised.
764
764
765
-
But learning is slower because an agent who ends up accepting a wage offer prematurally loses the option to explore new states in the same episode and to adjust the value associated with that state.
765
+
But learning is slower because an agent who ends up accepting a wage offer prematurely loses the option to explore new states in the same episode and to adjust the value associated with that state.
766
766
767
767
This can lead to inferior outcomes when the number of epochs/episodes is low.
0 commit comments