Skip to content

Commit 211832b

Browse files
committed
rectify grammatical errors in the lecture
1 parent c575e39 commit 211832b

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

lectures/mccall_q.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ The Q-learning algorithm combines ideas from
2525

2626
* a recursive version of least squares known as [temporal difference learning](https://en.wikipedia.org/wiki/Temporal_difference_learning).
2727

28-
This lecture applies a Q-learning algorithm to the situation faced by a McCall worker.
28+
This lecture applies a Q-learning algorithm to the situation faced by a McCall worker.
2929

3030
This lecture also considers the case where a McCall worker is given an option to quit the current job.
3131

@@ -242,7 +242,7 @@ print(valfunc_VFI)
242242
## Implied quality function $Q$
243243
244244
245-
A **quality function** $Q$ map state-action pairs into optimal values.
245+
A **quality function** $Q$ maps state-action pairs into optimal values.
246246
247247
They are tightly linked to optimal value functions.
248248
@@ -275,7 +275,7 @@ Q\left(w,\text{reject}\right) & =c+\beta\int\max_{\text{accept, reject}}\left\{
275275
$$ (eq:impliedq)
276276
277277
278-
Note that the first equation of system {eq}`eq:impliedq` presumes that after the agent has accepted an offer, he will not have the objection to reject that same offer in the future.
278+
Note that the first equation of system {eq}`eq:impliedq` presumes that after the agent has accepted an offer, he will not have the option to reject that same offer in the future.
279279
280280
These equations are aligned with the Bellman equation for the worker's optimal value function that we studied in {doc}`this quantecon lecture <mccall_model>`.
281281
@@ -326,7 +326,7 @@ $$ (eq:probtosample1)
326326
327327
Notice the integral over $F(w')$ on the second line.
328328
329-
Erasing the integral sign sets the stage for an illegitmate argument that can get us started thinking about Q-learning.
329+
Erasing the integral sign sets the stage for an illegitimate argument that can get us started thinking about Q-learning.
330330
331331
Thus, construct a difference equation system that keeps the first equation of {eq}`eq:probtosample1`
332332
but replaces the second by removing integration over $F (w')$:
@@ -456,7 +456,7 @@ pseudo-code for our McCall worker to do Q-learning:
456456
457457
4. Update the state associated with the chosen action and compute $\widetilde{TD}$ according to {eq}`eq:old4` and update $\widetilde{Q}$ according to {eq}`eq:old3`.
458458
459-
5. Either draw a new state $w'$ if required or else take existing wage if and update the Q-table again according to {eq}`eq:old3`.
459+
5. Either draw a new state $w'$ if required or else take the existing wage and update the Q-table again according to {eq}`eq:old3`.
460460
461461
6. Stop when the old and new Q-tables are close enough, i.e., $\lVert\tilde{Q}^{new}-\tilde{Q}^{old}\rVert_{\infty}\leq\delta$ for given $\delta$ or if the worker keeps accepting for $T$ periods for a prescribed $T$.
462462
@@ -474,7 +474,7 @@ The Q-table is updated via temporal difference learning.
474474
475475
We iterate this until convergence of the Q-table or the maximum length of an episode is reached.
476476
477-
Multiple episodes allow the agent to start afresh and visit states that she was less likely to visit from the terminal state of a previos episode.
477+
Multiple episodes allow the agent to start afresh and visit states that she was less likely to visit from the terminal state of a previous episode.
478478
479479
For example, an agent who has accepted a wage offer based on her Q-table will be less likely to draw a new offer from other parts of the wage distribution.
480480
@@ -488,7 +488,7 @@ For simplicity and convenience, we let `s` represent the state index between $0$
488488
489489
The first column of the Q-table represents the value associated with rejecting the wage and the second represents accepting the wage.
490490
491-
We use `numba` compilation to accelerate computations.
491+
We use JAX compilation to accelerate computations.
492492
493493
```{code-cell} ipython3
494494
class QlearningMcCall(NamedTuple):
@@ -746,7 +746,7 @@ This is an option that the McCall worker described in {doc}`this quantecon lectu
746746
See {cite}`Ljungqvist2012`, chapter 6 on search, for a proof.
747747
748748
But in the context of Q-learning, giving the worker the option to quit and get unemployment compensation while
749-
unemployed turns out to accelerate the learning process by promoting experimentation vis a vis premature
749+
unemployed turns out to accelerate the learning process by promoting experimentation versus premature
750750
exploitation only.
751751
752752
To illustrate this, we'll amend our formulas for temporal differences to forbid an employed worker from quitting a job she had accepted earlier.
@@ -762,7 +762,7 @@ $$ (eq:temp-diff)
762762
763763
It turns out that formulas {eq}`eq:temp-diff` combined with our Q-learning recursion {eq}`eq:old3` can lead our agent to eventually learn the optimal value function as well as in the case where an option to redraw can be exercised.
764764
765-
But learning is slower because an agent who ends up accepting a wage offer prematurally loses the option to explore new states in the same episode and to adjust the value associated with that state.
765+
But learning is slower because an agent who ends up accepting a wage offer prematurely loses the option to explore new states in the same episode and to adjust the value associated with that state.
766766
767767
This can lead to inferior outcomes when the number of epochs/episodes is low.
768768
@@ -777,7 +777,7 @@ plot_epochs(epochs_to_plot=[100, 1000, 10000, 100000, 200000], quit_allowed=0)
777777
778778
## Possible extensions
779779
780-
To extend the algorthm to handle problems with continuous state spaces,
780+
To extend the algorithm to handle problems with continuous state spaces,
781781
a typical approach is to restrict Q-functions and policy functions to take particular
782782
functional forms.
783783

0 commit comments

Comments
 (0)