diff --git a/lessons/5-NLP/20-LangModels/README.md b/lessons/5-NLP/20-LangModels/README.md
index 36627216..5acb7959 100644
--- a/lessons/5-NLP/20-LangModels/README.md
+++ b/lessons/5-NLP/20-LangModels/README.md
@@ -10,7 +10,7 @@ The idea of a neural network being able to do general tasks without downstream t
 
 > Understanding and being able to produce text also entails knowing something about the world around us. People  also learn by reading to the large extent, and GPT network is similar in this respect.
 
-Text generation networks wor;k by predicting probability of the next word $$P(w_N)$$ However, unconditional probability of the next word equals to the frequency of the this word in the text corpus. GPT is able to give us **conditional probability** of the next word, given the previous ones: $$P(w_N | w_{n-1}, ..., w_0)$$
+Text generation networks work by predicting probability of the next word $$P(w_N)$$ However, unconditional probability of the next word equals to the frequency of the this word in the text corpus. GPT is able to give us **conditional probability** of the next word, given the previous ones: $$P(w_N | w_{n-1}, ..., w_0)$$
 
 > You can read more about probabilities in our [Data Science for Beginers Curriculum](https://github.com/microsoft/Data-Science-For-Beginners/tree/main/1-Introduction/04-stats-and-probability)