Make minor grammatical improvements (#864)

Summary: I was following the tutorial on "Word representation" and thought the grammar could use a bit of polishing. Pull Request resolved: #864 Reviewed By: EdouardGrave Differential Revision: D17683908 Pulled By: Celebio fbshipit-source-id: cc891079a3e089b1730b5c770525bd850412923c
facebookresearch · Mar 25, 2020 · 2d453cd · 2d453cd
1 parent 022c1a7
commit 2d453cd
Showing 1 changed file with 13 additions and 9 deletions.
diff --git a/docs/unsupervised-tutorials.md b/docs/unsupervised-tutorials.md
@@ -22,7 +22,7 @@ $ wget -c http://mattmahoney.net/dc/enwik9.zip -P data
 $ unzip data/enwik9.zip -d data
 ```
 
-A raw Wikipedia dump contains a lot of HTML / XML data. We pre-process it with the wikifil.pl script bundled with fastText (this script was originally developed by Matt Mahoney, and can be found on his [website](http://mattmahoney.net/) )
+A raw Wikipedia dump contains a lot of HTML / XML data. We pre-process it with the wikifil.pl script bundled with fastText (this script was originally developed by Matt Mahoney, and can be found on his [website](http://mattmahoney.net/)).
 
 ```bash
 $ perl wikifil.pl data/enwik9 > data/fil9
@@ -147,7 +147,7 @@ $ ./fasttext skipgram -input data/fil9 -output result/fil9 -minn 2 -maxn 5 -dim
 ```
 <!--END_DOCUSAURUS_CODE_TABS-->
 
-Depending on the quantity of data you have, you may want to change the parameters of the training.  The *epoch* parameter controls how many time will loop over your data. By default, we loop over the dataset 5 times.  If you dataset is extremely massive, you may want to loop over it less often. Another important parameter is the learning rate -*lr*). The higher the learning rate is, the faster the model converge to a solution but at the risk of overfitting to the dataset. The default value is 0.05 which is a good compromise. If you want to play with it we suggest to stay in the range of [0.01, 1]:
+Depending on the quantity of data you have, you may want to change the parameters of the training.  The *epoch* parameter controls how many times the model will loop over your data. By default, we loop over the dataset 5 times.  If you dataset is extremely massive, you may want to loop over it less often. Another important parameter is the learning rate -*lr*. The higher the learning rate is, the faster the model converge to a solution but at the risk of overfitting to the dataset. The default value is 0.05 which is a good compromise. If you want to play with it we suggest to stay in the range of [0.01, 1]:
 
 <!--DOCUSAURUS_CODE_TABS-->
 <!--Command line-->
@@ -180,7 +180,7 @@ $ ./fasttext skipgram -input data/fil9 -output result/fil9 -thread 4
 
 Searching and printing word vectors directly from  the `fil9.vec`  file  is cumbersome. Fortunately, there is a `print-word-vectors` functionality in fastText.
 
-For examples, we can print the word vectors of words *asparagus,* *pidgey* and *yellow* with the following command:
+For example, we can print the word vectors of words *asparagus,* *pidgey* and *yellow* with the following command:
 <!--DOCUSAURUS_CODE_TABS-->
 <!--Command line-->
 ```bash
@@ -226,7 +226,7 @@ $ echo "enviroment" | ./fasttext print-word-vectors result/fil9.bin
 ```
 <!--END_DOCUSAURUS_CODE_TABS-->
 
-You still get a word vector for it! But how good it is? Let s find out in the next sections!
+You still get a word vector for it! But how good it is? Let's find out in the next sections!
 
 
 ## Nearest neighbor queries
@@ -322,7 +322,11 @@ In order to find nearest neighbors, we need to compute a similarity score betwee
 
 ## Word analogies
 
-In a similar spirit, one can play around with word analogies. For example, we can see if our model can guess what is to France, what Berlin is to Germany.
+In a similar spirit, one can play around with word analogies. For example, we can see if our model can guess what is to France, and what Berlin is to Germany.
+
+
+
+
 
 This can be done with the *analogies* functionality. It takes a word triplet (like *Germany Berlin France*) and outputs  the analogy:
 
@@ -350,7 +354,7 @@ pigneaux 0.736122
 ```
 <!--END_DOCUSAURUS_CODE_TABS-->
 
-The answer provides by our model is *Paris*, which is correct. Let's have a look at a less obvious example:
+The answer provided by our model is *Paris*, which is correct. Let's have a look at a less obvious example:
 
 <!--DOCUSAURUS_CODE_TABS-->
 <!--Command line-->
@@ -408,7 +412,7 @@ gearboxes 0.73986
 
 Most of the retrieved words share substantial substrings but a few are actually quite different, like *cogwheel*. You can try other words like *sunbathe* or *grandnieces*.
 
-Now that we have seen the interest of subword information for unknown words, let s check how it compares to a model that do not use subword information. To train a model without subwords, just run the following command:
+Now that we have seen the interest of subword information for unknown words, let's check how it compares to a model that does not use subword information. To train a model without subwords, just run the following command:
 
 <!--DOCUSAURUS_CODE_TABS-->
 <!--Command line-->
@@ -423,7 +427,7 @@ The results are saved in result/fil9-non.vec and result/fil9-non.bin.
 <!--END_DOCUSAURUS_CODE_TABS-->
 
 
-To illustrate the difference, let us take an uncommon word in Wikipedia, like *accomodation* which is a misspelling of *accommodation*. Here is the nearest neighbors obtained without subwords:
+To illustrate the difference, let us take an uncommon word in Wikipedia, like *accomodation* which is a misspelling of *accommodation**.* Here is the nearest neighbors obtained without subwords:
 
 <!--DOCUSAURUS_CODE_TABS-->
 <!--Command line-->
@@ -476,4 +480,4 @@ The nearest neighbors capture different variation around the word *accommodation
 
 ## Conclusion
 
-In this tutorial, we show how to obtain word vectors from Wikipedia. This can be done for any language and you we provide [pre-trained models](https://fasttext.cc/docs/en/pretrained-vectors.html) with the default setting for 294 of them.
+In this tutorial, we show how to obtain word vectors from Wikipedia. This can be done for any language and we provide [pre-trained models](https://fasttext.cc/docs/en/pretrained-vectors.html) with the default setting for 294 of them.