Skip to content

Commit 38350a5

Browse files
Celebiofacebook-github-bot
authored andcommitted
supervised tutorial and autotune documentation with python tabs
Summary: Docusaurus now allows multiple language tabs. This commit adds a python tab for supervised and autotune examples. It also includes a snippet that activates the same language tab for the whole page. Reviewed By: EdouardGrave Differential Revision: D17091834 fbshipit-source-id: 6e6f76aa9408baa08fcd6c0bfd011de2cb477dfb
1 parent 4aca28c commit 38350a5

File tree

4 files changed

+365
-10
lines changed

4 files changed

+365
-10
lines changed

docs/autotune.md

Lines changed: 65 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,28 +13,55 @@ In order to activate hyperparameter optimization, we must provide a validation f
1313

1414
For example, using the same data as our [tutorial example](/docs/en/supervised-tutorial.html#our-first-classifier), the autotune can be used in the following way:
1515

16+
<!--DOCUSAURUS_CODE_TABS-->
17+
<!--Command line-->
1618
```sh
1719
>> ./fasttext supervised -input cooking.train -output model_cooking -autotune-validation cooking.valid
1820
```
21+
<!--Python-->
22+
```py
23+
>>> import fasttext
24+
>>> model = fasttext.train_supervised(input='cooking.train', autotuneValidationFile='cooking.valid')
25+
```
26+
<!--END_DOCUSAURUS_CODE_TABS-->
27+
1928

2029
Then, fastText will search the hyperparameters that gives the best f1-score on `cooking.valid` file:
2130
```sh
2231
Progress: 100.0% Trials: 27 Best score: 0.406763 ETA: 0h 0m 0s
2332
```
2433

2534
Now we can test the obtained model with:
35+
<!--DOCUSAURUS_CODE_TABS-->
36+
<!--Command line-->
2637
```sh
27-
>> ./fasttext test model_cooking.bin data/cooking.valid
38+
>> ./fasttext test model_cooking.bin cooking.valid
2839
N 3000
2940
P@1 0.666
3041
R@1 0.288
3142
```
43+
<!--Python-->
44+
```py
45+
>>> model.test("cooking.valid")
46+
(3000L, 0.666, 0.288)
47+
```
48+
<!--END_DOCUSAURUS_CODE_TABS-->
49+
3250

3351
By default, the search will take 5 minutes. You can set the timeout in seconds with the `-autotune-duration` argument. For example, if you want to set the limit to 10 minutes:
3452

53+
<!--DOCUSAURUS_CODE_TABS-->
54+
<!--Command line-->
3555
```sh
3656
>> ./fasttext supervised -input cooking.train -output model_cooking -autotune-validation cooking.valid -autotune-duration 600
3757
```
58+
<!--Python-->
59+
```py
60+
>>> import fasttext
61+
>>> model = fasttext.train_supervised(input='cooking.train', autotuneValidationFile='cooking.valid', autotuneDuration=600)
62+
```
63+
<!--END_DOCUSAURUS_CODE_TABS-->
64+
3865

3966
While autotuning, fastText displays the best f1-score found so far. If we decide to stop the tuning before the time limit, we can send one `SIGINT` signal (via `CTLR-C` for example). FastText will then finish the current training, and retrain with the best parameters found so far.
4067

@@ -46,23 +73,42 @@ As you may know, fastText can compress the model with [quantization](/docs/en/ch
4673

4774
Fortunately, autotune can also find the hyperparameters for this compression task while targeting the desired model size. To this end, we can set the `-autotune-modelsize` argument:
4875

76+
<!--DOCUSAURUS_CODE_TABS-->
77+
<!--Command line-->
4978
```sh
5079
>> ./fasttext supervised -input cooking.train -output model_cooking -autotune-validation cooking.valid -autotune-modelsize 2M
5180
```
52-
5381
This will produce a `.ftz` file with the best accuracy having the desired size:
5482
```sh
5583
>> ls -la model_cooking.ftz
5684
-rw-r--r--. 1 celebio users 1990862 Aug 25 05:39 model_cooking.ftz
57-
>> ./fasttext test model_cooking.ftz data/cooking.valid
85+
>> ./fasttext test model_cooking.ftz cooking.valid
5886
N 3000
5987
P@1 0.57
6088
R@1 0.246
6189
```
90+
<!--Python-->
91+
```py
92+
>>> import fasttext
93+
>>> model = fasttext.train_supervised(input='cooking.train', autotuneValidationFile='cooking.valid', autotuneModelSize="2M")
94+
```
95+
If you save the model, you will obtain a model file with the desired size:
96+
```py
97+
>>> model.save_model("model_cooking.ftz")
98+
>>> import os
99+
>>> os.stat("model_cooking.ftz").st_size
100+
1990862
101+
>>> model.test("cooking.valid")
102+
(3000L, 0.57, 0.246)
103+
```
104+
<!--END_DOCUSAURUS_CODE_TABS-->
62105

63106

64107
# How to set the optimization metric?
65108

109+
<!--DOCUSAURUS_CODE_TABS-->
110+
<!--Command line-->
111+
<br />
66112
By default, autotune will test the validation file you provide, exactly the same way as `./fasttext test model_cooking.bin cooking.valid` and try to optimize to get the highest [f1-score](https://en.wikipedia.org/wiki/F1_score).
67113

68114
But, if we want to optimize the score of a specific label, say `__label__baking`, we can set the `-autotune-metric` argument:
@@ -74,3 +120,19 @@ But, if we want to optimize the score of a specific label, say `__label__baking`
74120
This is equivalent to manually optimize the f1-score we get when we test with `./fasttext test-label model_cooking.bin cooking.valid | grep __label__baking` in command line.
75121

76122
Sometimes, you may be interested in predicting more than one label. For example, if you were optimizing the hyperparameters manually to get the best score to predict two labels, you would test with `./fasttext test model_cooking.bin cooking.valid 2`. You can also tell autotune to optimize the parameters by testing two labels with the `-autotune-predictions` argument.
123+
<!--Python-->
124+
<br />
125+
By default, autotune will test the validation file you provide, exactly the same way as `model.test("cooking.valid")` and try to optimize to get the highest [f1-score](https://en.wikipedia.org/wiki/F1_score).
126+
127+
But, if we want to optimize the score of a specific label, say `__label__baking`, we can set the `autotuneMetric` argument:
128+
129+
```py
130+
>>> import fasttext
131+
>>> model = fasttext.train_supervised(input='cooking.train', autotuneValidationFile='cooking.valid', autotuneMetric="f1:__label__baking")
132+
```
133+
134+
This is equivalent to manually optimize the f1-score we get when we test with `model.test_label('cooking.valid')['__label__baking']`.
135+
136+
Sometimes, you may be interested in predicting more than one label. For example, if you were optimizing the hyperparameters manually to get the best score to predict two labels, you would test with `model.test("cooking.valid", k=2)`. You can also tell autotune to optimize the parameters by testing two labels with the `autotunePredictions` argument.
137+
<!--END_DOCUSAURUS_CODE_TABS-->
138+

0 commit comments

Comments
 (0)