You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 24, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+86-21Lines changed: 86 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,72 @@ Changes
4
4
Unreleased:
5
5
6
6
7
-
========
7
+
===========
8
+
9
+
2.0.0, 2017-04-10
10
+
11
+
Breaking changes:
12
+
13
+
Any direct calls to method train() of Word2Vec/Doc2Vec now require an explicit epochs parameter and explicit estimate of corpus size. The most usual way to call `train` is `vec_model.train(sentences, total_examples=self.corpus_count, epochs=self.iter)`
14
+
See the [method documentation](https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py#L766) for more information.
15
+
16
+
17
+
* Explicit epochs and corpus size in word2vec train(). (@gojomo, @robotcator, [#1139](https://github.com/RaRe-Technologies/gensim/pull/1139), [#1237](https://github.com/RaRe-Technologies/gensim/pull/1237))
18
+
19
+
New features:
20
+
21
+
* Add output word prediction in word2vec. Only for negative sampling scheme. See [ipynb](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/word2vec.ipynb) (@chinmayapancholi13,[#1209](https://github.com/RaRe-Technologies/gensim/pull/1209))
22
+
* scikit_learn wrapper for LSI Model in Gensim (@chinmayapancholi13,[#1244](https://github.com/RaRe-Technologies/gensim/pull/1244))
23
+
* Add the 'keep_tokens' parameter to 'filter_extremes'. (@toliwa,[#1210](https://github.com/RaRe-Technologies/gensim/pull/1210))
24
+
* Load FastText models with specified encoding (@jayantj,[#1210](https://github.com/RaRe-Technologies/gensim/pull/1189))
25
+
26
+
27
+
Improvements:
28
+
* Fix loading large FastText models on Mac. (@jaksmid,[#1196](https://github.com/RaRe-Technologies/gensim/pull/1214))
29
+
* Sklearn LDA wrapper now works in sklearn pipeline (@kris-singh,[#1213](https://github.com/RaRe-Technologies/gensim/pull/1213))
* Word2vec error message when update called before train . Fix #1162 (@hemavakade,[#1205](https://github.com/RaRe-Technologies/gensim/pull/1205))
32
+
* Allow training if model is not modified by "_minimize_model". Add deprecation warning. (@chinmayapancholi13,[#1207](https://github.com/RaRe-Technologies/gensim/pull/1207))
33
+
* Update the warning text when building vocab on a trained w2v model (@prakhar2b,[#1190](https://github.com/RaRe-Technologies/gensim/pull/1190))
34
+
35
+
Bug fixes:
36
+
37
+
* Fix word2vec reset_from bug in v1.0.1 Fix #1230. (@Kreiswolke,[#1234](https://github.com/RaRe-Technologies/gensim/pull/1234))
38
+
39
+
* Distributed LDA: checking the length of docs instead of the boolean value, plus int index conversion (@saparina ,[#1191](https://github.com/RaRe-Technologies/gensim/pull/1191))
40
+
41
+
* syn0_lockf initialised with zero in intersect_word2vec_format() (@KiddoZhu,[#1267](https://github.com/RaRe-Technologies/gensim/pull/1267))
* Add docstrings for word2vec.py forwarding functions ( @shubhvachher, [#1251](https://github.com/RaRe-Technologies/gensim/pull/1251))
63
+
64
+
* updated description for worker_loop function used in score function ( @chinmayapancholi13 , [#1206](https://github.com/RaRe-Technologies/gensim/pull/1206))
65
+
66
+
1.0.1, 2017-03-03
67
+
68
+
* Rebuild cumulative table on load. Fix #1180. (@tmylk,[#1181](https://github.com/RaRe-Technologies/gensim/pull/893))
* Fix loading old word2vec models pre-1.0.0 (@jayantj, [#1179](https://github.com/RaRe-Technologies/gensim/pull/1179))
71
+
* Load utf-8 words in fasttext (@jayantj, [#1176](https://github.com/RaRe-Technologies/gensim/pull/1176))
72
+
8
73
9
74
1.0.0, 2017-02-24
10
75
@@ -28,35 +93,35 @@ Improvements:
28
93
* Phrases and Phraser allow a generator corpus (ELind77 [#1099](https://github.com/RaRe-Technologies/gensim/pull/1099))
29
94
* Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze,[#1053](https://github.com/RaRe-Technologies/gensim/pull/1053))
30
95
* Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel,[#1103](https://github.com/RaRe-Technologies/gensim/pull/1103)
31
-
* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101))
32
-
* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084))
96
+
* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101))
97
+
* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084))
33
98
* Deacc option to keywords pre-processing (@bhargavvader,[#1076](https://github.com/RaRe-Technologies/gensim/pull/1076))
34
-
* Generate Deprecated exception when using Word2Vec.load_word2vec_format (@tmylk, [#1165](https://github.com/RaRe-Technologies/gensim/pull/1165))
35
-
* Fix hdpmodel constructor docstring for print_topics (#1152) (@toliwa, [#1152](https://github.com/RaRe-Technologies/gensim/pull/1152))
36
-
* Default to per_word_topics=False in LDA get_item for performance (@menshikh-iv, [#1154](https://github.com/RaRe-Technologies/gensim/pull/1154))
99
+
* Generate Deprecated exception when using Word2Vec.load_word2vec_format (@tmylk, [#1165](https://github.com/RaRe-Technologies/gensim/pull/1165))
100
+
* Fix hdpmodel constructor docstring for print_topics (#1152) (@toliwa, [#1152](https://github.com/RaRe-Technologies/gensim/pull/1152))
101
+
* Default to per_word_topics=False in LDA get_item for performance (@menshikh-iv, [#1154](https://github.com/RaRe-Technologies/gensim/pull/1154))
37
102
* Fix bound computation in Author Topic models. (@olavurmortensen, [#1156](https://github.com/RaRe-Technologies/gensim/pull/1156))
38
103
* Write UTF-8 byte strings in tensorboard conversion (@tmylk,[#1144](https://github.com/RaRe-Technologies/gensim/pull/1144))
39
104
* Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk,[#1146](https://github.com/RaRe-Technologies/gensim/pull/1146))
40
105
41
106
Tutorial and doc improvements:
42
107
43
-
* Clarifying comment in is_corpus func in utils.py (@greninja,[#1109](https://github.com/RaRe-Technologies/gensim/pull/1109))
108
+
* Clarifying comment in is_corpus func in utils.py (@greninja,[#1109](https://github.com/RaRe-Technologies/gensim/pull/1109))
44
109
* Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda,[#1120](https://github.com/RaRe-Technologies/gensim/pull/1120))
45
-
* Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,[#1119](https://github.com/RaRe-Technologies/gensim/pull/1119))
110
+
* Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,[#1119](https://github.com/RaRe-Technologies/gensim/pull/1119))
46
111
* Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti,[#1118](https://github.com/RaRe-Technologies/gensim/pull/1118))
47
112
* Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda,[#1116](https://github.com/RaRe-Technologies/gensim/pull/1116))
48
113
* Update Transformation and Topics link from quick start notebook (@mariana393,[#1115](https://github.com/RaRe-Technologies/gensim/pull/1115))
49
114
* Quick Start Text clarification and typo correction (@luizcavalcanti,[#1114](https://github.com/RaRe-Technologies/gensim/pull/1114))
50
115
* Fix typos in Author-topic tutorial (@Fil,[#1102](https://github.com/RaRe-Technologies/gensim/pull/1102))
51
116
* Address benchmark inconsistencies in Annoy tutorial (@droudy,[#1113](https://github.com/RaRe-Technologies/gensim/pull/1113))
52
-
* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137))
53
-
* Fix dependencies description on doc2vec-IMDB notebook (@luizcavalcanti, [#1132](https://github.com/RaRe-Technologies/gensim/pull/1132))
54
-
* Add documentation for WikiCorpus metadata. (@kirit93, [#1163](https://github.com/RaRe-Technologies/gensim/pull/1163))
117
+
* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137))
118
+
* Fix dependencies description on doc2vec-IMDB notebook (@luizcavalcanti, [#1132](https://github.com/RaRe-Technologies/gensim/pull/1132))
119
+
* Add documentation for WikiCorpus metadata. (@kirit93, [#1163](https://github.com/RaRe-Technologies/gensim/pull/1163))
120
+
55
121
56
-
57
122
1.0.0RC2, 2017-02-16
58
123
59
-
* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137))
124
+
* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137))
60
125
* Remove direct access to properties moved to KeyedVectors (@tmylk,[#1147](https://github.com/RaRe-Technologies/gensim/pull/1147))
61
126
* Remove support for Python 2.6, 3.3 and 3.4 (@tmylk,[#1145](https://github.com/RaRe-Technologies/gensim/pull/1145))
62
127
* Write UTF-8 byte strings in tensorboard conversion (@tmylk,[#1144](https://github.com/RaRe-Technologies/gensim/pull/1144))
@@ -76,15 +141,15 @@ Improvements:
76
141
* Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze,[#1053](https://github.com/RaRe-Technologies/gensim/pull/1053))
77
142
* Move load and save word2vec_format out of word2vec class to KeyedVectors (@tmylk,[#1107](https://github.com/RaRe-Technologies/gensim/pull/1107))
78
143
* Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel,[#1103](https://github.com/RaRe-Technologies/gensim/pull/1103)
79
-
* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101))
80
-
* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084))
144
+
* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101))
145
+
* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084))
81
146
* Deacc option to keywords pre-processing (@bhargavvader,[#1076](https://github.com/RaRe-Technologies/gensim/pull/1076))
82
147
83
148
Tutorial and doc improvements:
84
149
85
-
* Clarifying comment in is_corpus func in utils.py (@greninja,[#1109](https://github.com/RaRe-Technologies/gensim/pull/1109))
150
+
* Clarifying comment in is_corpus func in utils.py (@greninja,[#1109](https://github.com/RaRe-Technologies/gensim/pull/1109))
86
151
* Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda,[#1120](https://github.com/RaRe-Technologies/gensim/pull/1120))
87
-
* Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,[#1119](https://github.com/RaRe-Technologies/gensim/pull/1119))
152
+
* Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,[#1119](https://github.com/RaRe-Technologies/gensim/pull/1119))
88
153
* Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti,[#1118](https://github.com/RaRe-Technologies/gensim/pull/1118))
89
154
* Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda,[#1116](https://github.com/RaRe-Technologies/gensim/pull/1116))
90
155
* Update Transformation and Topics link from quick start notebook (@mariana393,[#1115](https://github.com/RaRe-Technologies/gensim/pull/1115))
@@ -96,9 +161,9 @@ Tutorial and doc improvements:
96
161
0.13.4.1, 2017-01-04
97
162
98
163
* Disable direct access warnings on save and load of Word2vec/Doc2vec (@tmylk, [#1072](https://github.com/RaRe-Technologies/gensim/pull/1072))
99
-
* Making Default hs error explicit (@accraze, [#1054](https://github.com/RaRe-Technologies/gensim/pull/1054))
164
+
* Making Default hs error explicit (@accraze, [#1054](https://github.com/RaRe-Technologies/gensim/pull/1054))
* Utils and Matutils changes (@bhargavvader, [#1062](https://github.com/RaRe-Technologies/gensim/pull/1062))
166
+
* Utils and Matutils changes (@bhargavvader, [#1062](https://github.com/RaRe-Technologies/gensim/pull/1062))
102
167
* Tests for the evaluate_word_pairs function (@akutuzov, [#1061](https://github.com/RaRe-Technologies/gensim/pull/1061))
103
168
104
169
0.13.4, 2016-12-22
@@ -120,8 +185,8 @@ Tutorial and doc improvements:
120
185
* Remove warning on gensim import "pattern not installed". Fix #1009 (@shashankg7, [#1018](https://github.com/RaRe-Technologies/gensim/pull/1018))
121
186
* Add delete_temporary_training_data() function to word2vec and doc2vec models. (@deepmipt-VladZhukov, [#987](https://github.com/RaRe-Technologies/gensim/pull/987))
Copy file name to clipboardExpand all lines: docs/notebooks/WordRank_wrapper_quickstart.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@
13
13
"\n",
14
14
"# Train model\n",
15
15
"\n",
16
-
"We'll use [Lee corpus](https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/test/test_data/lee_background.cor) for training which is already available in gensim. Now for Wordrank, two parameters `dump_period` and `iter` needs to be in sync as it dumps the embedding file with the start of next iteration. For example, if you want results after 10 iterations, you need to use `iter=11` and `dump_period` can be anything that gives mod 0 with resulting iteration, in this case 2 or 5.\n"
16
+
"We'll use [Lee corpus](https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/test/test_data/lee.cor) for training which is already available in gensim. Now for Wordrank, two parameters `dump_period` and `iter` needs to be in sync as it dumps the embedding file with the start of next iteration. For example, if you want results after 10 iterations, you need to use `iter=11` and `dump_period` can be anything that gives mod 0 with resulting iteration, in this case 2 or 5.\n"
0 commit comments