diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
index 8e3ad48871..41a608ef90 100644
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -39,7 +39,8 @@ jobs:
#
- name: Update sbt
run: |
- echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
+ echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
+ echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update -y
sudo apt-get install -y sbt
diff --git a/CHANGELOG.md b/CHANGELOG.md
index cc8c9cdbe1..c042d6490f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,126 @@
Changes
=======
+## Unreleased
+
+## 4.1.0, 2021-08-15
+
+Gensim 4.1 brings two major new functionalities:
+
+* [Ensemble LDA](https://radimrehurek.com/gensim/auto_examples/tutorials/run_ensemblelda.html) for robust training, selection and comparison of LDA models.
+* [FastSS module](https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/similarities/fastss.pyx) for super fast Levenshtein "fuzzy search" queries. Used e.g. for ["soft term similarity"](https://github.com/RaRe-Technologies/gensim/pull/3146) calculations.
+
+There are several minor changes that are **not** backwards compatible with previous versions of Gensim.
+The affected functionality is relatively less used, so it is unlikely to affect most users, so we have opted to not require a major version bump.
+Nevertheless, we describe them below.
+
+### Improved parameter edge-case handling in KeyedVectors most_similar and most_similar_cosmul methods
+
+We now handle both ``positive`` and ``negative`` keyword parameters consistently.
+They may now be either:
+
+1. A string, in which case the value is reinterpreted as a list of one element (the string value)
+2. A vector, in which case the value is reinterpreted as a list of one element (the vector)
+3. A list of strings
+4. A list of vectors
+
+So you can now simply do:
+
+```python
+ model.most_similar(positive='war', negative='peace')
+```
+
+instead of the slightly more involved
+
+```python
+model.most_similar(positive=['war'], negative=['peace'])
+```
+
+Both invocations remain correct, so you can use whichever is most convenient.
+If you were somehow expecting gensim to interpret the strings as a list of characters, e.g.
+
+```python
+model.most_similar(positive=['w', 'a', 'r'], negative=['p', 'e', 'a', 'c', 'e'])
+```
+
+then you will need to specify the lists explicitly in gensim 4.1.
+### Deprecated obsolete `step` parameter from doc2vec
+
+With the newer version, do this:
+
+```python
+model.infer_vector(..., epochs=123)
+```
+
+instead of this:
+
+```python
+model.infer_vector(..., steps=123)
+```
+
+Plus a large number of smaller improvements and fixes, as usual.
+
+**⚠️ If migrating from old Gensim 3.x, read the [Migration guide](https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4) first.**
+
+### :+1: New features
+
+* [#3169](https://github.com/RaRe-Technologies/gensim/pull/3169): Implement `shrink_windows` argument for Word2Vec, by [@M-Demay](https://github.com/M-Demay)
+* [#3163](https://github.com/RaRe-Technologies/gensim/pull/3163): Optimize word mover distance (WMD) computation, by [@flowlight0](https://github.com/flowlight0)
+* [#3157](https://github.com/RaRe-Technologies/gensim/pull/3157): New KeyedVectors.vectors_for_all method for vectorizing all words in a dictionary, by [@Witiko](https://github.com/Witiko)
+* [#3153](https://github.com/RaRe-Technologies/gensim/pull/3153): Vectorize word2vec.predict_output_word for speed, by [@M-Demay](https://github.com/M-Demay)
+* [#3146](https://github.com/RaRe-Technologies/gensim/pull/3146): Use FastSS for fast kNN over Levenshtein distance, by [@Witiko](https://github.com/Witiko)
+* [#3128](https://github.com/RaRe-Technologies/gensim/pull/3128): Materialize and copy the corpus passed to SoftCosineSimilarity, by [@Witiko](https://github.com/Witiko)
+* [#3115](https://github.com/RaRe-Technologies/gensim/pull/3115): Make LSI dispatcher CLI param for number of jobs optional, by [@robguinness](https://github.com/robguinness)
+* [#3091](https://github.com/RaRe-Technologies/gensim/pull/3091): LsiModel: Only log top words that actually exist in the dictionary, by [@kmurphy4](https://github.com/kmurphy4)
+* [#2980](https://github.com/RaRe-Technologies/gensim/pull/2980): Added EnsembleLda for stable LDA topics, by [@sezanzeb](https://github.com/sezanzeb)
+* [#2978](https://github.com/RaRe-Technologies/gensim/pull/2978): Optimize performance of Author-Topic model, by [@horpto](https://github.com/horpto)
+* [#3000](https://github.com/RaRe-Technologies/gensim/pull/3000): Tidy up KeyedVectors.most_similar() API, by [@simonwiles](https://github.com/simonwiles)
+
+### :books: Tutorials and docs
+
+* [#3155](https://github.com/RaRe-Technologies/gensim/pull/3155): Correct parameter name in documentation of fasttext.py, by [@bizzyvinci](https://github.com/bizzyvinci)
+* [#3148](https://github.com/RaRe-Technologies/gensim/pull/3148): Fix broken link to mycorpus.txt in documentation, by [@rohit901](https://github.com/rohit901)
+* [#3142](https://github.com/RaRe-Technologies/gensim/pull/3142): Use more permanent pdf link and update code link, by [@dymil](https://github.com/dymil)
+* [#3141](https://github.com/RaRe-Technologies/gensim/pull/3141): Update link for online LDA paper, by [@dymil](https://github.com/dymil)
+* [#3133](https://github.com/RaRe-Technologies/gensim/pull/3133): Update link to Hoffman paper (online VB LDA), by [@jonaschn](https://github.com/jonaschn)
+* [#3129](https://github.com/RaRe-Technologies/gensim/pull/3129): [MRG] Add bronze sponsor: TechTarget, by [@piskvorky](https://github.com/piskvorky)
+* [#3126](https://github.com/RaRe-Technologies/gensim/pull/3126): Fix typos in make_wiki_online.py and make_wikicorpus.py, by [@nicolasassi](https://github.com/nicolasassi)
+* [#3125](https://github.com/RaRe-Technologies/gensim/pull/3125): Improve & unify docs for dirichlet priors, by [@jonaschn](https://github.com/jonaschn)
+* [#3123](https://github.com/RaRe-Technologies/gensim/pull/3123): Fix hyperlink for doc2vec tutorial, by [@AdityaSoni19031997](https://github.com/AdityaSoni19031997)
+* [#3121](https://github.com/RaRe-Technologies/gensim/pull/3121): [MRG] Add bronze sponsor: eaccidents.com, by [@piskvorky](https://github.com/piskvorky)
+* [#3120](https://github.com/RaRe-Technologies/gensim/pull/3120): Fix URL for ldamodel.py, by [@jonaschn](https://github.com/jonaschn)
+* [#3118](https://github.com/RaRe-Technologies/gensim/pull/3118): Fix URL in doc string, by [@jonaschn](https://github.com/jonaschn)
+* [#3107](https://github.com/RaRe-Technologies/gensim/pull/3107): Draw attention to sponsoring in README, by [@piskvorky](https://github.com/piskvorky)
+* [#3105](https://github.com/RaRe-Technologies/gensim/pull/3105): Fix documentation links: Travis to Github Actions, by [@piskvorky](https://github.com/piskvorky)
+* [#3057](https://github.com/RaRe-Technologies/gensim/pull/3057): Clarify doc comment in LdaModel.inference(), by [@yocen](https://github.com/yocen)
+* [#2964](https://github.com/RaRe-Technologies/gensim/pull/2964): Document that preprocessing.strip_punctuation is limited to ASCII, by [@sciatro](https://github.com/sciatro)
+
+
+### :red_circle: Bug fixes
+
+* [#3178](https://github.com/RaRe-Technologies/gensim/pull/3178): Fix Unicode string incompatibility in gensim.similarities.fastss.editdist, by [@Witiko](https://github.com/Witiko)
+* [#3174](https://github.com/RaRe-Technologies/gensim/pull/3174): Fix loading Phraser models stored in Gensim 3.x into Gensim 4.0, by [@emgucv](https://github.com/emgucv)
+* [#3136](https://github.com/RaRe-Technologies/gensim/pull/3136): Fix indexing error in word2vec_inner.pyx, by [@bluekura](https://github.com/bluekura)
+* [#3131](https://github.com/RaRe-Technologies/gensim/pull/3131): Add missing import to NMF docs and models/__init__.py, by [@properGrammar](https://github.com/properGrammar)
+* [#3116](https://github.com/RaRe-Technologies/gensim/pull/3116): Fix bug where saved Phrases model did not load its connector_words, by [@aloknayak29](https://github.com/aloknayak29)
+* [#2830](https://github.com/RaRe-Technologies/gensim/pull/2830): Fixed KeyError in coherence model, by [@pietrotrope](https://github.com/pietrotrope)
+
+
+### :warning: Removed functionality & deprecations
+
+* [#3176](https://github.com/RaRe-Technologies/gensim/pull/3176): Eliminate obsolete step parameter from doc2vec infer_vector and similarity_unseen_docs, by [@rock420](https://github.com/rock420)
+* [#2965](https://github.com/RaRe-Technologies/gensim/pull/2965): Remove strip_punctuation2 alias of strip_punctuation, by [@sciatro](https://github.com/sciatro)
+* [#3180](https://github.com/RaRe-Technologies/gensim/pull/3180): Move preprocessing functions from gensim.corpora.textcorpus and gensim.corpora.lowcorpus to gensim.parsing.preprocessing, by [@rock420](https://github.com/rock420)
+
+### 🔮 Testing, CI, housekeeping
+
+* [#3156](https://github.com/RaRe-Technologies/gensim/pull/3156): Update Numpy minimum version to 1.17.0, by [@PrimozGodec](https://github.com/PrimozGodec)
+* [#3143](https://github.com/RaRe-Technologies/gensim/pull/3143): replace _mul function with explicit casts, by [@mpenkov](https://github.com/mpenkov)
+* [#2952](https://github.com/RaRe-Technologies/gensim/pull/2952): Allow newer versions of the Morfessor module for the tests, by [@pabs3](https://github.com/pabs3)
+* [#2965](https://github.com/RaRe-Technologies/gensim/pull/2965): Remove strip_punctuation2 alias of strip_punctuation, by [@sciatro](https://github.com/sciatro)
+
+
+
## 4.0.1, 2021-04-01
Bugfix release to address issues with Wheels on Windows:
diff --git a/README.md b/README.md
index 9b7fdca6f3..f1cb9f3ddd 100644
--- a/README.md
+++ b/README.md
@@ -19,12 +19,8 @@ and *similarity retrieval* with large corpora. Target audience is the
*natural language processing* (NLP) and *information retrieval* (IR)
community.
-
Features
--------
@@ -57,10 +53,10 @@ scientific computing. You must have them installed prior to installing
gensim.
It is also recommended you install a fast BLAS library before installing
-NumPy. This is optional, but using an optimized BLAS such as [ATLAS] or
+NumPy. This is optional, but using an optimized BLAS such as MKL, [ATLAS] or
[OpenBLAS] is known to improve performance by as much as an order of
-magnitude. On OS X, NumPy picks up the BLAS that comes with it
-automatically, so you don’t need to do anything special.
+magnitude. On OSX, NumPy picks up its vecLib BLAS automatically,
+so you don’t need to do anything special.
Install the latest version of gensim:
@@ -77,7 +73,8 @@ package:
For alternative modes of installation, see the [documentation].
-Gensim is being [continuously tested](https://travis-ci.org/RaRe-Technologies/gensim) under Python 3.6, 3.7 and 3.8.
+Gensim is being [continuously tested](http://radimrehurek.com/gensim/#testing) under all
+[supported Python versions](https://github.com/RaRe-Technologies/gensim/wiki/Gensim-And-Compatibility).
Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.
How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?
@@ -110,9 +107,12 @@ Documentation
Support
-------
-Ask open-ended or research questions on the [Gensim Mailing List](https://groups.google.com/forum/#!forum/gensim).
+For commercial support, please see [Gensim sponsorship](https://github.com/sponsors/piskvorky).
+
+Ask open-ended questions on the public [Gensim Mailing List](https://groups.google.com/forum/#!forum/gensim).
+
+Raise bugs on [Github](https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md) but please **make sure you follow the [issue template](https://github.com/RaRe-Technologies/gensim/blob/develop/ISSUE_TEMPLATE.md)**. Issues that are not bugs or fail to provide the requested details will be closed without inspection.
-Raise bugs on [Github](https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md) but **make sure you follow the [issue template](https://github.com/RaRe-Technologies/gensim/blob/develop/ISSUE_TEMPLATE.md)**. Issues that are not bugs or fail to follow the issue template will be closed without inspection.
---------
@@ -162,15 +162,12 @@ BibTeX entry:
[citing gensim in academic papers and theses]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C
- [Travis CI for automated testing]: https://travis-ci.org/RaRe-Technologies/gensim
[design goals]: http://radimrehurek.com/gensim/about.html
[RaRe Technologies]: http://rare-technologies.com/wp-content/uploads/2016/02/rare_image_only.png%20=10x20
[rare\_tech]: //rare-technologies.com
[Talentpair]: https://avatars3.githubusercontent.com/u/8418395?v=3&s=100
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC
-
-
[documentation and Jupyter Notebook tutorials]: https://github.com/RaRe-Technologies/gensim/#documentation
[Vector Space Model]: http://en.wikipedia.org/wiki/Vector_space_model
[unsupervised document analysis]: http://en.wikipedia.org/wiki/Latent_semantic_indexing
diff --git a/docs/notebooks/ensemble_lda_with_opinosis.ipynb b/docs/notebooks/ensemble_lda_with_opinosis.ipynb
new file mode 100644
index 0000000000..354a2eeff8
--- /dev/null
+++ b/docs/notebooks/ensemble_lda_with_opinosis.ipynb
@@ -0,0 +1,178 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [],
+ "source": [
+ "import logging\n",
+ "from gensim.models import EnsembleLda, LdaMulticore\n",
+ "from gensim.models.ensemblelda import rank_masking\n",
+ "from gensim.corpora import OpinosisCorpus\n",
+ "import os"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "enable the ensemble logger to show what it is doing currently"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "elda_logger = logging.getLogger(EnsembleLda.__module__)\n",
+ "elda_logger.setLevel(logging.INFO)\n",
+ "elda_logger.addHandler(logging.StreamHandler())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def pretty_print_topics():\n",
+ " # note that the words are stemmed so they appear chopped off\n",
+ " for t in elda.print_topics(num_words=7):\n",
+ " print('-', t[1].replace('*',' ').replace('\"','').replace(' +',','), '\\n')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Experiments on the Opinosis Dataset\n",
+ "\n",
+ "Opinosis [1] is a small (but redundant) corpus that contains 289 product reviews for 51 products. Since it's so small, the results are rather unstable.\n",
+ "\n",
+ "[1] Kavita Ganesan, ChengXiang Zhai, and Jiawei Han, _Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions [online],_ Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, 2010, pp. 340–348. Available from: https://kavita-ganesan.com/opinosis/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Preparing the corpus\n",
+ "\n",
+ "First, download the opinosis dataset. On linux it can be done like this for example:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!mkdir ~/opinosis\n",
+ "!wget -P ~/opinosis https://github.com/kavgan/opinosis/raw/master/OpinosisDataset1.0_0.zip\n",
+ "!unzip ~/opinosis/OpinosisDataset1.0_0.zip -d ~/opinosis"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "path = os.path.expanduser('~/opinosis/')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Corpus and id2word mapping can be created using the load_opinosis_data function provided in the package.\n",
+ "It preprocesses the data using the PorterStemmer and stopwords from the nltk package.\n",
+ "\n",
+ "The parameter of the function is the relative path to the folder, into which the zip file was extracted before. That folder contains a 'summaries-gold' subfolder."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "opinosis = OpinosisCorpus(path)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Training"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**parameters**\n",
+ "\n",
+ "**topic_model_kind** ldamulticore is highly recommended for EnsembleLda. ensemble_workers and **distance_workers** are used to improve the time needed to train the models, as well as the **masking_method** 'rank'. ldamulticore is not able to fully utilize all cores on this small corpus, so **ensemble_workers** can be set to 3 to get 95 - 100% cpu usage on my i5 3470.\n",
+ "\n",
+ "Since the corpus is so small, a high number of **num_models** is needed to extract stable topics. The Opinosis corpus contains 51 categories, however, some of them are quite similar. For example there are 3 categories about the batteries of portable products. There are also multiple categories about cars. So I chose 20 for num_topics, which is smaller than the number of categories."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "elda = EnsembleLda(\n",
+ " corpus=opinosis.corpus, id2word=opinosis.id2word, num_models=128, num_topics=20,\n",
+ " passes=20, iterations=100, ensemble_workers=3, distance_workers=4,\n",
+ " topic_model_class='ldamulticore', masking_method=rank_masking,\n",
+ ")\n",
+ "pretty_print_topics()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The default for **min_samples** would be 64, half of the number of models and **eps** would be 0.1. You basically play around with them until you find a sweetspot that fits for your needs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "elda.recluster(min_samples=55, eps=0.14)\n",
+ "pretty_print_topics()"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/src/_static/images/eaccidents-logo.png b/docs/src/_static/images/eaccidents-logo.png
new file mode 100644
index 0000000000..fbeb70cb19
Binary files /dev/null and b/docs/src/_static/images/eaccidents-logo.png differ
diff --git a/docs/src/_static/images/techtarget-logo.png b/docs/src/_static/images/techtarget-logo.png
new file mode 100644
index 0000000000..d7ae884076
Binary files /dev/null and b/docs/src/_static/images/techtarget-logo.png differ
diff --git a/docs/src/_templates/indexcontent.html b/docs/src/_templates/indexcontent.html
index 0d7c5a6252..0ca74b7210 100644
--- a/docs/src/_templates/indexcontent.html
+++ b/docs/src/_templates/indexcontent.html
@@ -108,7 +108,7 @@
Ready-to-use models and corpora
-
Installation
+ Installation
@@ -161,7 +161,7 @@
Code dependencies
-
Testing Gensim
+ Testing Gensim
@@ -174,10 +174,10 @@
Testing Gensim
Build status |
- Travis |
- Run tests on Linux and check code-style |
-
+ | Github Actions |
+ Run tests on Linux and Mac, plus check code-style |
+
|
diff --git a/docs/src/apiref.rst b/docs/src/apiref.rst
index eef4684ac8..583e4528a9 100644
--- a/docs/src/apiref.rst
+++ b/docs/src/apiref.rst
@@ -20,6 +20,7 @@ Modules:
corpora/lowcorpus
corpora/malletcorpus
corpora/mmcorpus
+ corpora/opinosiscorpus
corpora/sharded_corpus
corpora/svmlightcorpus
corpora/textcorpus
@@ -27,6 +28,7 @@ Modules:
corpora/wikicorpus
models/ldamodel
models/ldamulticore
+ models/ensemblelda
models/nmf
models/lsimodel
models/ldaseqmodel
diff --git a/docs/src/auto_examples/core/images/sphx_glr_run_corpora_and_vector_spaces_001.png b/docs/src/auto_examples/core/images/sphx_glr_run_corpora_and_vector_spaces_001.png
index 807a84c4de..5c86a24471 100644
Binary files a/docs/src/auto_examples/core/images/sphx_glr_run_corpora_and_vector_spaces_001.png and b/docs/src/auto_examples/core/images/sphx_glr_run_corpora_and_vector_spaces_001.png differ
diff --git a/docs/src/auto_examples/core/images/thumb/sphx_glr_run_corpora_and_vector_spaces_thumb.png b/docs/src/auto_examples/core/images/thumb/sphx_glr_run_corpora_and_vector_spaces_thumb.png
index 5a8e564326..bab9aec4a4 100644
Binary files a/docs/src/auto_examples/core/images/thumb/sphx_glr_run_corpora_and_vector_spaces_thumb.png and b/docs/src/auto_examples/core/images/thumb/sphx_glr_run_corpora_and_vector_spaces_thumb.png differ
diff --git a/docs/src/auto_examples/core/run_corpora_and_vector_spaces.ipynb b/docs/src/auto_examples/core/run_corpora_and_vector_spaces.ipynb
index 40a3324206..875db7b507 100644
--- a/docs/src/auto_examples/core/run_corpora_and_vector_spaces.ipynb
+++ b/docs/src/auto_examples/core/run_corpora_and_vector_spaces.ipynb
@@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "\nCorpora and Vector Spaces\n=========================\n\nDemonstrates transforming text into a vector space representation.\n\nAlso introduces corpus streaming and persistence to disk in various formats.\n\n"
+ "\n# Corpora and Vector Spaces\n\nDemonstrates transforming text into a vector space representation.\n\nAlso introduces corpus streaming and persistence to disk in various formats.\n"
]
},
{
@@ -33,7 +33,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "First, let\u2019s create a small corpus of nine short documents [1]_:\n\n\nFrom Strings to Vectors\n------------------------\n\nThis time, let's start from documents represented as strings:\n\n\n"
+ "First, let\u2019s create a small corpus of nine short documents [1]_:\n\n\n## From Strings to Vectors\n\nThis time, let's start from documents represented as strings:\n\n\n"
]
},
{
@@ -141,7 +141,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "By now it should be clear that the vector feature with ``id=10`` stands for the question \"How many\ntimes does the word `graph` appear in the document?\" and that the answer is \"zero\" for\nthe first six documents and \"one\" for the remaining three.\n\n\nCorpus Streaming -- One Document at a Time\n-------------------------------------------\n\nNote that `corpus` above resides fully in memory, as a plain Python list.\nIn this simple example, it doesn't matter much, but just to make things clear,\nlet's assume there are millions of documents in the corpus. Storing all of them in RAM won't do.\nInstead, let's assume the documents are stored in a file on disk, one document per line. Gensim\nonly requires that a corpus must be able to return one document vector at a time:\n\n\n"
+ "By now it should be clear that the vector feature with ``id=10`` stands for the question \"How many\ntimes does the word `graph` appear in the document?\" and that the answer is \"zero\" for\nthe first six documents and \"one\" for the remaining three.\n\n\n## Corpus Streaming -- One Document at a Time\n\nNote that `corpus` above resides fully in memory, as a plain Python list.\nIn this simple example, it doesn't matter much, but just to make things clear,\nlet's assume there are millions of documents in the corpus. Storing all of them in RAM won't do.\nInstead, let's assume the documents are stored in a file on disk, one document per line. Gensim\nonly requires that a corpus must be able to return one document vector at a time:\n\n\n"
]
},
{
@@ -152,7 +152,7 @@
},
"outputs": [],
"source": [
- "from smart_open import open # for transparently opening remote files\n\n\nclass MyCorpus:\n def __iter__(self):\n for line in open('https://radimrehurek.com/gensim/mycorpus.txt'):\n # assume there's one document per line, tokens separated by whitespace\n yield dictionary.doc2bow(line.lower().split())"
+ "from smart_open import open # for transparently opening remote files\n\n\nclass MyCorpus:\n def __iter__(self):\n for line in open('https://radimrehurek.com/mycorpus.txt'):\n # assume there's one document per line, tokens separated by whitespace\n yield dictionary.doc2bow(line.lower().split())"
]
},
{
@@ -177,7 +177,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Download the sample `mycorpus.txt file here <./mycorpus.txt>`_. The assumption that\neach document occupies one line in a single file is not important; you can mold\nthe `__iter__` function to fit your input format, whatever it is.\nWalking directories, parsing XML, accessing the network...\nJust parse your input to retrieve a clean list of tokens in each document,\nthen convert the tokens via a dictionary to their ids and yield the resulting sparse vector inside `__iter__`.\n\n"
+ "Download the sample `mycorpus.txt file here `_. The assumption that\neach document occupies one line in a single file is not important; you can mold\nthe `__iter__` function to fit your input format, whatever it is.\nWalking directories, parsing XML, accessing the network...\nJust parse your input to retrieve a clean list of tokens in each document,\nthen convert the tokens via a dictionary to their ids and yield the resulting sparse vector inside `__iter__`.\n\n"
]
},
{
@@ -224,14 +224,14 @@
},
"outputs": [],
"source": [
- "# collect statistics about all tokens\ndictionary = corpora.Dictionary(line.lower().split() for line in open('https://radimrehurek.com/gensim/mycorpus.txt'))\n# remove stop words and words that appear only once\nstop_ids = [\n dictionary.token2id[stopword]\n for stopword in stoplist\n if stopword in dictionary.token2id\n]\nonce_ids = [tokenid for tokenid, docfreq in dictionary.dfs.items() if docfreq == 1]\ndictionary.filter_tokens(stop_ids + once_ids) # remove stop words and words that appear only once\ndictionary.compactify() # remove gaps in id sequence after words that were removed\nprint(dictionary)"
+ "# collect statistics about all tokens\ndictionary = corpora.Dictionary(line.lower().split() for line in open('https://radimrehurek.com/mycorpus.txt'))\n# remove stop words and words that appear only once\nstop_ids = [\n dictionary.token2id[stopword]\n for stopword in stoplist\n if stopword in dictionary.token2id\n]\nonce_ids = [tokenid for tokenid, docfreq in dictionary.dfs.items() if docfreq == 1]\ndictionary.filter_tokens(stop_ids + once_ids) # remove stop words and words that appear only once\ndictionary.compactify() # remove gaps in id sequence after words that were removed\nprint(dictionary)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "And that is all there is to it! At least as far as bag-of-words representation is concerned.\nOf course, what we do with such a corpus is another question; it is not at all clear\nhow counting the frequency of distinct words could be useful. As it turns out, it isn't, and\nwe will need to apply a transformation on this simple representation first, before\nwe can use it to compute any meaningful document vs. document similarities.\nTransformations are covered in the next tutorial\n(`sphx_glr_auto_examples_core_run_topics_and_transformations.py`),\nbut before that, let's briefly turn our attention to *corpus persistency*.\n\n\nCorpus Formats\n---------------\n\nThere exist several file formats for serializing a Vector Space corpus (~sequence of vectors) to disk.\n`Gensim` implements them via the *streaming corpus interface* mentioned earlier:\ndocuments are read from (resp. stored to) disk in a lazy fashion, one document at\na time, without the whole corpus being read into main memory at once.\n\nOne of the more notable file formats is the `Market Matrix format `_.\nTo save a corpus in the Matrix Market format:\n\ncreate a toy corpus of 2 documents, as a plain Python list\n\n"
+ "And that is all there is to it! At least as far as bag-of-words representation is concerned.\nOf course, what we do with such a corpus is another question; it is not at all clear\nhow counting the frequency of distinct words could be useful. As it turns out, it isn't, and\nwe will need to apply a transformation on this simple representation first, before\nwe can use it to compute any meaningful document vs. document similarities.\nTransformations are covered in the next tutorial\n(`sphx_glr_auto_examples_core_run_topics_and_transformations.py`),\nbut before that, let's briefly turn our attention to *corpus persistency*.\n\n\n## Corpus Formats\n\nThere exist several file formats for serializing a Vector Space corpus (~sequence of vectors) to disk.\n`Gensim` implements them via the *streaming corpus interface* mentioned earlier:\ndocuments are read from (resp. stored to) disk in a lazy fashion, one document at\na time, without the whole corpus being read into main memory at once.\n\nOne of the more notable file formats is the `Market Matrix format `_.\nTo save a corpus in the Matrix Market format:\n\ncreate a toy corpus of 2 documents, as a plain Python list\n\n"
]
},
{
@@ -357,7 +357,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "In this way, `gensim` can also be used as a memory-efficient **I/O format conversion tool**:\njust load a document stream using one format and immediately save it in another format.\nAdding new formats is dead easy, check out the `code for the SVMlight corpus\n`_ for an example.\n\nCompatibility with NumPy and SciPy\n----------------------------------\n\nGensim also contains `efficient utility functions `_\nto help converting from/to numpy matrices\n\n"
+ "In this way, `gensim` can also be used as a memory-efficient **I/O format conversion tool**:\njust load a document stream using one format and immediately save it in another format.\nAdding new formats is dead easy, check out the `code for the SVMlight corpus\n`_ for an example.\n\n## Compatibility with NumPy and SciPy\n\nGensim also contains `efficient utility functions `_\nto help converting from/to numpy matrices\n\n"
]
},
{
@@ -393,7 +393,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "What Next\n---------\n\nRead about `sphx_glr_auto_examples_core_run_topics_and_transformations.py`.\n\nReferences\n----------\n\nFor a complete reference (Want to prune the dictionary to a smaller size?\nOptimize converting between corpora and NumPy/SciPy arrays?), see the `apiref`.\n\n.. [1] This is the same corpus as used in\n `Deerwester et al. (1990): Indexing by Latent Semantic Analysis `_, Table 2.\n\n"
+ "## What Next\n\nRead about `sphx_glr_auto_examples_core_run_topics_and_transformations.py`.\n\n## References\n\nFor a complete reference (Want to prune the dictionary to a smaller size?\nOptimize converting between corpora and NumPy/SciPy arrays?), see the `apiref`.\n\n.. [1] This is the same corpus as used in\n `Deerwester et al. (1990): Indexing by Latent Semantic Analysis `_, Table 2.\n\n"
]
},
{
@@ -424,7 +424,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.5"
+ "version": "3.8.5"
}
},
"nbformat": 4,
diff --git a/docs/src/auto_examples/core/run_corpora_and_vector_spaces.py b/docs/src/auto_examples/core/run_corpora_and_vector_spaces.py
index 0a49614123..983a9d1235 100644
--- a/docs/src/auto_examples/core/run_corpora_and_vector_spaces.py
+++ b/docs/src/auto_examples/core/run_corpora_and_vector_spaces.py
@@ -138,7 +138,7 @@
class MyCorpus:
def __iter__(self):
- for line in open('https://radimrehurek.com/gensim/mycorpus.txt'):
+ for line in open('https://radimrehurek.com/mycorpus.txt'):
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(line.lower().split())
@@ -154,7 +154,7 @@ def __iter__(self):
# in RAM at once. You can even create the documents on the fly!
###############################################################################
-# Download the sample `mycorpus.txt file here <./mycorpus.txt>`_. The assumption that
+# Download the sample `mycorpus.txt file here `_. The assumption that
# each document occupies one line in a single file is not important; you can mold
# the `__iter__` function to fit your input format, whatever it is.
# Walking directories, parsing XML, accessing the network...
@@ -180,7 +180,7 @@ def __iter__(self):
# Similarly, to construct the dictionary without loading all texts into memory:
# collect statistics about all tokens
-dictionary = corpora.Dictionary(line.lower().split() for line in open('https://radimrehurek.com/gensim/mycorpus.txt'))
+dictionary = corpora.Dictionary(line.lower().split() for line in open('https://radimrehurek.com/mycorpus.txt'))
# remove stop words and words that appear only once
stop_ids = [
dictionary.token2id[stopword]
diff --git a/docs/src/auto_examples/core/run_corpora_and_vector_spaces.py.md5 b/docs/src/auto_examples/core/run_corpora_and_vector_spaces.py.md5
index 935e0357af..174fe2a139 100644
--- a/docs/src/auto_examples/core/run_corpora_and_vector_spaces.py.md5
+++ b/docs/src/auto_examples/core/run_corpora_and_vector_spaces.py.md5
@@ -1 +1 @@
-6b98413399bca9fd1ed8fe420da85692
\ No newline at end of file
+55a8a886f05e5005c5f66d57569ee79d
\ No newline at end of file
diff --git a/docs/src/auto_examples/core/run_corpora_and_vector_spaces.rst b/docs/src/auto_examples/core/run_corpora_and_vector_spaces.rst
index 7f8d25cfec..3cc549dd65 100644
--- a/docs/src/auto_examples/core/run_corpora_and_vector_spaces.rst
+++ b/docs/src/auto_examples/core/run_corpora_and_vector_spaces.rst
@@ -1,12 +1,21 @@
+
+.. DO NOT EDIT.
+.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
+.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
+.. "auto_examples/core/run_corpora_and_vector_spaces.py"
+.. LINE NUMBERS ARE GIVEN BELOW.
+
.. only:: html
.. note::
:class: sphx-glr-download-link-note
- Click :ref:`here ` to download the full example code
- .. rst-class:: sphx-glr-example-title
+ Click :ref:`here `
+ to download the full example code
+
+.. rst-class:: sphx-glr-example-title
- .. _sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py:
+.. _sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py:
Corpora and Vector Spaces
@@ -16,6 +25,7 @@ Demonstrates transforming text into a vector space representation.
Also introduces corpus streaming and persistence to disk in various formats.
+.. GENERATED FROM PYTHON SOURCE LINES 9-13
.. code-block:: default
@@ -30,6 +40,8 @@ Also introduces corpus streaming and persistence to disk in various formats.
+.. GENERATED FROM PYTHON SOURCE LINES 14-23
+
First, let’s create a small corpus of nine short documents [1]_:
.. _second example:
@@ -40,6 +52,7 @@ From Strings to Vectors
This time, let's start from documents represented as strings:
+.. GENERATED FROM PYTHON SOURCE LINES 23-35
.. code-block:: default
@@ -62,11 +75,14 @@ This time, let's start from documents represented as strings:
+.. GENERATED FROM PYTHON SOURCE LINES 36-40
+
This is a tiny corpus of nine documents, each consisting of only a single sentence.
First, let's tokenize the documents, remove common words (using a toy stoplist)
as well as words that only appear once in the corpus:
+.. GENERATED FROM PYTHON SOURCE LINES 40-64
.. code-block:: default
@@ -117,6 +133,8 @@ as well as words that only appear once in the corpus:
+.. GENERATED FROM PYTHON SOURCE LINES 65-87
+
Your way of processing the documents will likely vary; here, I only split on whitespace
to tokenize, followed by lowercasing each word. In fact, I use this particular
(simplistic and inefficient) setup to mimic the experiment done in Deerwester et al.'s
@@ -140,6 +158,7 @@ a question-answer pair, in the style of:
It is advantageous to represent the questions only by their (integer) ids. The mapping
between the questions and ids is called a dictionary:
+.. GENERATED FROM PYTHON SOURCE LINES 87-93
.. code-block:: default
@@ -159,21 +178,25 @@ between the questions and ids is called a dictionary:
.. code-block:: none
- 2020-10-28 00:52:02,550 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
- 2020-10-28 00:52:02,550 : INFO : built Dictionary(12 unique tokens: ['computer', 'human', 'interface', 'response', 'survey']...) from 9 documents (total 29 corpus positions)
- 2020-10-28 00:52:02,550 : INFO : saving Dictionary(12 unique tokens: ['computer', 'human', 'interface', 'response', 'survey']...) under /tmp/deerwester.dict, separately None
- 2020-10-28 00:52:02,552 : INFO : saved /tmp/deerwester.dict
+ 2021-06-01 10:34:56,824 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
+ 2021-06-01 10:34:56,824 : INFO : built Dictionary(12 unique tokens: ['computer', 'human', 'interface', 'response', 'survey']...) from 9 documents (total 29 corpus positions)
+ 2021-06-01 10:34:56,834 : INFO : Dictionary lifecycle event {'msg': "built Dictionary(12 unique tokens: ['computer', 'human', 'interface', 'response', 'survey']...) from 9 documents (total 29 corpus positions)", 'datetime': '2021-06-01T10:34:56.825003', 'gensim': '4.1.0.dev0', 'python': '3.8.5 (default, Jan 27 2021, 15:41:15) \n[GCC 9.3.0]', 'platform': 'Linux-5.4.0-73-generic-x86_64-with-glibc2.29', 'event': 'created'}
+ 2021-06-01 10:34:56,834 : INFO : Dictionary lifecycle event {'fname_or_handle': '/tmp/deerwester.dict', 'separately': 'None', 'sep_limit': 10485760, 'ignore': frozenset(), 'datetime': '2021-06-01T10:34:56.834300', 'gensim': '4.1.0.dev0', 'python': '3.8.5 (default, Jan 27 2021, 15:41:15) \n[GCC 9.3.0]', 'platform': 'Linux-5.4.0-73-generic-x86_64-with-glibc2.29', 'event': 'saving'}
+ 2021-06-01 10:34:56,834 : INFO : saved /tmp/deerwester.dict
Dictionary(12 unique tokens: ['computer', 'human', 'interface', 'response', 'survey']...)
+.. GENERATED FROM PYTHON SOURCE LINES 94-99
+
Here we assigned a unique integer id to all words appearing in the corpus with the
:class:`gensim.corpora.dictionary.Dictionary` class. This sweeps across the texts, collecting word counts
and relevant statistics. In the end, we see there are twelve distinct words in the
processed corpus, which means each document will be represented by twelve numbers (ie., by a 12-D vector).
To see the mapping between words and their ids:
+.. GENERATED FROM PYTHON SOURCE LINES 99-102
.. code-block:: default
@@ -195,8 +218,11 @@ To see the mapping between words and their ids:
+.. GENERATED FROM PYTHON SOURCE LINES 103-104
+
To actually convert tokenized documents to vectors:
+.. GENERATED FROM PYTHON SOURCE LINES 104-109
.. code-block:: default
@@ -220,12 +246,15 @@ To actually convert tokenized documents to vectors:
+.. GENERATED FROM PYTHON SOURCE LINES 110-115
+
The function :func:`doc2bow` simply counts the number of occurrences of
each distinct word, converts the word to its integer word id
and returns the result as a sparse vector. The sparse vector ``[(0, 1), (1, 1)]``
therefore reads: in the document `"Human computer interaction"`, the words `computer`
(id 0) and `human` (id 1) appear once; the other ten dictionary words appear (implicitly) zero times.
+.. GENERATED FROM PYTHON SOURCE LINES 115-120
.. code-block:: default
@@ -244,16 +273,18 @@ therefore reads: in the document `"Human computer interaction"`, the words `comp
.. code-block:: none
- 2020-10-28 00:52:02,830 : INFO : storing corpus in Matrix Market format to /tmp/deerwester.mm
- 2020-10-28 00:52:02,832 : INFO : saving sparse matrix to /tmp/deerwester.mm
- 2020-10-28 00:52:02,832 : INFO : PROGRESS: saving document #0
- 2020-10-28 00:52:02,834 : INFO : saved 9x12 matrix, density=25.926% (28/108)
- 2020-10-28 00:52:02,834 : INFO : saving MmCorpus index to /tmp/deerwester.mm.index
+ 2021-06-01 10:34:57,074 : INFO : storing corpus in Matrix Market format to /tmp/deerwester.mm
+ 2021-06-01 10:34:57,075 : INFO : saving sparse matrix to /tmp/deerwester.mm
+ 2021-06-01 10:34:57,075 : INFO : PROGRESS: saving document #0
+ 2021-06-01 10:34:57,076 : INFO : saved 9x12 matrix, density=25.926% (28/108)
+ 2021-06-01 10:34:57,076 : INFO : saving MmCorpus index to /tmp/deerwester.mm.index
[[(0, 1), (1, 1), (2, 1)], [(0, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)], [(2, 1), (5, 1), (7, 1), (8, 1)], [(1, 1), (5, 2), (8, 1)], [(3, 1), (6, 1), (7, 1)], [(9, 1)], [(9, 1), (10, 1)], [(9, 1), (10, 1), (11, 1)], [(4, 1), (10, 1), (11, 1)]]
+.. GENERATED FROM PYTHON SOURCE LINES 121-136
+
By now it should be clear that the vector feature with ``id=10`` stands for the question "How many
times does the word `graph` appear in the document?" and that the answer is "zero" for
the first six documents and "one" for the remaining three.
@@ -270,6 +301,7 @@ Instead, let's assume the documents are stored in a file on disk, one document p
only requires that a corpus must be able to return one document vector at a time:
+.. GENERATED FROM PYTHON SOURCE LINES 136-145
.. code-block:: default
@@ -278,7 +310,7 @@ only requires that a corpus must be able to return one document vector at a time
class MyCorpus:
def __iter__(self):
- for line in open('https://radimrehurek.com/gensim/mycorpus.txt'):
+ for line in open('https://radimrehurek.com/mycorpus.txt'):
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(line.lower().split())
@@ -289,11 +321,14 @@ only requires that a corpus must be able to return one document vector at a time
+.. GENERATED FROM PYTHON SOURCE LINES 146-150
+
The full power of Gensim comes from the fact that a corpus doesn't have to be
a ``list``, or a ``NumPy`` array, or a ``Pandas`` dataframe, or whatever.
Gensim *accepts any object that, when iterated over, successively yields
documents*.
+.. GENERATED FROM PYTHON SOURCE LINES 150-156
.. code-block:: default
@@ -310,13 +345,16 @@ documents*.
-Download the sample `mycorpus.txt file here <./mycorpus.txt>`_. The assumption that
+.. GENERATED FROM PYTHON SOURCE LINES 157-163
+
+Download the sample `mycorpus.txt file here `_. The assumption that
each document occupies one line in a single file is not important; you can mold
the `__iter__` function to fit your input format, whatever it is.
Walking directories, parsing XML, accessing the network...
Just parse your input to retrieve a clean list of tokens in each document,
then convert the tokens via a dictionary to their ids and yield the resulting sparse vector inside `__iter__`.
+.. GENERATED FROM PYTHON SOURCE LINES 163-167
.. code-block:: default
@@ -334,15 +372,18 @@ then convert the tokens via a dictionary to their ids and yield the resulting sp
.. code-block:: none
- <__main__.MyCorpus object at 0x11e77bb38>
+ <__main__.MyCorpus object at 0x7f389b5f8520>
+.. GENERATED FROM PYTHON SOURCE LINES 168-171
+
Corpus is now an object. We didn't define any way to print it, so `print` just outputs address
of the object in memory. Not very useful. To see the constituent vectors, let's
iterate over the corpus and print each document vector (one at a time):
+.. GENERATED FROM PYTHON SOURCE LINES 171-175
.. code-block:: default
@@ -373,18 +414,21 @@ iterate over the corpus and print each document vector (one at a time):
+.. GENERATED FROM PYTHON SOURCE LINES 176-181
+
Although the output is the same as for the plain Python list, the corpus is now much
more memory friendly, because at most one vector resides in RAM at a time. Your
corpus can now be as large as you want.
Similarly, to construct the dictionary without loading all texts into memory:
+.. GENERATED FROM PYTHON SOURCE LINES 181-195
.. code-block:: default
# collect statistics about all tokens
- dictionary = corpora.Dictionary(line.lower().split() for line in open('https://radimrehurek.com/gensim/mycorpus.txt'))
+ dictionary = corpora.Dictionary(line.lower().split() for line in open('https://radimrehurek.com/mycorpus.txt'))
# remove stop words and words that appear only once
stop_ids = [
dictionary.token2id[stopword]
@@ -406,13 +450,16 @@ Similarly, to construct the dictionary without loading all texts into memory:
.. code-block:: none
- 2020-10-28 00:52:04,241 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
- 2020-10-28 00:52:04,243 : INFO : built Dictionary(42 unique tokens: ['abc', 'applications', 'computer', 'for', 'human']...) from 9 documents (total 69 corpus positions)
+ 2021-06-01 10:34:58,466 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
+ 2021-06-01 10:34:58,467 : INFO : built Dictionary(42 unique tokens: ['abc', 'applications', 'computer', 'for', 'human']...) from 9 documents (total 69 corpus positions)
+ 2021-06-01 10:34:58,467 : INFO : Dictionary lifecycle event {'msg': "built Dictionary(42 unique tokens: ['abc', 'applications', 'computer', 'for', 'human']...) from 9 documents (total 69 corpus positions)", 'datetime': '2021-06-01T10:34:58.467454', 'gensim': '4.1.0.dev0', 'python': '3.8.5 (default, Jan 27 2021, 15:41:15) \n[GCC 9.3.0]', 'platform': 'Linux-5.4.0-73-generic-x86_64-with-glibc2.29', 'event': 'created'}
Dictionary(12 unique tokens: ['computer', 'human', 'interface', 'response', 'survey']...)
+.. GENERATED FROM PYTHON SOURCE LINES 196-219
+
And that is all there is to it! At least as far as bag-of-words representation is concerned.
Of course, what we do with such a corpus is another question; it is not at all clear
how counting the frequency of distinct words could be useful. As it turns out, it isn't, and
@@ -437,6 +484,7 @@ To save a corpus in the Matrix Market format:
create a toy corpus of 2 documents, as a plain Python list
+.. GENERATED FROM PYTHON SOURCE LINES 219-223
.. code-block:: default
@@ -454,19 +502,22 @@ create a toy corpus of 2 documents, as a plain Python list
.. code-block:: none
- 2020-10-28 00:52:04,368 : INFO : storing corpus in Matrix Market format to /tmp/corpus.mm
- 2020-10-28 00:52:04,370 : INFO : saving sparse matrix to /tmp/corpus.mm
- 2020-10-28 00:52:04,370 : INFO : PROGRESS: saving document #0
- 2020-10-28 00:52:04,370 : INFO : saved 2x2 matrix, density=25.000% (1/4)
- 2020-10-28 00:52:04,370 : INFO : saving MmCorpus index to /tmp/corpus.mm.index
+ 2021-06-01 10:34:58,603 : INFO : storing corpus in Matrix Market format to /tmp/corpus.mm
+ 2021-06-01 10:34:58,604 : INFO : saving sparse matrix to /tmp/corpus.mm
+ 2021-06-01 10:34:58,604 : INFO : PROGRESS: saving document #0
+ 2021-06-01 10:34:58,604 : INFO : saved 2x2 matrix, density=25.000% (1/4)
+ 2021-06-01 10:34:58,604 : INFO : saving MmCorpus index to /tmp/corpus.mm.index
+.. GENERATED FROM PYTHON SOURCE LINES 224-227
+
Other formats include `Joachim's SVMlight format `_,
`Blei's LDA-C format `_ and
`GibbsLDA++ format `_.
+.. GENERATED FROM PYTHON SOURCE LINES 227-233
.. code-block:: default
@@ -486,22 +537,25 @@ Other formats include `Joachim's SVMlight format
.. code-block:: none
- 2020-10-28 00:52:04,425 : INFO : converting corpus to SVMlight format: /tmp/corpus.svmlight
- 2020-10-28 00:52:04,426 : INFO : saving SvmLightCorpus index to /tmp/corpus.svmlight.index
- 2020-10-28 00:52:04,427 : INFO : no word id mapping provided; initializing from corpus
- 2020-10-28 00:52:04,427 : INFO : storing corpus in Blei's LDA-C format into /tmp/corpus.lda-c
- 2020-10-28 00:52:04,427 : INFO : saving vocabulary of 2 words to /tmp/corpus.lda-c.vocab
- 2020-10-28 00:52:04,427 : INFO : saving BleiCorpus index to /tmp/corpus.lda-c.index
- 2020-10-28 00:52:04,481 : INFO : no word id mapping provided; initializing from corpus
- 2020-10-28 00:52:04,481 : INFO : storing corpus in List-Of-Words format into /tmp/corpus.low
- 2020-10-28 00:52:04,482 : WARNING : List-of-words format can only save vectors with integer elements; 1 float entries were truncated to integer value
- 2020-10-28 00:52:04,482 : INFO : saving LowCorpus index to /tmp/corpus.low.index
+ 2021-06-01 10:34:58,653 : INFO : converting corpus to SVMlight format: /tmp/corpus.svmlight
+ 2021-06-01 10:34:58,654 : INFO : saving SvmLightCorpus index to /tmp/corpus.svmlight.index
+ 2021-06-01 10:34:58,654 : INFO : no word id mapping provided; initializing from corpus
+ 2021-06-01 10:34:58,654 : INFO : storing corpus in Blei's LDA-C format into /tmp/corpus.lda-c
+ 2021-06-01 10:34:58,654 : INFO : saving vocabulary of 2 words to /tmp/corpus.lda-c.vocab
+ 2021-06-01 10:34:58,654 : INFO : saving BleiCorpus index to /tmp/corpus.lda-c.index
+ 2021-06-01 10:34:58,707 : INFO : no word id mapping provided; initializing from corpus
+ 2021-06-01 10:34:58,708 : INFO : storing corpus in List-Of-Words format into /tmp/corpus.low
+ 2021-06-01 10:34:58,708 : WARNING : List-of-words format can only save vectors with integer elements; 1 float entries were truncated to integer value
+ 2021-06-01 10:34:58,708 : INFO : saving LowCorpus index to /tmp/corpus.low.index
+
+.. GENERATED FROM PYTHON SOURCE LINES 234-235
Conversely, to load a corpus iterator from a Matrix Market file:
+.. GENERATED FROM PYTHON SOURCE LINES 235-238
.. code-block:: default
@@ -518,15 +572,18 @@ Conversely, to load a corpus iterator from a Matrix Market file:
.. code-block:: none
- 2020-10-28 00:52:04,538 : INFO : loaded corpus index from /tmp/corpus.mm.index
- 2020-10-28 00:52:04,540 : INFO : initializing cython corpus reader from /tmp/corpus.mm
- 2020-10-28 00:52:04,540 : INFO : accepted corpus with 2 documents, 2 features, 1 non-zero entries
+ 2021-06-01 10:34:58,756 : INFO : loaded corpus index from /tmp/corpus.mm.index
+ 2021-06-01 10:34:58,757 : INFO : initializing cython corpus reader from /tmp/corpus.mm
+ 2021-06-01 10:34:58,757 : INFO : accepted corpus with 2 documents, 2 features, 1 non-zero entries
+
+.. GENERATED FROM PYTHON SOURCE LINES 239-240
Corpus objects are streams, so typically you won't be able to print them directly:
+.. GENERATED FROM PYTHON SOURCE LINES 240-243
.. code-block:: default
@@ -548,8 +605,11 @@ Corpus objects are streams, so typically you won't be able to print them directl
+.. GENERATED FROM PYTHON SOURCE LINES 244-245
+
Instead, to view the contents of a corpus:
+.. GENERATED FROM PYTHON SOURCE LINES 245-249
.. code-block:: default
@@ -572,8 +632,11 @@ Instead, to view the contents of a corpus:
+.. GENERATED FROM PYTHON SOURCE LINES 250-251
+
or
+.. GENERATED FROM PYTHON SOURCE LINES 251-256
.. code-block:: default
@@ -598,11 +661,14 @@ or
+.. GENERATED FROM PYTHON SOURCE LINES 257-261
+
The second way is obviously more memory-friendly, but for testing and development
purposes, nothing beats the simplicity of calling ``list(corpus)``.
To save the same Matrix Market document stream in Blei's LDA-C format,
+.. GENERATED FROM PYTHON SOURCE LINES 261-264
.. code-block:: default
@@ -619,14 +685,16 @@ To save the same Matrix Market document stream in Blei's LDA-C format,
.. code-block:: none
- 2020-10-28 00:52:04,921 : INFO : no word id mapping provided; initializing from corpus
- 2020-10-28 00:52:04,922 : INFO : storing corpus in Blei's LDA-C format into /tmp/corpus.lda-c
- 2020-10-28 00:52:04,923 : INFO : saving vocabulary of 2 words to /tmp/corpus.lda-c.vocab
- 2020-10-28 00:52:04,923 : INFO : saving BleiCorpus index to /tmp/corpus.lda-c.index
+ 2021-06-01 10:34:59,085 : INFO : no word id mapping provided; initializing from corpus
+ 2021-06-01 10:34:59,086 : INFO : storing corpus in Blei's LDA-C format into /tmp/corpus.lda-c
+ 2021-06-01 10:34:59,087 : INFO : saving vocabulary of 2 words to /tmp/corpus.lda-c.vocab
+ 2021-06-01 10:34:59,087 : INFO : saving BleiCorpus index to /tmp/corpus.lda-c.index
+.. GENERATED FROM PYTHON SOURCE LINES 265-275
+
In this way, `gensim` can also be used as a memory-efficient **I/O format conversion tool**:
just load a document stream using one format and immediately save it in another format.
Adding new formats is dead easy, check out the `code for the SVMlight corpus
@@ -638,6 +706,7 @@ Compatibility with NumPy and SciPy
Gensim also contains `efficient utility functions `_
to help converting from/to numpy matrices
+.. GENERATED FROM PYTHON SOURCE LINES 275-282
.. code-block:: default
@@ -655,8 +724,11 @@ to help converting from/to numpy matrices
+.. GENERATED FROM PYTHON SOURCE LINES 283-284
+
and from/to `scipy.sparse` matrices
+.. GENERATED FROM PYTHON SOURCE LINES 284-290
.. code-block:: default
@@ -673,6 +745,8 @@ and from/to `scipy.sparse` matrices
+.. GENERATED FROM PYTHON SOURCE LINES 291-304
+
What Next
---------
@@ -687,6 +761,7 @@ Optimize converting between corpora and NumPy/SciPy arrays?), see the :ref:`apir
.. [1] This is the same corpus as used in
`Deerwester et al. (1990): Indexing by Latent Semantic Analysis `_, Table 2.
+.. GENERATED FROM PYTHON SOURCE LINES 304-310
.. code-block:: default
@@ -710,9 +785,9 @@ Optimize converting between corpora and NumPy/SciPy arrays?), see the :ref:`apir
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 0 minutes 4.010 seconds)
+ **Total running time of the script:** ( 0 minutes 3.242 seconds)
-**Estimated memory usage:** 40 MB
+**Estimated memory usage:** 48 MB
.. _sphx_glr_download_auto_examples_core_run_corpora_and_vector_spaces.py:
diff --git a/docs/src/auto_examples/core/sg_execution_times.rst b/docs/src/auto_examples/core/sg_execution_times.rst
index 9e36b38b09..da5c34f485 100644
--- a/docs/src/auto_examples/core/sg_execution_times.rst
+++ b/docs/src/auto_examples/core/sg_execution_times.rst
@@ -5,10 +5,10 @@
Computation times
=================
-**00:04.010** total execution time for **auto_examples_core** files:
+**00:03.242** total execution time for **auto_examples_core** files:
+--------------------------------------------------------------------------------------------------------------+-----------+---------+
-| :ref:`sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py` (``run_corpora_and_vector_spaces.py``) | 00:04.010 | 39.8 MB |
+| :ref:`sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py` (``run_corpora_and_vector_spaces.py``) | 00:03.242 | 48.2 MB |
+--------------------------------------------------------------------------------------------------------------+-----------+---------+
| :ref:`sphx_glr_auto_examples_core_run_core_concepts.py` (``run_core_concepts.py``) | 00:00.000 | 0.0 MB |
+--------------------------------------------------------------------------------------------------------------+-----------+---------+
diff --git a/docs/src/auto_examples/howtos/run_doc.ipynb b/docs/src/auto_examples/howtos/run_doc.ipynb
index 7d3be48ac2..4fab716e66 100644
--- a/docs/src/auto_examples/howtos/run_doc.ipynb
+++ b/docs/src/auto_examples/howtos/run_doc.ipynb
@@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "\nHow to Author Gensim Documentation\n==================================\n\nHow to author documentation for Gensim.\n"
+ "\nHow to Author Gensim Documentation\n==================================\n\nHow to author documentation for Gensim.\n\n"
]
},
{
@@ -61,7 +61,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Correctness\n-----------\n\nIncorrect documentation can be worse than no documentation at all.\nTake the following steps to ensure correctness:\n\n- Run Python's doctest module on your docstrings\n- Run your documentation scripts from scratch, removing any temporary files/results\n\nUsing data in your documentation\n--------------------------------\n\nSome parts of the documentation require real-world data to be useful.\nFor example, you may need more than just a toy example to demonstrate the benefits of one model over another.\nThis subsection provides some tips for including data in your documentation.\n\nIf possible, use data available via Gensim's\n`downloader API `__.\nThis will reduce the risk of your documentation becoming obsolete because required data is no longer available.\n\nUse the smallest possible dataset: avoid making people unnecessarily load large datasets and models.\nThis will make your documentation faster to run and easier for people to use (they can modify your examples and re-run them quickly).\n\nFinalizing your contribution\n----------------------------\n\nFirst, get Sphinx Gallery to build your documentation::\n\n make -C docs/src html\n\nThis can take a while if your documentation uses a large dataset, or if you've changed many other tutorials or guides.\nOnce this completes successfully, open ``docs/auto_examples/index.html`` in your browser.\nYou should see your new tutorial or guide in the gallery.\n\nOnce your documentation script is working correctly, it's time to add it to the git repository::\n\n git add docs/src/gallery/tutorials/run_example.py\n git add docs/src/auto_examples/tutorials/run_example.{py,py.md5,rst,ipynb}\n git add docs/src/auto_examples/howtos/sg_execution_times.rst\n git commit -m \"enter a helpful commit message here\"\n git push origin branchname\n\n.. Note::\n You may be wondering what all those other files are.\n Sphinx Gallery puts a copy of your Python script in ``auto_examples/tutorials``.\n The .md5 contains MD5 hash of the script to enable easy detection of modifications.\n Gallery also generates .rst (RST for Sphinx) and .ipynb (Jupyter notebook) files from the script.\n Finally, ``sg_execution_times.rst`` contains the time taken to run each example.\n\nFinally, make a PR on `github `__.\nOne of our friendly maintainers will review it, make suggestions, and eventually merge it.\nYour documentation will then appear in the gallery alongside the rest of the example.\nAt that stage, give yourself a pat on the back: you're done!\n\n"
+ "Correctness\n-----------\n\nIncorrect documentation can be worse than no documentation at all.\nTake the following steps to ensure correctness:\n\n- Run Python's doctest module on your docstrings\n- Run your documentation scripts from scratch, removing any temporary files/results\n\nUsing data in your documentation\n--------------------------------\n\nSome parts of the documentation require real-world data to be useful.\nFor example, you may need more than just a toy example to demonstrate the benefits of one model over another.\nThis subsection provides some tips for including data in your documentation.\n\nIf possible, use data available via Gensim's\n`downloader API `__.\nThis will reduce the risk of your documentation becoming obsolete because required data is no longer available.\n\nUse the smallest possible dataset: avoid making people unnecessarily load large datasets and models.\nThis will make your documentation faster to run and easier for people to use (they can modify your examples and re-run them quickly).\n\nFinalizing your contribution\n----------------------------\n\nFirst, get Sphinx Gallery to build your documentation::\n\n make --directory docs/src html\n\nThis can take a while if your documentation uses a large dataset, or if you've changed many other tutorials or guides.\nOnce this completes successfully, open ``docs/auto_examples/index.html`` in your browser.\nYou should see your new tutorial or guide in the gallery.\n\nOnce your documentation script is working correctly, it's time to add it to the git repository::\n\n git add docs/src/gallery/tutorials/run_example.py\n git add docs/src/auto_examples/tutorials/run_example.{py,py.md5,rst,ipynb}\n git add docs/src/auto_examples/howtos/sg_execution_times.rst\n git commit -m \"enter a helpful commit message here\"\n git push origin branchname\n\n.. Note::\n You may be wondering what all those other files are.\n Sphinx Gallery puts a copy of your Python script in ``auto_examples/tutorials``.\n The .md5 contains MD5 hash of the script to enable easy detection of modifications.\n Gallery also generates .rst (RST for Sphinx) and .ipynb (Jupyter notebook) files from the script.\n Finally, ``sg_execution_times.rst`` contains the time taken to run each example.\n\nFinally, open a PR at `github `__.\nOne of our friendly maintainers will review it, make suggestions, and eventually merge it.\nYour documentation will then appear in the `gallery `__,\nalongside the rest of the examples. Thanks a lot!\n\n"
]
}
],
@@ -81,7 +81,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.3"
+ "version": "3.6.5"
}
},
"nbformat": 4,
diff --git a/docs/src/auto_examples/howtos/run_doc.py b/docs/src/auto_examples/howtos/run_doc.py
index 15e870f1be..dbcd6d91e3 100644
--- a/docs/src/auto_examples/howtos/run_doc.py
+++ b/docs/src/auto_examples/howtos/run_doc.py
@@ -155,7 +155,7 @@
#
# First, get Sphinx Gallery to build your documentation::
#
-# make -C docs/src html
+# make --directory docs/src html
#
# This can take a while if your documentation uses a large dataset, or if you've changed many other tutorials or guides.
# Once this completes successfully, open ``docs/auto_examples/index.html`` in your browser.
@@ -176,7 +176,7 @@
# Gallery also generates .rst (RST for Sphinx) and .ipynb (Jupyter notebook) files from the script.
# Finally, ``sg_execution_times.rst`` contains the time taken to run each example.
#
-# Finally, make a PR on `github `__.
+# Finally, open a PR at `github `__.
# One of our friendly maintainers will review it, make suggestions, and eventually merge it.
-# Your documentation will then appear in the gallery alongside the rest of the example.
-# At that stage, give yourself a pat on the back: you're done!
+# Your documentation will then appear in the `gallery `__,
+# alongside the rest of the examples. Thanks a lot!
diff --git a/docs/src/auto_examples/howtos/run_doc.py.md5 b/docs/src/auto_examples/howtos/run_doc.py.md5
index 979aa0eb5e..66fecaa7cf 100644
--- a/docs/src/auto_examples/howtos/run_doc.py.md5
+++ b/docs/src/auto_examples/howtos/run_doc.py.md5
@@ -1 +1 @@
-512a76ce743dd12482d21784a76b60fe
\ No newline at end of file
+96cefb1417d54ac8010e38cc739d5ff1
\ No newline at end of file
diff --git a/docs/src/auto_examples/howtos/run_doc.rst b/docs/src/auto_examples/howtos/run_doc.rst
index c763ca1de0..dd4e957315 100644
--- a/docs/src/auto_examples/howtos/run_doc.rst
+++ b/docs/src/auto_examples/howtos/run_doc.rst
@@ -1,10 +1,12 @@
-.. note::
- :class: sphx-glr-download-link-note
+.. only:: html
+
+ .. note::
+ :class: sphx-glr-download-link-note
- Click :ref:`here ` to download the full example code
-.. rst-class:: sphx-glr-example-title
+ Click :ref:`here ` to download the full example code
+ .. rst-class:: sphx-glr-example-title
-.. _sphx_glr_auto_examples_howtos_run_doc.py:
+ .. _sphx_glr_auto_examples_howtos_run_doc.py:
How to Author Gensim Documentation
@@ -78,6 +80,15 @@ At the very top, you need a docstring describing what your script does.
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+
+ '\nTitle\n=====\n\nBrief description.\n'
+
The title is what will show up in the gallery.
@@ -167,7 +178,7 @@ Finalizing your contribution
First, get Sphinx Gallery to build your documentation::
- make -C docs/src html
+ make --directory docs/src html
This can take a while if your documentation uses a large dataset, or if you've changed many other tutorials or guides.
Once this completes successfully, open ``docs/auto_examples/index.html`` in your browser.
@@ -188,17 +199,17 @@ Once your documentation script is working correctly, it's time to add it to the
Gallery also generates .rst (RST for Sphinx) and .ipynb (Jupyter notebook) files from the script.
Finally, ``sg_execution_times.rst`` contains the time taken to run each example.
-Finally, make a PR on `github `__.
+Finally, open a PR at `github `__.
One of our friendly maintainers will review it, make suggestions, and eventually merge it.
-Your documentation will then appear in the gallery alongside the rest of the example.
-At that stage, give yourself a pat on the back: you're done!
+Your documentation will then appear in the `gallery `__,
+alongside the rest of the examples. Thanks a lot!
.. rst-class:: sphx-glr-timing
- **Total running time of the script:** ( 0 minutes 1.226 seconds)
+ **Total running time of the script:** ( 0 minutes 0.171 seconds)
-**Estimated memory usage:** 9 MB
+**Estimated memory usage:** 6 MB
.. _sphx_glr_download_auto_examples_howtos_run_doc.py:
@@ -211,13 +222,13 @@ At that stage, give yourself a pat on the back: you're done!
- .. container:: sphx-glr-download
+ .. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: run_doc.py `
- .. container:: sphx-glr-download
+ .. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: run_doc.ipynb `
diff --git a/docs/src/auto_examples/howtos/sg_execution_times.rst b/docs/src/auto_examples/howtos/sg_execution_times.rst
index ec9ea90bd7..e13e4ff9dc 100644
--- a/docs/src/auto_examples/howtos/sg_execution_times.rst
+++ b/docs/src/auto_examples/howtos/sg_execution_times.rst
@@ -5,9 +5,14 @@
Computation times
=================
-**00:01.226** total execution time for **auto_examples_howtos** files:
+**00:00.171** total execution time for **auto_examples_howtos** files:
-- **00:01.226**: :ref:`sphx_glr_auto_examples_howtos_run_doc.py` (``run_doc.py``)
-- **00:00.000**: :ref:`sphx_glr_auto_examples_howtos_run_compare_lda.py` (``run_compare_lda.py``)
-- **00:00.000**: :ref:`sphx_glr_auto_examples_howtos_run_doc2vec_imdb.py` (``run_doc2vec_imdb.py``)
-- **00:00.000**: :ref:`sphx_glr_auto_examples_howtos_run_downloader_api.py` (``run_downloader_api.py``)
++----------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_auto_examples_howtos_run_doc.py` (``run_doc.py``) | 00:00.171 | 6.1 MB |
++----------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_auto_examples_howtos_run_compare_lda.py` (``run_compare_lda.py``) | 00:00.000 | 0.0 MB |
++----------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_auto_examples_howtos_run_doc2vec_imdb.py` (``run_doc2vec_imdb.py``) | 00:00.000 | 0.0 MB |
++----------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_auto_examples_howtos_run_downloader_api.py` (``run_downloader_api.py``) | 00:00.000 | 0.0 MB |
++----------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/src/auto_examples/index.rst b/docs/src/auto_examples/index.rst
index 1fa9eeca12..05643de00c 100644
--- a/docs/src/auto_examples/index.rst
+++ b/docs/src/auto_examples/index.rst
@@ -71,7 +71,7 @@ Understanding this functionality is vital for using gensim effectively.
.. raw:: html
-
+
.. only:: html
@@ -92,7 +92,7 @@ Understanding this functionality is vital for using gensim effectively.
.. raw:: html
-
+
.. only:: html
@@ -169,7 +169,7 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod
.. raw:: html
-
+
.. only:: html
@@ -190,7 +190,28 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod
.. raw:: html
-
+
+
+.. only:: html
+
+ .. figure:: /auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png
+ :alt: Ensemble LDA
+
+ :ref:`sphx_glr_auto_examples_tutorials_run_ensemblelda.py`
+
+.. raw:: html
+
+
+
+
+.. toctree::
+ :hidden:
+
+ /auto_examples/tutorials/run_ensemblelda
+
+.. raw:: html
+
+
.. only:: html
@@ -288,7 +309,7 @@ These **goal-oriented guides** demonstrate how to **solve a specific problem** u
.. raw:: html
-
+
.. only:: html
@@ -309,7 +330,7 @@ These **goal-oriented guides** demonstrate how to **solve a specific problem** u
.. raw:: html
-
+
.. only:: html
@@ -426,13 +447,13 @@ Blog posts, tutorial videos, hackathons and other useful Gensim resources, from
.. container:: sphx-glr-download sphx-glr-download-python
- :download:`Download all examples in Python source code: auto_examples_python.zip /Volumes/work/workspace/gensim/trunk/docs/src/auto_examples/auto_examples_python.zip>`
+ :download:`Download all examples in Python source code: auto_examples_python.zip `
.. container:: sphx-glr-download sphx-glr-download-jupyter
- :download:`Download all examples in Jupyter notebooks: auto_examples_jupyter.zip /Volumes/work/workspace/gensim/trunk/docs/src/auto_examples/auto_examples_jupyter.zip>`
+ :download:`Download all examples in Jupyter notebooks: auto_examples_jupyter.zip `
.. only:: html
diff --git a/docs/src/auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png b/docs/src/auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png
new file mode 100644
index 0000000000..233f8e605e
Binary files /dev/null and b/docs/src/auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png differ
diff --git a/docs/src/auto_examples/tutorials/run_ensemblelda.ipynb b/docs/src/auto_examples/tutorials/run_ensemblelda.ipynb
new file mode 100644
index 0000000000..7cefa761aa
--- /dev/null
+++ b/docs/src/auto_examples/tutorials/run_ensemblelda.ipynb
@@ -0,0 +1,205 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\nEnsemble LDA\n============\n\nIntroduces Gensim's EnsembleLda model\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import logging\nlogging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This tutorial will explain how to use the EnsembleLDA model class.\n\nEnsembleLda is a method of finding and generating stable topics from the results of multiple topic models,\nit can be used to remove topics from your results that are noise and are not reproducible.\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Corpus\n------\nWe will use the gensim downloader api to get a small corpus for training our ensemble.\n\nThe preprocessing is similar to `sphx_glr_auto_examples_tutorials_run_word2vec.py`,\nso it won't be explained again in detail.\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import gensim.downloader as api\nfrom gensim.corpora import Dictionary\nfrom nltk.stem.wordnet import WordNetLemmatizer\n\nlemmatizer = WordNetLemmatizer()\ndocs = api.load('text8')\n\ndictionary = Dictionary()\nfor doc in docs:\n dictionary.add_documents([[lemmatizer.lemmatize(token) for token in doc]])\ndictionary.filter_extremes(no_below=20, no_above=0.5)\n\ncorpus = [dictionary.doc2bow(doc) for doc in docs]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Training\n--------\n\nTraining the ensemble works very similar to training a single model,\n\nYou can use any model that is based on LdaModel, such as LdaMulticore, to train the Ensemble.\nIn experiments, LdaMulticore showed better results.\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "from gensim.models import LdaModel\ntopic_model_class = LdaModel"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Any arbitrary number of models can be used, but it should be a multiple of your workers so that the\nload can be distributed properly. In this example, 4 processes will train 8 models each.\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "ensemble_workers = 4\nnum_models = 8"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "After training all the models, some distance computations are required which can take quite some\ntime as well. You can speed this up by using workers for that as well.\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "distance_workers = 4"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "All other parameters that are unknown to EnsembleLda are forwarded to each LDA Model, such as\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "num_topics = 20\npasses = 2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now start the training\n\nSince 20 topics were trained on each of the 8 models, we expect there to be 160 different topics.\nThe number of stable topics which are clustered from all those topics is smaller.\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "from gensim.models import EnsembleLda\nensemble = EnsembleLda(\n corpus=corpus,\n id2word=dictionary,\n num_topics=num_topics,\n passes=passes,\n num_models=num_models,\n topic_model_class=LdaModel,\n ensemble_workers=ensemble_workers,\n distance_workers=distance_workers\n)\n\nprint(len(ensemble.ttda))\nprint(len(ensemble.get_topics()))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Tuning\n------\n\nDifferent from LdaModel, the number of resulting topics varies greatly depending on the clustering parameters.\n\nYou can provide those in the ``recluster()`` function or the ``EnsembleLda`` constructor.\n\nPlay around until you get as many topics as you desire, which however may reduce their quality.\nIf your ensemble doesn't have enough topics to begin with, you should make sure to make it large enough.\n\nHaving an epsilon that is smaller than the smallest distance doesn't make sense.\nMake sure to chose one that is within the range of values in ``asymmetric_distance_matrix``.\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import numpy as np\nshape = ensemble.asymmetric_distance_matrix.shape\nwithout_diagonal = ensemble.asymmetric_distance_matrix[~np.eye(shape[0], dtype=bool)].reshape(shape[0], -1)\nprint(without_diagonal.min(), without_diagonal.mean(), without_diagonal.max())\n\nensemble.recluster(eps=0.09, min_samples=2, min_cores=2)\n\nprint(len(ensemble.get_topics()))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Increasing the Size\n-------------------\n\nIf you have some models lying around that were trained on a corpus based on the same dictionary,\nthey are compatible and you can add them to the ensemble.\n\nBy setting num_models of the EnsembleLda constructor to 0 you can also create an ensemble that is\nentirely made out of your existing topic models with the following method.\n\nAfterwards the number and quality of stable topics might be different depending on your added topics and parameters.\n\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "from gensim.models import LdaMulticore\n\nmodel1 = LdaMulticore(\n corpus=corpus,\n id2word=dictionary,\n num_topics=9,\n passes=4,\n)\n\nmodel2 = LdaModel(\n corpus=corpus,\n id2word=dictionary,\n num_topics=11,\n passes=2,\n)\n\n# add_model supports various types of input, check out its docstring\nensemble.add_model(model1)\nensemble.add_model(model2)\n\nensemble.recluster()\n\nprint(len(ensemble.ttda))\nprint(len(ensemble.get_topics()))"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/docs/src/auto_examples/tutorials/run_ensemblelda.py b/docs/src/auto_examples/tutorials/run_ensemblelda.py
new file mode 100644
index 0000000000..aa87d0ecd3
--- /dev/null
+++ b/docs/src/auto_examples/tutorials/run_ensemblelda.py
@@ -0,0 +1,158 @@
+r"""
+Ensemble LDA
+============
+
+Introduces Gensim's EnsembleLda model
+
+"""
+
+import logging
+logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
+
+###############################################################################
+# This tutorial will explain how to use the EnsembleLDA model class.
+#
+# EnsembleLda is a method of finding and generating stable topics from the results of multiple topic models,
+# it can be used to remove topics from your results that are noise and are not reproducible.
+#
+
+###############################################################################
+# Corpus
+# ------
+# We will use the gensim downloader api to get a small corpus for training our ensemble.
+#
+# The preprocessing is similar to :ref:`sphx_glr_auto_examples_tutorials_run_word2vec.py`,
+# so it won't be explained again in detail.
+#
+
+import gensim.downloader as api
+from gensim.corpora import Dictionary
+from nltk.stem.wordnet import WordNetLemmatizer
+
+lemmatizer = WordNetLemmatizer()
+docs = api.load('text8')
+
+dictionary = Dictionary()
+for doc in docs:
+ dictionary.add_documents([[lemmatizer.lemmatize(token) for token in doc]])
+dictionary.filter_extremes(no_below=20, no_above=0.5)
+
+corpus = [dictionary.doc2bow(doc) for doc in docs]
+
+###############################################################################
+# Training
+# --------
+#
+# Training the ensemble works very similar to training a single model,
+#
+# You can use any model that is based on LdaModel, such as LdaMulticore, to train the Ensemble.
+# In experiments, LdaMulticore showed better results.
+#
+
+from gensim.models import LdaModel
+topic_model_class = LdaModel
+
+###############################################################################
+# Any arbitrary number of models can be used, but it should be a multiple of your workers so that the
+# load can be distributed properly. In this example, 4 processes will train 8 models each.
+#
+
+ensemble_workers = 4
+num_models = 8
+
+###############################################################################
+# After training all the models, some distance computations are required which can take quite some
+# time as well. You can speed this up by using workers for that as well.
+#
+
+distance_workers = 4
+
+###############################################################################
+# All other parameters that are unknown to EnsembleLda are forwarded to each LDA Model, such as
+#
+num_topics = 20
+passes = 2
+
+###############################################################################
+# Now start the training
+#
+# Since 20 topics were trained on each of the 8 models, we expect there to be 160 different topics.
+# The number of stable topics which are clustered from all those topics is smaller.
+#
+
+from gensim.models import EnsembleLda
+ensemble = EnsembleLda(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=num_topics,
+ passes=passes,
+ num_models=num_models,
+ topic_model_class=LdaModel,
+ ensemble_workers=ensemble_workers,
+ distance_workers=distance_workers
+)
+
+print(len(ensemble.ttda))
+print(len(ensemble.get_topics()))
+
+###############################################################################
+# Tuning
+# ------
+#
+# Different from LdaModel, the number of resulting topics varies greatly depending on the clustering parameters.
+#
+# You can provide those in the ``recluster()`` function or the ``EnsembleLda`` constructor.
+#
+# Play around until you get as many topics as you desire, which however may reduce their quality.
+# If your ensemble doesn't have enough topics to begin with, you should make sure to make it large enough.
+#
+# Having an epsilon that is smaller than the smallest distance doesn't make sense.
+# Make sure to chose one that is within the range of values in ``asymmetric_distance_matrix``.
+#
+
+import numpy as np
+shape = ensemble.asymmetric_distance_matrix.shape
+without_diagonal = ensemble.asymmetric_distance_matrix[~np.eye(shape[0], dtype=bool)].reshape(shape[0], -1)
+print(without_diagonal.min(), without_diagonal.mean(), without_diagonal.max())
+
+ensemble.recluster(eps=0.09, min_samples=2, min_cores=2)
+
+print(len(ensemble.get_topics()))
+
+###############################################################################
+# Increasing the Size
+# -------------------
+#
+# If you have some models lying around that were trained on a corpus based on the same dictionary,
+# they are compatible and you can add them to the ensemble.
+#
+# By setting num_models of the EnsembleLda constructor to 0 you can also create an ensemble that is
+# entirely made out of your existing topic models with the following method.
+#
+# Afterwards the number and quality of stable topics might be different depending on your added topics and parameters.
+#
+
+from gensim.models import LdaMulticore
+
+model1 = LdaMulticore(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=9,
+ passes=4,
+)
+
+model2 = LdaModel(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=11,
+ passes=2,
+)
+
+# add_model supports various types of input, check out its docstring
+ensemble.add_model(model1)
+ensemble.add_model(model2)
+
+ensemble.recluster()
+
+print(len(ensemble.ttda))
+print(len(ensemble.get_topics()))
diff --git a/docs/src/auto_examples/tutorials/run_ensemblelda.py.md5 b/docs/src/auto_examples/tutorials/run_ensemblelda.py.md5
new file mode 100644
index 0000000000..f09f123fba
--- /dev/null
+++ b/docs/src/auto_examples/tutorials/run_ensemblelda.py.md5
@@ -0,0 +1 @@
+be0c32b18644ebb1a7826764b37ebc01
\ No newline at end of file
diff --git a/docs/src/auto_examples/tutorials/run_ensemblelda.rst b/docs/src/auto_examples/tutorials/run_ensemblelda.rst
new file mode 100644
index 0000000000..f554213238
--- /dev/null
+++ b/docs/src/auto_examples/tutorials/run_ensemblelda.rst
@@ -0,0 +1,3832 @@
+.. only:: html
+
+ .. note::
+ :class: sphx-glr-download-link-note
+
+ Click :ref:`here
` to download the full example code
+ .. rst-class:: sphx-glr-example-title
+
+ .. _sphx_glr_auto_examples_tutorials_run_ensemblelda.py:
+
+
+Ensemble LDA
+============
+
+Introduces Gensim's EnsembleLda model
+
+
+
+.. code-block:: default
+
+
+ import logging
+ logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
+
+
+
+
+
+
+
+
+This tutorial will explain how to use the EnsembleLDA model class.
+
+EnsembleLda is a method of finding and generating stable topics from the results of multiple topic models,
+it can be used to remove topics from your results that are noise and are not reproducible.
+
+
+Corpus
+------
+We will use the gensim downloader api to get a small corpus for training our ensemble.
+
+The preprocessing is similar to :ref:`sphx_glr_auto_examples_tutorials_run_word2vec.py`,
+so it won't be explained again in detail.
+
+
+
+.. code-block:: default
+
+
+ import gensim.downloader as api
+ from gensim.corpora import Dictionary
+ from nltk.stem.wordnet import WordNetLemmatizer
+
+ lemmatizer = WordNetLemmatizer()
+ docs = api.load('text8')
+
+ dictionary = Dictionary()
+ for doc in docs:
+ dictionary.add_documents([[lemmatizer.lemmatize(token) for token in doc]])
+ dictionary.filter_extremes(no_below=20, no_above=0.5)
+
+ corpus = [dictionary.doc2bow(doc) for doc in docs]
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+ /Volumes/work/workspace/gensim/trunk/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
+ warnings.warn(msg)
+ 2021-05-05 22:35:33,169 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
+ 2021-05-05 22:35:33,178 : INFO : built Dictionary(2312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1 documents (total 10000 corpus positions)
+ 2021-05-05 22:35:33,220 : INFO : adding document #0 to Dictionary(2312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,228 : INFO : built Dictionary(3906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 2 documents (total 20000 corpus positions)
+ 2021-05-05 22:35:33,270 : INFO : adding document #0 to Dictionary(3906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,278 : INFO : built Dictionary(5147 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 3 documents (total 30000 corpus positions)
+ 2021-05-05 22:35:33,320 : INFO : adding document #0 to Dictionary(5147 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,328 : INFO : built Dictionary(6182 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 4 documents (total 40000 corpus positions)
+ 2021-05-05 22:35:33,371 : INFO : adding document #0 to Dictionary(6182 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,378 : INFO : built Dictionary(7053 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 5 documents (total 50000 corpus positions)
+ 2021-05-05 22:35:33,420 : INFO : adding document #0 to Dictionary(7053 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,427 : INFO : built Dictionary(7993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 6 documents (total 60000 corpus positions)
+ 2021-05-05 22:35:33,470 : INFO : adding document #0 to Dictionary(7993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,476 : INFO : built Dictionary(8587 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 7 documents (total 70000 corpus positions)
+ 2021-05-05 22:35:33,519 : INFO : adding document #0 to Dictionary(8587 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,527 : INFO : built Dictionary(9306 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 8 documents (total 80000 corpus positions)
+ 2021-05-05 22:35:33,569 : INFO : adding document #0 to Dictionary(9306 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,576 : INFO : built Dictionary(10072 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 9 documents (total 90000 corpus positions)
+ 2021-05-05 22:35:33,619 : INFO : adding document #0 to Dictionary(10072 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,626 : INFO : built Dictionary(10770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 10 documents (total 100000 corpus positions)
+ 2021-05-05 22:35:33,669 : INFO : adding document #0 to Dictionary(10770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,677 : INFO : built Dictionary(11396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 11 documents (total 110000 corpus positions)
+ 2021-05-05 22:35:33,719 : INFO : adding document #0 to Dictionary(11396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,727 : INFO : built Dictionary(12149 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 12 documents (total 120000 corpus positions)
+ 2021-05-05 22:35:33,773 : INFO : adding document #0 to Dictionary(12149 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,781 : INFO : built Dictionary(12766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 13 documents (total 130000 corpus positions)
+ 2021-05-05 22:35:33,824 : INFO : adding document #0 to Dictionary(12766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,832 : INFO : built Dictionary(13310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 14 documents (total 140000 corpus positions)
+ 2021-05-05 22:35:33,875 : INFO : adding document #0 to Dictionary(13310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,883 : INFO : built Dictionary(13921 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 15 documents (total 150000 corpus positions)
+ 2021-05-05 22:35:33,926 : INFO : adding document #0 to Dictionary(13921 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,932 : INFO : built Dictionary(14485 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 16 documents (total 160000 corpus positions)
+ 2021-05-05 22:35:33,975 : INFO : adding document #0 to Dictionary(14485 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:33,983 : INFO : built Dictionary(15040 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 17 documents (total 170000 corpus positions)
+ 2021-05-05 22:35:34,025 : INFO : adding document #0 to Dictionary(15040 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,032 : INFO : built Dictionary(15482 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 18 documents (total 180000 corpus positions)
+ 2021-05-05 22:35:34,080 : INFO : adding document #0 to Dictionary(15482 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,088 : INFO : built Dictionary(16019 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 19 documents (total 190000 corpus positions)
+ 2021-05-05 22:35:34,131 : INFO : adding document #0 to Dictionary(16019 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,140 : INFO : built Dictionary(16997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 20 documents (total 200000 corpus positions)
+ 2021-05-05 22:35:34,186 : INFO : adding document #0 to Dictionary(16997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,194 : INFO : built Dictionary(17548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 21 documents (total 210000 corpus positions)
+ 2021-05-05 22:35:34,237 : INFO : adding document #0 to Dictionary(17548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,244 : INFO : built Dictionary(18074 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 22 documents (total 220000 corpus positions)
+ 2021-05-05 22:35:34,287 : INFO : adding document #0 to Dictionary(18074 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,294 : INFO : built Dictionary(18485 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 23 documents (total 230000 corpus positions)
+ 2021-05-05 22:35:34,337 : INFO : adding document #0 to Dictionary(18485 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,346 : INFO : built Dictionary(19411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 24 documents (total 240000 corpus positions)
+ 2021-05-05 22:35:34,391 : INFO : adding document #0 to Dictionary(19411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,398 : INFO : built Dictionary(19909 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 25 documents (total 250000 corpus positions)
+ 2021-05-05 22:35:34,441 : INFO : adding document #0 to Dictionary(19909 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,448 : INFO : built Dictionary(20332 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 26 documents (total 260000 corpus positions)
+ 2021-05-05 22:35:34,491 : INFO : adding document #0 to Dictionary(20332 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,499 : INFO : built Dictionary(20766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 27 documents (total 270000 corpus positions)
+ 2021-05-05 22:35:34,541 : INFO : adding document #0 to Dictionary(20766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,548 : INFO : built Dictionary(21174 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 28 documents (total 280000 corpus positions)
+ 2021-05-05 22:35:34,591 : INFO : adding document #0 to Dictionary(21174 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,598 : INFO : built Dictionary(21602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 29 documents (total 290000 corpus positions)
+ 2021-05-05 22:35:34,641 : INFO : adding document #0 to Dictionary(21602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,648 : INFO : built Dictionary(21878 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 30 documents (total 300000 corpus positions)
+ 2021-05-05 22:35:34,695 : INFO : adding document #0 to Dictionary(21878 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,702 : INFO : built Dictionary(22126 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 31 documents (total 310000 corpus positions)
+ 2021-05-05 22:35:34,746 : INFO : adding document #0 to Dictionary(22126 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,754 : INFO : built Dictionary(22522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 32 documents (total 320000 corpus positions)
+ 2021-05-05 22:35:34,798 : INFO : adding document #0 to Dictionary(22522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,806 : INFO : built Dictionary(23022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 33 documents (total 330000 corpus positions)
+ 2021-05-05 22:35:34,850 : INFO : adding document #0 to Dictionary(23022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,859 : INFO : built Dictionary(23512 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 34 documents (total 340000 corpus positions)
+ 2021-05-05 22:35:34,902 : INFO : adding document #0 to Dictionary(23512 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,911 : INFO : built Dictionary(24078 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 35 documents (total 350000 corpus positions)
+ 2021-05-05 22:35:34,954 : INFO : adding document #0 to Dictionary(24078 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:34,963 : INFO : built Dictionary(24518 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 36 documents (total 360000 corpus positions)
+ 2021-05-05 22:35:35,008 : INFO : adding document #0 to Dictionary(24518 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,018 : INFO : built Dictionary(25027 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 37 documents (total 370000 corpus positions)
+ 2021-05-05 22:35:35,067 : INFO : adding document #0 to Dictionary(25027 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,074 : INFO : built Dictionary(25452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 38 documents (total 380000 corpus positions)
+ 2021-05-05 22:35:35,117 : INFO : adding document #0 to Dictionary(25452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,125 : INFO : built Dictionary(26041 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 39 documents (total 390000 corpus positions)
+ 2021-05-05 22:35:35,168 : INFO : adding document #0 to Dictionary(26041 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,176 : INFO : built Dictionary(26389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 40 documents (total 400000 corpus positions)
+ 2021-05-05 22:35:35,220 : INFO : adding document #0 to Dictionary(26389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,228 : INFO : built Dictionary(26621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 41 documents (total 410000 corpus positions)
+ 2021-05-05 22:35:35,272 : INFO : adding document #0 to Dictionary(26621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,280 : INFO : built Dictionary(27013 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 42 documents (total 420000 corpus positions)
+ 2021-05-05 22:35:35,326 : INFO : adding document #0 to Dictionary(27013 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,335 : INFO : built Dictionary(27452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 43 documents (total 430000 corpus positions)
+ 2021-05-05 22:35:35,378 : INFO : adding document #0 to Dictionary(27452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,386 : INFO : built Dictionary(27868 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 44 documents (total 440000 corpus positions)
+ 2021-05-05 22:35:35,429 : INFO : adding document #0 to Dictionary(27868 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,437 : INFO : built Dictionary(28213 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 45 documents (total 450000 corpus positions)
+ 2021-05-05 22:35:35,480 : INFO : adding document #0 to Dictionary(28213 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,488 : INFO : built Dictionary(28596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 46 documents (total 460000 corpus positions)
+ 2021-05-05 22:35:35,535 : INFO : adding document #0 to Dictionary(28596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,541 : INFO : built Dictionary(28842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 47 documents (total 470000 corpus positions)
+ 2021-05-05 22:35:35,584 : INFO : adding document #0 to Dictionary(28842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,591 : INFO : built Dictionary(29183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 48 documents (total 480000 corpus positions)
+ 2021-05-05 22:35:35,635 : INFO : adding document #0 to Dictionary(29183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,643 : INFO : built Dictionary(29569 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 49 documents (total 490000 corpus positions)
+ 2021-05-05 22:35:35,689 : INFO : adding document #0 to Dictionary(29569 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,698 : INFO : built Dictionary(29905 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 50 documents (total 500000 corpus positions)
+ 2021-05-05 22:35:35,742 : INFO : adding document #0 to Dictionary(29905 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,754 : INFO : built Dictionary(30435 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 51 documents (total 510000 corpus positions)
+ 2021-05-05 22:35:35,799 : INFO : adding document #0 to Dictionary(30435 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,808 : INFO : built Dictionary(30852 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 52 documents (total 520000 corpus positions)
+ 2021-05-05 22:35:35,852 : INFO : adding document #0 to Dictionary(30852 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,860 : INFO : built Dictionary(31140 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 53 documents (total 530000 corpus positions)
+ 2021-05-05 22:35:35,904 : INFO : adding document #0 to Dictionary(31140 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,913 : INFO : built Dictionary(31611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 54 documents (total 540000 corpus positions)
+ 2021-05-05 22:35:35,957 : INFO : adding document #0 to Dictionary(31611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:35,965 : INFO : built Dictionary(32277 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 55 documents (total 550000 corpus positions)
+ 2021-05-05 22:35:36,009 : INFO : adding document #0 to Dictionary(32277 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,018 : INFO : built Dictionary(32761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 56 documents (total 560000 corpus positions)
+ 2021-05-05 22:35:36,061 : INFO : adding document #0 to Dictionary(32761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,069 : INFO : built Dictionary(33053 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 57 documents (total 570000 corpus positions)
+ 2021-05-05 22:35:36,113 : INFO : adding document #0 to Dictionary(33053 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,122 : INFO : built Dictionary(33393 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 58 documents (total 580000 corpus positions)
+ 2021-05-05 22:35:36,165 : INFO : adding document #0 to Dictionary(33393 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,173 : INFO : built Dictionary(33804 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 59 documents (total 590000 corpus positions)
+ 2021-05-05 22:35:36,216 : INFO : adding document #0 to Dictionary(33804 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,225 : INFO : built Dictionary(34233 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 60 documents (total 600000 corpus positions)
+ 2021-05-05 22:35:36,272 : INFO : adding document #0 to Dictionary(34233 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,281 : INFO : built Dictionary(34513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 61 documents (total 610000 corpus positions)
+ 2021-05-05 22:35:36,329 : INFO : adding document #0 to Dictionary(34513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,336 : INFO : built Dictionary(34814 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 62 documents (total 620000 corpus positions)
+ 2021-05-05 22:35:36,379 : INFO : adding document #0 to Dictionary(34814 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,386 : INFO : built Dictionary(35107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 63 documents (total 630000 corpus positions)
+ 2021-05-05 22:35:36,429 : INFO : adding document #0 to Dictionary(35107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,437 : INFO : built Dictionary(35446 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 64 documents (total 640000 corpus positions)
+ 2021-05-05 22:35:36,480 : INFO : adding document #0 to Dictionary(35446 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,487 : INFO : built Dictionary(35713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 65 documents (total 650000 corpus positions)
+ 2021-05-05 22:35:36,530 : INFO : adding document #0 to Dictionary(35713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,538 : INFO : built Dictionary(36124 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 66 documents (total 660000 corpus positions)
+ 2021-05-05 22:35:36,582 : INFO : adding document #0 to Dictionary(36124 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,590 : INFO : built Dictionary(36513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 67 documents (total 670000 corpus positions)
+ 2021-05-05 22:35:36,633 : INFO : adding document #0 to Dictionary(36513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,643 : INFO : built Dictionary(36825 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 68 documents (total 680000 corpus positions)
+ 2021-05-05 22:35:36,686 : INFO : adding document #0 to Dictionary(36825 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,694 : INFO : built Dictionary(37084 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 69 documents (total 690000 corpus positions)
+ 2021-05-05 22:35:36,738 : INFO : adding document #0 to Dictionary(37084 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,746 : INFO : built Dictionary(37333 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 70 documents (total 700000 corpus positions)
+ 2021-05-05 22:35:36,789 : INFO : adding document #0 to Dictionary(37333 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,797 : INFO : built Dictionary(37634 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 71 documents (total 710000 corpus positions)
+ 2021-05-05 22:35:36,840 : INFO : adding document #0 to Dictionary(37634 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,847 : INFO : built Dictionary(37919 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 72 documents (total 720000 corpus positions)
+ 2021-05-05 22:35:36,892 : INFO : adding document #0 to Dictionary(37919 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,900 : INFO : built Dictionary(38309 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 73 documents (total 730000 corpus positions)
+ 2021-05-05 22:35:36,944 : INFO : adding document #0 to Dictionary(38309 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:36,952 : INFO : built Dictionary(38690 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 74 documents (total 740000 corpus positions)
+ 2021-05-05 22:35:36,995 : INFO : adding document #0 to Dictionary(38690 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,003 : INFO : built Dictionary(39042 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 75 documents (total 750000 corpus positions)
+ 2021-05-05 22:35:37,050 : INFO : adding document #0 to Dictionary(39042 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,058 : INFO : built Dictionary(39344 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 76 documents (total 760000 corpus positions)
+ 2021-05-05 22:35:37,101 : INFO : adding document #0 to Dictionary(39344 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,108 : INFO : built Dictionary(39690 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 77 documents (total 770000 corpus positions)
+ 2021-05-05 22:35:37,153 : INFO : adding document #0 to Dictionary(39690 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,162 : INFO : built Dictionary(40100 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 78 documents (total 780000 corpus positions)
+ 2021-05-05 22:35:37,205 : INFO : adding document #0 to Dictionary(40100 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,212 : INFO : built Dictionary(40362 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 79 documents (total 790000 corpus positions)
+ 2021-05-05 22:35:37,255 : INFO : adding document #0 to Dictionary(40362 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,263 : INFO : built Dictionary(40614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 80 documents (total 800000 corpus positions)
+ 2021-05-05 22:35:37,308 : INFO : adding document #0 to Dictionary(40614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,317 : INFO : built Dictionary(41071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 81 documents (total 810000 corpus positions)
+ 2021-05-05 22:35:37,364 : INFO : adding document #0 to Dictionary(41071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,372 : INFO : built Dictionary(41378 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 82 documents (total 820000 corpus positions)
+ 2021-05-05 22:35:37,416 : INFO : adding document #0 to Dictionary(41378 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,425 : INFO : built Dictionary(41734 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 83 documents (total 830000 corpus positions)
+ 2021-05-05 22:35:37,469 : INFO : adding document #0 to Dictionary(41734 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,479 : INFO : built Dictionary(42113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 84 documents (total 840000 corpus positions)
+ 2021-05-05 22:35:37,524 : INFO : adding document #0 to Dictionary(42113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,532 : INFO : built Dictionary(42410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 85 documents (total 850000 corpus positions)
+ 2021-05-05 22:35:37,575 : INFO : adding document #0 to Dictionary(42410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,583 : INFO : built Dictionary(42735 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 86 documents (total 860000 corpus positions)
+ 2021-05-05 22:35:37,626 : INFO : adding document #0 to Dictionary(42735 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,634 : INFO : built Dictionary(43066 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 87 documents (total 870000 corpus positions)
+ 2021-05-05 22:35:37,677 : INFO : adding document #0 to Dictionary(43066 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,684 : INFO : built Dictionary(43358 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 88 documents (total 880000 corpus positions)
+ 2021-05-05 22:35:37,727 : INFO : adding document #0 to Dictionary(43358 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,739 : INFO : built Dictionary(43707 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 89 documents (total 890000 corpus positions)
+ 2021-05-05 22:35:37,782 : INFO : adding document #0 to Dictionary(43707 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,790 : INFO : built Dictionary(44209 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 90 documents (total 900000 corpus positions)
+ 2021-05-05 22:35:37,834 : INFO : adding document #0 to Dictionary(44209 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,841 : INFO : built Dictionary(44413 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 91 documents (total 910000 corpus positions)
+ 2021-05-05 22:35:37,888 : INFO : adding document #0 to Dictionary(44413 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,895 : INFO : built Dictionary(44687 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 92 documents (total 920000 corpus positions)
+ 2021-05-05 22:35:37,939 : INFO : adding document #0 to Dictionary(44687 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:37,948 : INFO : built Dictionary(45069 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 93 documents (total 930000 corpus positions)
+ 2021-05-05 22:35:37,992 : INFO : adding document #0 to Dictionary(45069 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,001 : INFO : built Dictionary(45372 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 94 documents (total 940000 corpus positions)
+ 2021-05-05 22:35:38,046 : INFO : adding document #0 to Dictionary(45372 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,054 : INFO : built Dictionary(45740 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 95 documents (total 950000 corpus positions)
+ 2021-05-05 22:35:38,097 : INFO : adding document #0 to Dictionary(45740 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,107 : INFO : built Dictionary(46116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 96 documents (total 960000 corpus positions)
+ 2021-05-05 22:35:38,152 : INFO : adding document #0 to Dictionary(46116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,163 : INFO : built Dictionary(46614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 97 documents (total 970000 corpus positions)
+ 2021-05-05 22:35:38,210 : INFO : adding document #0 to Dictionary(46614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,221 : INFO : built Dictionary(46993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 98 documents (total 980000 corpus positions)
+ 2021-05-05 22:35:38,264 : INFO : adding document #0 to Dictionary(46993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,274 : INFO : built Dictionary(47333 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 99 documents (total 990000 corpus positions)
+ 2021-05-05 22:35:38,318 : INFO : adding document #0 to Dictionary(47333 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,327 : INFO : built Dictionary(47651 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 100 documents (total 1000000 corpus positions)
+ 2021-05-05 22:35:38,371 : INFO : adding document #0 to Dictionary(47651 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,380 : INFO : built Dictionary(47971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 101 documents (total 1010000 corpus positions)
+ 2021-05-05 22:35:38,425 : INFO : adding document #0 to Dictionary(47971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,433 : INFO : built Dictionary(48158 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 102 documents (total 1020000 corpus positions)
+ 2021-05-05 22:35:38,476 : INFO : adding document #0 to Dictionary(48158 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,484 : INFO : built Dictionary(48417 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 103 documents (total 1030000 corpus positions)
+ 2021-05-05 22:35:38,527 : INFO : adding document #0 to Dictionary(48417 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,535 : INFO : built Dictionary(48761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 104 documents (total 1040000 corpus positions)
+ 2021-05-05 22:35:38,578 : INFO : adding document #0 to Dictionary(48761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,586 : INFO : built Dictionary(49112 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 105 documents (total 1050000 corpus positions)
+ 2021-05-05 22:35:38,634 : INFO : adding document #0 to Dictionary(49112 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,643 : INFO : built Dictionary(49389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 106 documents (total 1060000 corpus positions)
+ 2021-05-05 22:35:38,685 : INFO : adding document #0 to Dictionary(49389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,692 : INFO : built Dictionary(49713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 107 documents (total 1070000 corpus positions)
+ 2021-05-05 22:35:38,738 : INFO : adding document #0 to Dictionary(49713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,747 : INFO : built Dictionary(50011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 108 documents (total 1080000 corpus positions)
+ 2021-05-05 22:35:38,794 : INFO : adding document #0 to Dictionary(50011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,804 : INFO : built Dictionary(50348 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 109 documents (total 1090000 corpus positions)
+ 2021-05-05 22:35:38,854 : INFO : adding document #0 to Dictionary(50348 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,863 : INFO : built Dictionary(50686 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 110 documents (total 1100000 corpus positions)
+ 2021-05-05 22:35:38,907 : INFO : adding document #0 to Dictionary(50686 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,915 : INFO : built Dictionary(51015 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 111 documents (total 1110000 corpus positions)
+ 2021-05-05 22:35:38,959 : INFO : adding document #0 to Dictionary(51015 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:38,967 : INFO : built Dictionary(51249 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 112 documents (total 1120000 corpus positions)
+ 2021-05-05 22:35:39,012 : INFO : adding document #0 to Dictionary(51249 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,020 : INFO : built Dictionary(51548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 113 documents (total 1130000 corpus positions)
+ 2021-05-05 22:35:39,067 : INFO : adding document #0 to Dictionary(51548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,074 : INFO : built Dictionary(51723 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 114 documents (total 1140000 corpus positions)
+ 2021-05-05 22:35:39,118 : INFO : adding document #0 to Dictionary(51723 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,127 : INFO : built Dictionary(52062 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 115 documents (total 1150000 corpus positions)
+ 2021-05-05 22:35:39,170 : INFO : adding document #0 to Dictionary(52062 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,178 : INFO : built Dictionary(52293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 116 documents (total 1160000 corpus positions)
+ 2021-05-05 22:35:39,222 : INFO : adding document #0 to Dictionary(52293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,230 : INFO : built Dictionary(52463 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 117 documents (total 1170000 corpus positions)
+ 2021-05-05 22:35:39,274 : INFO : adding document #0 to Dictionary(52463 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,283 : INFO : built Dictionary(52883 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 118 documents (total 1180000 corpus positions)
+ 2021-05-05 22:35:39,326 : INFO : adding document #0 to Dictionary(52883 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,334 : INFO : built Dictionary(53129 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 119 documents (total 1190000 corpus positions)
+ 2021-05-05 22:35:39,377 : INFO : adding document #0 to Dictionary(53129 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,384 : INFO : built Dictionary(53356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 120 documents (total 1200000 corpus positions)
+ 2021-05-05 22:35:39,427 : INFO : adding document #0 to Dictionary(53356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,437 : INFO : built Dictionary(53643 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 121 documents (total 1210000 corpus positions)
+ 2021-05-05 22:35:39,482 : INFO : adding document #0 to Dictionary(53643 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,490 : INFO : built Dictionary(53991 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 122 documents (total 1220000 corpus positions)
+ 2021-05-05 22:35:39,534 : INFO : adding document #0 to Dictionary(53991 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,541 : INFO : built Dictionary(54237 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 123 documents (total 1230000 corpus positions)
+ 2021-05-05 22:35:39,584 : INFO : adding document #0 to Dictionary(54237 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,591 : INFO : built Dictionary(54478 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 124 documents (total 1240000 corpus positions)
+ 2021-05-05 22:35:39,635 : INFO : adding document #0 to Dictionary(54478 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,643 : INFO : built Dictionary(54729 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 125 documents (total 1250000 corpus positions)
+ 2021-05-05 22:35:39,687 : INFO : adding document #0 to Dictionary(54729 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,693 : INFO : built Dictionary(54920 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 126 documents (total 1260000 corpus positions)
+ 2021-05-05 22:35:39,737 : INFO : adding document #0 to Dictionary(54920 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,745 : INFO : built Dictionary(55154 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 127 documents (total 1270000 corpus positions)
+ 2021-05-05 22:35:39,788 : INFO : adding document #0 to Dictionary(55154 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,795 : INFO : built Dictionary(55312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 128 documents (total 1280000 corpus positions)
+ 2021-05-05 22:35:39,839 : INFO : adding document #0 to Dictionary(55312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,846 : INFO : built Dictionary(55447 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 129 documents (total 1290000 corpus positions)
+ 2021-05-05 22:35:39,917 : INFO : adding document #0 to Dictionary(55447 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,924 : INFO : built Dictionary(55546 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 130 documents (total 1300000 corpus positions)
+ 2021-05-05 22:35:39,972 : INFO : adding document #0 to Dictionary(55546 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:39,980 : INFO : built Dictionary(55725 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 131 documents (total 1310000 corpus positions)
+ 2021-05-05 22:35:40,027 : INFO : adding document #0 to Dictionary(55725 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,035 : INFO : built Dictionary(55983 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 132 documents (total 1320000 corpus positions)
+ 2021-05-05 22:35:40,081 : INFO : adding document #0 to Dictionary(55983 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,089 : INFO : built Dictionary(56128 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 133 documents (total 1330000 corpus positions)
+ 2021-05-05 22:35:40,135 : INFO : adding document #0 to Dictionary(56128 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,144 : INFO : built Dictionary(56299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 134 documents (total 1340000 corpus positions)
+ 2021-05-05 22:35:40,191 : INFO : adding document #0 to Dictionary(56299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,199 : INFO : built Dictionary(56401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 135 documents (total 1350000 corpus positions)
+ 2021-05-05 22:35:40,245 : INFO : adding document #0 to Dictionary(56401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,255 : INFO : built Dictionary(56673 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 136 documents (total 1360000 corpus positions)
+ 2021-05-05 22:35:40,300 : INFO : adding document #0 to Dictionary(56673 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,310 : INFO : built Dictionary(56854 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 137 documents (total 1370000 corpus positions)
+ 2021-05-05 22:35:40,376 : INFO : adding document #0 to Dictionary(56854 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,387 : INFO : built Dictionary(56986 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 138 documents (total 1380000 corpus positions)
+ 2021-05-05 22:35:40,450 : INFO : adding document #0 to Dictionary(56986 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,462 : INFO : built Dictionary(57177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 139 documents (total 1390000 corpus positions)
+ 2021-05-05 22:35:40,538 : INFO : adding document #0 to Dictionary(57177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,549 : INFO : built Dictionary(57377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 140 documents (total 1400000 corpus positions)
+ 2021-05-05 22:35:40,632 : INFO : adding document #0 to Dictionary(57377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,644 : INFO : built Dictionary(57564 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 141 documents (total 1410000 corpus positions)
+ 2021-05-05 22:35:40,711 : INFO : adding document #0 to Dictionary(57564 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,728 : INFO : built Dictionary(57814 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 142 documents (total 1420000 corpus positions)
+ 2021-05-05 22:35:40,832 : INFO : adding document #0 to Dictionary(57814 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,846 : INFO : built Dictionary(58037 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 143 documents (total 1430000 corpus positions)
+ 2021-05-05 22:35:40,921 : INFO : adding document #0 to Dictionary(58037 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:40,933 : INFO : built Dictionary(58170 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 144 documents (total 1440000 corpus positions)
+ 2021-05-05 22:35:41,140 : INFO : adding document #0 to Dictionary(58170 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,159 : INFO : built Dictionary(58427 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 145 documents (total 1450000 corpus positions)
+ 2021-05-05 22:35:41,231 : INFO : adding document #0 to Dictionary(58427 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,244 : INFO : built Dictionary(58737 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 146 documents (total 1460000 corpus positions)
+ 2021-05-05 22:35:41,316 : INFO : adding document #0 to Dictionary(58737 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,325 : INFO : built Dictionary(58925 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 147 documents (total 1470000 corpus positions)
+ 2021-05-05 22:35:41,379 : INFO : adding document #0 to Dictionary(58925 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,388 : INFO : built Dictionary(59102 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 148 documents (total 1480000 corpus positions)
+ 2021-05-05 22:35:41,435 : INFO : adding document #0 to Dictionary(59102 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,445 : INFO : built Dictionary(59274 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 149 documents (total 1490000 corpus positions)
+ 2021-05-05 22:35:41,491 : INFO : adding document #0 to Dictionary(59274 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,499 : INFO : built Dictionary(59504 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 150 documents (total 1500000 corpus positions)
+ 2021-05-05 22:35:41,544 : INFO : adding document #0 to Dictionary(59504 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,553 : INFO : built Dictionary(59712 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 151 documents (total 1510000 corpus positions)
+ 2021-05-05 22:35:41,598 : INFO : adding document #0 to Dictionary(59712 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,608 : INFO : built Dictionary(60001 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 152 documents (total 1520000 corpus positions)
+ 2021-05-05 22:35:41,662 : INFO : adding document #0 to Dictionary(60001 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,671 : INFO : built Dictionary(60298 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 153 documents (total 1530000 corpus positions)
+ 2021-05-05 22:35:41,723 : INFO : adding document #0 to Dictionary(60298 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,734 : INFO : built Dictionary(60490 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 154 documents (total 1540000 corpus positions)
+ 2021-05-05 22:35:41,783 : INFO : adding document #0 to Dictionary(60490 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,792 : INFO : built Dictionary(60838 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 155 documents (total 1550000 corpus positions)
+ 2021-05-05 22:35:41,854 : INFO : adding document #0 to Dictionary(60838 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,869 : INFO : built Dictionary(61147 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 156 documents (total 1560000 corpus positions)
+ 2021-05-05 22:35:41,930 : INFO : adding document #0 to Dictionary(61147 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,943 : INFO : built Dictionary(61440 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 157 documents (total 1570000 corpus positions)
+ 2021-05-05 22:35:41,990 : INFO : adding document #0 to Dictionary(61440 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:41,997 : INFO : built Dictionary(61659 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 158 documents (total 1580000 corpus positions)
+ 2021-05-05 22:35:42,042 : INFO : adding document #0 to Dictionary(61659 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,051 : INFO : built Dictionary(61915 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 159 documents (total 1590000 corpus positions)
+ 2021-05-05 22:35:42,100 : INFO : adding document #0 to Dictionary(61915 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,109 : INFO : built Dictionary(62163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 160 documents (total 1600000 corpus positions)
+ 2021-05-05 22:35:42,156 : INFO : adding document #0 to Dictionary(62163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,164 : INFO : built Dictionary(62295 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 161 documents (total 1610000 corpus positions)
+ 2021-05-05 22:35:42,211 : INFO : adding document #0 to Dictionary(62295 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,219 : INFO : built Dictionary(62495 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 162 documents (total 1620000 corpus positions)
+ 2021-05-05 22:35:42,265 : INFO : adding document #0 to Dictionary(62495 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,276 : INFO : built Dictionary(62821 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 163 documents (total 1630000 corpus positions)
+ 2021-05-05 22:35:42,323 : INFO : adding document #0 to Dictionary(62821 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,330 : INFO : built Dictionary(62975 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 164 documents (total 1640000 corpus positions)
+ 2021-05-05 22:35:42,375 : INFO : adding document #0 to Dictionary(62975 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,384 : INFO : built Dictionary(63237 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 165 documents (total 1650000 corpus positions)
+ 2021-05-05 22:35:42,437 : INFO : adding document #0 to Dictionary(63237 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,446 : INFO : built Dictionary(63517 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 166 documents (total 1660000 corpus positions)
+ 2021-05-05 22:35:42,494 : INFO : adding document #0 to Dictionary(63517 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,504 : INFO : built Dictionary(63860 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 167 documents (total 1670000 corpus positions)
+ 2021-05-05 22:35:42,555 : INFO : adding document #0 to Dictionary(63860 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,565 : INFO : built Dictionary(64072 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 168 documents (total 1680000 corpus positions)
+ 2021-05-05 22:35:42,616 : INFO : adding document #0 to Dictionary(64072 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,625 : INFO : built Dictionary(64198 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 169 documents (total 1690000 corpus positions)
+ 2021-05-05 22:35:42,679 : INFO : adding document #0 to Dictionary(64198 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,689 : INFO : built Dictionary(64384 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 170 documents (total 1700000 corpus positions)
+ 2021-05-05 22:35:42,741 : INFO : adding document #0 to Dictionary(64384 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,749 : INFO : built Dictionary(64537 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 171 documents (total 1710000 corpus positions)
+ 2021-05-05 22:35:42,801 : INFO : adding document #0 to Dictionary(64537 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,812 : INFO : built Dictionary(64748 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 172 documents (total 1720000 corpus positions)
+ 2021-05-05 22:35:42,862 : INFO : adding document #0 to Dictionary(64748 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,874 : INFO : built Dictionary(64997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 173 documents (total 1730000 corpus positions)
+ 2021-05-05 22:35:42,927 : INFO : adding document #0 to Dictionary(64997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,937 : INFO : built Dictionary(65321 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 174 documents (total 1740000 corpus positions)
+ 2021-05-05 22:35:42,987 : INFO : adding document #0 to Dictionary(65321 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:42,997 : INFO : built Dictionary(65527 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 175 documents (total 1750000 corpus positions)
+ 2021-05-05 22:35:43,047 : INFO : adding document #0 to Dictionary(65527 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,057 : INFO : built Dictionary(65821 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 176 documents (total 1760000 corpus positions)
+ 2021-05-05 22:35:43,118 : INFO : adding document #0 to Dictionary(65821 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,130 : INFO : built Dictionary(66074 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 177 documents (total 1770000 corpus positions)
+ 2021-05-05 22:35:43,186 : INFO : adding document #0 to Dictionary(66074 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,198 : INFO : built Dictionary(66275 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 178 documents (total 1780000 corpus positions)
+ 2021-05-05 22:35:43,247 : INFO : adding document #0 to Dictionary(66275 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,258 : INFO : built Dictionary(66562 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 179 documents (total 1790000 corpus positions)
+ 2021-05-05 22:35:43,307 : INFO : adding document #0 to Dictionary(66562 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,317 : INFO : built Dictionary(66761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 180 documents (total 1800000 corpus positions)
+ 2021-05-05 22:35:43,375 : INFO : adding document #0 to Dictionary(66761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,388 : INFO : built Dictionary(67013 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 181 documents (total 1810000 corpus positions)
+ 2021-05-05 22:35:43,447 : INFO : adding document #0 to Dictionary(67013 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,459 : INFO : built Dictionary(67260 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 182 documents (total 1820000 corpus positions)
+ 2021-05-05 22:35:43,512 : INFO : adding document #0 to Dictionary(67260 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,521 : INFO : built Dictionary(67506 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 183 documents (total 1830000 corpus positions)
+ 2021-05-05 22:35:43,567 : INFO : adding document #0 to Dictionary(67506 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,575 : INFO : built Dictionary(67701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 184 documents (total 1840000 corpus positions)
+ 2021-05-05 22:35:43,621 : INFO : adding document #0 to Dictionary(67701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,631 : INFO : built Dictionary(67914 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 185 documents (total 1850000 corpus positions)
+ 2021-05-05 22:35:43,681 : INFO : adding document #0 to Dictionary(67914 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,689 : INFO : built Dictionary(68096 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 186 documents (total 1860000 corpus positions)
+ 2021-05-05 22:35:43,739 : INFO : adding document #0 to Dictionary(68096 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,749 : INFO : built Dictionary(68319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 187 documents (total 1870000 corpus positions)
+ 2021-05-05 22:35:43,800 : INFO : adding document #0 to Dictionary(68319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,812 : INFO : built Dictionary(68621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 188 documents (total 1880000 corpus positions)
+ 2021-05-05 22:35:43,867 : INFO : adding document #0 to Dictionary(68621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,878 : INFO : built Dictionary(68875 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 189 documents (total 1890000 corpus positions)
+ 2021-05-05 22:35:43,936 : INFO : adding document #0 to Dictionary(68875 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:43,945 : INFO : built Dictionary(69130 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 190 documents (total 1900000 corpus positions)
+ 2021-05-05 22:35:43,996 : INFO : adding document #0 to Dictionary(69130 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,004 : INFO : built Dictionary(69306 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 191 documents (total 1910000 corpus positions)
+ 2021-05-05 22:35:44,055 : INFO : adding document #0 to Dictionary(69306 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,064 : INFO : built Dictionary(69479 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 192 documents (total 1920000 corpus positions)
+ 2021-05-05 22:35:44,118 : INFO : adding document #0 to Dictionary(69479 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,128 : INFO : built Dictionary(69741 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 193 documents (total 1930000 corpus positions)
+ 2021-05-05 22:35:44,177 : INFO : adding document #0 to Dictionary(69741 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,186 : INFO : built Dictionary(69934 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 194 documents (total 1940000 corpus positions)
+ 2021-05-05 22:35:44,236 : INFO : adding document #0 to Dictionary(69934 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,245 : INFO : built Dictionary(70090 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 195 documents (total 1950000 corpus positions)
+ 2021-05-05 22:35:44,297 : INFO : adding document #0 to Dictionary(70090 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,307 : INFO : built Dictionary(70442 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 196 documents (total 1960000 corpus positions)
+ 2021-05-05 22:35:44,363 : INFO : adding document #0 to Dictionary(70442 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,380 : INFO : built Dictionary(70634 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 197 documents (total 1970000 corpus positions)
+ 2021-05-05 22:35:44,436 : INFO : adding document #0 to Dictionary(70634 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,447 : INFO : built Dictionary(70951 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 198 documents (total 1980000 corpus positions)
+ 2021-05-05 22:35:44,500 : INFO : adding document #0 to Dictionary(70951 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,514 : INFO : built Dictionary(71175 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 199 documents (total 1990000 corpus positions)
+ 2021-05-05 22:35:44,563 : INFO : adding document #0 to Dictionary(71175 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,571 : INFO : built Dictionary(71354 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 200 documents (total 2000000 corpus positions)
+ 2021-05-05 22:35:44,623 : INFO : adding document #0 to Dictionary(71354 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,631 : INFO : built Dictionary(71464 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 201 documents (total 2010000 corpus positions)
+ 2021-05-05 22:35:44,682 : INFO : adding document #0 to Dictionary(71464 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,694 : INFO : built Dictionary(71697 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 202 documents (total 2020000 corpus positions)
+ 2021-05-05 22:35:44,745 : INFO : adding document #0 to Dictionary(71697 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,753 : INFO : built Dictionary(71804 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 203 documents (total 2030000 corpus positions)
+ 2021-05-05 22:35:44,806 : INFO : adding document #0 to Dictionary(71804 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,815 : INFO : built Dictionary(71947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 204 documents (total 2040000 corpus positions)
+ 2021-05-05 22:35:44,865 : INFO : adding document #0 to Dictionary(71947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,875 : INFO : built Dictionary(72137 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 205 documents (total 2050000 corpus positions)
+ 2021-05-05 22:35:44,928 : INFO : adding document #0 to Dictionary(72137 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:44,939 : INFO : built Dictionary(72359 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 206 documents (total 2060000 corpus positions)
+ 2021-05-05 22:35:44,990 : INFO : adding document #0 to Dictionary(72359 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,000 : INFO : built Dictionary(72547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 207 documents (total 2070000 corpus positions)
+ 2021-05-05 22:35:45,061 : INFO : adding document #0 to Dictionary(72547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,070 : INFO : built Dictionary(72768 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 208 documents (total 2080000 corpus positions)
+ 2021-05-05 22:35:45,128 : INFO : adding document #0 to Dictionary(72768 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,136 : INFO : built Dictionary(72963 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 209 documents (total 2090000 corpus positions)
+ 2021-05-05 22:35:45,189 : INFO : adding document #0 to Dictionary(72963 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,199 : INFO : built Dictionary(73150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 210 documents (total 2100000 corpus positions)
+ 2021-05-05 22:35:45,249 : INFO : adding document #0 to Dictionary(73150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,257 : INFO : built Dictionary(73299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 211 documents (total 2110000 corpus positions)
+ 2021-05-05 22:35:45,311 : INFO : adding document #0 to Dictionary(73299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,320 : INFO : built Dictionary(73442 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 212 documents (total 2120000 corpus positions)
+ 2021-05-05 22:35:45,370 : INFO : adding document #0 to Dictionary(73442 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,380 : INFO : built Dictionary(73613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 213 documents (total 2130000 corpus positions)
+ 2021-05-05 22:35:45,431 : INFO : adding document #0 to Dictionary(73613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,440 : INFO : built Dictionary(73755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 214 documents (total 2140000 corpus positions)
+ 2021-05-05 22:35:45,491 : INFO : adding document #0 to Dictionary(73755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,498 : INFO : built Dictionary(73889 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 215 documents (total 2150000 corpus positions)
+ 2021-05-05 22:35:45,550 : INFO : adding document #0 to Dictionary(73889 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,559 : INFO : built Dictionary(73986 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 216 documents (total 2160000 corpus positions)
+ 2021-05-05 22:35:45,611 : INFO : adding document #0 to Dictionary(73986 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,622 : INFO : built Dictionary(74149 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 217 documents (total 2170000 corpus positions)
+ 2021-05-05 22:35:45,674 : INFO : adding document #0 to Dictionary(74149 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,683 : INFO : built Dictionary(74300 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 218 documents (total 2180000 corpus positions)
+ 2021-05-05 22:35:45,733 : INFO : adding document #0 to Dictionary(74300 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,743 : INFO : built Dictionary(74430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 219 documents (total 2190000 corpus positions)
+ 2021-05-05 22:35:45,795 : INFO : adding document #0 to Dictionary(74430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,805 : INFO : built Dictionary(74596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 220 documents (total 2200000 corpus positions)
+ 2021-05-05 22:35:45,857 : INFO : adding document #0 to Dictionary(74596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,868 : INFO : built Dictionary(74840 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 221 documents (total 2210000 corpus positions)
+ 2021-05-05 22:35:45,917 : INFO : adding document #0 to Dictionary(74840 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,929 : INFO : built Dictionary(75063 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 222 documents (total 2220000 corpus positions)
+ 2021-05-05 22:35:45,980 : INFO : adding document #0 to Dictionary(75063 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:45,990 : INFO : built Dictionary(75278 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 223 documents (total 2230000 corpus positions)
+ 2021-05-05 22:35:46,040 : INFO : adding document #0 to Dictionary(75278 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,050 : INFO : built Dictionary(75512 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 224 documents (total 2240000 corpus positions)
+ 2021-05-05 22:35:46,100 : INFO : adding document #0 to Dictionary(75512 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,108 : INFO : built Dictionary(75707 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 225 documents (total 2250000 corpus positions)
+ 2021-05-05 22:35:46,155 : INFO : adding document #0 to Dictionary(75707 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,164 : INFO : built Dictionary(75892 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 226 documents (total 2260000 corpus positions)
+ 2021-05-05 22:35:46,209 : INFO : adding document #0 to Dictionary(75892 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,216 : INFO : built Dictionary(76033 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 227 documents (total 2270000 corpus positions)
+ 2021-05-05 22:35:46,268 : INFO : adding document #0 to Dictionary(76033 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,276 : INFO : built Dictionary(76216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 228 documents (total 2280000 corpus positions)
+ 2021-05-05 22:35:46,332 : INFO : adding document #0 to Dictionary(76216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,341 : INFO : built Dictionary(76357 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 229 documents (total 2290000 corpus positions)
+ 2021-05-05 22:35:46,388 : INFO : adding document #0 to Dictionary(76357 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,396 : INFO : built Dictionary(76492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 230 documents (total 2300000 corpus positions)
+ 2021-05-05 22:35:46,444 : INFO : adding document #0 to Dictionary(76492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,454 : INFO : built Dictionary(76682 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 231 documents (total 2310000 corpus positions)
+ 2021-05-05 22:35:46,506 : INFO : adding document #0 to Dictionary(76682 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,515 : INFO : built Dictionary(76947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 232 documents (total 2320000 corpus positions)
+ 2021-05-05 22:35:46,573 : INFO : adding document #0 to Dictionary(76947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,583 : INFO : built Dictionary(77189 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 233 documents (total 2330000 corpus positions)
+ 2021-05-05 22:35:46,632 : INFO : adding document #0 to Dictionary(77189 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,643 : INFO : built Dictionary(77451 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 234 documents (total 2340000 corpus positions)
+ 2021-05-05 22:35:46,691 : INFO : adding document #0 to Dictionary(77451 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,701 : INFO : built Dictionary(77719 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 235 documents (total 2350000 corpus positions)
+ 2021-05-05 22:35:46,749 : INFO : adding document #0 to Dictionary(77719 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,759 : INFO : built Dictionary(77989 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 236 documents (total 2360000 corpus positions)
+ 2021-05-05 22:35:46,805 : INFO : adding document #0 to Dictionary(77989 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,815 : INFO : built Dictionary(78233 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 237 documents (total 2370000 corpus positions)
+ 2021-05-05 22:35:46,860 : INFO : adding document #0 to Dictionary(78233 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,868 : INFO : built Dictionary(78374 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 238 documents (total 2380000 corpus positions)
+ 2021-05-05 22:35:46,914 : INFO : adding document #0 to Dictionary(78374 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,924 : INFO : built Dictionary(78577 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 239 documents (total 2390000 corpus positions)
+ 2021-05-05 22:35:46,969 : INFO : adding document #0 to Dictionary(78577 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:46,978 : INFO : built Dictionary(78758 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 240 documents (total 2400000 corpus positions)
+ 2021-05-05 22:35:47,034 : INFO : adding document #0 to Dictionary(78758 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,043 : INFO : built Dictionary(78896 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 241 documents (total 2410000 corpus positions)
+ 2021-05-05 22:35:47,098 : INFO : adding document #0 to Dictionary(78896 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,109 : INFO : built Dictionary(79128 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 242 documents (total 2420000 corpus positions)
+ 2021-05-05 22:35:47,162 : INFO : adding document #0 to Dictionary(79128 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,171 : INFO : built Dictionary(79261 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 243 documents (total 2430000 corpus positions)
+ 2021-05-05 22:35:47,221 : INFO : adding document #0 to Dictionary(79261 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,230 : INFO : built Dictionary(79438 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 244 documents (total 2440000 corpus positions)
+ 2021-05-05 22:35:47,281 : INFO : adding document #0 to Dictionary(79438 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,289 : INFO : built Dictionary(79545 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 245 documents (total 2450000 corpus positions)
+ 2021-05-05 22:35:47,344 : INFO : adding document #0 to Dictionary(79545 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,354 : INFO : built Dictionary(79654 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 246 documents (total 2460000 corpus positions)
+ 2021-05-05 22:35:47,412 : INFO : adding document #0 to Dictionary(79654 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,420 : INFO : built Dictionary(79769 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 247 documents (total 2470000 corpus positions)
+ 2021-05-05 22:35:47,470 : INFO : adding document #0 to Dictionary(79769 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,480 : INFO : built Dictionary(79887 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 248 documents (total 2480000 corpus positions)
+ 2021-05-05 22:35:47,530 : INFO : adding document #0 to Dictionary(79887 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,541 : INFO : built Dictionary(80129 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 249 documents (total 2490000 corpus positions)
+ 2021-05-05 22:35:47,592 : INFO : adding document #0 to Dictionary(80129 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,601 : INFO : built Dictionary(80286 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 250 documents (total 2500000 corpus positions)
+ 2021-05-05 22:35:47,650 : INFO : adding document #0 to Dictionary(80286 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,661 : INFO : built Dictionary(80379 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 251 documents (total 2510000 corpus positions)
+ 2021-05-05 22:35:47,715 : INFO : adding document #0 to Dictionary(80379 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,726 : INFO : built Dictionary(80648 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 252 documents (total 2520000 corpus positions)
+ 2021-05-05 22:35:47,779 : INFO : adding document #0 to Dictionary(80648 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,789 : INFO : built Dictionary(80937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 253 documents (total 2530000 corpus positions)
+ 2021-05-05 22:35:47,846 : INFO : adding document #0 to Dictionary(80937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,854 : INFO : built Dictionary(81090 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 254 documents (total 2540000 corpus positions)
+ 2021-05-05 22:35:47,904 : INFO : adding document #0 to Dictionary(81090 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,915 : INFO : built Dictionary(81273 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 255 documents (total 2550000 corpus positions)
+ 2021-05-05 22:35:47,967 : INFO : adding document #0 to Dictionary(81273 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:47,977 : INFO : built Dictionary(81547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 256 documents (total 2560000 corpus positions)
+ 2021-05-05 22:35:48,030 : INFO : adding document #0 to Dictionary(81547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,038 : INFO : built Dictionary(81698 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 257 documents (total 2570000 corpus positions)
+ 2021-05-05 22:35:48,087 : INFO : adding document #0 to Dictionary(81698 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,097 : INFO : built Dictionary(81927 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 258 documents (total 2580000 corpus positions)
+ 2021-05-05 22:35:48,146 : INFO : adding document #0 to Dictionary(81927 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,154 : INFO : built Dictionary(82024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 259 documents (total 2590000 corpus positions)
+ 2021-05-05 22:35:48,203 : INFO : adding document #0 to Dictionary(82024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,214 : INFO : built Dictionary(82218 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 260 documents (total 2600000 corpus positions)
+ 2021-05-05 22:35:48,264 : INFO : adding document #0 to Dictionary(82218 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,276 : INFO : built Dictionary(82461 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 261 documents (total 2610000 corpus positions)
+ 2021-05-05 22:35:48,325 : INFO : adding document #0 to Dictionary(82461 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,333 : INFO : built Dictionary(82558 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 262 documents (total 2620000 corpus positions)
+ 2021-05-05 22:35:48,383 : INFO : adding document #0 to Dictionary(82558 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,391 : INFO : built Dictionary(82649 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 263 documents (total 2630000 corpus positions)
+ 2021-05-05 22:35:48,441 : INFO : adding document #0 to Dictionary(82649 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,450 : INFO : built Dictionary(82786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 264 documents (total 2640000 corpus positions)
+ 2021-05-05 22:35:48,498 : INFO : adding document #0 to Dictionary(82786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,507 : INFO : built Dictionary(83056 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 265 documents (total 2650000 corpus positions)
+ 2021-05-05 22:35:48,558 : INFO : adding document #0 to Dictionary(83056 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,567 : INFO : built Dictionary(83269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 266 documents (total 2660000 corpus positions)
+ 2021-05-05 22:35:48,618 : INFO : adding document #0 to Dictionary(83269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,628 : INFO : built Dictionary(83411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 267 documents (total 2670000 corpus positions)
+ 2021-05-05 22:35:48,677 : INFO : adding document #0 to Dictionary(83411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,684 : INFO : built Dictionary(83536 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 268 documents (total 2680000 corpus positions)
+ 2021-05-05 22:35:48,737 : INFO : adding document #0 to Dictionary(83536 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,745 : INFO : built Dictionary(83762 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 269 documents (total 2690000 corpus positions)
+ 2021-05-05 22:35:48,792 : INFO : adding document #0 to Dictionary(83762 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,800 : INFO : built Dictionary(83922 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 270 documents (total 2700000 corpus positions)
+ 2021-05-05 22:35:48,848 : INFO : adding document #0 to Dictionary(83922 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,858 : INFO : built Dictionary(84113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 271 documents (total 2710000 corpus positions)
+ 2021-05-05 22:35:48,904 : INFO : adding document #0 to Dictionary(84113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,913 : INFO : built Dictionary(84205 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 272 documents (total 2720000 corpus positions)
+ 2021-05-05 22:35:48,962 : INFO : adding document #0 to Dictionary(84205 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:48,971 : INFO : built Dictionary(84339 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 273 documents (total 2730000 corpus positions)
+ 2021-05-05 22:35:49,018 : INFO : adding document #0 to Dictionary(84339 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,025 : INFO : built Dictionary(84411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 274 documents (total 2740000 corpus positions)
+ 2021-05-05 22:35:49,073 : INFO : adding document #0 to Dictionary(84411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,081 : INFO : built Dictionary(84491 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 275 documents (total 2750000 corpus positions)
+ 2021-05-05 22:35:49,128 : INFO : adding document #0 to Dictionary(84491 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,135 : INFO : built Dictionary(84598 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 276 documents (total 2760000 corpus positions)
+ 2021-05-05 22:35:49,184 : INFO : adding document #0 to Dictionary(84598 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,192 : INFO : built Dictionary(84687 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 277 documents (total 2770000 corpus positions)
+ 2021-05-05 22:35:49,239 : INFO : adding document #0 to Dictionary(84687 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,248 : INFO : built Dictionary(84864 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 278 documents (total 2780000 corpus positions)
+ 2021-05-05 22:35:49,299 : INFO : adding document #0 to Dictionary(84864 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,308 : INFO : built Dictionary(85069 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 279 documents (total 2790000 corpus positions)
+ 2021-05-05 22:35:49,355 : INFO : adding document #0 to Dictionary(85069 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,364 : INFO : built Dictionary(85228 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 280 documents (total 2800000 corpus positions)
+ 2021-05-05 22:35:49,414 : INFO : adding document #0 to Dictionary(85228 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,421 : INFO : built Dictionary(85331 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 281 documents (total 2810000 corpus positions)
+ 2021-05-05 22:35:49,478 : INFO : adding document #0 to Dictionary(85331 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,488 : INFO : built Dictionary(85458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 282 documents (total 2820000 corpus positions)
+ 2021-05-05 22:35:49,540 : INFO : adding document #0 to Dictionary(85458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,549 : INFO : built Dictionary(85593 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 283 documents (total 2830000 corpus positions)
+ 2021-05-05 22:35:49,603 : INFO : adding document #0 to Dictionary(85593 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,612 : INFO : built Dictionary(85713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 284 documents (total 2840000 corpus positions)
+ 2021-05-05 22:35:49,663 : INFO : adding document #0 to Dictionary(85713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,674 : INFO : built Dictionary(85931 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 285 documents (total 2850000 corpus positions)
+ 2021-05-05 22:35:49,727 : INFO : adding document #0 to Dictionary(85931 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,738 : INFO : built Dictionary(86057 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 286 documents (total 2860000 corpus positions)
+ 2021-05-05 22:35:49,790 : INFO : adding document #0 to Dictionary(86057 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,799 : INFO : built Dictionary(86188 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 287 documents (total 2870000 corpus positions)
+ 2021-05-05 22:35:49,848 : INFO : adding document #0 to Dictionary(86188 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,858 : INFO : built Dictionary(86415 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 288 documents (total 2880000 corpus positions)
+ 2021-05-05 22:35:49,911 : INFO : adding document #0 to Dictionary(86415 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,919 : INFO : built Dictionary(86517 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 289 documents (total 2890000 corpus positions)
+ 2021-05-05 22:35:49,970 : INFO : adding document #0 to Dictionary(86517 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:49,982 : INFO : built Dictionary(86920 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 290 documents (total 2900000 corpus positions)
+ 2021-05-05 22:35:50,033 : INFO : adding document #0 to Dictionary(86920 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,041 : INFO : built Dictionary(87071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 291 documents (total 2910000 corpus positions)
+ 2021-05-05 22:35:50,092 : INFO : adding document #0 to Dictionary(87071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,101 : INFO : built Dictionary(87224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 292 documents (total 2920000 corpus positions)
+ 2021-05-05 22:35:50,157 : INFO : adding document #0 to Dictionary(87224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,167 : INFO : built Dictionary(87289 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 293 documents (total 2930000 corpus positions)
+ 2021-05-05 22:35:50,217 : INFO : adding document #0 to Dictionary(87289 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,226 : INFO : built Dictionary(87368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 294 documents (total 2940000 corpus positions)
+ 2021-05-05 22:35:50,278 : INFO : adding document #0 to Dictionary(87368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,300 : INFO : built Dictionary(87452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 295 documents (total 2950000 corpus positions)
+ 2021-05-05 22:35:50,357 : INFO : adding document #0 to Dictionary(87452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,369 : INFO : built Dictionary(87614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 296 documents (total 2960000 corpus positions)
+ 2021-05-05 22:35:50,426 : INFO : adding document #0 to Dictionary(87614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,436 : INFO : built Dictionary(87766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 297 documents (total 2970000 corpus positions)
+ 2021-05-05 22:35:50,487 : INFO : adding document #0 to Dictionary(87766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,495 : INFO : built Dictionary(87901 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 298 documents (total 2980000 corpus positions)
+ 2021-05-05 22:35:50,541 : INFO : adding document #0 to Dictionary(87901 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,551 : INFO : built Dictionary(88063 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 299 documents (total 2990000 corpus positions)
+ 2021-05-05 22:35:50,601 : INFO : adding document #0 to Dictionary(88063 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,610 : INFO : built Dictionary(88333 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 300 documents (total 3000000 corpus positions)
+ 2021-05-05 22:35:50,655 : INFO : adding document #0 to Dictionary(88333 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,663 : INFO : built Dictionary(88408 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 301 documents (total 3010000 corpus positions)
+ 2021-05-05 22:35:50,710 : INFO : adding document #0 to Dictionary(88408 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,721 : INFO : built Dictionary(88589 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 302 documents (total 3020000 corpus positions)
+ 2021-05-05 22:35:50,768 : INFO : adding document #0 to Dictionary(88589 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,778 : INFO : built Dictionary(88800 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 303 documents (total 3030000 corpus positions)
+ 2021-05-05 22:35:50,827 : INFO : adding document #0 to Dictionary(88800 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,836 : INFO : built Dictionary(88944 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 304 documents (total 3040000 corpus positions)
+ 2021-05-05 22:35:50,882 : INFO : adding document #0 to Dictionary(88944 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,891 : INFO : built Dictionary(89095 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 305 documents (total 3050000 corpus positions)
+ 2021-05-05 22:35:50,937 : INFO : adding document #0 to Dictionary(89095 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,946 : INFO : built Dictionary(89272 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 306 documents (total 3060000 corpus positions)
+ 2021-05-05 22:35:50,991 : INFO : adding document #0 to Dictionary(89272 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:50,999 : INFO : built Dictionary(89396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 307 documents (total 3070000 corpus positions)
+ 2021-05-05 22:35:51,046 : INFO : adding document #0 to Dictionary(89396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,054 : INFO : built Dictionary(89505 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 308 documents (total 3080000 corpus positions)
+ 2021-05-05 22:35:51,104 : INFO : adding document #0 to Dictionary(89505 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,113 : INFO : built Dictionary(89656 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 309 documents (total 3090000 corpus positions)
+ 2021-05-05 22:35:51,162 : INFO : adding document #0 to Dictionary(89656 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,170 : INFO : built Dictionary(89740 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 310 documents (total 3100000 corpus positions)
+ 2021-05-05 22:35:51,217 : INFO : adding document #0 to Dictionary(89740 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,225 : INFO : built Dictionary(89822 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 311 documents (total 3110000 corpus positions)
+ 2021-05-05 22:35:51,269 : INFO : adding document #0 to Dictionary(89822 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,277 : INFO : built Dictionary(89940 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 312 documents (total 3120000 corpus positions)
+ 2021-05-05 22:35:51,321 : INFO : adding document #0 to Dictionary(89940 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,329 : INFO : built Dictionary(90078 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 313 documents (total 3130000 corpus positions)
+ 2021-05-05 22:35:51,375 : INFO : adding document #0 to Dictionary(90078 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,384 : INFO : built Dictionary(90168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 314 documents (total 3140000 corpus positions)
+ 2021-05-05 22:35:51,438 : INFO : adding document #0 to Dictionary(90168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,446 : INFO : built Dictionary(90275 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 315 documents (total 3150000 corpus positions)
+ 2021-05-05 22:35:51,490 : INFO : adding document #0 to Dictionary(90275 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,499 : INFO : built Dictionary(90457 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 316 documents (total 3160000 corpus positions)
+ 2021-05-05 22:35:51,552 : INFO : adding document #0 to Dictionary(90457 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,563 : INFO : built Dictionary(90610 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 317 documents (total 3170000 corpus positions)
+ 2021-05-05 22:35:51,609 : INFO : adding document #0 to Dictionary(90610 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,617 : INFO : built Dictionary(90730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 318 documents (total 3180000 corpus positions)
+ 2021-05-05 22:35:51,668 : INFO : adding document #0 to Dictionary(90730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,676 : INFO : built Dictionary(90831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 319 documents (total 3190000 corpus positions)
+ 2021-05-05 22:35:51,725 : INFO : adding document #0 to Dictionary(90831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,734 : INFO : built Dictionary(90966 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 320 documents (total 3200000 corpus positions)
+ 2021-05-05 22:35:51,780 : INFO : adding document #0 to Dictionary(90966 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,788 : INFO : built Dictionary(91088 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 321 documents (total 3210000 corpus positions)
+ 2021-05-05 22:35:51,838 : INFO : adding document #0 to Dictionary(91088 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,846 : INFO : built Dictionary(91214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 322 documents (total 3220000 corpus positions)
+ 2021-05-05 22:35:51,891 : INFO : adding document #0 to Dictionary(91214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,899 : INFO : built Dictionary(91323 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 323 documents (total 3230000 corpus positions)
+ 2021-05-05 22:35:51,944 : INFO : adding document #0 to Dictionary(91323 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:51,954 : INFO : built Dictionary(91523 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 324 documents (total 3240000 corpus positions)
+ 2021-05-05 22:35:51,999 : INFO : adding document #0 to Dictionary(91523 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,008 : INFO : built Dictionary(91717 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 325 documents (total 3250000 corpus positions)
+ 2021-05-05 22:35:52,053 : INFO : adding document #0 to Dictionary(91717 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,062 : INFO : built Dictionary(91857 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 326 documents (total 3260000 corpus positions)
+ 2021-05-05 22:35:52,114 : INFO : adding document #0 to Dictionary(91857 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,125 : INFO : built Dictionary(92032 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 327 documents (total 3270000 corpus positions)
+ 2021-05-05 22:35:52,169 : INFO : adding document #0 to Dictionary(92032 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,178 : INFO : built Dictionary(92163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 328 documents (total 3280000 corpus positions)
+ 2021-05-05 22:35:52,230 : INFO : adding document #0 to Dictionary(92163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,238 : INFO : built Dictionary(92311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 329 documents (total 3290000 corpus positions)
+ 2021-05-05 22:35:52,284 : INFO : adding document #0 to Dictionary(92311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,292 : INFO : built Dictionary(92409 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 330 documents (total 3300000 corpus positions)
+ 2021-05-05 22:35:52,337 : INFO : adding document #0 to Dictionary(92409 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,346 : INFO : built Dictionary(92533 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 331 documents (total 3310000 corpus positions)
+ 2021-05-05 22:35:52,393 : INFO : adding document #0 to Dictionary(92533 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,401 : INFO : built Dictionary(92621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 332 documents (total 3320000 corpus positions)
+ 2021-05-05 22:35:52,445 : INFO : adding document #0 to Dictionary(92621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,452 : INFO : built Dictionary(92730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 333 documents (total 3330000 corpus positions)
+ 2021-05-05 22:35:52,500 : INFO : adding document #0 to Dictionary(92730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,510 : INFO : built Dictionary(92999 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 334 documents (total 3340000 corpus positions)
+ 2021-05-05 22:35:52,555 : INFO : adding document #0 to Dictionary(92999 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,564 : INFO : built Dictionary(93146 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 335 documents (total 3350000 corpus positions)
+ 2021-05-05 22:35:52,610 : INFO : adding document #0 to Dictionary(93146 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,620 : INFO : built Dictionary(93326 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 336 documents (total 3360000 corpus positions)
+ 2021-05-05 22:35:52,669 : INFO : adding document #0 to Dictionary(93326 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,678 : INFO : built Dictionary(93485 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 337 documents (total 3370000 corpus positions)
+ 2021-05-05 22:35:52,723 : INFO : adding document #0 to Dictionary(93485 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,731 : INFO : built Dictionary(93621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 338 documents (total 3380000 corpus positions)
+ 2021-05-05 22:35:52,778 : INFO : adding document #0 to Dictionary(93621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,786 : INFO : built Dictionary(93823 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 339 documents (total 3390000 corpus positions)
+ 2021-05-05 22:35:52,836 : INFO : adding document #0 to Dictionary(93823 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,846 : INFO : built Dictionary(93970 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 340 documents (total 3400000 corpus positions)
+ 2021-05-05 22:35:52,898 : INFO : adding document #0 to Dictionary(93970 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,909 : INFO : built Dictionary(94159 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 341 documents (total 3410000 corpus positions)
+ 2021-05-05 22:35:52,956 : INFO : adding document #0 to Dictionary(94159 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:52,965 : INFO : built Dictionary(94291 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 342 documents (total 3420000 corpus positions)
+ 2021-05-05 22:35:53,022 : INFO : adding document #0 to Dictionary(94291 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,033 : INFO : built Dictionary(94472 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 343 documents (total 3430000 corpus positions)
+ 2021-05-05 22:35:53,083 : INFO : adding document #0 to Dictionary(94472 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,092 : INFO : built Dictionary(94589 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 344 documents (total 3440000 corpus positions)
+ 2021-05-05 22:35:53,149 : INFO : adding document #0 to Dictionary(94589 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,159 : INFO : built Dictionary(94701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 345 documents (total 3450000 corpus positions)
+ 2021-05-05 22:35:53,221 : INFO : adding document #0 to Dictionary(94701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,232 : INFO : built Dictionary(94809 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 346 documents (total 3460000 corpus positions)
+ 2021-05-05 22:35:53,282 : INFO : adding document #0 to Dictionary(94809 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,292 : INFO : built Dictionary(94935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 347 documents (total 3470000 corpus positions)
+ 2021-05-05 22:35:53,338 : INFO : adding document #0 to Dictionary(94935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,347 : INFO : built Dictionary(95081 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 348 documents (total 3480000 corpus positions)
+ 2021-05-05 22:35:53,394 : INFO : adding document #0 to Dictionary(95081 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,402 : INFO : built Dictionary(95212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 349 documents (total 3490000 corpus positions)
+ 2021-05-05 22:35:53,451 : INFO : adding document #0 to Dictionary(95212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,460 : INFO : built Dictionary(95447 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 350 documents (total 3500000 corpus positions)
+ 2021-05-05 22:35:53,505 : INFO : adding document #0 to Dictionary(95447 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,514 : INFO : built Dictionary(95600 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 351 documents (total 3510000 corpus positions)
+ 2021-05-05 22:35:53,558 : INFO : adding document #0 to Dictionary(95600 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,567 : INFO : built Dictionary(95741 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 352 documents (total 3520000 corpus positions)
+ 2021-05-05 22:35:53,613 : INFO : adding document #0 to Dictionary(95741 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,622 : INFO : built Dictionary(95962 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 353 documents (total 3530000 corpus positions)
+ 2021-05-05 22:35:53,670 : INFO : adding document #0 to Dictionary(95962 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,679 : INFO : built Dictionary(96154 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 354 documents (total 3540000 corpus positions)
+ 2021-05-05 22:35:53,725 : INFO : adding document #0 to Dictionary(96154 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,734 : INFO : built Dictionary(96193 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 355 documents (total 3550000 corpus positions)
+ 2021-05-05 22:35:53,780 : INFO : adding document #0 to Dictionary(96193 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,790 : INFO : built Dictionary(96315 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 356 documents (total 3560000 corpus positions)
+ 2021-05-05 22:35:53,838 : INFO : adding document #0 to Dictionary(96315 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,849 : INFO : built Dictionary(96498 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 357 documents (total 3570000 corpus positions)
+ 2021-05-05 22:35:53,899 : INFO : adding document #0 to Dictionary(96498 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,908 : INFO : built Dictionary(96628 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 358 documents (total 3580000 corpus positions)
+ 2021-05-05 22:35:53,955 : INFO : adding document #0 to Dictionary(96628 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:53,965 : INFO : built Dictionary(96810 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 359 documents (total 3590000 corpus positions)
+ 2021-05-05 22:35:54,018 : INFO : adding document #0 to Dictionary(96810 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,028 : INFO : built Dictionary(96971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 360 documents (total 3600000 corpus positions)
+ 2021-05-05 22:35:54,074 : INFO : adding document #0 to Dictionary(96971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,083 : INFO : built Dictionary(97100 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 361 documents (total 3610000 corpus positions)
+ 2021-05-05 22:35:54,128 : INFO : adding document #0 to Dictionary(97100 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,137 : INFO : built Dictionary(97235 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 362 documents (total 3620000 corpus positions)
+ 2021-05-05 22:35:54,184 : INFO : adding document #0 to Dictionary(97235 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,194 : INFO : built Dictionary(97327 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 363 documents (total 3630000 corpus positions)
+ 2021-05-05 22:35:54,241 : INFO : adding document #0 to Dictionary(97327 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,249 : INFO : built Dictionary(97474 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 364 documents (total 3640000 corpus positions)
+ 2021-05-05 22:35:54,295 : INFO : adding document #0 to Dictionary(97474 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,304 : INFO : built Dictionary(97562 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 365 documents (total 3650000 corpus positions)
+ 2021-05-05 22:35:54,351 : INFO : adding document #0 to Dictionary(97562 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,359 : INFO : built Dictionary(97701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 366 documents (total 3660000 corpus positions)
+ 2021-05-05 22:35:54,409 : INFO : adding document #0 to Dictionary(97701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,417 : INFO : built Dictionary(97828 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 367 documents (total 3670000 corpus positions)
+ 2021-05-05 22:35:54,462 : INFO : adding document #0 to Dictionary(97828 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,472 : INFO : built Dictionary(97993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 368 documents (total 3680000 corpus positions)
+ 2021-05-05 22:35:54,520 : INFO : adding document #0 to Dictionary(97993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,528 : INFO : built Dictionary(98111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 369 documents (total 3690000 corpus positions)
+ 2021-05-05 22:35:54,572 : INFO : adding document #0 to Dictionary(98111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,585 : INFO : built Dictionary(98325 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 370 documents (total 3700000 corpus positions)
+ 2021-05-05 22:35:54,635 : INFO : adding document #0 to Dictionary(98325 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,645 : INFO : built Dictionary(98510 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 371 documents (total 3710000 corpus positions)
+ 2021-05-05 22:35:54,689 : INFO : adding document #0 to Dictionary(98510 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,697 : INFO : built Dictionary(98608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 372 documents (total 3720000 corpus positions)
+ 2021-05-05 22:35:54,749 : INFO : adding document #0 to Dictionary(98608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,757 : INFO : built Dictionary(98746 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 373 documents (total 3730000 corpus positions)
+ 2021-05-05 22:35:54,802 : INFO : adding document #0 to Dictionary(98746 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,811 : INFO : built Dictionary(98886 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 374 documents (total 3740000 corpus positions)
+ 2021-05-05 22:35:54,857 : INFO : adding document #0 to Dictionary(98886 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,867 : INFO : built Dictionary(99025 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 375 documents (total 3750000 corpus positions)
+ 2021-05-05 22:35:54,911 : INFO : adding document #0 to Dictionary(99025 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,920 : INFO : built Dictionary(99177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 376 documents (total 3760000 corpus positions)
+ 2021-05-05 22:35:54,965 : INFO : adding document #0 to Dictionary(99177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:54,973 : INFO : built Dictionary(99310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 377 documents (total 3770000 corpus positions)
+ 2021-05-05 22:35:55,023 : INFO : adding document #0 to Dictionary(99310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,034 : INFO : built Dictionary(99510 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 378 documents (total 3780000 corpus positions)
+ 2021-05-05 22:35:55,079 : INFO : adding document #0 to Dictionary(99510 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,088 : INFO : built Dictionary(99687 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 379 documents (total 3790000 corpus positions)
+ 2021-05-05 22:35:55,135 : INFO : adding document #0 to Dictionary(99687 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,144 : INFO : built Dictionary(99739 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 380 documents (total 3800000 corpus positions)
+ 2021-05-05 22:35:55,189 : INFO : adding document #0 to Dictionary(99739 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,200 : INFO : built Dictionary(99880 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 381 documents (total 3810000 corpus positions)
+ 2021-05-05 22:35:55,250 : INFO : adding document #0 to Dictionary(99880 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,259 : INFO : built Dictionary(99993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 382 documents (total 3820000 corpus positions)
+ 2021-05-05 22:35:55,307 : INFO : adding document #0 to Dictionary(99993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,315 : INFO : built Dictionary(100051 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 383 documents (total 3830000 corpus positions)
+ 2021-05-05 22:35:55,367 : INFO : adding document #0 to Dictionary(100051 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,376 : INFO : built Dictionary(100093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 384 documents (total 3840000 corpus positions)
+ 2021-05-05 22:35:55,421 : INFO : adding document #0 to Dictionary(100093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,430 : INFO : built Dictionary(100188 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 385 documents (total 3850000 corpus positions)
+ 2021-05-05 22:35:55,477 : INFO : adding document #0 to Dictionary(100188 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,485 : INFO : built Dictionary(100335 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 386 documents (total 3860000 corpus positions)
+ 2021-05-05 22:35:55,535 : INFO : adding document #0 to Dictionary(100335 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,545 : INFO : built Dictionary(100596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 387 documents (total 3870000 corpus positions)
+ 2021-05-05 22:35:55,589 : INFO : adding document #0 to Dictionary(100596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,599 : INFO : built Dictionary(100749 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 388 documents (total 3880000 corpus positions)
+ 2021-05-05 22:35:55,645 : INFO : adding document #0 to Dictionary(100749 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,654 : INFO : built Dictionary(100812 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 389 documents (total 3890000 corpus positions)
+ 2021-05-05 22:35:55,712 : INFO : adding document #0 to Dictionary(100812 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,721 : INFO : built Dictionary(100953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 390 documents (total 3900000 corpus positions)
+ 2021-05-05 22:35:55,769 : INFO : adding document #0 to Dictionary(100953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,777 : INFO : built Dictionary(101035 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 391 documents (total 3910000 corpus positions)
+ 2021-05-05 22:35:55,825 : INFO : adding document #0 to Dictionary(101035 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,833 : INFO : built Dictionary(101185 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 392 documents (total 3920000 corpus positions)
+ 2021-05-05 22:35:55,877 : INFO : adding document #0 to Dictionary(101185 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,885 : INFO : built Dictionary(101261 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 393 documents (total 3930000 corpus positions)
+ 2021-05-05 22:35:55,932 : INFO : adding document #0 to Dictionary(101261 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:55,940 : INFO : built Dictionary(101358 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 394 documents (total 3940000 corpus positions)
+ 2021-05-05 22:35:55,987 : INFO : adding document #0 to Dictionary(101358 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,002 : INFO : built Dictionary(101512 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 395 documents (total 3950000 corpus positions)
+ 2021-05-05 22:35:56,059 : INFO : adding document #0 to Dictionary(101512 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,067 : INFO : built Dictionary(101667 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 396 documents (total 3960000 corpus positions)
+ 2021-05-05 22:35:56,117 : INFO : adding document #0 to Dictionary(101667 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,126 : INFO : built Dictionary(101752 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 397 documents (total 3970000 corpus positions)
+ 2021-05-05 22:35:56,171 : INFO : adding document #0 to Dictionary(101752 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,180 : INFO : built Dictionary(101874 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 398 documents (total 3980000 corpus positions)
+ 2021-05-05 22:35:56,225 : INFO : adding document #0 to Dictionary(101874 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,234 : INFO : built Dictionary(101971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 399 documents (total 3990000 corpus positions)
+ 2021-05-05 22:35:56,278 : INFO : adding document #0 to Dictionary(101971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,286 : INFO : built Dictionary(102121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 400 documents (total 4000000 corpus positions)
+ 2021-05-05 22:35:56,333 : INFO : adding document #0 to Dictionary(102121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,342 : INFO : built Dictionary(102210 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 401 documents (total 4010000 corpus positions)
+ 2021-05-05 22:35:56,387 : INFO : adding document #0 to Dictionary(102210 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,395 : INFO : built Dictionary(102325 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 402 documents (total 4020000 corpus positions)
+ 2021-05-05 22:35:56,450 : INFO : adding document #0 to Dictionary(102325 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,463 : INFO : built Dictionary(102396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 403 documents (total 4030000 corpus positions)
+ 2021-05-05 22:35:56,519 : INFO : adding document #0 to Dictionary(102396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,527 : INFO : built Dictionary(102479 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 404 documents (total 4040000 corpus positions)
+ 2021-05-05 22:35:56,574 : INFO : adding document #0 to Dictionary(102479 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,583 : INFO : built Dictionary(102659 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 405 documents (total 4050000 corpus positions)
+ 2021-05-05 22:35:56,626 : INFO : adding document #0 to Dictionary(102659 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,635 : INFO : built Dictionary(102804 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 406 documents (total 4060000 corpus positions)
+ 2021-05-05 22:35:56,681 : INFO : adding document #0 to Dictionary(102804 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,689 : INFO : built Dictionary(102870 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 407 documents (total 4070000 corpus positions)
+ 2021-05-05 22:35:56,735 : INFO : adding document #0 to Dictionary(102870 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,745 : INFO : built Dictionary(103057 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 408 documents (total 4080000 corpus positions)
+ 2021-05-05 22:35:56,790 : INFO : adding document #0 to Dictionary(103057 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,798 : INFO : built Dictionary(103200 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 409 documents (total 4090000 corpus positions)
+ 2021-05-05 22:35:56,843 : INFO : adding document #0 to Dictionary(103200 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,852 : INFO : built Dictionary(103354 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 410 documents (total 4100000 corpus positions)
+ 2021-05-05 22:35:56,895 : INFO : adding document #0 to Dictionary(103354 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,902 : INFO : built Dictionary(103486 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 411 documents (total 4110000 corpus positions)
+ 2021-05-05 22:35:56,946 : INFO : adding document #0 to Dictionary(103486 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:56,953 : INFO : built Dictionary(103592 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 412 documents (total 4120000 corpus positions)
+ 2021-05-05 22:35:57,001 : INFO : adding document #0 to Dictionary(103592 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,012 : INFO : built Dictionary(103681 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 413 documents (total 4130000 corpus positions)
+ 2021-05-05 22:35:57,058 : INFO : adding document #0 to Dictionary(103681 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,066 : INFO : built Dictionary(103785 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 414 documents (total 4140000 corpus positions)
+ 2021-05-05 22:35:57,112 : INFO : adding document #0 to Dictionary(103785 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,121 : INFO : built Dictionary(103982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 415 documents (total 4150000 corpus positions)
+ 2021-05-05 22:35:57,166 : INFO : adding document #0 to Dictionary(103982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,174 : INFO : built Dictionary(104123 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 416 documents (total 4160000 corpus positions)
+ 2021-05-05 22:35:57,222 : INFO : adding document #0 to Dictionary(104123 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,230 : INFO : built Dictionary(104199 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 417 documents (total 4170000 corpus positions)
+ 2021-05-05 22:35:57,273 : INFO : adding document #0 to Dictionary(104199 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,280 : INFO : built Dictionary(104405 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 418 documents (total 4180000 corpus positions)
+ 2021-05-05 22:35:57,323 : INFO : adding document #0 to Dictionary(104405 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,331 : INFO : built Dictionary(104538 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 419 documents (total 4190000 corpus positions)
+ 2021-05-05 22:35:57,376 : INFO : adding document #0 to Dictionary(104538 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,384 : INFO : built Dictionary(104731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 420 documents (total 4200000 corpus positions)
+ 2021-05-05 22:35:57,427 : INFO : adding document #0 to Dictionary(104731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,435 : INFO : built Dictionary(104809 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 421 documents (total 4210000 corpus positions)
+ 2021-05-05 22:35:57,479 : INFO : adding document #0 to Dictionary(104809 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,487 : INFO : built Dictionary(104946 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 422 documents (total 4220000 corpus positions)
+ 2021-05-05 22:35:57,531 : INFO : adding document #0 to Dictionary(104946 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,539 : INFO : built Dictionary(105011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 423 documents (total 4230000 corpus positions)
+ 2021-05-05 22:35:57,582 : INFO : adding document #0 to Dictionary(105011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,590 : INFO : built Dictionary(105224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 424 documents (total 4240000 corpus positions)
+ 2021-05-05 22:35:57,633 : INFO : adding document #0 to Dictionary(105224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,641 : INFO : built Dictionary(105377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 425 documents (total 4250000 corpus positions)
+ 2021-05-05 22:35:57,684 : INFO : adding document #0 to Dictionary(105377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,692 : INFO : built Dictionary(105511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 426 documents (total 4260000 corpus positions)
+ 2021-05-05 22:35:57,748 : INFO : adding document #0 to Dictionary(105511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,758 : INFO : built Dictionary(105677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 427 documents (total 4270000 corpus positions)
+ 2021-05-05 22:35:57,815 : INFO : adding document #0 to Dictionary(105677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,827 : INFO : built Dictionary(105814 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 428 documents (total 4280000 corpus positions)
+ 2021-05-05 22:35:57,876 : INFO : adding document #0 to Dictionary(105814 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,886 : INFO : built Dictionary(105952 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 429 documents (total 4290000 corpus positions)
+ 2021-05-05 22:35:57,938 : INFO : adding document #0 to Dictionary(105952 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:57,951 : INFO : built Dictionary(106160 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 430 documents (total 4300000 corpus positions)
+ 2021-05-05 22:35:58,002 : INFO : adding document #0 to Dictionary(106160 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,013 : INFO : built Dictionary(106311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 431 documents (total 4310000 corpus positions)
+ 2021-05-05 22:35:58,061 : INFO : adding document #0 to Dictionary(106311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,068 : INFO : built Dictionary(106382 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 432 documents (total 4320000 corpus positions)
+ 2021-05-05 22:35:58,113 : INFO : adding document #0 to Dictionary(106382 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,122 : INFO : built Dictionary(106493 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 433 documents (total 4330000 corpus positions)
+ 2021-05-05 22:35:58,166 : INFO : adding document #0 to Dictionary(106493 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,174 : INFO : built Dictionary(106597 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 434 documents (total 4340000 corpus positions)
+ 2021-05-05 22:35:58,225 : INFO : adding document #0 to Dictionary(106597 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,235 : INFO : built Dictionary(106714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 435 documents (total 4350000 corpus positions)
+ 2021-05-05 22:35:58,289 : INFO : adding document #0 to Dictionary(106714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,300 : INFO : built Dictionary(106906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 436 documents (total 4360000 corpus positions)
+ 2021-05-05 22:35:58,350 : INFO : adding document #0 to Dictionary(106906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,361 : INFO : built Dictionary(107095 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 437 documents (total 4370000 corpus positions)
+ 2021-05-05 22:35:58,415 : INFO : adding document #0 to Dictionary(107095 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,426 : INFO : built Dictionary(107144 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 438 documents (total 4380000 corpus positions)
+ 2021-05-05 22:35:58,475 : INFO : adding document #0 to Dictionary(107144 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,482 : INFO : built Dictionary(107262 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 439 documents (total 4390000 corpus positions)
+ 2021-05-05 22:35:58,528 : INFO : adding document #0 to Dictionary(107262 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,537 : INFO : built Dictionary(107413 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 440 documents (total 4400000 corpus positions)
+ 2021-05-05 22:35:58,588 : INFO : adding document #0 to Dictionary(107413 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,596 : INFO : built Dictionary(107543 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 441 documents (total 4410000 corpus positions)
+ 2021-05-05 22:35:58,646 : INFO : adding document #0 to Dictionary(107543 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,655 : INFO : built Dictionary(107657 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 442 documents (total 4420000 corpus positions)
+ 2021-05-05 22:35:58,708 : INFO : adding document #0 to Dictionary(107657 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,720 : INFO : built Dictionary(107778 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 443 documents (total 4430000 corpus positions)
+ 2021-05-05 22:35:58,765 : INFO : adding document #0 to Dictionary(107778 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,774 : INFO : built Dictionary(107918 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 444 documents (total 4440000 corpus positions)
+ 2021-05-05 22:35:58,816 : INFO : adding document #0 to Dictionary(107918 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,825 : INFO : built Dictionary(108205 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 445 documents (total 4450000 corpus positions)
+ 2021-05-05 22:35:58,873 : INFO : adding document #0 to Dictionary(108205 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,881 : INFO : built Dictionary(108398 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 446 documents (total 4460000 corpus positions)
+ 2021-05-05 22:35:58,925 : INFO : adding document #0 to Dictionary(108398 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,933 : INFO : built Dictionary(108505 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 447 documents (total 4470000 corpus positions)
+ 2021-05-05 22:35:58,984 : INFO : adding document #0 to Dictionary(108505 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:58,995 : INFO : built Dictionary(108741 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 448 documents (total 4480000 corpus positions)
+ 2021-05-05 22:35:59,045 : INFO : adding document #0 to Dictionary(108741 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,056 : INFO : built Dictionary(108858 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 449 documents (total 4490000 corpus positions)
+ 2021-05-05 22:35:59,101 : INFO : adding document #0 to Dictionary(108858 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,114 : INFO : built Dictionary(108989 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 450 documents (total 4500000 corpus positions)
+ 2021-05-05 22:35:59,166 : INFO : adding document #0 to Dictionary(108989 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,174 : INFO : built Dictionary(109089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 451 documents (total 4510000 corpus positions)
+ 2021-05-05 22:35:59,227 : INFO : adding document #0 to Dictionary(109089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,236 : INFO : built Dictionary(109217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 452 documents (total 4520000 corpus positions)
+ 2021-05-05 22:35:59,279 : INFO : adding document #0 to Dictionary(109217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,287 : INFO : built Dictionary(109307 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 453 documents (total 4530000 corpus positions)
+ 2021-05-05 22:35:59,331 : INFO : adding document #0 to Dictionary(109307 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,340 : INFO : built Dictionary(109441 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 454 documents (total 4540000 corpus positions)
+ 2021-05-05 22:35:59,389 : INFO : adding document #0 to Dictionary(109441 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,397 : INFO : built Dictionary(109568 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 455 documents (total 4550000 corpus positions)
+ 2021-05-05 22:35:59,441 : INFO : adding document #0 to Dictionary(109568 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,451 : INFO : built Dictionary(109776 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 456 documents (total 4560000 corpus positions)
+ 2021-05-05 22:35:59,495 : INFO : adding document #0 to Dictionary(109776 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,502 : INFO : built Dictionary(109832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 457 documents (total 4570000 corpus positions)
+ 2021-05-05 22:35:59,545 : INFO : adding document #0 to Dictionary(109832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,552 : INFO : built Dictionary(109871 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 458 documents (total 4580000 corpus positions)
+ 2021-05-05 22:35:59,595 : INFO : adding document #0 to Dictionary(109871 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,602 : INFO : built Dictionary(109918 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 459 documents (total 4590000 corpus positions)
+ 2021-05-05 22:35:59,645 : INFO : adding document #0 to Dictionary(109918 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,652 : INFO : built Dictionary(110004 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 460 documents (total 4600000 corpus positions)
+ 2021-05-05 22:35:59,700 : INFO : adding document #0 to Dictionary(110004 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,709 : INFO : built Dictionary(110123 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 461 documents (total 4610000 corpus positions)
+ 2021-05-05 22:35:59,756 : INFO : adding document #0 to Dictionary(110123 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,766 : INFO : built Dictionary(110295 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 462 documents (total 4620000 corpus positions)
+ 2021-05-05 22:35:59,812 : INFO : adding document #0 to Dictionary(110295 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,820 : INFO : built Dictionary(110361 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 463 documents (total 4630000 corpus positions)
+ 2021-05-05 22:35:59,864 : INFO : adding document #0 to Dictionary(110361 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,874 : INFO : built Dictionary(110504 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 464 documents (total 4640000 corpus positions)
+ 2021-05-05 22:35:59,924 : INFO : adding document #0 to Dictionary(110504 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,933 : INFO : built Dictionary(110635 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 465 documents (total 4650000 corpus positions)
+ 2021-05-05 22:35:59,976 : INFO : adding document #0 to Dictionary(110635 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:35:59,983 : INFO : built Dictionary(110731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 466 documents (total 4660000 corpus positions)
+ 2021-05-05 22:36:00,032 : INFO : adding document #0 to Dictionary(110731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,042 : INFO : built Dictionary(110873 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 467 documents (total 4670000 corpus positions)
+ 2021-05-05 22:36:00,085 : INFO : adding document #0 to Dictionary(110873 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,092 : INFO : built Dictionary(110953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 468 documents (total 4680000 corpus positions)
+ 2021-05-05 22:36:00,135 : INFO : adding document #0 to Dictionary(110953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,143 : INFO : built Dictionary(111028 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 469 documents (total 4690000 corpus positions)
+ 2021-05-05 22:36:00,186 : INFO : adding document #0 to Dictionary(111028 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,193 : INFO : built Dictionary(111109 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 470 documents (total 4700000 corpus positions)
+ 2021-05-05 22:36:00,237 : INFO : adding document #0 to Dictionary(111109 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,247 : INFO : built Dictionary(111350 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 471 documents (total 4710000 corpus positions)
+ 2021-05-05 22:36:00,294 : INFO : adding document #0 to Dictionary(111350 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,301 : INFO : built Dictionary(111495 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 472 documents (total 4720000 corpus positions)
+ 2021-05-05 22:36:00,356 : INFO : adding document #0 to Dictionary(111495 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,368 : INFO : built Dictionary(111592 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 473 documents (total 4730000 corpus positions)
+ 2021-05-05 22:36:00,439 : INFO : adding document #0 to Dictionary(111592 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,448 : INFO : built Dictionary(111692 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 474 documents (total 4740000 corpus positions)
+ 2021-05-05 22:36:00,497 : INFO : adding document #0 to Dictionary(111692 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,507 : INFO : built Dictionary(111827 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 475 documents (total 4750000 corpus positions)
+ 2021-05-05 22:36:00,571 : INFO : adding document #0 to Dictionary(111827 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,584 : INFO : built Dictionary(111997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 476 documents (total 4760000 corpus positions)
+ 2021-05-05 22:36:00,634 : INFO : adding document #0 to Dictionary(111997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,643 : INFO : built Dictionary(112111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 477 documents (total 4770000 corpus positions)
+ 2021-05-05 22:36:00,696 : INFO : adding document #0 to Dictionary(112111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,705 : INFO : built Dictionary(112224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 478 documents (total 4780000 corpus positions)
+ 2021-05-05 22:36:00,760 : INFO : adding document #0 to Dictionary(112224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,771 : INFO : built Dictionary(112314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 479 documents (total 4790000 corpus positions)
+ 2021-05-05 22:36:00,838 : INFO : adding document #0 to Dictionary(112314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,851 : INFO : built Dictionary(112426 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 480 documents (total 4800000 corpus positions)
+ 2021-05-05 22:36:00,915 : INFO : adding document #0 to Dictionary(112426 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,923 : INFO : built Dictionary(112484 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 481 documents (total 4810000 corpus positions)
+ 2021-05-05 22:36:00,979 : INFO : adding document #0 to Dictionary(112484 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:00,988 : INFO : built Dictionary(112657 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 482 documents (total 4820000 corpus positions)
+ 2021-05-05 22:36:01,040 : INFO : adding document #0 to Dictionary(112657 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,053 : INFO : built Dictionary(112757 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 483 documents (total 4830000 corpus positions)
+ 2021-05-05 22:36:01,108 : INFO : adding document #0 to Dictionary(112757 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,118 : INFO : built Dictionary(112842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 484 documents (total 4840000 corpus positions)
+ 2021-05-05 22:36:01,170 : INFO : adding document #0 to Dictionary(112842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,180 : INFO : built Dictionary(112945 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 485 documents (total 4850000 corpus positions)
+ 2021-05-05 22:36:01,229 : INFO : adding document #0 to Dictionary(112945 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,239 : INFO : built Dictionary(113003 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 486 documents (total 4860000 corpus positions)
+ 2021-05-05 22:36:01,289 : INFO : adding document #0 to Dictionary(113003 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,300 : INFO : built Dictionary(113222 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 487 documents (total 4870000 corpus positions)
+ 2021-05-05 22:36:01,350 : INFO : adding document #0 to Dictionary(113222 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,362 : INFO : built Dictionary(113410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 488 documents (total 4880000 corpus positions)
+ 2021-05-05 22:36:01,416 : INFO : adding document #0 to Dictionary(113410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,426 : INFO : built Dictionary(113535 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 489 documents (total 4890000 corpus positions)
+ 2021-05-05 22:36:01,482 : INFO : adding document #0 to Dictionary(113535 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,493 : INFO : built Dictionary(113696 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 490 documents (total 4900000 corpus positions)
+ 2021-05-05 22:36:01,544 : INFO : adding document #0 to Dictionary(113696 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,552 : INFO : built Dictionary(113848 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 491 documents (total 4910000 corpus positions)
+ 2021-05-05 22:36:01,601 : INFO : adding document #0 to Dictionary(113848 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,611 : INFO : built Dictionary(113965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 492 documents (total 4920000 corpus positions)
+ 2021-05-05 22:36:01,668 : INFO : adding document #0 to Dictionary(113965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,679 : INFO : built Dictionary(114110 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 493 documents (total 4930000 corpus positions)
+ 2021-05-05 22:36:01,731 : INFO : adding document #0 to Dictionary(114110 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,740 : INFO : built Dictionary(114287 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 494 documents (total 4940000 corpus positions)
+ 2021-05-05 22:36:01,788 : INFO : adding document #0 to Dictionary(114287 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,797 : INFO : built Dictionary(114471 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 495 documents (total 4950000 corpus positions)
+ 2021-05-05 22:36:01,844 : INFO : adding document #0 to Dictionary(114471 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,853 : INFO : built Dictionary(114584 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 496 documents (total 4960000 corpus positions)
+ 2021-05-05 22:36:01,901 : INFO : adding document #0 to Dictionary(114584 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,909 : INFO : built Dictionary(114665 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 497 documents (total 4970000 corpus positions)
+ 2021-05-05 22:36:01,956 : INFO : adding document #0 to Dictionary(114665 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:01,966 : INFO : built Dictionary(114783 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 498 documents (total 4980000 corpus positions)
+ 2021-05-05 22:36:02,014 : INFO : adding document #0 to Dictionary(114783 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,025 : INFO : built Dictionary(115035 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 499 documents (total 4990000 corpus positions)
+ 2021-05-05 22:36:02,072 : INFO : adding document #0 to Dictionary(115035 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,081 : INFO : built Dictionary(115143 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 500 documents (total 5000000 corpus positions)
+ 2021-05-05 22:36:02,128 : INFO : adding document #0 to Dictionary(115143 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,137 : INFO : built Dictionary(115226 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 501 documents (total 5010000 corpus positions)
+ 2021-05-05 22:36:02,186 : INFO : adding document #0 to Dictionary(115226 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,197 : INFO : built Dictionary(115592 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 502 documents (total 5020000 corpus positions)
+ 2021-05-05 22:36:02,245 : INFO : adding document #0 to Dictionary(115592 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,255 : INFO : built Dictionary(116153 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 503 documents (total 5030000 corpus positions)
+ 2021-05-05 22:36:02,301 : INFO : adding document #0 to Dictionary(116153 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,311 : INFO : built Dictionary(116297 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 504 documents (total 5040000 corpus positions)
+ 2021-05-05 22:36:02,360 : INFO : adding document #0 to Dictionary(116297 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,369 : INFO : built Dictionary(116466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 505 documents (total 5050000 corpus positions)
+ 2021-05-05 22:36:02,417 : INFO : adding document #0 to Dictionary(116466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,429 : INFO : built Dictionary(116658 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 506 documents (total 5060000 corpus positions)
+ 2021-05-05 22:36:02,477 : INFO : adding document #0 to Dictionary(116658 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,487 : INFO : built Dictionary(116770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 507 documents (total 5070000 corpus positions)
+ 2021-05-05 22:36:02,536 : INFO : adding document #0 to Dictionary(116770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,545 : INFO : built Dictionary(116879 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 508 documents (total 5080000 corpus positions)
+ 2021-05-05 22:36:02,593 : INFO : adding document #0 to Dictionary(116879 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,601 : INFO : built Dictionary(116952 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 509 documents (total 5090000 corpus positions)
+ 2021-05-05 22:36:02,652 : INFO : adding document #0 to Dictionary(116952 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,661 : INFO : built Dictionary(117066 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 510 documents (total 5100000 corpus positions)
+ 2021-05-05 22:36:02,712 : INFO : adding document #0 to Dictionary(117066 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,723 : INFO : built Dictionary(117251 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 511 documents (total 5110000 corpus positions)
+ 2021-05-05 22:36:02,772 : INFO : adding document #0 to Dictionary(117251 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,782 : INFO : built Dictionary(117365 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 512 documents (total 5120000 corpus positions)
+ 2021-05-05 22:36:02,830 : INFO : adding document #0 to Dictionary(117365 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,836 : INFO : built Dictionary(117443 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 513 documents (total 5130000 corpus positions)
+ 2021-05-05 22:36:02,886 : INFO : adding document #0 to Dictionary(117443 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,896 : INFO : built Dictionary(117581 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 514 documents (total 5140000 corpus positions)
+ 2021-05-05 22:36:02,956 : INFO : adding document #0 to Dictionary(117581 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:02,966 : INFO : built Dictionary(117780 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 515 documents (total 5150000 corpus positions)
+ 2021-05-05 22:36:03,020 : INFO : adding document #0 to Dictionary(117780 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,029 : INFO : built Dictionary(117905 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 516 documents (total 5160000 corpus positions)
+ 2021-05-05 22:36:03,077 : INFO : adding document #0 to Dictionary(117905 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,086 : INFO : built Dictionary(118005 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 517 documents (total 5170000 corpus positions)
+ 2021-05-05 22:36:03,150 : INFO : adding document #0 to Dictionary(118005 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,159 : INFO : built Dictionary(118125 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 518 documents (total 5180000 corpus positions)
+ 2021-05-05 22:36:03,208 : INFO : adding document #0 to Dictionary(118125 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,218 : INFO : built Dictionary(118231 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 519 documents (total 5190000 corpus positions)
+ 2021-05-05 22:36:03,265 : INFO : adding document #0 to Dictionary(118231 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,272 : INFO : built Dictionary(118324 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 520 documents (total 5200000 corpus positions)
+ 2021-05-05 22:36:03,322 : INFO : adding document #0 to Dictionary(118324 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,332 : INFO : built Dictionary(118436 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 521 documents (total 5210000 corpus positions)
+ 2021-05-05 22:36:03,380 : INFO : adding document #0 to Dictionary(118436 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,388 : INFO : built Dictionary(118537 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 522 documents (total 5220000 corpus positions)
+ 2021-05-05 22:36:03,436 : INFO : adding document #0 to Dictionary(118537 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,446 : INFO : built Dictionary(118708 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 523 documents (total 5230000 corpus positions)
+ 2021-05-05 22:36:03,496 : INFO : adding document #0 to Dictionary(118708 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,507 : INFO : built Dictionary(119299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 524 documents (total 5240000 corpus positions)
+ 2021-05-05 22:36:03,555 : INFO : adding document #0 to Dictionary(119299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,564 : INFO : built Dictionary(119447 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 525 documents (total 5250000 corpus positions)
+ 2021-05-05 22:36:03,611 : INFO : adding document #0 to Dictionary(119447 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,619 : INFO : built Dictionary(119529 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 526 documents (total 5260000 corpus positions)
+ 2021-05-05 22:36:03,667 : INFO : adding document #0 to Dictionary(119529 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,676 : INFO : built Dictionary(119648 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 527 documents (total 5270000 corpus positions)
+ 2021-05-05 22:36:03,722 : INFO : adding document #0 to Dictionary(119648 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,732 : INFO : built Dictionary(119776 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 528 documents (total 5280000 corpus positions)
+ 2021-05-05 22:36:03,781 : INFO : adding document #0 to Dictionary(119776 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,789 : INFO : built Dictionary(119862 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 529 documents (total 5290000 corpus positions)
+ 2021-05-05 22:36:03,835 : INFO : adding document #0 to Dictionary(119862 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,844 : INFO : built Dictionary(120025 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 530 documents (total 5300000 corpus positions)
+ 2021-05-05 22:36:03,889 : INFO : adding document #0 to Dictionary(120025 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,898 : INFO : built Dictionary(120113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 531 documents (total 5310000 corpus positions)
+ 2021-05-05 22:36:03,942 : INFO : adding document #0 to Dictionary(120113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:03,950 : INFO : built Dictionary(120216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 532 documents (total 5320000 corpus positions)
+ 2021-05-05 22:36:03,994 : INFO : adding document #0 to Dictionary(120216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,002 : INFO : built Dictionary(120304 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 533 documents (total 5330000 corpus positions)
+ 2021-05-05 22:36:04,052 : INFO : adding document #0 to Dictionary(120304 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,061 : INFO : built Dictionary(120424 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 534 documents (total 5340000 corpus positions)
+ 2021-05-05 22:36:04,107 : INFO : adding document #0 to Dictionary(120424 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,117 : INFO : built Dictionary(120650 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 535 documents (total 5350000 corpus positions)
+ 2021-05-05 22:36:04,163 : INFO : adding document #0 to Dictionary(120650 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,173 : INFO : built Dictionary(120785 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 536 documents (total 5360000 corpus positions)
+ 2021-05-05 22:36:04,216 : INFO : adding document #0 to Dictionary(120785 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,225 : INFO : built Dictionary(120861 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 537 documents (total 5370000 corpus positions)
+ 2021-05-05 22:36:04,270 : INFO : adding document #0 to Dictionary(120861 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,279 : INFO : built Dictionary(120953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 538 documents (total 5380000 corpus positions)
+ 2021-05-05 22:36:04,322 : INFO : adding document #0 to Dictionary(120953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,332 : INFO : built Dictionary(121154 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 539 documents (total 5390000 corpus positions)
+ 2021-05-05 22:36:04,376 : INFO : adding document #0 to Dictionary(121154 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,384 : INFO : built Dictionary(121251 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 540 documents (total 5400000 corpus positions)
+ 2021-05-05 22:36:04,427 : INFO : adding document #0 to Dictionary(121251 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,435 : INFO : built Dictionary(121323 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 541 documents (total 5410000 corpus positions)
+ 2021-05-05 22:36:04,478 : INFO : adding document #0 to Dictionary(121323 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,488 : INFO : built Dictionary(121515 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 542 documents (total 5420000 corpus positions)
+ 2021-05-05 22:36:04,531 : INFO : adding document #0 to Dictionary(121515 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,541 : INFO : built Dictionary(121631 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 543 documents (total 5430000 corpus positions)
+ 2021-05-05 22:36:04,584 : INFO : adding document #0 to Dictionary(121631 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,592 : INFO : built Dictionary(121753 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 544 documents (total 5440000 corpus positions)
+ 2021-05-05 22:36:04,635 : INFO : adding document #0 to Dictionary(121753 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,643 : INFO : built Dictionary(121850 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 545 documents (total 5450000 corpus positions)
+ 2021-05-05 22:36:04,693 : INFO : adding document #0 to Dictionary(121850 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,701 : INFO : built Dictionary(121942 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 546 documents (total 5460000 corpus positions)
+ 2021-05-05 22:36:04,754 : INFO : adding document #0 to Dictionary(121942 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,764 : INFO : built Dictionary(122030 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 547 documents (total 5470000 corpus positions)
+ 2021-05-05 22:36:04,816 : INFO : adding document #0 to Dictionary(122030 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,829 : INFO : built Dictionary(122281 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 548 documents (total 5480000 corpus positions)
+ 2021-05-05 22:36:04,878 : INFO : adding document #0 to Dictionary(122281 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,887 : INFO : built Dictionary(122423 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 549 documents (total 5490000 corpus positions)
+ 2021-05-05 22:36:04,935 : INFO : adding document #0 to Dictionary(122423 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:04,945 : INFO : built Dictionary(122540 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 550 documents (total 5500000 corpus positions)
+ 2021-05-05 22:36:04,993 : INFO : adding document #0 to Dictionary(122540 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,003 : INFO : built Dictionary(122676 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 551 documents (total 5510000 corpus positions)
+ 2021-05-05 22:36:05,054 : INFO : adding document #0 to Dictionary(122676 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,063 : INFO : built Dictionary(122750 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 552 documents (total 5520000 corpus positions)
+ 2021-05-05 22:36:05,110 : INFO : adding document #0 to Dictionary(122750 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,119 : INFO : built Dictionary(122853 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 553 documents (total 5530000 corpus positions)
+ 2021-05-05 22:36:05,164 : INFO : adding document #0 to Dictionary(122853 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,173 : INFO : built Dictionary(122992 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 554 documents (total 5540000 corpus positions)
+ 2021-05-05 22:36:05,217 : INFO : adding document #0 to Dictionary(122992 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,230 : INFO : built Dictionary(123074 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 555 documents (total 5550000 corpus positions)
+ 2021-05-05 22:36:05,281 : INFO : adding document #0 to Dictionary(123074 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,290 : INFO : built Dictionary(123212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 556 documents (total 5560000 corpus positions)
+ 2021-05-05 22:36:05,335 : INFO : adding document #0 to Dictionary(123212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,344 : INFO : built Dictionary(123363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 557 documents (total 5570000 corpus positions)
+ 2021-05-05 22:36:05,392 : INFO : adding document #0 to Dictionary(123363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,400 : INFO : built Dictionary(123482 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 558 documents (total 5580000 corpus positions)
+ 2021-05-05 22:36:05,447 : INFO : adding document #0 to Dictionary(123482 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,456 : INFO : built Dictionary(123643 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 559 documents (total 5590000 corpus positions)
+ 2021-05-05 22:36:05,499 : INFO : adding document #0 to Dictionary(123643 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,509 : INFO : built Dictionary(123711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 560 documents (total 5600000 corpus positions)
+ 2021-05-05 22:36:05,555 : INFO : adding document #0 to Dictionary(123711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,564 : INFO : built Dictionary(123815 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 561 documents (total 5610000 corpus positions)
+ 2021-05-05 22:36:05,609 : INFO : adding document #0 to Dictionary(123815 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,619 : INFO : built Dictionary(123910 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 562 documents (total 5620000 corpus positions)
+ 2021-05-05 22:36:05,664 : INFO : adding document #0 to Dictionary(123910 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,673 : INFO : built Dictionary(124116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 563 documents (total 5630000 corpus positions)
+ 2021-05-05 22:36:05,717 : INFO : adding document #0 to Dictionary(124116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,725 : INFO : built Dictionary(124250 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 564 documents (total 5640000 corpus positions)
+ 2021-05-05 22:36:05,770 : INFO : adding document #0 to Dictionary(124250 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,780 : INFO : built Dictionary(124396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 565 documents (total 5650000 corpus positions)
+ 2021-05-05 22:36:05,825 : INFO : adding document #0 to Dictionary(124396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,833 : INFO : built Dictionary(124522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 566 documents (total 5660000 corpus positions)
+ 2021-05-05 22:36:05,876 : INFO : adding document #0 to Dictionary(124522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,885 : INFO : built Dictionary(124838 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 567 documents (total 5670000 corpus positions)
+ 2021-05-05 22:36:05,928 : INFO : adding document #0 to Dictionary(124838 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,938 : INFO : built Dictionary(124977 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 568 documents (total 5680000 corpus positions)
+ 2021-05-05 22:36:05,982 : INFO : adding document #0 to Dictionary(124977 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:05,990 : INFO : built Dictionary(125082 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 569 documents (total 5690000 corpus positions)
+ 2021-05-05 22:36:06,035 : INFO : adding document #0 to Dictionary(125082 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,044 : INFO : built Dictionary(125157 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 570 documents (total 5700000 corpus positions)
+ 2021-05-05 22:36:06,087 : INFO : adding document #0 to Dictionary(125157 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,094 : INFO : built Dictionary(125219 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 571 documents (total 5710000 corpus positions)
+ 2021-05-05 22:36:06,138 : INFO : adding document #0 to Dictionary(125219 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,147 : INFO : built Dictionary(125539 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 572 documents (total 5720000 corpus positions)
+ 2021-05-05 22:36:06,192 : INFO : adding document #0 to Dictionary(125539 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,200 : INFO : built Dictionary(125661 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 573 documents (total 5730000 corpus positions)
+ 2021-05-05 22:36:06,247 : INFO : adding document #0 to Dictionary(125661 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,260 : INFO : built Dictionary(125840 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 574 documents (total 5740000 corpus positions)
+ 2021-05-05 22:36:06,312 : INFO : adding document #0 to Dictionary(125840 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,321 : INFO : built Dictionary(125965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 575 documents (total 5750000 corpus positions)
+ 2021-05-05 22:36:06,371 : INFO : adding document #0 to Dictionary(125965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,381 : INFO : built Dictionary(126068 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 576 documents (total 5760000 corpus positions)
+ 2021-05-05 22:36:06,435 : INFO : adding document #0 to Dictionary(126068 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,444 : INFO : built Dictionary(126171 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 577 documents (total 5770000 corpus positions)
+ 2021-05-05 22:36:06,493 : INFO : adding document #0 to Dictionary(126171 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,502 : INFO : built Dictionary(126271 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 578 documents (total 5780000 corpus positions)
+ 2021-05-05 22:36:06,552 : INFO : adding document #0 to Dictionary(126271 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,562 : INFO : built Dictionary(126344 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 579 documents (total 5790000 corpus positions)
+ 2021-05-05 22:36:06,611 : INFO : adding document #0 to Dictionary(126344 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,621 : INFO : built Dictionary(126436 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 580 documents (total 5800000 corpus positions)
+ 2021-05-05 22:36:06,665 : INFO : adding document #0 to Dictionary(126436 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,672 : INFO : built Dictionary(126538 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 581 documents (total 5810000 corpus positions)
+ 2021-05-05 22:36:06,719 : INFO : adding document #0 to Dictionary(126538 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,728 : INFO : built Dictionary(126600 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 582 documents (total 5820000 corpus positions)
+ 2021-05-05 22:36:06,773 : INFO : adding document #0 to Dictionary(126600 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,783 : INFO : built Dictionary(126716 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 583 documents (total 5830000 corpus positions)
+ 2021-05-05 22:36:06,834 : INFO : adding document #0 to Dictionary(126716 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,843 : INFO : built Dictionary(126805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 584 documents (total 5840000 corpus positions)
+ 2021-05-05 22:36:06,886 : INFO : adding document #0 to Dictionary(126805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,895 : INFO : built Dictionary(126906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 585 documents (total 5850000 corpus positions)
+ 2021-05-05 22:36:06,945 : INFO : adding document #0 to Dictionary(126906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:06,954 : INFO : built Dictionary(126982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 586 documents (total 5860000 corpus positions)
+ 2021-05-05 22:36:06,998 : INFO : adding document #0 to Dictionary(126982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,008 : INFO : built Dictionary(127080 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 587 documents (total 5870000 corpus positions)
+ 2021-05-05 22:36:07,055 : INFO : adding document #0 to Dictionary(127080 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,063 : INFO : built Dictionary(127188 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 588 documents (total 5880000 corpus positions)
+ 2021-05-05 22:36:07,108 : INFO : adding document #0 to Dictionary(127188 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,119 : INFO : built Dictionary(127299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 589 documents (total 5890000 corpus positions)
+ 2021-05-05 22:36:07,183 : INFO : adding document #0 to Dictionary(127299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,190 : INFO : built Dictionary(127382 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 590 documents (total 5900000 corpus positions)
+ 2021-05-05 22:36:07,241 : INFO : adding document #0 to Dictionary(127382 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,250 : INFO : built Dictionary(127575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 591 documents (total 5910000 corpus positions)
+ 2021-05-05 22:36:07,296 : INFO : adding document #0 to Dictionary(127575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,306 : INFO : built Dictionary(127702 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 592 documents (total 5920000 corpus positions)
+ 2021-05-05 22:36:07,352 : INFO : adding document #0 to Dictionary(127702 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,360 : INFO : built Dictionary(127770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 593 documents (total 5930000 corpus positions)
+ 2021-05-05 22:36:07,404 : INFO : adding document #0 to Dictionary(127770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,413 : INFO : built Dictionary(127866 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 594 documents (total 5940000 corpus positions)
+ 2021-05-05 22:36:07,457 : INFO : adding document #0 to Dictionary(127866 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,466 : INFO : built Dictionary(128040 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 595 documents (total 5950000 corpus positions)
+ 2021-05-05 22:36:07,510 : INFO : adding document #0 to Dictionary(128040 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,518 : INFO : built Dictionary(128141 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 596 documents (total 5960000 corpus positions)
+ 2021-05-05 22:36:07,561 : INFO : adding document #0 to Dictionary(128141 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,567 : INFO : built Dictionary(128287 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 597 documents (total 5970000 corpus positions)
+ 2021-05-05 22:36:07,610 : INFO : adding document #0 to Dictionary(128287 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,618 : INFO : built Dictionary(128386 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 598 documents (total 5980000 corpus positions)
+ 2021-05-05 22:36:07,661 : INFO : adding document #0 to Dictionary(128386 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,668 : INFO : built Dictionary(128481 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 599 documents (total 5990000 corpus positions)
+ 2021-05-05 22:36:07,712 : INFO : adding document #0 to Dictionary(128481 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,720 : INFO : built Dictionary(128573 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 600 documents (total 6000000 corpus positions)
+ 2021-05-05 22:36:07,763 : INFO : adding document #0 to Dictionary(128573 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,772 : INFO : built Dictionary(128699 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 601 documents (total 6010000 corpus positions)
+ 2021-05-05 22:36:07,815 : INFO : adding document #0 to Dictionary(128699 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,824 : INFO : built Dictionary(128817 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 602 documents (total 6020000 corpus positions)
+ 2021-05-05 22:36:07,869 : INFO : adding document #0 to Dictionary(128817 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,877 : INFO : built Dictionary(128905 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 603 documents (total 6030000 corpus positions)
+ 2021-05-05 22:36:07,922 : INFO : adding document #0 to Dictionary(128905 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,931 : INFO : built Dictionary(129057 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 604 documents (total 6040000 corpus positions)
+ 2021-05-05 22:36:07,974 : INFO : adding document #0 to Dictionary(129057 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:07,983 : INFO : built Dictionary(129246 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 605 documents (total 6050000 corpus positions)
+ 2021-05-05 22:36:08,028 : INFO : adding document #0 to Dictionary(129246 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,036 : INFO : built Dictionary(129370 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 606 documents (total 6060000 corpus positions)
+ 2021-05-05 22:36:08,080 : INFO : adding document #0 to Dictionary(129370 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,088 : INFO : built Dictionary(129483 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 607 documents (total 6070000 corpus positions)
+ 2021-05-05 22:36:08,131 : INFO : adding document #0 to Dictionary(129483 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,139 : INFO : built Dictionary(129548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 608 documents (total 6080000 corpus positions)
+ 2021-05-05 22:36:08,181 : INFO : adding document #0 to Dictionary(129548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,190 : INFO : built Dictionary(129698 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 609 documents (total 6090000 corpus positions)
+ 2021-05-05 22:36:08,233 : INFO : adding document #0 to Dictionary(129698 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,242 : INFO : built Dictionary(129807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 610 documents (total 6100000 corpus positions)
+ 2021-05-05 22:36:08,285 : INFO : adding document #0 to Dictionary(129807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,293 : INFO : built Dictionary(129869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 611 documents (total 6110000 corpus positions)
+ 2021-05-05 22:36:08,337 : INFO : adding document #0 to Dictionary(129869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,345 : INFO : built Dictionary(130017 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 612 documents (total 6120000 corpus positions)
+ 2021-05-05 22:36:08,390 : INFO : adding document #0 to Dictionary(130017 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,399 : INFO : built Dictionary(130108 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 613 documents (total 6130000 corpus positions)
+ 2021-05-05 22:36:08,456 : INFO : adding document #0 to Dictionary(130108 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,466 : INFO : built Dictionary(130177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 614 documents (total 6140000 corpus positions)
+ 2021-05-05 22:36:08,517 : INFO : adding document #0 to Dictionary(130177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,525 : INFO : built Dictionary(130236 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 615 documents (total 6150000 corpus positions)
+ 2021-05-05 22:36:08,569 : INFO : adding document #0 to Dictionary(130236 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,579 : INFO : built Dictionary(130386 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 616 documents (total 6160000 corpus positions)
+ 2021-05-05 22:36:08,626 : INFO : adding document #0 to Dictionary(130386 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,633 : INFO : built Dictionary(130448 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 617 documents (total 6170000 corpus positions)
+ 2021-05-05 22:36:08,677 : INFO : adding document #0 to Dictionary(130448 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,685 : INFO : built Dictionary(130558 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 618 documents (total 6180000 corpus positions)
+ 2021-05-05 22:36:08,728 : INFO : adding document #0 to Dictionary(130558 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,737 : INFO : built Dictionary(130659 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 619 documents (total 6190000 corpus positions)
+ 2021-05-05 22:36:08,781 : INFO : adding document #0 to Dictionary(130659 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,789 : INFO : built Dictionary(130755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 620 documents (total 6200000 corpus positions)
+ 2021-05-05 22:36:08,833 : INFO : adding document #0 to Dictionary(130755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,840 : INFO : built Dictionary(130844 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 621 documents (total 6210000 corpus positions)
+ 2021-05-05 22:36:08,884 : INFO : adding document #0 to Dictionary(130844 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,891 : INFO : built Dictionary(130935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 622 documents (total 6220000 corpus positions)
+ 2021-05-05 22:36:08,935 : INFO : adding document #0 to Dictionary(130935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,942 : INFO : built Dictionary(131018 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 623 documents (total 6230000 corpus positions)
+ 2021-05-05 22:36:08,985 : INFO : adding document #0 to Dictionary(131018 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:08,993 : INFO : built Dictionary(131095 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 624 documents (total 6240000 corpus positions)
+ 2021-05-05 22:36:09,041 : INFO : adding document #0 to Dictionary(131095 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,050 : INFO : built Dictionary(131189 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 625 documents (total 6250000 corpus positions)
+ 2021-05-05 22:36:09,092 : INFO : adding document #0 to Dictionary(131189 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,100 : INFO : built Dictionary(131293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 626 documents (total 6260000 corpus positions)
+ 2021-05-05 22:36:09,144 : INFO : adding document #0 to Dictionary(131293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,153 : INFO : built Dictionary(131390 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 627 documents (total 6270000 corpus positions)
+ 2021-05-05 22:36:09,196 : INFO : adding document #0 to Dictionary(131390 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,203 : INFO : built Dictionary(131459 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 628 documents (total 6280000 corpus positions)
+ 2021-05-05 22:36:09,253 : INFO : adding document #0 to Dictionary(131459 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,263 : INFO : built Dictionary(131616 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 629 documents (total 6290000 corpus positions)
+ 2021-05-05 22:36:09,306 : INFO : adding document #0 to Dictionary(131616 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,315 : INFO : built Dictionary(131742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 630 documents (total 6300000 corpus positions)
+ 2021-05-05 22:36:09,359 : INFO : adding document #0 to Dictionary(131742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,368 : INFO : built Dictionary(131953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 631 documents (total 6310000 corpus positions)
+ 2021-05-05 22:36:09,411 : INFO : adding document #0 to Dictionary(131953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,419 : INFO : built Dictionary(132057 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 632 documents (total 6320000 corpus positions)
+ 2021-05-05 22:36:09,462 : INFO : adding document #0 to Dictionary(132057 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,471 : INFO : built Dictionary(132244 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 633 documents (total 6330000 corpus positions)
+ 2021-05-05 22:36:09,519 : INFO : adding document #0 to Dictionary(132244 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,529 : INFO : built Dictionary(132418 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 634 documents (total 6340000 corpus positions)
+ 2021-05-05 22:36:09,572 : INFO : adding document #0 to Dictionary(132418 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,581 : INFO : built Dictionary(132511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 635 documents (total 6350000 corpus positions)
+ 2021-05-05 22:36:09,623 : INFO : adding document #0 to Dictionary(132511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,631 : INFO : built Dictionary(132627 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 636 documents (total 6360000 corpus positions)
+ 2021-05-05 22:36:09,674 : INFO : adding document #0 to Dictionary(132627 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,682 : INFO : built Dictionary(132711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 637 documents (total 6370000 corpus positions)
+ 2021-05-05 22:36:09,729 : INFO : adding document #0 to Dictionary(132711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,739 : INFO : built Dictionary(132941 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 638 documents (total 6380000 corpus positions)
+ 2021-05-05 22:36:09,782 : INFO : adding document #0 to Dictionary(132941 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,791 : INFO : built Dictionary(133060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 639 documents (total 6390000 corpus positions)
+ 2021-05-05 22:36:09,838 : INFO : adding document #0 to Dictionary(133060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,847 : INFO : built Dictionary(133151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 640 documents (total 6400000 corpus positions)
+ 2021-05-05 22:36:09,890 : INFO : adding document #0 to Dictionary(133151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,898 : INFO : built Dictionary(133285 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 641 documents (total 6410000 corpus positions)
+ 2021-05-05 22:36:09,941 : INFO : adding document #0 to Dictionary(133285 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:09,948 : INFO : built Dictionary(133368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 642 documents (total 6420000 corpus positions)
+ 2021-05-05 22:36:09,992 : INFO : adding document #0 to Dictionary(133368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,000 : INFO : built Dictionary(133450 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 643 documents (total 6430000 corpus positions)
+ 2021-05-05 22:36:10,048 : INFO : adding document #0 to Dictionary(133450 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,058 : INFO : built Dictionary(133653 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 644 documents (total 6440000 corpus positions)
+ 2021-05-05 22:36:10,101 : INFO : adding document #0 to Dictionary(133653 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,109 : INFO : built Dictionary(133728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 645 documents (total 6450000 corpus positions)
+ 2021-05-05 22:36:10,153 : INFO : adding document #0 to Dictionary(133728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,161 : INFO : built Dictionary(133842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 646 documents (total 6460000 corpus positions)
+ 2021-05-05 22:36:10,205 : INFO : adding document #0 to Dictionary(133842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,213 : INFO : built Dictionary(134024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 647 documents (total 6470000 corpus positions)
+ 2021-05-05 22:36:10,256 : INFO : adding document #0 to Dictionary(134024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,264 : INFO : built Dictionary(134137 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 648 documents (total 6480000 corpus positions)
+ 2021-05-05 22:36:10,310 : INFO : adding document #0 to Dictionary(134137 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,319 : INFO : built Dictionary(134234 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 649 documents (total 6490000 corpus positions)
+ 2021-05-05 22:36:10,363 : INFO : adding document #0 to Dictionary(134234 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,372 : INFO : built Dictionary(134461 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 650 documents (total 6500000 corpus positions)
+ 2021-05-05 22:36:10,416 : INFO : adding document #0 to Dictionary(134461 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,424 : INFO : built Dictionary(134632 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 651 documents (total 6510000 corpus positions)
+ 2021-05-05 22:36:10,466 : INFO : adding document #0 to Dictionary(134632 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,473 : INFO : built Dictionary(134680 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 652 documents (total 6520000 corpus positions)
+ 2021-05-05 22:36:10,516 : INFO : adding document #0 to Dictionary(134680 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,523 : INFO : built Dictionary(134767 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 653 documents (total 6530000 corpus positions)
+ 2021-05-05 22:36:10,566 : INFO : adding document #0 to Dictionary(134767 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,576 : INFO : built Dictionary(134891 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 654 documents (total 6540000 corpus positions)
+ 2021-05-05 22:36:10,625 : INFO : adding document #0 to Dictionary(134891 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,634 : INFO : built Dictionary(135012 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 655 documents (total 6550000 corpus positions)
+ 2021-05-05 22:36:10,679 : INFO : adding document #0 to Dictionary(135012 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,688 : INFO : built Dictionary(135134 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 656 documents (total 6560000 corpus positions)
+ 2021-05-05 22:36:10,734 : INFO : adding document #0 to Dictionary(135134 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,742 : INFO : built Dictionary(135244 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 657 documents (total 6570000 corpus positions)
+ 2021-05-05 22:36:10,790 : INFO : adding document #0 to Dictionary(135244 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,799 : INFO : built Dictionary(135383 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 658 documents (total 6580000 corpus positions)
+ 2021-05-05 22:36:10,843 : INFO : adding document #0 to Dictionary(135383 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,850 : INFO : built Dictionary(135585 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 659 documents (total 6590000 corpus positions)
+ 2021-05-05 22:36:10,894 : INFO : adding document #0 to Dictionary(135585 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,903 : INFO : built Dictionary(135727 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 660 documents (total 6600000 corpus positions)
+ 2021-05-05 22:36:10,947 : INFO : adding document #0 to Dictionary(135727 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:10,955 : INFO : built Dictionary(135813 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 661 documents (total 6610000 corpus positions)
+ 2021-05-05 22:36:10,999 : INFO : adding document #0 to Dictionary(135813 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,009 : INFO : built Dictionary(135902 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 662 documents (total 6620000 corpus positions)
+ 2021-05-05 22:36:11,056 : INFO : adding document #0 to Dictionary(135902 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,064 : INFO : built Dictionary(136011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 663 documents (total 6630000 corpus positions)
+ 2021-05-05 22:36:11,107 : INFO : adding document #0 to Dictionary(136011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,115 : INFO : built Dictionary(136172 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 664 documents (total 6640000 corpus positions)
+ 2021-05-05 22:36:11,159 : INFO : adding document #0 to Dictionary(136172 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,168 : INFO : built Dictionary(136293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 665 documents (total 6650000 corpus positions)
+ 2021-05-05 22:36:11,214 : INFO : adding document #0 to Dictionary(136293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,225 : INFO : built Dictionary(136353 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 666 documents (total 6660000 corpus positions)
+ 2021-05-05 22:36:11,272 : INFO : adding document #0 to Dictionary(136353 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,281 : INFO : built Dictionary(136461 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 667 documents (total 6670000 corpus positions)
+ 2021-05-05 22:36:11,324 : INFO : adding document #0 to Dictionary(136461 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,335 : INFO : built Dictionary(136556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 668 documents (total 6680000 corpus positions)
+ 2021-05-05 22:36:11,385 : INFO : adding document #0 to Dictionary(136556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,398 : INFO : built Dictionary(136620 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 669 documents (total 6690000 corpus positions)
+ 2021-05-05 22:36:11,453 : INFO : adding document #0 to Dictionary(136620 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,462 : INFO : built Dictionary(136724 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 670 documents (total 6700000 corpus positions)
+ 2021-05-05 22:36:11,508 : INFO : adding document #0 to Dictionary(136724 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,517 : INFO : built Dictionary(136828 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 671 documents (total 6710000 corpus positions)
+ 2021-05-05 22:36:11,560 : INFO : adding document #0 to Dictionary(136828 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,568 : INFO : built Dictionary(136888 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 672 documents (total 6720000 corpus positions)
+ 2021-05-05 22:36:11,617 : INFO : adding document #0 to Dictionary(136888 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,624 : INFO : built Dictionary(136982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 673 documents (total 6730000 corpus positions)
+ 2021-05-05 22:36:11,667 : INFO : adding document #0 to Dictionary(136982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,676 : INFO : built Dictionary(137102 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 674 documents (total 6740000 corpus positions)
+ 2021-05-05 22:36:11,720 : INFO : adding document #0 to Dictionary(137102 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,728 : INFO : built Dictionary(137187 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 675 documents (total 6750000 corpus positions)
+ 2021-05-05 22:36:11,774 : INFO : adding document #0 to Dictionary(137187 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,783 : INFO : built Dictionary(137496 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 676 documents (total 6760000 corpus positions)
+ 2021-05-05 22:36:11,831 : INFO : adding document #0 to Dictionary(137496 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,843 : INFO : built Dictionary(137623 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 677 documents (total 6770000 corpus positions)
+ 2021-05-05 22:36:11,887 : INFO : adding document #0 to Dictionary(137623 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,897 : INFO : built Dictionary(137744 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 678 documents (total 6780000 corpus positions)
+ 2021-05-05 22:36:11,945 : INFO : adding document #0 to Dictionary(137744 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:11,953 : INFO : built Dictionary(137860 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 679 documents (total 6790000 corpus positions)
+ 2021-05-05 22:36:11,996 : INFO : adding document #0 to Dictionary(137860 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,004 : INFO : built Dictionary(138028 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 680 documents (total 6800000 corpus positions)
+ 2021-05-05 22:36:12,047 : INFO : adding document #0 to Dictionary(138028 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,055 : INFO : built Dictionary(138152 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 681 documents (total 6810000 corpus positions)
+ 2021-05-05 22:36:12,098 : INFO : adding document #0 to Dictionary(138152 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,108 : INFO : built Dictionary(138239 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 682 documents (total 6820000 corpus positions)
+ 2021-05-05 22:36:12,153 : INFO : adding document #0 to Dictionary(138239 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,163 : INFO : built Dictionary(138341 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 683 documents (total 6830000 corpus positions)
+ 2021-05-05 22:36:12,206 : INFO : adding document #0 to Dictionary(138341 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,215 : INFO : built Dictionary(138458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 684 documents (total 6840000 corpus positions)
+ 2021-05-05 22:36:12,259 : INFO : adding document #0 to Dictionary(138458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,267 : INFO : built Dictionary(138601 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 685 documents (total 6850000 corpus positions)
+ 2021-05-05 22:36:12,311 : INFO : adding document #0 to Dictionary(138601 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,319 : INFO : built Dictionary(138704 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 686 documents (total 6860000 corpus positions)
+ 2021-05-05 22:36:12,363 : INFO : adding document #0 to Dictionary(138704 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,371 : INFO : built Dictionary(138831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 687 documents (total 6870000 corpus positions)
+ 2021-05-05 22:36:12,420 : INFO : adding document #0 to Dictionary(138831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,429 : INFO : built Dictionary(139055 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 688 documents (total 6880000 corpus positions)
+ 2021-05-05 22:36:12,472 : INFO : adding document #0 to Dictionary(139055 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,480 : INFO : built Dictionary(139148 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 689 documents (total 6890000 corpus positions)
+ 2021-05-05 22:36:12,523 : INFO : adding document #0 to Dictionary(139148 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,531 : INFO : built Dictionary(139208 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 690 documents (total 6900000 corpus positions)
+ 2021-05-05 22:36:12,575 : INFO : adding document #0 to Dictionary(139208 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,583 : INFO : built Dictionary(139310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 691 documents (total 6910000 corpus positions)
+ 2021-05-05 22:36:12,626 : INFO : adding document #0 to Dictionary(139310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,635 : INFO : built Dictionary(139430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 692 documents (total 6920000 corpus positions)
+ 2021-05-05 22:36:12,678 : INFO : adding document #0 to Dictionary(139430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,686 : INFO : built Dictionary(139521 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 693 documents (total 6930000 corpus positions)
+ 2021-05-05 22:36:12,730 : INFO : adding document #0 to Dictionary(139521 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,738 : INFO : built Dictionary(139608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 694 documents (total 6940000 corpus positions)
+ 2021-05-05 22:36:12,783 : INFO : adding document #0 to Dictionary(139608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,792 : INFO : built Dictionary(139720 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 695 documents (total 6950000 corpus positions)
+ 2021-05-05 22:36:12,841 : INFO : adding document #0 to Dictionary(139720 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,852 : INFO : built Dictionary(139797 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 696 documents (total 6960000 corpus positions)
+ 2021-05-05 22:36:12,901 : INFO : adding document #0 to Dictionary(139797 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,912 : INFO : built Dictionary(139969 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 697 documents (total 6970000 corpus positions)
+ 2021-05-05 22:36:12,967 : INFO : adding document #0 to Dictionary(139969 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:12,975 : INFO : built Dictionary(140104 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 698 documents (total 6980000 corpus positions)
+ 2021-05-05 22:36:13,019 : INFO : adding document #0 to Dictionary(140104 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,031 : INFO : built Dictionary(140231 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 699 documents (total 6990000 corpus positions)
+ 2021-05-05 22:36:13,075 : INFO : adding document #0 to Dictionary(140231 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,083 : INFO : built Dictionary(140320 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 700 documents (total 7000000 corpus positions)
+ 2021-05-05 22:36:13,127 : INFO : adding document #0 to Dictionary(140320 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,135 : INFO : built Dictionary(140429 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 701 documents (total 7010000 corpus positions)
+ 2021-05-05 22:36:13,179 : INFO : adding document #0 to Dictionary(140429 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,187 : INFO : built Dictionary(140572 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 702 documents (total 7020000 corpus positions)
+ 2021-05-05 22:36:13,231 : INFO : adding document #0 to Dictionary(140572 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,241 : INFO : built Dictionary(140720 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 703 documents (total 7030000 corpus positions)
+ 2021-05-05 22:36:13,284 : INFO : adding document #0 to Dictionary(140720 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,293 : INFO : built Dictionary(140801 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 704 documents (total 7040000 corpus positions)
+ 2021-05-05 22:36:13,339 : INFO : adding document #0 to Dictionary(140801 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,348 : INFO : built Dictionary(140937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 705 documents (total 7050000 corpus positions)
+ 2021-05-05 22:36:13,393 : INFO : adding document #0 to Dictionary(140937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,402 : INFO : built Dictionary(141029 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 706 documents (total 7060000 corpus positions)
+ 2021-05-05 22:36:13,450 : INFO : adding document #0 to Dictionary(141029 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,459 : INFO : built Dictionary(141145 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 707 documents (total 7070000 corpus positions)
+ 2021-05-05 22:36:13,502 : INFO : adding document #0 to Dictionary(141145 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,509 : INFO : built Dictionary(141230 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 708 documents (total 7080000 corpus positions)
+ 2021-05-05 22:36:13,553 : INFO : adding document #0 to Dictionary(141230 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,564 : INFO : built Dictionary(141660 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 709 documents (total 7090000 corpus positions)
+ 2021-05-05 22:36:13,607 : INFO : adding document #0 to Dictionary(141660 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,617 : INFO : built Dictionary(141799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 710 documents (total 7100000 corpus positions)
+ 2021-05-05 22:36:13,666 : INFO : adding document #0 to Dictionary(141799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,675 : INFO : built Dictionary(141871 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 711 documents (total 7110000 corpus positions)
+ 2021-05-05 22:36:13,719 : INFO : adding document #0 to Dictionary(141871 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,728 : INFO : built Dictionary(141937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 712 documents (total 7120000 corpus positions)
+ 2021-05-05 22:36:13,777 : INFO : adding document #0 to Dictionary(141937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,784 : INFO : built Dictionary(141983 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 713 documents (total 7130000 corpus positions)
+ 2021-05-05 22:36:13,828 : INFO : adding document #0 to Dictionary(141983 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,838 : INFO : built Dictionary(142083 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 714 documents (total 7140000 corpus positions)
+ 2021-05-05 22:36:13,881 : INFO : adding document #0 to Dictionary(142083 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,889 : INFO : built Dictionary(142196 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 715 documents (total 7150000 corpus positions)
+ 2021-05-05 22:36:13,936 : INFO : adding document #0 to Dictionary(142196 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,945 : INFO : built Dictionary(142299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 716 documents (total 7160000 corpus positions)
+ 2021-05-05 22:36:13,989 : INFO : adding document #0 to Dictionary(142299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:13,997 : INFO : built Dictionary(142356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 717 documents (total 7170000 corpus positions)
+ 2021-05-05 22:36:14,048 : INFO : adding document #0 to Dictionary(142356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,057 : INFO : built Dictionary(142453 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 718 documents (total 7180000 corpus positions)
+ 2021-05-05 22:36:14,101 : INFO : adding document #0 to Dictionary(142453 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,109 : INFO : built Dictionary(142562 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 719 documents (total 7190000 corpus positions)
+ 2021-05-05 22:36:14,153 : INFO : adding document #0 to Dictionary(142562 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,161 : INFO : built Dictionary(142689 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 720 documents (total 7200000 corpus positions)
+ 2021-05-05 22:36:14,205 : INFO : adding document #0 to Dictionary(142689 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,217 : INFO : built Dictionary(142789 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 721 documents (total 7210000 corpus positions)
+ 2021-05-05 22:36:14,263 : INFO : adding document #0 to Dictionary(142789 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,272 : INFO : built Dictionary(142887 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 722 documents (total 7220000 corpus positions)
+ 2021-05-05 22:36:14,317 : INFO : adding document #0 to Dictionary(142887 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,325 : INFO : built Dictionary(143013 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 723 documents (total 7230000 corpus positions)
+ 2021-05-05 22:36:14,369 : INFO : adding document #0 to Dictionary(143013 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,376 : INFO : built Dictionary(143123 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 724 documents (total 7240000 corpus positions)
+ 2021-05-05 22:36:14,420 : INFO : adding document #0 to Dictionary(143123 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,428 : INFO : built Dictionary(143195 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 725 documents (total 7250000 corpus positions)
+ 2021-05-05 22:36:14,472 : INFO : adding document #0 to Dictionary(143195 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,479 : INFO : built Dictionary(143322 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 726 documents (total 7260000 corpus positions)
+ 2021-05-05 22:36:14,527 : INFO : adding document #0 to Dictionary(143322 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,536 : INFO : built Dictionary(143419 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 727 documents (total 7270000 corpus positions)
+ 2021-05-05 22:36:14,580 : INFO : adding document #0 to Dictionary(143419 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,588 : INFO : built Dictionary(143471 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 728 documents (total 7280000 corpus positions)
+ 2021-05-05 22:36:14,638 : INFO : adding document #0 to Dictionary(143471 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,647 : INFO : built Dictionary(143543 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 729 documents (total 7290000 corpus positions)
+ 2021-05-05 22:36:14,690 : INFO : adding document #0 to Dictionary(143543 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,697 : INFO : built Dictionary(143675 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 730 documents (total 7300000 corpus positions)
+ 2021-05-05 22:36:14,747 : INFO : adding document #0 to Dictionary(143675 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,757 : INFO : built Dictionary(143808 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 731 documents (total 7310000 corpus positions)
+ 2021-05-05 22:36:14,801 : INFO : adding document #0 to Dictionary(143808 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,810 : INFO : built Dictionary(143907 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 732 documents (total 7320000 corpus positions)
+ 2021-05-05 22:36:14,855 : INFO : adding document #0 to Dictionary(143907 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,864 : INFO : built Dictionary(144045 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 733 documents (total 7330000 corpus positions)
+ 2021-05-05 22:36:14,909 : INFO : adding document #0 to Dictionary(144045 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,917 : INFO : built Dictionary(144153 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 734 documents (total 7340000 corpus positions)
+ 2021-05-05 22:36:14,964 : INFO : adding document #0 to Dictionary(144153 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:14,972 : INFO : built Dictionary(144226 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 735 documents (total 7350000 corpus positions)
+ 2021-05-05 22:36:15,016 : INFO : adding document #0 to Dictionary(144226 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,025 : INFO : built Dictionary(144325 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 736 documents (total 7360000 corpus positions)
+ 2021-05-05 22:36:15,074 : INFO : adding document #0 to Dictionary(144325 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,081 : INFO : built Dictionary(144419 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 737 documents (total 7370000 corpus positions)
+ 2021-05-05 22:36:15,131 : INFO : adding document #0 to Dictionary(144419 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,138 : INFO : built Dictionary(144518 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 738 documents (total 7380000 corpus positions)
+ 2021-05-05 22:36:15,181 : INFO : adding document #0 to Dictionary(144518 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,188 : INFO : built Dictionary(144615 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 739 documents (total 7390000 corpus positions)
+ 2021-05-05 22:36:15,235 : INFO : adding document #0 to Dictionary(144615 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,244 : INFO : built Dictionary(144823 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 740 documents (total 7400000 corpus positions)
+ 2021-05-05 22:36:15,287 : INFO : adding document #0 to Dictionary(144823 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,297 : INFO : built Dictionary(145022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 741 documents (total 7410000 corpus positions)
+ 2021-05-05 22:36:15,347 : INFO : adding document #0 to Dictionary(145022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,356 : INFO : built Dictionary(145076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 742 documents (total 7420000 corpus positions)
+ 2021-05-05 22:36:15,401 : INFO : adding document #0 to Dictionary(145076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,411 : INFO : built Dictionary(145168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 743 documents (total 7430000 corpus positions)
+ 2021-05-05 22:36:15,458 : INFO : adding document #0 to Dictionary(145168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,467 : INFO : built Dictionary(145555 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 744 documents (total 7440000 corpus positions)
+ 2021-05-05 22:36:15,510 : INFO : adding document #0 to Dictionary(145555 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,517 : INFO : built Dictionary(145596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 745 documents (total 7450000 corpus positions)
+ 2021-05-05 22:36:15,560 : INFO : adding document #0 to Dictionary(145596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,568 : INFO : built Dictionary(145704 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 746 documents (total 7460000 corpus positions)
+ 2021-05-05 22:36:15,613 : INFO : adding document #0 to Dictionary(145704 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,620 : INFO : built Dictionary(145814 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 747 documents (total 7470000 corpus positions)
+ 2021-05-05 22:36:15,663 : INFO : adding document #0 to Dictionary(145814 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,670 : INFO : built Dictionary(145943 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 748 documents (total 7480000 corpus positions)
+ 2021-05-05 22:36:15,714 : INFO : adding document #0 to Dictionary(145943 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,722 : INFO : built Dictionary(146043 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 749 documents (total 7490000 corpus positions)
+ 2021-05-05 22:36:15,766 : INFO : adding document #0 to Dictionary(146043 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,774 : INFO : built Dictionary(146150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 750 documents (total 7500000 corpus positions)
+ 2021-05-05 22:36:15,819 : INFO : adding document #0 to Dictionary(146150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,829 : INFO : built Dictionary(146253 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 751 documents (total 7510000 corpus positions)
+ 2021-05-05 22:36:15,873 : INFO : adding document #0 to Dictionary(146253 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,882 : INFO : built Dictionary(146334 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 752 documents (total 7520000 corpus positions)
+ 2021-05-05 22:36:15,929 : INFO : adding document #0 to Dictionary(146334 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,939 : INFO : built Dictionary(146706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 753 documents (total 7530000 corpus positions)
+ 2021-05-05 22:36:15,982 : INFO : adding document #0 to Dictionary(146706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:15,988 : INFO : built Dictionary(146753 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 754 documents (total 7540000 corpus positions)
+ 2021-05-05 22:36:16,041 : INFO : adding document #0 to Dictionary(146753 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,050 : INFO : built Dictionary(146833 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 755 documents (total 7550000 corpus positions)
+ 2021-05-05 22:36:16,095 : INFO : adding document #0 to Dictionary(146833 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,102 : INFO : built Dictionary(146974 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 756 documents (total 7560000 corpus positions)
+ 2021-05-05 22:36:16,146 : INFO : adding document #0 to Dictionary(146974 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,153 : INFO : built Dictionary(147051 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 757 documents (total 7570000 corpus positions)
+ 2021-05-05 22:36:16,196 : INFO : adding document #0 to Dictionary(147051 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,204 : INFO : built Dictionary(147152 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 758 documents (total 7580000 corpus positions)
+ 2021-05-05 22:36:16,248 : INFO : adding document #0 to Dictionary(147152 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,255 : INFO : built Dictionary(147232 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 759 documents (total 7590000 corpus positions)
+ 2021-05-05 22:36:16,298 : INFO : adding document #0 to Dictionary(147232 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,305 : INFO : built Dictionary(147305 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 760 documents (total 7600000 corpus positions)
+ 2021-05-05 22:36:16,355 : INFO : adding document #0 to Dictionary(147305 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,363 : INFO : built Dictionary(147363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 761 documents (total 7610000 corpus positions)
+ 2021-05-05 22:36:16,409 : INFO : adding document #0 to Dictionary(147363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,418 : INFO : built Dictionary(147473 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 762 documents (total 7620000 corpus positions)
+ 2021-05-05 22:36:16,462 : INFO : adding document #0 to Dictionary(147473 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,470 : INFO : built Dictionary(147576 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 763 documents (total 7630000 corpus positions)
+ 2021-05-05 22:36:16,518 : INFO : adding document #0 to Dictionary(147576 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,527 : INFO : built Dictionary(147682 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 764 documents (total 7640000 corpus positions)
+ 2021-05-05 22:36:16,571 : INFO : adding document #0 to Dictionary(147682 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,579 : INFO : built Dictionary(147779 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 765 documents (total 7650000 corpus positions)
+ 2021-05-05 22:36:16,623 : INFO : adding document #0 to Dictionary(147779 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,632 : INFO : built Dictionary(147964 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 766 documents (total 7660000 corpus positions)
+ 2021-05-05 22:36:16,675 : INFO : adding document #0 to Dictionary(147964 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,683 : INFO : built Dictionary(148040 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 767 documents (total 7670000 corpus positions)
+ 2021-05-05 22:36:16,727 : INFO : adding document #0 to Dictionary(148040 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,737 : INFO : built Dictionary(148431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 768 documents (total 7680000 corpus positions)
+ 2021-05-05 22:36:16,781 : INFO : adding document #0 to Dictionary(148431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,789 : INFO : built Dictionary(148526 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 769 documents (total 7690000 corpus positions)
+ 2021-05-05 22:36:16,833 : INFO : adding document #0 to Dictionary(148526 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,841 : INFO : built Dictionary(148588 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 770 documents (total 7700000 corpus positions)
+ 2021-05-05 22:36:16,883 : INFO : adding document #0 to Dictionary(148588 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,890 : INFO : built Dictionary(148663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 771 documents (total 7710000 corpus positions)
+ 2021-05-05 22:36:16,933 : INFO : adding document #0 to Dictionary(148663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,940 : INFO : built Dictionary(148735 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 772 documents (total 7720000 corpus positions)
+ 2021-05-05 22:36:16,983 : INFO : adding document #0 to Dictionary(148735 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:16,990 : INFO : built Dictionary(148835 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 773 documents (total 7730000 corpus positions)
+ 2021-05-05 22:36:17,038 : INFO : adding document #0 to Dictionary(148835 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,048 : INFO : built Dictionary(148970 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 774 documents (total 7740000 corpus positions)
+ 2021-05-05 22:36:17,092 : INFO : adding document #0 to Dictionary(148970 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,101 : INFO : built Dictionary(149064 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 775 documents (total 7750000 corpus positions)
+ 2021-05-05 22:36:17,149 : INFO : adding document #0 to Dictionary(149064 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,157 : INFO : built Dictionary(149141 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 776 documents (total 7760000 corpus positions)
+ 2021-05-05 22:36:17,201 : INFO : adding document #0 to Dictionary(149141 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,213 : INFO : built Dictionary(149304 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 777 documents (total 7770000 corpus positions)
+ 2021-05-05 22:36:17,262 : INFO : adding document #0 to Dictionary(149304 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,271 : INFO : built Dictionary(149522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 778 documents (total 7780000 corpus positions)
+ 2021-05-05 22:36:17,315 : INFO : adding document #0 to Dictionary(149522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,323 : INFO : built Dictionary(149586 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 779 documents (total 7790000 corpus positions)
+ 2021-05-05 22:36:17,368 : INFO : adding document #0 to Dictionary(149586 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,378 : INFO : built Dictionary(149693 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 780 documents (total 7800000 corpus positions)
+ 2021-05-05 22:36:17,422 : INFO : adding document #0 to Dictionary(149693 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,430 : INFO : built Dictionary(149781 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 781 documents (total 7810000 corpus positions)
+ 2021-05-05 22:36:17,474 : INFO : adding document #0 to Dictionary(149781 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,482 : INFO : built Dictionary(149947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 782 documents (total 7820000 corpus positions)
+ 2021-05-05 22:36:17,525 : INFO : adding document #0 to Dictionary(149947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,534 : INFO : built Dictionary(150037 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 783 documents (total 7830000 corpus positions)
+ 2021-05-05 22:36:17,578 : INFO : adding document #0 to Dictionary(150037 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,586 : INFO : built Dictionary(150111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 784 documents (total 7840000 corpus positions)
+ 2021-05-05 22:36:17,629 : INFO : adding document #0 to Dictionary(150111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,637 : INFO : built Dictionary(150267 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 785 documents (total 7850000 corpus positions)
+ 2021-05-05 22:36:17,680 : INFO : adding document #0 to Dictionary(150267 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,688 : INFO : built Dictionary(150338 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 786 documents (total 7860000 corpus positions)
+ 2021-05-05 22:36:17,732 : INFO : adding document #0 to Dictionary(150338 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,742 : INFO : built Dictionary(150683 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 787 documents (total 7870000 corpus positions)
+ 2021-05-05 22:36:17,788 : INFO : adding document #0 to Dictionary(150683 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,797 : INFO : built Dictionary(150838 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 788 documents (total 7880000 corpus positions)
+ 2021-05-05 22:36:17,846 : INFO : adding document #0 to Dictionary(150838 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,857 : INFO : built Dictionary(150934 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 789 documents (total 7890000 corpus positions)
+ 2021-05-05 22:36:17,902 : INFO : adding document #0 to Dictionary(150934 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,912 : INFO : built Dictionary(151047 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 790 documents (total 7900000 corpus positions)
+ 2021-05-05 22:36:17,957 : INFO : adding document #0 to Dictionary(151047 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:17,965 : INFO : built Dictionary(151134 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 791 documents (total 7910000 corpus positions)
+ 2021-05-05 22:36:18,014 : INFO : adding document #0 to Dictionary(151134 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,023 : INFO : built Dictionary(151214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 792 documents (total 7920000 corpus positions)
+ 2021-05-05 22:36:18,073 : INFO : adding document #0 to Dictionary(151214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,084 : INFO : built Dictionary(151349 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 793 documents (total 7930000 corpus positions)
+ 2021-05-05 22:36:18,131 : INFO : adding document #0 to Dictionary(151349 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,140 : INFO : built Dictionary(151467 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 794 documents (total 7940000 corpus positions)
+ 2021-05-05 22:36:18,183 : INFO : adding document #0 to Dictionary(151467 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,191 : INFO : built Dictionary(151541 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 795 documents (total 7950000 corpus positions)
+ 2021-05-05 22:36:18,239 : INFO : adding document #0 to Dictionary(151541 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,247 : INFO : built Dictionary(151637 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 796 documents (total 7960000 corpus positions)
+ 2021-05-05 22:36:18,291 : INFO : adding document #0 to Dictionary(151637 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,298 : INFO : built Dictionary(151731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 797 documents (total 7970000 corpus positions)
+ 2021-05-05 22:36:18,341 : INFO : adding document #0 to Dictionary(151731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,349 : INFO : built Dictionary(151812 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 798 documents (total 7980000 corpus positions)
+ 2021-05-05 22:36:18,391 : INFO : adding document #0 to Dictionary(151812 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,400 : INFO : built Dictionary(151891 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 799 documents (total 7990000 corpus positions)
+ 2021-05-05 22:36:18,444 : INFO : adding document #0 to Dictionary(151891 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,452 : INFO : built Dictionary(151968 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 800 documents (total 8000000 corpus positions)
+ 2021-05-05 22:36:18,534 : INFO : adding document #0 to Dictionary(151968 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,570 : INFO : built Dictionary(152084 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 801 documents (total 8010000 corpus positions)
+ 2021-05-05 22:36:18,640 : INFO : adding document #0 to Dictionary(152084 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,648 : INFO : built Dictionary(152180 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 802 documents (total 8020000 corpus positions)
+ 2021-05-05 22:36:18,702 : INFO : adding document #0 to Dictionary(152180 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,710 : INFO : built Dictionary(152274 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 803 documents (total 8030000 corpus positions)
+ 2021-05-05 22:36:18,762 : INFO : adding document #0 to Dictionary(152274 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,769 : INFO : built Dictionary(152349 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 804 documents (total 8040000 corpus positions)
+ 2021-05-05 22:36:18,822 : INFO : adding document #0 to Dictionary(152349 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,834 : INFO : built Dictionary(152414 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 805 documents (total 8050000 corpus positions)
+ 2021-05-05 22:36:18,881 : INFO : adding document #0 to Dictionary(152414 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,891 : INFO : built Dictionary(153168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 806 documents (total 8060000 corpus positions)
+ 2021-05-05 22:36:18,934 : INFO : adding document #0 to Dictionary(153168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,942 : INFO : built Dictionary(153234 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 807 documents (total 8070000 corpus positions)
+ 2021-05-05 22:36:18,987 : INFO : adding document #0 to Dictionary(153234 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:18,996 : INFO : built Dictionary(153330 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 808 documents (total 8080000 corpus positions)
+ 2021-05-05 22:36:19,043 : INFO : adding document #0 to Dictionary(153330 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,053 : INFO : built Dictionary(153674 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 809 documents (total 8090000 corpus positions)
+ 2021-05-05 22:36:19,100 : INFO : adding document #0 to Dictionary(153674 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,109 : INFO : built Dictionary(153770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 810 documents (total 8100000 corpus positions)
+ 2021-05-05 22:36:19,153 : INFO : adding document #0 to Dictionary(153770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,161 : INFO : built Dictionary(153832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 811 documents (total 8110000 corpus positions)
+ 2021-05-05 22:36:19,204 : INFO : adding document #0 to Dictionary(153832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,215 : INFO : built Dictionary(153892 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 812 documents (total 8120000 corpus positions)
+ 2021-05-05 22:36:19,263 : INFO : adding document #0 to Dictionary(153892 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,273 : INFO : built Dictionary(154030 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 813 documents (total 8130000 corpus positions)
+ 2021-05-05 22:36:19,316 : INFO : adding document #0 to Dictionary(154030 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,325 : INFO : built Dictionary(154151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 814 documents (total 8140000 corpus positions)
+ 2021-05-05 22:36:19,368 : INFO : adding document #0 to Dictionary(154151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,375 : INFO : built Dictionary(154274 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 815 documents (total 8150000 corpus positions)
+ 2021-05-05 22:36:19,418 : INFO : adding document #0 to Dictionary(154274 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,425 : INFO : built Dictionary(154347 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 816 documents (total 8160000 corpus positions)
+ 2021-05-05 22:36:19,473 : INFO : adding document #0 to Dictionary(154347 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,480 : INFO : built Dictionary(154424 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 817 documents (total 8170000 corpus positions)
+ 2021-05-05 22:36:19,523 : INFO : adding document #0 to Dictionary(154424 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,531 : INFO : built Dictionary(154528 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 818 documents (total 8180000 corpus positions)
+ 2021-05-05 22:36:19,574 : INFO : adding document #0 to Dictionary(154528 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,583 : INFO : built Dictionary(154726 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 819 documents (total 8190000 corpus positions)
+ 2021-05-05 22:36:19,625 : INFO : adding document #0 to Dictionary(154726 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,633 : INFO : built Dictionary(155004 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 820 documents (total 8200000 corpus positions)
+ 2021-05-05 22:36:19,676 : INFO : adding document #0 to Dictionary(155004 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,683 : INFO : built Dictionary(155187 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 821 documents (total 8210000 corpus positions)
+ 2021-05-05 22:36:19,731 : INFO : adding document #0 to Dictionary(155187 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,738 : INFO : built Dictionary(155322 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 822 documents (total 8220000 corpus positions)
+ 2021-05-05 22:36:19,782 : INFO : adding document #0 to Dictionary(155322 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,790 : INFO : built Dictionary(155492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 823 documents (total 8230000 corpus positions)
+ 2021-05-05 22:36:19,834 : INFO : adding document #0 to Dictionary(155492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,843 : INFO : built Dictionary(155639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 824 documents (total 8240000 corpus positions)
+ 2021-05-05 22:36:19,910 : INFO : adding document #0 to Dictionary(155639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,920 : INFO : built Dictionary(155714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 825 documents (total 8250000 corpus positions)
+ 2021-05-05 22:36:19,966 : INFO : adding document #0 to Dictionary(155714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:19,975 : INFO : built Dictionary(155804 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 826 documents (total 8260000 corpus positions)
+ 2021-05-05 22:36:20,021 : INFO : adding document #0 to Dictionary(155804 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,029 : INFO : built Dictionary(155863 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 827 documents (total 8270000 corpus positions)
+ 2021-05-05 22:36:20,074 : INFO : adding document #0 to Dictionary(155863 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,081 : INFO : built Dictionary(155938 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 828 documents (total 8280000 corpus positions)
+ 2021-05-05 22:36:20,128 : INFO : adding document #0 to Dictionary(155938 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,136 : INFO : built Dictionary(156110 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 829 documents (total 8290000 corpus positions)
+ 2021-05-05 22:36:20,183 : INFO : adding document #0 to Dictionary(156110 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,191 : INFO : built Dictionary(156355 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 830 documents (total 8300000 corpus positions)
+ 2021-05-05 22:36:20,238 : INFO : adding document #0 to Dictionary(156355 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,247 : INFO : built Dictionary(156457 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 831 documents (total 8310000 corpus positions)
+ 2021-05-05 22:36:20,294 : INFO : adding document #0 to Dictionary(156457 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,302 : INFO : built Dictionary(156535 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 832 documents (total 8320000 corpus positions)
+ 2021-05-05 22:36:20,350 : INFO : adding document #0 to Dictionary(156535 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,359 : INFO : built Dictionary(156674 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 833 documents (total 8330000 corpus positions)
+ 2021-05-05 22:36:20,408 : INFO : adding document #0 to Dictionary(156674 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,417 : INFO : built Dictionary(156751 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 834 documents (total 8340000 corpus positions)
+ 2021-05-05 22:36:20,465 : INFO : adding document #0 to Dictionary(156751 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,474 : INFO : built Dictionary(156824 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 835 documents (total 8350000 corpus positions)
+ 2021-05-05 22:36:20,523 : INFO : adding document #0 to Dictionary(156824 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,534 : INFO : built Dictionary(157156 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 836 documents (total 8360000 corpus positions)
+ 2021-05-05 22:36:20,581 : INFO : adding document #0 to Dictionary(157156 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,590 : INFO : built Dictionary(157333 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 837 documents (total 8370000 corpus positions)
+ 2021-05-05 22:36:20,637 : INFO : adding document #0 to Dictionary(157333 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,645 : INFO : built Dictionary(157412 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 838 documents (total 8380000 corpus positions)
+ 2021-05-05 22:36:20,693 : INFO : adding document #0 to Dictionary(157412 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,701 : INFO : built Dictionary(157484 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 839 documents (total 8390000 corpus positions)
+ 2021-05-05 22:36:20,753 : INFO : adding document #0 to Dictionary(157484 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,761 : INFO : built Dictionary(157556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 840 documents (total 8400000 corpus positions)
+ 2021-05-05 22:36:20,810 : INFO : adding document #0 to Dictionary(157556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,818 : INFO : built Dictionary(157657 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 841 documents (total 8410000 corpus positions)
+ 2021-05-05 22:36:20,867 : INFO : adding document #0 to Dictionary(157657 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,877 : INFO : built Dictionary(157730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 842 documents (total 8420000 corpus positions)
+ 2021-05-05 22:36:20,929 : INFO : adding document #0 to Dictionary(157730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,938 : INFO : built Dictionary(157807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 843 documents (total 8430000 corpus positions)
+ 2021-05-05 22:36:20,984 : INFO : adding document #0 to Dictionary(157807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:20,993 : INFO : built Dictionary(158043 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 844 documents (total 8440000 corpus positions)
+ 2021-05-05 22:36:21,041 : INFO : adding document #0 to Dictionary(158043 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,052 : INFO : built Dictionary(158171 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 845 documents (total 8450000 corpus positions)
+ 2021-05-05 22:36:21,098 : INFO : adding document #0 to Dictionary(158171 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,107 : INFO : built Dictionary(158294 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 846 documents (total 8460000 corpus positions)
+ 2021-05-05 22:36:21,153 : INFO : adding document #0 to Dictionary(158294 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,162 : INFO : built Dictionary(158616 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 847 documents (total 8470000 corpus positions)
+ 2021-05-05 22:36:21,212 : INFO : adding document #0 to Dictionary(158616 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,222 : INFO : built Dictionary(158684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 848 documents (total 8480000 corpus positions)
+ 2021-05-05 22:36:21,271 : INFO : adding document #0 to Dictionary(158684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,279 : INFO : built Dictionary(158816 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 849 documents (total 8490000 corpus positions)
+ 2021-05-05 22:36:21,322 : INFO : adding document #0 to Dictionary(158816 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,331 : INFO : built Dictionary(158890 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 850 documents (total 8500000 corpus positions)
+ 2021-05-05 22:36:21,375 : INFO : adding document #0 to Dictionary(158890 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,383 : INFO : built Dictionary(158990 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 851 documents (total 8510000 corpus positions)
+ 2021-05-05 22:36:21,427 : INFO : adding document #0 to Dictionary(158990 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,434 : INFO : built Dictionary(159060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 852 documents (total 8520000 corpus positions)
+ 2021-05-05 22:36:21,477 : INFO : adding document #0 to Dictionary(159060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,486 : INFO : built Dictionary(159215 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 853 documents (total 8530000 corpus positions)
+ 2021-05-05 22:36:21,530 : INFO : adding document #0 to Dictionary(159215 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,538 : INFO : built Dictionary(159321 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 854 documents (total 8540000 corpus positions)
+ 2021-05-05 22:36:21,585 : INFO : adding document #0 to Dictionary(159321 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,594 : INFO : built Dictionary(159406 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 855 documents (total 8550000 corpus positions)
+ 2021-05-05 22:36:21,636 : INFO : adding document #0 to Dictionary(159406 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,643 : INFO : built Dictionary(159466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 856 documents (total 8560000 corpus positions)
+ 2021-05-05 22:36:21,687 : INFO : adding document #0 to Dictionary(159466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,694 : INFO : built Dictionary(159576 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 857 documents (total 8570000 corpus positions)
+ 2021-05-05 22:36:21,737 : INFO : adding document #0 to Dictionary(159576 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,746 : INFO : built Dictionary(159685 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 858 documents (total 8580000 corpus positions)
+ 2021-05-05 22:36:21,789 : INFO : adding document #0 to Dictionary(159685 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,796 : INFO : built Dictionary(159783 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 859 documents (total 8590000 corpus positions)
+ 2021-05-05 22:36:21,842 : INFO : adding document #0 to Dictionary(159783 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,849 : INFO : built Dictionary(159879 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 860 documents (total 8600000 corpus positions)
+ 2021-05-05 22:36:21,894 : INFO : adding document #0 to Dictionary(159879 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,901 : INFO : built Dictionary(159940 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 861 documents (total 8610000 corpus positions)
+ 2021-05-05 22:36:21,945 : INFO : adding document #0 to Dictionary(159940 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:21,953 : INFO : built Dictionary(160060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 862 documents (total 8620000 corpus positions)
+ 2021-05-05 22:36:21,996 : INFO : adding document #0 to Dictionary(160060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,005 : INFO : built Dictionary(160124 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 863 documents (total 8630000 corpus positions)
+ 2021-05-05 22:36:22,052 : INFO : adding document #0 to Dictionary(160124 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,060 : INFO : built Dictionary(160164 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 864 documents (total 8640000 corpus positions)
+ 2021-05-05 22:36:22,103 : INFO : adding document #0 to Dictionary(160164 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,110 : INFO : built Dictionary(160309 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 865 documents (total 8650000 corpus positions)
+ 2021-05-05 22:36:22,155 : INFO : adding document #0 to Dictionary(160309 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,163 : INFO : built Dictionary(160401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 866 documents (total 8660000 corpus positions)
+ 2021-05-05 22:36:22,206 : INFO : adding document #0 to Dictionary(160401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,214 : INFO : built Dictionary(160548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 867 documents (total 8670000 corpus positions)
+ 2021-05-05 22:36:22,259 : INFO : adding document #0 to Dictionary(160548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,267 : INFO : built Dictionary(160709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 868 documents (total 8680000 corpus positions)
+ 2021-05-05 22:36:22,312 : INFO : adding document #0 to Dictionary(160709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,321 : INFO : built Dictionary(160829 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 869 documents (total 8690000 corpus positions)
+ 2021-05-05 22:36:22,365 : INFO : adding document #0 to Dictionary(160829 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,374 : INFO : built Dictionary(160981 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 870 documents (total 8700000 corpus positions)
+ 2021-05-05 22:36:22,417 : INFO : adding document #0 to Dictionary(160981 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,425 : INFO : built Dictionary(161093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 871 documents (total 8710000 corpus positions)
+ 2021-05-05 22:36:22,469 : INFO : adding document #0 to Dictionary(161093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,479 : INFO : built Dictionary(161182 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 872 documents (total 8720000 corpus positions)
+ 2021-05-05 22:36:22,521 : INFO : adding document #0 to Dictionary(161182 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,528 : INFO : built Dictionary(161252 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 873 documents (total 8730000 corpus positions)
+ 2021-05-05 22:36:22,571 : INFO : adding document #0 to Dictionary(161252 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,579 : INFO : built Dictionary(161363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 874 documents (total 8740000 corpus positions)
+ 2021-05-05 22:36:22,622 : INFO : adding document #0 to Dictionary(161363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,630 : INFO : built Dictionary(161458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 875 documents (total 8750000 corpus positions)
+ 2021-05-05 22:36:22,672 : INFO : adding document #0 to Dictionary(161458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,680 : INFO : built Dictionary(161575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 876 documents (total 8760000 corpus positions)
+ 2021-05-05 22:36:22,722 : INFO : adding document #0 to Dictionary(161575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,731 : INFO : built Dictionary(161691 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 877 documents (total 8770000 corpus positions)
+ 2021-05-05 22:36:22,778 : INFO : adding document #0 to Dictionary(161691 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,786 : INFO : built Dictionary(161832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 878 documents (total 8780000 corpus positions)
+ 2021-05-05 22:36:22,829 : INFO : adding document #0 to Dictionary(161832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,840 : INFO : built Dictionary(161936 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 879 documents (total 8790000 corpus positions)
+ 2021-05-05 22:36:22,891 : INFO : adding document #0 to Dictionary(161936 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,902 : INFO : built Dictionary(162022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 880 documents (total 8800000 corpus positions)
+ 2021-05-05 22:36:22,956 : INFO : adding document #0 to Dictionary(162022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:22,965 : INFO : built Dictionary(162163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 881 documents (total 8810000 corpus positions)
+ 2021-05-05 22:36:23,011 : INFO : adding document #0 to Dictionary(162163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,021 : INFO : built Dictionary(162287 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 882 documents (total 8820000 corpus positions)
+ 2021-05-05 22:36:23,065 : INFO : adding document #0 to Dictionary(162287 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,073 : INFO : built Dictionary(162378 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 883 documents (total 8830000 corpus positions)
+ 2021-05-05 22:36:23,121 : INFO : adding document #0 to Dictionary(162378 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,129 : INFO : built Dictionary(162474 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 884 documents (total 8840000 corpus positions)
+ 2021-05-05 22:36:23,173 : INFO : adding document #0 to Dictionary(162474 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,180 : INFO : built Dictionary(162568 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 885 documents (total 8850000 corpus positions)
+ 2021-05-05 22:36:23,224 : INFO : adding document #0 to Dictionary(162568 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,231 : INFO : built Dictionary(162645 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 886 documents (total 8860000 corpus positions)
+ 2021-05-05 22:36:23,278 : INFO : adding document #0 to Dictionary(162645 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,285 : INFO : built Dictionary(162755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 887 documents (total 8870000 corpus positions)
+ 2021-05-05 22:36:23,333 : INFO : adding document #0 to Dictionary(162755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,341 : INFO : built Dictionary(162815 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 888 documents (total 8880000 corpus positions)
+ 2021-05-05 22:36:23,384 : INFO : adding document #0 to Dictionary(162815 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,391 : INFO : built Dictionary(162882 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 889 documents (total 8890000 corpus positions)
+ 2021-05-05 22:36:23,435 : INFO : adding document #0 to Dictionary(162882 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,443 : INFO : built Dictionary(162984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 890 documents (total 8900000 corpus positions)
+ 2021-05-05 22:36:23,488 : INFO : adding document #0 to Dictionary(162984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,496 : INFO : built Dictionary(163136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 891 documents (total 8910000 corpus positions)
+ 2021-05-05 22:36:23,540 : INFO : adding document #0 to Dictionary(163136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,548 : INFO : built Dictionary(163311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 892 documents (total 8920000 corpus positions)
+ 2021-05-05 22:36:23,591 : INFO : adding document #0 to Dictionary(163311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,598 : INFO : built Dictionary(163421 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 893 documents (total 8930000 corpus positions)
+ 2021-05-05 22:36:23,642 : INFO : adding document #0 to Dictionary(163421 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,650 : INFO : built Dictionary(163481 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 894 documents (total 8940000 corpus positions)
+ 2021-05-05 22:36:23,693 : INFO : adding document #0 to Dictionary(163481 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,701 : INFO : built Dictionary(163537 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 895 documents (total 8950000 corpus positions)
+ 2021-05-05 22:36:23,751 : INFO : adding document #0 to Dictionary(163537 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,759 : INFO : built Dictionary(163697 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 896 documents (total 8960000 corpus positions)
+ 2021-05-05 22:36:23,810 : INFO : adding document #0 to Dictionary(163697 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,818 : INFO : built Dictionary(163771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 897 documents (total 8970000 corpus positions)
+ 2021-05-05 22:36:23,861 : INFO : adding document #0 to Dictionary(163771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,870 : INFO : built Dictionary(163950 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 898 documents (total 8980000 corpus positions)
+ 2021-05-05 22:36:23,915 : INFO : adding document #0 to Dictionary(163950 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,927 : INFO : built Dictionary(164922 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 899 documents (total 8990000 corpus positions)
+ 2021-05-05 22:36:23,972 : INFO : adding document #0 to Dictionary(164922 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:23,983 : INFO : built Dictionary(165570 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 900 documents (total 9000000 corpus positions)
+ 2021-05-05 22:36:24,027 : INFO : adding document #0 to Dictionary(165570 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,039 : INFO : built Dictionary(166285 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 901 documents (total 9010000 corpus positions)
+ 2021-05-05 22:36:24,084 : INFO : adding document #0 to Dictionary(166285 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,091 : INFO : built Dictionary(166408 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 902 documents (total 9020000 corpus positions)
+ 2021-05-05 22:36:24,136 : INFO : adding document #0 to Dictionary(166408 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,143 : INFO : built Dictionary(166513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 903 documents (total 9030000 corpus positions)
+ 2021-05-05 22:36:24,191 : INFO : adding document #0 to Dictionary(166513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,199 : INFO : built Dictionary(166717 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 904 documents (total 9040000 corpus positions)
+ 2021-05-05 22:36:24,255 : INFO : adding document #0 to Dictionary(166717 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,264 : INFO : built Dictionary(166828 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 905 documents (total 9050000 corpus positions)
+ 2021-05-05 22:36:24,320 : INFO : adding document #0 to Dictionary(166828 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,328 : INFO : built Dictionary(166933 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 906 documents (total 9060000 corpus positions)
+ 2021-05-05 22:36:24,382 : INFO : adding document #0 to Dictionary(166933 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,390 : INFO : built Dictionary(167004 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 907 documents (total 9070000 corpus positions)
+ 2021-05-05 22:36:24,433 : INFO : adding document #0 to Dictionary(167004 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,441 : INFO : built Dictionary(167204 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 908 documents (total 9080000 corpus positions)
+ 2021-05-05 22:36:24,485 : INFO : adding document #0 to Dictionary(167204 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,492 : INFO : built Dictionary(167276 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 909 documents (total 9090000 corpus positions)
+ 2021-05-05 22:36:24,537 : INFO : adding document #0 to Dictionary(167276 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,545 : INFO : built Dictionary(167336 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 910 documents (total 9100000 corpus positions)
+ 2021-05-05 22:36:24,591 : INFO : adding document #0 to Dictionary(167336 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,599 : INFO : built Dictionary(167441 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 911 documents (total 9110000 corpus positions)
+ 2021-05-05 22:36:24,646 : INFO : adding document #0 to Dictionary(167441 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,654 : INFO : built Dictionary(167575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 912 documents (total 9120000 corpus positions)
+ 2021-05-05 22:36:24,698 : INFO : adding document #0 to Dictionary(167575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,706 : INFO : built Dictionary(167628 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 913 documents (total 9130000 corpus positions)
+ 2021-05-05 22:36:24,752 : INFO : adding document #0 to Dictionary(167628 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,763 : INFO : built Dictionary(167802 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 914 documents (total 9140000 corpus positions)
+ 2021-05-05 22:36:24,806 : INFO : adding document #0 to Dictionary(167802 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,815 : INFO : built Dictionary(167869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 915 documents (total 9150000 corpus positions)
+ 2021-05-05 22:36:24,858 : INFO : adding document #0 to Dictionary(167869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,865 : INFO : built Dictionary(167935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 916 documents (total 9160000 corpus positions)
+ 2021-05-05 22:36:24,908 : INFO : adding document #0 to Dictionary(167935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,920 : INFO : built Dictionary(168049 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 917 documents (total 9170000 corpus positions)
+ 2021-05-05 22:36:24,967 : INFO : adding document #0 to Dictionary(168049 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:24,975 : INFO : built Dictionary(168189 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 918 documents (total 9180000 corpus positions)
+ 2021-05-05 22:36:25,024 : INFO : adding document #0 to Dictionary(168189 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,032 : INFO : built Dictionary(168214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 919 documents (total 9190000 corpus positions)
+ 2021-05-05 22:36:25,079 : INFO : adding document #0 to Dictionary(168214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,087 : INFO : built Dictionary(168282 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 920 documents (total 9200000 corpus positions)
+ 2021-05-05 22:36:25,135 : INFO : adding document #0 to Dictionary(168282 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,144 : INFO : built Dictionary(168394 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 921 documents (total 9210000 corpus positions)
+ 2021-05-05 22:36:25,189 : INFO : adding document #0 to Dictionary(168394 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,197 : INFO : built Dictionary(168480 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 922 documents (total 9220000 corpus positions)
+ 2021-05-05 22:36:25,241 : INFO : adding document #0 to Dictionary(168480 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,249 : INFO : built Dictionary(168548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 923 documents (total 9230000 corpus positions)
+ 2021-05-05 22:36:25,294 : INFO : adding document #0 to Dictionary(168548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,302 : INFO : built Dictionary(168600 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 924 documents (total 9240000 corpus positions)
+ 2021-05-05 22:36:25,347 : INFO : adding document #0 to Dictionary(168600 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,355 : INFO : built Dictionary(168721 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 925 documents (total 9250000 corpus positions)
+ 2021-05-05 22:36:25,401 : INFO : adding document #0 to Dictionary(168721 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,409 : INFO : built Dictionary(168778 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 926 documents (total 9260000 corpus positions)
+ 2021-05-05 22:36:25,453 : INFO : adding document #0 to Dictionary(168778 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,460 : INFO : built Dictionary(168853 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 927 documents (total 9270000 corpus positions)
+ 2021-05-05 22:36:25,505 : INFO : adding document #0 to Dictionary(168853 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,513 : INFO : built Dictionary(168988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 928 documents (total 9280000 corpus positions)
+ 2021-05-05 22:36:25,557 : INFO : adding document #0 to Dictionary(168988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,565 : INFO : built Dictionary(169048 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 929 documents (total 9290000 corpus positions)
+ 2021-05-05 22:36:25,609 : INFO : adding document #0 to Dictionary(169048 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,618 : INFO : built Dictionary(169212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 930 documents (total 9300000 corpus positions)
+ 2021-05-05 22:36:25,662 : INFO : adding document #0 to Dictionary(169212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,671 : INFO : built Dictionary(169560 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 931 documents (total 9310000 corpus positions)
+ 2021-05-05 22:36:25,716 : INFO : adding document #0 to Dictionary(169560 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,724 : INFO : built Dictionary(169635 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 932 documents (total 9320000 corpus positions)
+ 2021-05-05 22:36:25,771 : INFO : adding document #0 to Dictionary(169635 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,779 : INFO : built Dictionary(169735 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 933 documents (total 9330000 corpus positions)
+ 2021-05-05 22:36:25,826 : INFO : adding document #0 to Dictionary(169735 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,834 : INFO : built Dictionary(169786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 934 documents (total 9340000 corpus positions)
+ 2021-05-05 22:36:25,877 : INFO : adding document #0 to Dictionary(169786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,885 : INFO : built Dictionary(169915 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 935 documents (total 9350000 corpus positions)
+ 2021-05-05 22:36:25,931 : INFO : adding document #0 to Dictionary(169915 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,939 : INFO : built Dictionary(169988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 936 documents (total 9360000 corpus positions)
+ 2021-05-05 22:36:25,981 : INFO : adding document #0 to Dictionary(169988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:25,989 : INFO : built Dictionary(170052 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 937 documents (total 9370000 corpus positions)
+ 2021-05-05 22:36:26,036 : INFO : adding document #0 to Dictionary(170052 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,043 : INFO : built Dictionary(170183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 938 documents (total 9380000 corpus positions)
+ 2021-05-05 22:36:26,088 : INFO : adding document #0 to Dictionary(170183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,096 : INFO : built Dictionary(170302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 939 documents (total 9390000 corpus positions)
+ 2021-05-05 22:36:26,142 : INFO : adding document #0 to Dictionary(170302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,149 : INFO : built Dictionary(170383 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 940 documents (total 9400000 corpus positions)
+ 2021-05-05 22:36:26,194 : INFO : adding document #0 to Dictionary(170383 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,202 : INFO : built Dictionary(170473 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 941 documents (total 9410000 corpus positions)
+ 2021-05-05 22:36:26,245 : INFO : adding document #0 to Dictionary(170473 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,253 : INFO : built Dictionary(170632 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 942 documents (total 9420000 corpus positions)
+ 2021-05-05 22:36:26,296 : INFO : adding document #0 to Dictionary(170632 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,306 : INFO : built Dictionary(170831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 943 documents (total 9430000 corpus positions)
+ 2021-05-05 22:36:26,353 : INFO : adding document #0 to Dictionary(170831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,361 : INFO : built Dictionary(170883 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 944 documents (total 9440000 corpus positions)
+ 2021-05-05 22:36:26,406 : INFO : adding document #0 to Dictionary(170883 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,414 : INFO : built Dictionary(170973 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 945 documents (total 9450000 corpus positions)
+ 2021-05-05 22:36:26,457 : INFO : adding document #0 to Dictionary(170973 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,464 : INFO : built Dictionary(171010 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 946 documents (total 9460000 corpus positions)
+ 2021-05-05 22:36:26,507 : INFO : adding document #0 to Dictionary(171010 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,515 : INFO : built Dictionary(171090 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 947 documents (total 9470000 corpus positions)
+ 2021-05-05 22:36:26,558 : INFO : adding document #0 to Dictionary(171090 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,567 : INFO : built Dictionary(171152 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 948 documents (total 9480000 corpus positions)
+ 2021-05-05 22:36:26,614 : INFO : adding document #0 to Dictionary(171152 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,623 : INFO : built Dictionary(171217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 949 documents (total 9490000 corpus positions)
+ 2021-05-05 22:36:26,666 : INFO : adding document #0 to Dictionary(171217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,673 : INFO : built Dictionary(171282 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 950 documents (total 9500000 corpus positions)
+ 2021-05-05 22:36:26,717 : INFO : adding document #0 to Dictionary(171282 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,725 : INFO : built Dictionary(171327 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 951 documents (total 9510000 corpus positions)
+ 2021-05-05 22:36:26,770 : INFO : adding document #0 to Dictionary(171327 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,778 : INFO : built Dictionary(171401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 952 documents (total 9520000 corpus positions)
+ 2021-05-05 22:36:26,821 : INFO : adding document #0 to Dictionary(171401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,829 : INFO : built Dictionary(171521 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 953 documents (total 9530000 corpus positions)
+ 2021-05-05 22:36:26,871 : INFO : adding document #0 to Dictionary(171521 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,879 : INFO : built Dictionary(171655 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 954 documents (total 9540000 corpus positions)
+ 2021-05-05 22:36:26,926 : INFO : adding document #0 to Dictionary(171655 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,933 : INFO : built Dictionary(171709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 955 documents (total 9550000 corpus positions)
+ 2021-05-05 22:36:26,976 : INFO : adding document #0 to Dictionary(171709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:26,983 : INFO : built Dictionary(171774 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 956 documents (total 9560000 corpus positions)
+ 2021-05-05 22:36:27,028 : INFO : adding document #0 to Dictionary(171774 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,035 : INFO : built Dictionary(171840 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 957 documents (total 9570000 corpus positions)
+ 2021-05-05 22:36:27,078 : INFO : adding document #0 to Dictionary(171840 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,086 : INFO : built Dictionary(171897 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 958 documents (total 9580000 corpus positions)
+ 2021-05-05 22:36:27,130 : INFO : adding document #0 to Dictionary(171897 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,137 : INFO : built Dictionary(171960 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 959 documents (total 9590000 corpus positions)
+ 2021-05-05 22:36:27,180 : INFO : adding document #0 to Dictionary(171960 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,187 : INFO : built Dictionary(172106 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 960 documents (total 9600000 corpus positions)
+ 2021-05-05 22:36:27,231 : INFO : adding document #0 to Dictionary(172106 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,239 : INFO : built Dictionary(172209 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 961 documents (total 9610000 corpus positions)
+ 2021-05-05 22:36:27,283 : INFO : adding document #0 to Dictionary(172209 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,290 : INFO : built Dictionary(172302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 962 documents (total 9620000 corpus positions)
+ 2021-05-05 22:36:27,335 : INFO : adding document #0 to Dictionary(172302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,345 : INFO : built Dictionary(172399 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 963 documents (total 9630000 corpus positions)
+ 2021-05-05 22:36:27,390 : INFO : adding document #0 to Dictionary(172399 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,397 : INFO : built Dictionary(172454 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 964 documents (total 9640000 corpus positions)
+ 2021-05-05 22:36:27,445 : INFO : adding document #0 to Dictionary(172454 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,452 : INFO : built Dictionary(172553 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 965 documents (total 9650000 corpus positions)
+ 2021-05-05 22:36:27,496 : INFO : adding document #0 to Dictionary(172553 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,503 : INFO : built Dictionary(172662 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 966 documents (total 9660000 corpus positions)
+ 2021-05-05 22:36:27,546 : INFO : adding document #0 to Dictionary(172662 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,555 : INFO : built Dictionary(172748 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 967 documents (total 9670000 corpus positions)
+ 2021-05-05 22:36:27,603 : INFO : adding document #0 to Dictionary(172748 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,610 : INFO : built Dictionary(172833 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 968 documents (total 9680000 corpus positions)
+ 2021-05-05 22:36:27,655 : INFO : adding document #0 to Dictionary(172833 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,662 : INFO : built Dictionary(172865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 969 documents (total 9690000 corpus positions)
+ 2021-05-05 22:36:27,707 : INFO : adding document #0 to Dictionary(172865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,715 : INFO : built Dictionary(172960 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 970 documents (total 9700000 corpus positions)
+ 2021-05-05 22:36:27,760 : INFO : adding document #0 to Dictionary(172960 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,768 : INFO : built Dictionary(173046 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 971 documents (total 9710000 corpus positions)
+ 2021-05-05 22:36:27,811 : INFO : adding document #0 to Dictionary(173046 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,820 : INFO : built Dictionary(173146 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 972 documents (total 9720000 corpus positions)
+ 2021-05-05 22:36:27,867 : INFO : adding document #0 to Dictionary(173146 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,875 : INFO : built Dictionary(173215 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 973 documents (total 9730000 corpus positions)
+ 2021-05-05 22:36:27,918 : INFO : adding document #0 to Dictionary(173215 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,926 : INFO : built Dictionary(173290 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 974 documents (total 9740000 corpus positions)
+ 2021-05-05 22:36:27,969 : INFO : adding document #0 to Dictionary(173290 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:27,977 : INFO : built Dictionary(173468 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 975 documents (total 9750000 corpus positions)
+ 2021-05-05 22:36:28,023 : INFO : adding document #0 to Dictionary(173468 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,031 : INFO : built Dictionary(173609 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 976 documents (total 9760000 corpus positions)
+ 2021-05-05 22:36:28,074 : INFO : adding document #0 to Dictionary(173609 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,081 : INFO : built Dictionary(173682 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 977 documents (total 9770000 corpus positions)
+ 2021-05-05 22:36:28,125 : INFO : adding document #0 to Dictionary(173682 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,132 : INFO : built Dictionary(173723 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 978 documents (total 9780000 corpus positions)
+ 2021-05-05 22:36:28,175 : INFO : adding document #0 to Dictionary(173723 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,183 : INFO : built Dictionary(173808 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 979 documents (total 9790000 corpus positions)
+ 2021-05-05 22:36:28,232 : INFO : adding document #0 to Dictionary(173808 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,240 : INFO : built Dictionary(173997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 980 documents (total 9800000 corpus positions)
+ 2021-05-05 22:36:28,288 : INFO : adding document #0 to Dictionary(173997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,296 : INFO : built Dictionary(174089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 981 documents (total 9810000 corpus positions)
+ 2021-05-05 22:36:28,339 : INFO : adding document #0 to Dictionary(174089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,346 : INFO : built Dictionary(174151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 982 documents (total 9820000 corpus positions)
+ 2021-05-05 22:36:28,390 : INFO : adding document #0 to Dictionary(174151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,397 : INFO : built Dictionary(174227 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 983 documents (total 9830000 corpus positions)
+ 2021-05-05 22:36:28,441 : INFO : adding document #0 to Dictionary(174227 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,447 : INFO : built Dictionary(174288 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 984 documents (total 9840000 corpus positions)
+ 2021-05-05 22:36:28,494 : INFO : adding document #0 to Dictionary(174288 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,504 : INFO : built Dictionary(174431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 985 documents (total 9850000 corpus positions)
+ 2021-05-05 22:36:28,549 : INFO : adding document #0 to Dictionary(174431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,557 : INFO : built Dictionary(174652 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 986 documents (total 9860000 corpus positions)
+ 2021-05-05 22:36:28,607 : INFO : adding document #0 to Dictionary(174652 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,642 : INFO : built Dictionary(174763 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 987 documents (total 9870000 corpus positions)
+ 2021-05-05 22:36:28,690 : INFO : adding document #0 to Dictionary(174763 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,699 : INFO : built Dictionary(174956 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 988 documents (total 9880000 corpus positions)
+ 2021-05-05 22:36:28,742 : INFO : adding document #0 to Dictionary(174956 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,751 : INFO : built Dictionary(175029 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 989 documents (total 9890000 corpus positions)
+ 2021-05-05 22:36:28,798 : INFO : adding document #0 to Dictionary(175029 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,807 : INFO : built Dictionary(175097 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 990 documents (total 9900000 corpus positions)
+ 2021-05-05 22:36:28,856 : INFO : adding document #0 to Dictionary(175097 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,864 : INFO : built Dictionary(175193 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 991 documents (total 9910000 corpus positions)
+ 2021-05-05 22:36:28,911 : INFO : adding document #0 to Dictionary(175193 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,919 : INFO : built Dictionary(175351 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 992 documents (total 9920000 corpus positions)
+ 2021-05-05 22:36:28,968 : INFO : adding document #0 to Dictionary(175351 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:28,975 : INFO : built Dictionary(175452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 993 documents (total 9930000 corpus positions)
+ 2021-05-05 22:36:29,025 : INFO : adding document #0 to Dictionary(175452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,034 : INFO : built Dictionary(175521 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 994 documents (total 9940000 corpus positions)
+ 2021-05-05 22:36:29,081 : INFO : adding document #0 to Dictionary(175521 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,089 : INFO : built Dictionary(175581 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 995 documents (total 9950000 corpus positions)
+ 2021-05-05 22:36:29,135 : INFO : adding document #0 to Dictionary(175581 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,143 : INFO : built Dictionary(175716 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 996 documents (total 9960000 corpus positions)
+ 2021-05-05 22:36:29,188 : INFO : adding document #0 to Dictionary(175716 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,196 : INFO : built Dictionary(175801 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 997 documents (total 9970000 corpus positions)
+ 2021-05-05 22:36:29,244 : INFO : adding document #0 to Dictionary(175801 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,253 : INFO : built Dictionary(175904 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 998 documents (total 9980000 corpus positions)
+ 2021-05-05 22:36:29,298 : INFO : adding document #0 to Dictionary(175904 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,306 : INFO : built Dictionary(175970 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 999 documents (total 9990000 corpus positions)
+ 2021-05-05 22:36:29,351 : INFO : adding document #0 to Dictionary(175970 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,359 : INFO : built Dictionary(176030 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1000 documents (total 10000000 corpus positions)
+ 2021-05-05 22:36:29,406 : INFO : adding document #0 to Dictionary(176030 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,413 : INFO : built Dictionary(176089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1001 documents (total 10010000 corpus positions)
+ 2021-05-05 22:36:29,461 : INFO : adding document #0 to Dictionary(176089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,470 : INFO : built Dictionary(176186 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1002 documents (total 10020000 corpus positions)
+ 2021-05-05 22:36:29,518 : INFO : adding document #0 to Dictionary(176186 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,526 : INFO : built Dictionary(176324 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1003 documents (total 10030000 corpus positions)
+ 2021-05-05 22:36:29,570 : INFO : adding document #0 to Dictionary(176324 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,579 : INFO : built Dictionary(176552 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1004 documents (total 10040000 corpus positions)
+ 2021-05-05 22:36:29,622 : INFO : adding document #0 to Dictionary(176552 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,630 : INFO : built Dictionary(176635 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1005 documents (total 10050000 corpus positions)
+ 2021-05-05 22:36:29,675 : INFO : adding document #0 to Dictionary(176635 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,684 : INFO : built Dictionary(176865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1006 documents (total 10060000 corpus positions)
+ 2021-05-05 22:36:29,731 : INFO : adding document #0 to Dictionary(176865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,740 : INFO : built Dictionary(176990 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1007 documents (total 10070000 corpus positions)
+ 2021-05-05 22:36:29,785 : INFO : adding document #0 to Dictionary(176990 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,792 : INFO : built Dictionary(177047 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1008 documents (total 10080000 corpus positions)
+ 2021-05-05 22:36:29,842 : INFO : adding document #0 to Dictionary(177047 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,850 : INFO : built Dictionary(177105 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1009 documents (total 10090000 corpus positions)
+ 2021-05-05 22:36:29,895 : INFO : adding document #0 to Dictionary(177105 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,904 : INFO : built Dictionary(177166 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1010 documents (total 10100000 corpus positions)
+ 2021-05-05 22:36:29,952 : INFO : adding document #0 to Dictionary(177166 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:29,960 : INFO : built Dictionary(177229 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1011 documents (total 10110000 corpus positions)
+ 2021-05-05 22:36:30,011 : INFO : adding document #0 to Dictionary(177229 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,021 : INFO : built Dictionary(177361 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1012 documents (total 10120000 corpus positions)
+ 2021-05-05 22:36:30,065 : INFO : adding document #0 to Dictionary(177361 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,073 : INFO : built Dictionary(177464 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1013 documents (total 10130000 corpus positions)
+ 2021-05-05 22:36:30,117 : INFO : adding document #0 to Dictionary(177464 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,124 : INFO : built Dictionary(177539 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1014 documents (total 10140000 corpus positions)
+ 2021-05-05 22:36:30,168 : INFO : adding document #0 to Dictionary(177539 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,177 : INFO : built Dictionary(177679 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1015 documents (total 10150000 corpus positions)
+ 2021-05-05 22:36:30,221 : INFO : adding document #0 to Dictionary(177679 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,229 : INFO : built Dictionary(177779 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1016 documents (total 10160000 corpus positions)
+ 2021-05-05 22:36:30,278 : INFO : adding document #0 to Dictionary(177779 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,287 : INFO : built Dictionary(177926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1017 documents (total 10170000 corpus positions)
+ 2021-05-05 22:36:30,332 : INFO : adding document #0 to Dictionary(177926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,340 : INFO : built Dictionary(178018 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1018 documents (total 10180000 corpus positions)
+ 2021-05-05 22:36:30,384 : INFO : adding document #0 to Dictionary(178018 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,392 : INFO : built Dictionary(178104 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1019 documents (total 10190000 corpus positions)
+ 2021-05-05 22:36:30,440 : INFO : adding document #0 to Dictionary(178104 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,448 : INFO : built Dictionary(178160 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1020 documents (total 10200000 corpus positions)
+ 2021-05-05 22:36:30,492 : INFO : adding document #0 to Dictionary(178160 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,500 : INFO : built Dictionary(178258 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1021 documents (total 10210000 corpus positions)
+ 2021-05-05 22:36:30,543 : INFO : adding document #0 to Dictionary(178258 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,551 : INFO : built Dictionary(178324 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1022 documents (total 10220000 corpus positions)
+ 2021-05-05 22:36:30,594 : INFO : adding document #0 to Dictionary(178324 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,601 : INFO : built Dictionary(178437 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1023 documents (total 10230000 corpus positions)
+ 2021-05-05 22:36:30,645 : INFO : adding document #0 to Dictionary(178437 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,652 : INFO : built Dictionary(178621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1024 documents (total 10240000 corpus positions)
+ 2021-05-05 22:36:30,695 : INFO : adding document #0 to Dictionary(178621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,703 : INFO : built Dictionary(178799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1025 documents (total 10250000 corpus positions)
+ 2021-05-05 22:36:30,749 : INFO : adding document #0 to Dictionary(178799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,758 : INFO : built Dictionary(178921 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1026 documents (total 10260000 corpus positions)
+ 2021-05-05 22:36:30,803 : INFO : adding document #0 to Dictionary(178921 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,812 : INFO : built Dictionary(178987 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1027 documents (total 10270000 corpus positions)
+ 2021-05-05 22:36:30,856 : INFO : adding document #0 to Dictionary(178987 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,864 : INFO : built Dictionary(179136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1028 documents (total 10280000 corpus positions)
+ 2021-05-05 22:36:30,910 : INFO : adding document #0 to Dictionary(179136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,919 : INFO : built Dictionary(179194 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1029 documents (total 10290000 corpus positions)
+ 2021-05-05 22:36:30,961 : INFO : adding document #0 to Dictionary(179194 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:30,970 : INFO : built Dictionary(179426 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1030 documents (total 10300000 corpus positions)
+ 2021-05-05 22:36:31,017 : INFO : adding document #0 to Dictionary(179426 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,025 : INFO : built Dictionary(179492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1031 documents (total 10310000 corpus positions)
+ 2021-05-05 22:36:31,072 : INFO : adding document #0 to Dictionary(179492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,080 : INFO : built Dictionary(179582 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1032 documents (total 10320000 corpus positions)
+ 2021-05-05 22:36:31,124 : INFO : adding document #0 to Dictionary(179582 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,132 : INFO : built Dictionary(179647 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1033 documents (total 10330000 corpus positions)
+ 2021-05-05 22:36:31,175 : INFO : adding document #0 to Dictionary(179647 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,183 : INFO : built Dictionary(179754 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1034 documents (total 10340000 corpus positions)
+ 2021-05-05 22:36:31,226 : INFO : adding document #0 to Dictionary(179754 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,234 : INFO : built Dictionary(179865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1035 documents (total 10350000 corpus positions)
+ 2021-05-05 22:36:31,277 : INFO : adding document #0 to Dictionary(179865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,285 : INFO : built Dictionary(179969 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1036 documents (total 10360000 corpus positions)
+ 2021-05-05 22:36:31,329 : INFO : adding document #0 to Dictionary(179969 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,337 : INFO : built Dictionary(180029 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1037 documents (total 10370000 corpus positions)
+ 2021-05-05 22:36:31,380 : INFO : adding document #0 to Dictionary(180029 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,389 : INFO : built Dictionary(180177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1038 documents (total 10380000 corpus positions)
+ 2021-05-05 22:36:31,437 : INFO : adding document #0 to Dictionary(180177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,445 : INFO : built Dictionary(180300 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1039 documents (total 10390000 corpus positions)
+ 2021-05-05 22:36:31,488 : INFO : adding document #0 to Dictionary(180300 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,496 : INFO : built Dictionary(180374 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1040 documents (total 10400000 corpus positions)
+ 2021-05-05 22:36:31,539 : INFO : adding document #0 to Dictionary(180374 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,547 : INFO : built Dictionary(180448 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1041 documents (total 10410000 corpus positions)
+ 2021-05-05 22:36:31,590 : INFO : adding document #0 to Dictionary(180448 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,597 : INFO : built Dictionary(180502 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1042 documents (total 10420000 corpus positions)
+ 2021-05-05 22:36:31,641 : INFO : adding document #0 to Dictionary(180502 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,648 : INFO : built Dictionary(180604 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1043 documents (total 10430000 corpus positions)
+ 2021-05-05 22:36:31,690 : INFO : adding document #0 to Dictionary(180604 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,697 : INFO : built Dictionary(180663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1044 documents (total 10440000 corpus positions)
+ 2021-05-05 22:36:31,741 : INFO : adding document #0 to Dictionary(180663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,749 : INFO : built Dictionary(180710 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1045 documents (total 10450000 corpus positions)
+ 2021-05-05 22:36:31,793 : INFO : adding document #0 to Dictionary(180710 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,800 : INFO : built Dictionary(180784 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1046 documents (total 10460000 corpus positions)
+ 2021-05-05 22:36:31,844 : INFO : adding document #0 to Dictionary(180784 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,853 : INFO : built Dictionary(180913 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1047 documents (total 10470000 corpus positions)
+ 2021-05-05 22:36:31,896 : INFO : adding document #0 to Dictionary(180913 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,904 : INFO : built Dictionary(181036 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1048 documents (total 10480000 corpus positions)
+ 2021-05-05 22:36:31,948 : INFO : adding document #0 to Dictionary(181036 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:31,955 : INFO : built Dictionary(181158 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1049 documents (total 10490000 corpus positions)
+ 2021-05-05 22:36:31,999 : INFO : adding document #0 to Dictionary(181158 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,007 : INFO : built Dictionary(181243 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1050 documents (total 10500000 corpus positions)
+ 2021-05-05 22:36:32,052 : INFO : adding document #0 to Dictionary(181243 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,060 : INFO : built Dictionary(181319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1051 documents (total 10510000 corpus positions)
+ 2021-05-05 22:36:32,104 : INFO : adding document #0 to Dictionary(181319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,111 : INFO : built Dictionary(181418 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1052 documents (total 10520000 corpus positions)
+ 2021-05-05 22:36:32,159 : INFO : adding document #0 to Dictionary(181418 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,167 : INFO : built Dictionary(181515 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1053 documents (total 10530000 corpus positions)
+ 2021-05-05 22:36:32,210 : INFO : adding document #0 to Dictionary(181515 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,219 : INFO : built Dictionary(181595 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1054 documents (total 10540000 corpus positions)
+ 2021-05-05 22:36:32,264 : INFO : adding document #0 to Dictionary(181595 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,272 : INFO : built Dictionary(181669 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1055 documents (total 10550000 corpus positions)
+ 2021-05-05 22:36:32,318 : INFO : adding document #0 to Dictionary(181669 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,326 : INFO : built Dictionary(181767 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1056 documents (total 10560000 corpus positions)
+ 2021-05-05 22:36:32,369 : INFO : adding document #0 to Dictionary(181767 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,377 : INFO : built Dictionary(181806 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1057 documents (total 10570000 corpus positions)
+ 2021-05-05 22:36:32,421 : INFO : adding document #0 to Dictionary(181806 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,428 : INFO : built Dictionary(181838 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1058 documents (total 10580000 corpus positions)
+ 2021-05-05 22:36:32,471 : INFO : adding document #0 to Dictionary(181838 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,481 : INFO : built Dictionary(181946 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1059 documents (total 10590000 corpus positions)
+ 2021-05-05 22:36:32,527 : INFO : adding document #0 to Dictionary(181946 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,535 : INFO : built Dictionary(182005 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1060 documents (total 10600000 corpus positions)
+ 2021-05-05 22:36:32,578 : INFO : adding document #0 to Dictionary(182005 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,586 : INFO : built Dictionary(182107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1061 documents (total 10610000 corpus positions)
+ 2021-05-05 22:36:32,635 : INFO : adding document #0 to Dictionary(182107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,645 : INFO : built Dictionary(182170 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1062 documents (total 10620000 corpus positions)
+ 2021-05-05 22:36:32,691 : INFO : adding document #0 to Dictionary(182170 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,699 : INFO : built Dictionary(182245 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1063 documents (total 10630000 corpus positions)
+ 2021-05-05 22:36:32,743 : INFO : adding document #0 to Dictionary(182245 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,753 : INFO : built Dictionary(182368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1064 documents (total 10640000 corpus positions)
+ 2021-05-05 22:36:32,798 : INFO : adding document #0 to Dictionary(182368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,806 : INFO : built Dictionary(182440 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1065 documents (total 10650000 corpus positions)
+ 2021-05-05 22:36:32,854 : INFO : adding document #0 to Dictionary(182440 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,862 : INFO : built Dictionary(182532 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1066 documents (total 10660000 corpus positions)
+ 2021-05-05 22:36:32,905 : INFO : adding document #0 to Dictionary(182532 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,913 : INFO : built Dictionary(182639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1067 documents (total 10670000 corpus positions)
+ 2021-05-05 22:36:32,956 : INFO : adding document #0 to Dictionary(182639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:32,963 : INFO : built Dictionary(182711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1068 documents (total 10680000 corpus positions)
+ 2021-05-05 22:36:33,007 : INFO : adding document #0 to Dictionary(182711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,016 : INFO : built Dictionary(182821 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1069 documents (total 10690000 corpus positions)
+ 2021-05-05 22:36:33,062 : INFO : adding document #0 to Dictionary(182821 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,070 : INFO : built Dictionary(182922 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1070 documents (total 10700000 corpus positions)
+ 2021-05-05 22:36:33,118 : INFO : adding document #0 to Dictionary(182922 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,126 : INFO : built Dictionary(183001 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1071 documents (total 10710000 corpus positions)
+ 2021-05-05 22:36:33,169 : INFO : adding document #0 to Dictionary(183001 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,177 : INFO : built Dictionary(183050 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1072 documents (total 10720000 corpus positions)
+ 2021-05-05 22:36:33,227 : INFO : adding document #0 to Dictionary(183050 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,240 : INFO : built Dictionary(183151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1073 documents (total 10730000 corpus positions)
+ 2021-05-05 22:36:33,292 : INFO : adding document #0 to Dictionary(183151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,308 : INFO : built Dictionary(183292 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1074 documents (total 10740000 corpus positions)
+ 2021-05-05 22:36:33,359 : INFO : adding document #0 to Dictionary(183292 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,371 : INFO : built Dictionary(183360 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1075 documents (total 10750000 corpus positions)
+ 2021-05-05 22:36:33,427 : INFO : adding document #0 to Dictionary(183360 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,435 : INFO : built Dictionary(183431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1076 documents (total 10760000 corpus positions)
+ 2021-05-05 22:36:33,482 : INFO : adding document #0 to Dictionary(183431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,490 : INFO : built Dictionary(183517 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1077 documents (total 10770000 corpus positions)
+ 2021-05-05 22:36:33,534 : INFO : adding document #0 to Dictionary(183517 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,542 : INFO : built Dictionary(183590 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1078 documents (total 10780000 corpus positions)
+ 2021-05-05 22:36:33,585 : INFO : adding document #0 to Dictionary(183590 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,592 : INFO : built Dictionary(183672 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1079 documents (total 10790000 corpus positions)
+ 2021-05-05 22:36:33,635 : INFO : adding document #0 to Dictionary(183672 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,643 : INFO : built Dictionary(183713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1080 documents (total 10800000 corpus positions)
+ 2021-05-05 22:36:33,686 : INFO : adding document #0 to Dictionary(183713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,693 : INFO : built Dictionary(183790 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1081 documents (total 10810000 corpus positions)
+ 2021-05-05 22:36:33,737 : INFO : adding document #0 to Dictionary(183790 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,745 : INFO : built Dictionary(183877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1082 documents (total 10820000 corpus positions)
+ 2021-05-05 22:36:33,788 : INFO : adding document #0 to Dictionary(183877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,795 : INFO : built Dictionary(183953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1083 documents (total 10830000 corpus positions)
+ 2021-05-05 22:36:33,839 : INFO : adding document #0 to Dictionary(183953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,846 : INFO : built Dictionary(183980 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1084 documents (total 10840000 corpus positions)
+ 2021-05-05 22:36:33,890 : INFO : adding document #0 to Dictionary(183980 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,898 : INFO : built Dictionary(184116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1085 documents (total 10850000 corpus positions)
+ 2021-05-05 22:36:33,941 : INFO : adding document #0 to Dictionary(184116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:33,948 : INFO : built Dictionary(184269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1086 documents (total 10860000 corpus positions)
+ 2021-05-05 22:36:34,000 : INFO : adding document #0 to Dictionary(184269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,007 : INFO : built Dictionary(184310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1087 documents (total 10870000 corpus positions)
+ 2021-05-05 22:36:34,050 : INFO : adding document #0 to Dictionary(184310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,057 : INFO : built Dictionary(184402 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1088 documents (total 10880000 corpus positions)
+ 2021-05-05 22:36:34,101 : INFO : adding document #0 to Dictionary(184402 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,108 : INFO : built Dictionary(184449 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1089 documents (total 10890000 corpus positions)
+ 2021-05-05 22:36:34,152 : INFO : adding document #0 to Dictionary(184449 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,159 : INFO : built Dictionary(184561 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1090 documents (total 10900000 corpus positions)
+ 2021-05-05 22:36:34,203 : INFO : adding document #0 to Dictionary(184561 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,210 : INFO : built Dictionary(184650 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1091 documents (total 10910000 corpus positions)
+ 2021-05-05 22:36:34,254 : INFO : adding document #0 to Dictionary(184650 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,262 : INFO : built Dictionary(184742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1092 documents (total 10920000 corpus positions)
+ 2021-05-05 22:36:34,305 : INFO : adding document #0 to Dictionary(184742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,313 : INFO : built Dictionary(184831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1093 documents (total 10930000 corpus positions)
+ 2021-05-05 22:36:34,356 : INFO : adding document #0 to Dictionary(184831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,363 : INFO : built Dictionary(184940 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1094 documents (total 10940000 corpus positions)
+ 2021-05-05 22:36:34,406 : INFO : adding document #0 to Dictionary(184940 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,414 : INFO : built Dictionary(184987 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1095 documents (total 10950000 corpus positions)
+ 2021-05-05 22:36:34,457 : INFO : adding document #0 to Dictionary(184987 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,465 : INFO : built Dictionary(185060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1096 documents (total 10960000 corpus positions)
+ 2021-05-05 22:36:34,508 : INFO : adding document #0 to Dictionary(185060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,515 : INFO : built Dictionary(185091 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1097 documents (total 10970000 corpus positions)
+ 2021-05-05 22:36:34,558 : INFO : adding document #0 to Dictionary(185091 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,564 : INFO : built Dictionary(185159 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1098 documents (total 10980000 corpus positions)
+ 2021-05-05 22:36:34,607 : INFO : adding document #0 to Dictionary(185159 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,615 : INFO : built Dictionary(185241 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1099 documents (total 10990000 corpus positions)
+ 2021-05-05 22:36:34,658 : INFO : adding document #0 to Dictionary(185241 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,665 : INFO : built Dictionary(185301 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1100 documents (total 11000000 corpus positions)
+ 2021-05-05 22:36:34,708 : INFO : adding document #0 to Dictionary(185301 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,715 : INFO : built Dictionary(185363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1101 documents (total 11010000 corpus positions)
+ 2021-05-05 22:36:34,758 : INFO : adding document #0 to Dictionary(185363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,766 : INFO : built Dictionary(185393 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1102 documents (total 11020000 corpus positions)
+ 2021-05-05 22:36:34,813 : INFO : adding document #0 to Dictionary(185393 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,820 : INFO : built Dictionary(185519 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1103 documents (total 11030000 corpus positions)
+ 2021-05-05 22:36:34,868 : INFO : adding document #0 to Dictionary(185519 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,875 : INFO : built Dictionary(185617 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1104 documents (total 11040000 corpus positions)
+ 2021-05-05 22:36:34,918 : INFO : adding document #0 to Dictionary(185617 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,924 : INFO : built Dictionary(185654 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1105 documents (total 11050000 corpus positions)
+ 2021-05-05 22:36:34,967 : INFO : adding document #0 to Dictionary(185654 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:34,974 : INFO : built Dictionary(185693 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1106 documents (total 11060000 corpus positions)
+ 2021-05-05 22:36:35,031 : INFO : adding document #0 to Dictionary(185693 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,040 : INFO : built Dictionary(185845 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1107 documents (total 11070000 corpus positions)
+ 2021-05-05 22:36:35,090 : INFO : adding document #0 to Dictionary(185845 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,100 : INFO : built Dictionary(185981 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1108 documents (total 11080000 corpus positions)
+ 2021-05-05 22:36:35,148 : INFO : adding document #0 to Dictionary(185981 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,157 : INFO : built Dictionary(186066 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1109 documents (total 11090000 corpus positions)
+ 2021-05-05 22:36:35,206 : INFO : adding document #0 to Dictionary(186066 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,215 : INFO : built Dictionary(186111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1110 documents (total 11100000 corpus positions)
+ 2021-05-05 22:36:35,260 : INFO : adding document #0 to Dictionary(186111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,267 : INFO : built Dictionary(186195 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1111 documents (total 11110000 corpus positions)
+ 2021-05-05 22:36:35,311 : INFO : adding document #0 to Dictionary(186195 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,318 : INFO : built Dictionary(186415 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1112 documents (total 11120000 corpus positions)
+ 2021-05-05 22:36:35,366 : INFO : adding document #0 to Dictionary(186415 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,374 : INFO : built Dictionary(186557 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1113 documents (total 11130000 corpus positions)
+ 2021-05-05 22:36:35,423 : INFO : adding document #0 to Dictionary(186557 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,430 : INFO : built Dictionary(186602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1114 documents (total 11140000 corpus positions)
+ 2021-05-05 22:36:35,473 : INFO : adding document #0 to Dictionary(186602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,481 : INFO : built Dictionary(186785 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1115 documents (total 11150000 corpus positions)
+ 2021-05-05 22:36:35,525 : INFO : adding document #0 to Dictionary(186785 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,533 : INFO : built Dictionary(186905 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1116 documents (total 11160000 corpus positions)
+ 2021-05-05 22:36:35,577 : INFO : adding document #0 to Dictionary(186905 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,585 : INFO : built Dictionary(186993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1117 documents (total 11170000 corpus positions)
+ 2021-05-05 22:36:35,629 : INFO : adding document #0 to Dictionary(186993 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,637 : INFO : built Dictionary(187070 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1118 documents (total 11180000 corpus positions)
+ 2021-05-05 22:36:35,679 : INFO : adding document #0 to Dictionary(187070 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,687 : INFO : built Dictionary(187175 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1119 documents (total 11190000 corpus positions)
+ 2021-05-05 22:36:35,731 : INFO : adding document #0 to Dictionary(187175 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,738 : INFO : built Dictionary(187230 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1120 documents (total 11200000 corpus positions)
+ 2021-05-05 22:36:35,781 : INFO : adding document #0 to Dictionary(187230 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,789 : INFO : built Dictionary(187314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1121 documents (total 11210000 corpus positions)
+ 2021-05-05 22:36:35,837 : INFO : adding document #0 to Dictionary(187314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,844 : INFO : built Dictionary(187354 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1122 documents (total 11220000 corpus positions)
+ 2021-05-05 22:36:35,891 : INFO : adding document #0 to Dictionary(187354 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,899 : INFO : built Dictionary(187475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1123 documents (total 11230000 corpus positions)
+ 2021-05-05 22:36:35,947 : INFO : adding document #0 to Dictionary(187475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:35,956 : INFO : built Dictionary(187596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1124 documents (total 11240000 corpus positions)
+ 2021-05-05 22:36:35,999 : INFO : adding document #0 to Dictionary(187596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,007 : INFO : built Dictionary(187654 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1125 documents (total 11250000 corpus positions)
+ 2021-05-05 22:36:36,052 : INFO : adding document #0 to Dictionary(187654 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,062 : INFO : built Dictionary(187746 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1126 documents (total 11260000 corpus positions)
+ 2021-05-05 22:36:36,106 : INFO : adding document #0 to Dictionary(187746 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,114 : INFO : built Dictionary(187810 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1127 documents (total 11270000 corpus positions)
+ 2021-05-05 22:36:36,157 : INFO : adding document #0 to Dictionary(187810 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,164 : INFO : built Dictionary(187869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1128 documents (total 11280000 corpus positions)
+ 2021-05-05 22:36:36,207 : INFO : adding document #0 to Dictionary(187869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,216 : INFO : built Dictionary(187984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1129 documents (total 11290000 corpus positions)
+ 2021-05-05 22:36:36,258 : INFO : adding document #0 to Dictionary(187984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,266 : INFO : built Dictionary(188030 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1130 documents (total 11300000 corpus positions)
+ 2021-05-05 22:36:36,309 : INFO : adding document #0 to Dictionary(188030 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,317 : INFO : built Dictionary(188116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1131 documents (total 11310000 corpus positions)
+ 2021-05-05 22:36:36,360 : INFO : adding document #0 to Dictionary(188116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,367 : INFO : built Dictionary(188203 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1132 documents (total 11320000 corpus positions)
+ 2021-05-05 22:36:36,410 : INFO : adding document #0 to Dictionary(188203 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,418 : INFO : built Dictionary(188330 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1133 documents (total 11330000 corpus positions)
+ 2021-05-05 22:36:36,461 : INFO : adding document #0 to Dictionary(188330 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,468 : INFO : built Dictionary(188377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1134 documents (total 11340000 corpus positions)
+ 2021-05-05 22:36:36,511 : INFO : adding document #0 to Dictionary(188377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,518 : INFO : built Dictionary(188429 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1135 documents (total 11350000 corpus positions)
+ 2021-05-05 22:36:36,561 : INFO : adding document #0 to Dictionary(188429 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,569 : INFO : built Dictionary(188516 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1136 documents (total 11360000 corpus positions)
+ 2021-05-05 22:36:36,612 : INFO : adding document #0 to Dictionary(188516 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,620 : INFO : built Dictionary(188549 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1137 documents (total 11370000 corpus positions)
+ 2021-05-05 22:36:36,663 : INFO : adding document #0 to Dictionary(188549 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,670 : INFO : built Dictionary(188580 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1138 documents (total 11380000 corpus positions)
+ 2021-05-05 22:36:36,714 : INFO : adding document #0 to Dictionary(188580 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,722 : INFO : built Dictionary(188680 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1139 documents (total 11390000 corpus positions)
+ 2021-05-05 22:36:36,767 : INFO : adding document #0 to Dictionary(188680 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,775 : INFO : built Dictionary(188798 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1140 documents (total 11400000 corpus positions)
+ 2021-05-05 22:36:36,823 : INFO : adding document #0 to Dictionary(188798 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,831 : INFO : built Dictionary(188916 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1141 documents (total 11410000 corpus positions)
+ 2021-05-05 22:36:36,873 : INFO : adding document #0 to Dictionary(188916 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,881 : INFO : built Dictionary(189008 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1142 documents (total 11420000 corpus positions)
+ 2021-05-05 22:36:36,923 : INFO : adding document #0 to Dictionary(189008 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,930 : INFO : built Dictionary(189092 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1143 documents (total 11430000 corpus positions)
+ 2021-05-05 22:36:36,972 : INFO : adding document #0 to Dictionary(189092 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:36,980 : INFO : built Dictionary(189217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1144 documents (total 11440000 corpus positions)
+ 2021-05-05 22:36:37,025 : INFO : adding document #0 to Dictionary(189217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,033 : INFO : built Dictionary(189321 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1145 documents (total 11450000 corpus positions)
+ 2021-05-05 22:36:37,076 : INFO : adding document #0 to Dictionary(189321 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,085 : INFO : built Dictionary(189381 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1146 documents (total 11460000 corpus positions)
+ 2021-05-05 22:36:37,128 : INFO : adding document #0 to Dictionary(189381 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,136 : INFO : built Dictionary(189503 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1147 documents (total 11470000 corpus positions)
+ 2021-05-05 22:36:37,181 : INFO : adding document #0 to Dictionary(189503 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,189 : INFO : built Dictionary(189617 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1148 documents (total 11480000 corpus positions)
+ 2021-05-05 22:36:37,232 : INFO : adding document #0 to Dictionary(189617 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,240 : INFO : built Dictionary(189710 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1149 documents (total 11490000 corpus positions)
+ 2021-05-05 22:36:37,283 : INFO : adding document #0 to Dictionary(189710 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,291 : INFO : built Dictionary(189799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1150 documents (total 11500000 corpus positions)
+ 2021-05-05 22:36:37,333 : INFO : adding document #0 to Dictionary(189799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,341 : INFO : built Dictionary(189887 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1151 documents (total 11510000 corpus positions)
+ 2021-05-05 22:36:37,384 : INFO : adding document #0 to Dictionary(189887 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,392 : INFO : built Dictionary(189961 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1152 documents (total 11520000 corpus positions)
+ 2021-05-05 22:36:37,435 : INFO : adding document #0 to Dictionary(189961 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,442 : INFO : built Dictionary(190016 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1153 documents (total 11530000 corpus positions)
+ 2021-05-05 22:36:37,485 : INFO : adding document #0 to Dictionary(190016 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,493 : INFO : built Dictionary(190079 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1154 documents (total 11540000 corpus positions)
+ 2021-05-05 22:36:37,536 : INFO : adding document #0 to Dictionary(190079 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,543 : INFO : built Dictionary(190131 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1155 documents (total 11550000 corpus positions)
+ 2021-05-05 22:36:37,586 : INFO : adding document #0 to Dictionary(190131 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,594 : INFO : built Dictionary(190252 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1156 documents (total 11560000 corpus positions)
+ 2021-05-05 22:36:37,636 : INFO : adding document #0 to Dictionary(190252 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,642 : INFO : built Dictionary(190331 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1157 documents (total 11570000 corpus positions)
+ 2021-05-05 22:36:37,685 : INFO : adding document #0 to Dictionary(190331 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,693 : INFO : built Dictionary(190451 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1158 documents (total 11580000 corpus positions)
+ 2021-05-05 22:36:37,741 : INFO : adding document #0 to Dictionary(190451 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,749 : INFO : built Dictionary(190547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1159 documents (total 11590000 corpus positions)
+ 2021-05-05 22:36:37,792 : INFO : adding document #0 to Dictionary(190547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,799 : INFO : built Dictionary(190634 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1160 documents (total 11600000 corpus positions)
+ 2021-05-05 22:36:37,847 : INFO : adding document #0 to Dictionary(190634 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,854 : INFO : built Dictionary(190677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1161 documents (total 11610000 corpus positions)
+ 2021-05-05 22:36:37,899 : INFO : adding document #0 to Dictionary(190677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,909 : INFO : built Dictionary(190774 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1162 documents (total 11620000 corpus positions)
+ 2021-05-05 22:36:37,958 : INFO : adding document #0 to Dictionary(190774 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:37,971 : INFO : built Dictionary(190877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1163 documents (total 11630000 corpus positions)
+ 2021-05-05 22:36:38,022 : INFO : adding document #0 to Dictionary(190877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,030 : INFO : built Dictionary(191000 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1164 documents (total 11640000 corpus positions)
+ 2021-05-05 22:36:38,076 : INFO : adding document #0 to Dictionary(191000 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,085 : INFO : built Dictionary(191107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1165 documents (total 11650000 corpus positions)
+ 2021-05-05 22:36:38,132 : INFO : adding document #0 to Dictionary(191107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,140 : INFO : built Dictionary(191212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1166 documents (total 11660000 corpus positions)
+ 2021-05-05 22:36:38,183 : INFO : adding document #0 to Dictionary(191212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,192 : INFO : built Dictionary(191312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1167 documents (total 11670000 corpus positions)
+ 2021-05-05 22:36:38,240 : INFO : adding document #0 to Dictionary(191312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,247 : INFO : built Dictionary(191401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1168 documents (total 11680000 corpus positions)
+ 2021-05-05 22:36:38,291 : INFO : adding document #0 to Dictionary(191401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,298 : INFO : built Dictionary(191563 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1169 documents (total 11690000 corpus positions)
+ 2021-05-05 22:36:38,347 : INFO : adding document #0 to Dictionary(191563 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,355 : INFO : built Dictionary(191661 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1170 documents (total 11700000 corpus positions)
+ 2021-05-05 22:36:38,401 : INFO : adding document #0 to Dictionary(191661 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,408 : INFO : built Dictionary(191711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1171 documents (total 11710000 corpus positions)
+ 2021-05-05 22:36:38,452 : INFO : adding document #0 to Dictionary(191711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,459 : INFO : built Dictionary(191753 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1172 documents (total 11720000 corpus positions)
+ 2021-05-05 22:36:38,507 : INFO : adding document #0 to Dictionary(191753 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,516 : INFO : built Dictionary(191872 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1173 documents (total 11730000 corpus positions)
+ 2021-05-05 22:36:38,560 : INFO : adding document #0 to Dictionary(191872 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,568 : INFO : built Dictionary(191957 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1174 documents (total 11740000 corpus positions)
+ 2021-05-05 22:36:38,612 : INFO : adding document #0 to Dictionary(191957 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,620 : INFO : built Dictionary(192034 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1175 documents (total 11750000 corpus positions)
+ 2021-05-05 22:36:38,665 : INFO : adding document #0 to Dictionary(192034 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,672 : INFO : built Dictionary(192144 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1176 documents (total 11760000 corpus positions)
+ 2021-05-05 22:36:38,720 : INFO : adding document #0 to Dictionary(192144 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,728 : INFO : built Dictionary(192256 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1177 documents (total 11770000 corpus positions)
+ 2021-05-05 22:36:38,775 : INFO : adding document #0 to Dictionary(192256 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,783 : INFO : built Dictionary(192355 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1178 documents (total 11780000 corpus positions)
+ 2021-05-05 22:36:38,831 : INFO : adding document #0 to Dictionary(192355 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,839 : INFO : built Dictionary(192448 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1179 documents (total 11790000 corpus positions)
+ 2021-05-05 22:36:38,882 : INFO : adding document #0 to Dictionary(192448 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,889 : INFO : built Dictionary(192549 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1180 documents (total 11800000 corpus positions)
+ 2021-05-05 22:36:38,933 : INFO : adding document #0 to Dictionary(192549 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,940 : INFO : built Dictionary(192642 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1181 documents (total 11810000 corpus positions)
+ 2021-05-05 22:36:38,989 : INFO : adding document #0 to Dictionary(192642 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:38,996 : INFO : built Dictionary(192728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1182 documents (total 11820000 corpus positions)
+ 2021-05-05 22:36:39,048 : INFO : adding document #0 to Dictionary(192728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,056 : INFO : built Dictionary(192761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1183 documents (total 11830000 corpus positions)
+ 2021-05-05 22:36:39,100 : INFO : adding document #0 to Dictionary(192761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,107 : INFO : built Dictionary(192811 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1184 documents (total 11840000 corpus positions)
+ 2021-05-05 22:36:39,151 : INFO : adding document #0 to Dictionary(192811 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,158 : INFO : built Dictionary(192877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1185 documents (total 11850000 corpus positions)
+ 2021-05-05 22:36:39,202 : INFO : adding document #0 to Dictionary(192877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,213 : INFO : built Dictionary(193056 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1186 documents (total 11860000 corpus positions)
+ 2021-05-05 22:36:39,256 : INFO : adding document #0 to Dictionary(193056 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,263 : INFO : built Dictionary(193172 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1187 documents (total 11870000 corpus positions)
+ 2021-05-05 22:36:39,307 : INFO : adding document #0 to Dictionary(193172 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,315 : INFO : built Dictionary(193221 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1188 documents (total 11880000 corpus positions)
+ 2021-05-05 22:36:39,357 : INFO : adding document #0 to Dictionary(193221 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,365 : INFO : built Dictionary(193349 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1189 documents (total 11890000 corpus positions)
+ 2021-05-05 22:36:39,408 : INFO : adding document #0 to Dictionary(193349 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,416 : INFO : built Dictionary(193410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1190 documents (total 11900000 corpus positions)
+ 2021-05-05 22:36:39,459 : INFO : adding document #0 to Dictionary(193410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,466 : INFO : built Dictionary(193460 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1191 documents (total 11910000 corpus positions)
+ 2021-05-05 22:36:39,509 : INFO : adding document #0 to Dictionary(193460 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,517 : INFO : built Dictionary(193556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1192 documents (total 11920000 corpus positions)
+ 2021-05-05 22:36:39,560 : INFO : adding document #0 to Dictionary(193556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,568 : INFO : built Dictionary(193612 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1193 documents (total 11930000 corpus positions)
+ 2021-05-05 22:36:39,611 : INFO : adding document #0 to Dictionary(193612 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,618 : INFO : built Dictionary(193684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1194 documents (total 11940000 corpus positions)
+ 2021-05-05 22:36:39,661 : INFO : adding document #0 to Dictionary(193684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,668 : INFO : built Dictionary(193736 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1195 documents (total 11950000 corpus positions)
+ 2021-05-05 22:36:39,712 : INFO : adding document #0 to Dictionary(193736 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,720 : INFO : built Dictionary(193796 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1196 documents (total 11960000 corpus positions)
+ 2021-05-05 22:36:39,767 : INFO : adding document #0 to Dictionary(193796 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,775 : INFO : built Dictionary(193860 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1197 documents (total 11970000 corpus positions)
+ 2021-05-05 22:36:39,818 : INFO : adding document #0 to Dictionary(193860 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,826 : INFO : built Dictionary(193933 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1198 documents (total 11980000 corpus positions)
+ 2021-05-05 22:36:39,869 : INFO : adding document #0 to Dictionary(193933 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,876 : INFO : built Dictionary(193972 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1199 documents (total 11990000 corpus positions)
+ 2021-05-05 22:36:39,920 : INFO : adding document #0 to Dictionary(193972 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,928 : INFO : built Dictionary(194050 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1200 documents (total 12000000 corpus positions)
+ 2021-05-05 22:36:39,971 : INFO : adding document #0 to Dictionary(194050 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:39,979 : INFO : built Dictionary(194113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1201 documents (total 12010000 corpus positions)
+ 2021-05-05 22:36:40,026 : INFO : adding document #0 to Dictionary(194113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,034 : INFO : built Dictionary(194190 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1202 documents (total 12020000 corpus positions)
+ 2021-05-05 22:36:40,083 : INFO : adding document #0 to Dictionary(194190 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,090 : INFO : built Dictionary(194289 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1203 documents (total 12030000 corpus positions)
+ 2021-05-05 22:36:40,139 : INFO : adding document #0 to Dictionary(194289 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,147 : INFO : built Dictionary(194368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1204 documents (total 12040000 corpus positions)
+ 2021-05-05 22:36:40,192 : INFO : adding document #0 to Dictionary(194368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,200 : INFO : built Dictionary(194437 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1205 documents (total 12050000 corpus positions)
+ 2021-05-05 22:36:40,244 : INFO : adding document #0 to Dictionary(194437 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,251 : INFO : built Dictionary(194487 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1206 documents (total 12060000 corpus positions)
+ 2021-05-05 22:36:40,295 : INFO : adding document #0 to Dictionary(194487 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,303 : INFO : built Dictionary(194537 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1207 documents (total 12070000 corpus positions)
+ 2021-05-05 22:36:40,350 : INFO : adding document #0 to Dictionary(194537 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,359 : INFO : built Dictionary(194916 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1208 documents (total 12080000 corpus positions)
+ 2021-05-05 22:36:40,402 : INFO : adding document #0 to Dictionary(194916 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,409 : INFO : built Dictionary(195034 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1209 documents (total 12090000 corpus positions)
+ 2021-05-05 22:36:40,454 : INFO : adding document #0 to Dictionary(195034 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,463 : INFO : built Dictionary(195218 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1210 documents (total 12100000 corpus positions)
+ 2021-05-05 22:36:40,513 : INFO : adding document #0 to Dictionary(195218 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,521 : INFO : built Dictionary(195294 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1211 documents (total 12110000 corpus positions)
+ 2021-05-05 22:36:40,565 : INFO : adding document #0 to Dictionary(195294 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,573 : INFO : built Dictionary(195352 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1212 documents (total 12120000 corpus positions)
+ 2021-05-05 22:36:40,622 : INFO : adding document #0 to Dictionary(195352 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,631 : INFO : built Dictionary(195434 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1213 documents (total 12130000 corpus positions)
+ 2021-05-05 22:36:40,681 : INFO : adding document #0 to Dictionary(195434 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,688 : INFO : built Dictionary(195496 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1214 documents (total 12140000 corpus positions)
+ 2021-05-05 22:36:40,734 : INFO : adding document #0 to Dictionary(195496 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,741 : INFO : built Dictionary(195522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1215 documents (total 12150000 corpus positions)
+ 2021-05-05 22:36:40,789 : INFO : adding document #0 to Dictionary(195522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,797 : INFO : built Dictionary(195714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1216 documents (total 12160000 corpus positions)
+ 2021-05-05 22:36:40,848 : INFO : adding document #0 to Dictionary(195714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,856 : INFO : built Dictionary(195786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1217 documents (total 12170000 corpus positions)
+ 2021-05-05 22:36:40,901 : INFO : adding document #0 to Dictionary(195786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,908 : INFO : built Dictionary(195836 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1218 documents (total 12180000 corpus positions)
+ 2021-05-05 22:36:40,954 : INFO : adding document #0 to Dictionary(195836 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:40,959 : INFO : built Dictionary(195880 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1219 documents (total 12190000 corpus positions)
+ 2021-05-05 22:36:41,005 : INFO : adding document #0 to Dictionary(195880 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,014 : INFO : built Dictionary(195945 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1220 documents (total 12200000 corpus positions)
+ 2021-05-05 22:36:41,060 : INFO : adding document #0 to Dictionary(195945 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,070 : INFO : built Dictionary(196013 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1221 documents (total 12210000 corpus positions)
+ 2021-05-05 22:36:41,120 : INFO : adding document #0 to Dictionary(196013 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,128 : INFO : built Dictionary(196112 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1222 documents (total 12220000 corpus positions)
+ 2021-05-05 22:36:41,173 : INFO : adding document #0 to Dictionary(196112 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,182 : INFO : built Dictionary(196280 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1223 documents (total 12230000 corpus positions)
+ 2021-05-05 22:36:41,227 : INFO : adding document #0 to Dictionary(196280 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,236 : INFO : built Dictionary(196440 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1224 documents (total 12240000 corpus positions)
+ 2021-05-05 22:36:41,285 : INFO : adding document #0 to Dictionary(196440 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,292 : INFO : built Dictionary(196500 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1225 documents (total 12250000 corpus positions)
+ 2021-05-05 22:36:41,337 : INFO : adding document #0 to Dictionary(196500 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,345 : INFO : built Dictionary(196588 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1226 documents (total 12260000 corpus positions)
+ 2021-05-05 22:36:41,389 : INFO : adding document #0 to Dictionary(196588 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,398 : INFO : built Dictionary(196728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1227 documents (total 12270000 corpus positions)
+ 2021-05-05 22:36:41,445 : INFO : adding document #0 to Dictionary(196728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,453 : INFO : built Dictionary(196797 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1228 documents (total 12280000 corpus positions)
+ 2021-05-05 22:36:41,497 : INFO : adding document #0 to Dictionary(196797 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,505 : INFO : built Dictionary(196887 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1229 documents (total 12290000 corpus positions)
+ 2021-05-05 22:36:41,549 : INFO : adding document #0 to Dictionary(196887 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,556 : INFO : built Dictionary(196931 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1230 documents (total 12300000 corpus positions)
+ 2021-05-05 22:36:41,600 : INFO : adding document #0 to Dictionary(196931 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,607 : INFO : built Dictionary(197038 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1231 documents (total 12310000 corpus positions)
+ 2021-05-05 22:36:41,654 : INFO : adding document #0 to Dictionary(197038 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,662 : INFO : built Dictionary(197084 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1232 documents (total 12320000 corpus positions)
+ 2021-05-05 22:36:41,707 : INFO : adding document #0 to Dictionary(197084 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,715 : INFO : built Dictionary(197205 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1233 documents (total 12330000 corpus positions)
+ 2021-05-05 22:36:41,762 : INFO : adding document #0 to Dictionary(197205 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,771 : INFO : built Dictionary(197364 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1234 documents (total 12340000 corpus positions)
+ 2021-05-05 22:36:41,817 : INFO : adding document #0 to Dictionary(197364 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,826 : INFO : built Dictionary(197642 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1235 documents (total 12350000 corpus positions)
+ 2021-05-05 22:36:41,868 : INFO : adding document #0 to Dictionary(197642 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,876 : INFO : built Dictionary(197698 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1236 documents (total 12360000 corpus positions)
+ 2021-05-05 22:36:41,919 : INFO : adding document #0 to Dictionary(197698 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,926 : INFO : built Dictionary(197722 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1237 documents (total 12370000 corpus positions)
+ 2021-05-05 22:36:41,969 : INFO : adding document #0 to Dictionary(197722 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:41,976 : INFO : built Dictionary(197820 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1238 documents (total 12380000 corpus positions)
+ 2021-05-05 22:36:42,020 : INFO : adding document #0 to Dictionary(197820 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,029 : INFO : built Dictionary(197877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1239 documents (total 12390000 corpus positions)
+ 2021-05-05 22:36:42,072 : INFO : adding document #0 to Dictionary(197877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,079 : INFO : built Dictionary(197969 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1240 documents (total 12400000 corpus positions)
+ 2021-05-05 22:36:42,123 : INFO : adding document #0 to Dictionary(197969 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,131 : INFO : built Dictionary(198043 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1241 documents (total 12410000 corpus positions)
+ 2021-05-05 22:36:42,174 : INFO : adding document #0 to Dictionary(198043 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,182 : INFO : built Dictionary(198121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1242 documents (total 12420000 corpus positions)
+ 2021-05-05 22:36:42,227 : INFO : adding document #0 to Dictionary(198121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,236 : INFO : built Dictionary(198266 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1243 documents (total 12430000 corpus positions)
+ 2021-05-05 22:36:42,279 : INFO : adding document #0 to Dictionary(198266 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,287 : INFO : built Dictionary(198312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1244 documents (total 12440000 corpus positions)
+ 2021-05-05 22:36:42,330 : INFO : adding document #0 to Dictionary(198312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,338 : INFO : built Dictionary(198389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1245 documents (total 12450000 corpus positions)
+ 2021-05-05 22:36:42,381 : INFO : adding document #0 to Dictionary(198389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,388 : INFO : built Dictionary(198421 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1246 documents (total 12460000 corpus positions)
+ 2021-05-05 22:36:42,432 : INFO : adding document #0 to Dictionary(198421 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,439 : INFO : built Dictionary(198483 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1247 documents (total 12470000 corpus positions)
+ 2021-05-05 22:36:42,482 : INFO : adding document #0 to Dictionary(198483 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,490 : INFO : built Dictionary(198535 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1248 documents (total 12480000 corpus positions)
+ 2021-05-05 22:36:42,533 : INFO : adding document #0 to Dictionary(198535 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,540 : INFO : built Dictionary(198594 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1249 documents (total 12490000 corpus positions)
+ 2021-05-05 22:36:42,584 : INFO : adding document #0 to Dictionary(198594 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,591 : INFO : built Dictionary(198668 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1250 documents (total 12500000 corpus positions)
+ 2021-05-05 22:36:42,635 : INFO : adding document #0 to Dictionary(198668 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,644 : INFO : built Dictionary(198748 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1251 documents (total 12510000 corpus positions)
+ 2021-05-05 22:36:42,687 : INFO : adding document #0 to Dictionary(198748 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,695 : INFO : built Dictionary(198815 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1252 documents (total 12520000 corpus positions)
+ 2021-05-05 22:36:42,738 : INFO : adding document #0 to Dictionary(198815 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,747 : INFO : built Dictionary(198895 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1253 documents (total 12530000 corpus positions)
+ 2021-05-05 22:36:42,790 : INFO : adding document #0 to Dictionary(198895 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,797 : INFO : built Dictionary(198955 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1254 documents (total 12540000 corpus positions)
+ 2021-05-05 22:36:42,842 : INFO : adding document #0 to Dictionary(198955 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,851 : INFO : built Dictionary(199074 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1255 documents (total 12550000 corpus positions)
+ 2021-05-05 22:36:42,895 : INFO : adding document #0 to Dictionary(199074 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,902 : INFO : built Dictionary(199106 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1256 documents (total 12560000 corpus positions)
+ 2021-05-05 22:36:42,945 : INFO : adding document #0 to Dictionary(199106 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:42,952 : INFO : built Dictionary(199200 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1257 documents (total 12570000 corpus positions)
+ 2021-05-05 22:36:42,996 : INFO : adding document #0 to Dictionary(199200 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,005 : INFO : built Dictionary(199294 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1258 documents (total 12580000 corpus positions)
+ 2021-05-05 22:36:43,050 : INFO : adding document #0 to Dictionary(199294 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,058 : INFO : built Dictionary(199340 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1259 documents (total 12590000 corpus positions)
+ 2021-05-05 22:36:43,102 : INFO : adding document #0 to Dictionary(199340 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,111 : INFO : built Dictionary(199424 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1260 documents (total 12600000 corpus positions)
+ 2021-05-05 22:36:43,155 : INFO : adding document #0 to Dictionary(199424 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,162 : INFO : built Dictionary(199487 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1261 documents (total 12610000 corpus positions)
+ 2021-05-05 22:36:43,207 : INFO : adding document #0 to Dictionary(199487 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,214 : INFO : built Dictionary(199534 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1262 documents (total 12620000 corpus positions)
+ 2021-05-05 22:36:43,259 : INFO : adding document #0 to Dictionary(199534 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,267 : INFO : built Dictionary(199615 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1263 documents (total 12630000 corpus positions)
+ 2021-05-05 22:36:43,311 : INFO : adding document #0 to Dictionary(199615 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,318 : INFO : built Dictionary(199663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1264 documents (total 12640000 corpus positions)
+ 2021-05-05 22:36:43,362 : INFO : adding document #0 to Dictionary(199663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,370 : INFO : built Dictionary(199729 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1265 documents (total 12650000 corpus positions)
+ 2021-05-05 22:36:43,414 : INFO : adding document #0 to Dictionary(199729 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,422 : INFO : built Dictionary(199810 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1266 documents (total 12660000 corpus positions)
+ 2021-05-05 22:36:43,465 : INFO : adding document #0 to Dictionary(199810 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,473 : INFO : built Dictionary(199855 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1267 documents (total 12670000 corpus positions)
+ 2021-05-05 22:36:43,516 : INFO : adding document #0 to Dictionary(199855 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,523 : INFO : built Dictionary(199926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1268 documents (total 12680000 corpus positions)
+ 2021-05-05 22:36:43,567 : INFO : adding document #0 to Dictionary(199926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,578 : INFO : built Dictionary(199984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1269 documents (total 12690000 corpus positions)
+ 2021-05-05 22:36:43,621 : INFO : adding document #0 to Dictionary(199984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,630 : INFO : built Dictionary(200060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1270 documents (total 12700000 corpus positions)
+ 2021-05-05 22:36:43,674 : INFO : adding document #0 to Dictionary(200060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,683 : INFO : built Dictionary(200177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1271 documents (total 12710000 corpus positions)
+ 2021-05-05 22:36:43,726 : INFO : adding document #0 to Dictionary(200177 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,735 : INFO : built Dictionary(200262 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1272 documents (total 12720000 corpus positions)
+ 2021-05-05 22:36:43,779 : INFO : adding document #0 to Dictionary(200262 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,788 : INFO : built Dictionary(200366 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1273 documents (total 12730000 corpus positions)
+ 2021-05-05 22:36:43,837 : INFO : adding document #0 to Dictionary(200366 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,845 : INFO : built Dictionary(200500 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1274 documents (total 12740000 corpus positions)
+ 2021-05-05 22:36:43,889 : INFO : adding document #0 to Dictionary(200500 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,897 : INFO : built Dictionary(200622 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1275 documents (total 12750000 corpus positions)
+ 2021-05-05 22:36:43,941 : INFO : adding document #0 to Dictionary(200622 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:43,950 : INFO : built Dictionary(200763 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1276 documents (total 12760000 corpus positions)
+ 2021-05-05 22:36:43,995 : INFO : adding document #0 to Dictionary(200763 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,005 : INFO : built Dictionary(200881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1277 documents (total 12770000 corpus positions)
+ 2021-05-05 22:36:44,057 : INFO : adding document #0 to Dictionary(200881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,065 : INFO : built Dictionary(200976 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1278 documents (total 12780000 corpus positions)
+ 2021-05-05 22:36:44,112 : INFO : adding document #0 to Dictionary(200976 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,122 : INFO : built Dictionary(201104 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1279 documents (total 12790000 corpus positions)
+ 2021-05-05 22:36:44,173 : INFO : adding document #0 to Dictionary(201104 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,181 : INFO : built Dictionary(201192 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1280 documents (total 12800000 corpus positions)
+ 2021-05-05 22:36:44,226 : INFO : adding document #0 to Dictionary(201192 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,237 : INFO : built Dictionary(201319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1281 documents (total 12810000 corpus positions)
+ 2021-05-05 22:36:44,286 : INFO : adding document #0 to Dictionary(201319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,294 : INFO : built Dictionary(201438 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1282 documents (total 12820000 corpus positions)
+ 2021-05-05 22:36:44,340 : INFO : adding document #0 to Dictionary(201438 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,349 : INFO : built Dictionary(201515 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1283 documents (total 12830000 corpus positions)
+ 2021-05-05 22:36:44,397 : INFO : adding document #0 to Dictionary(201515 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,408 : INFO : built Dictionary(201603 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1284 documents (total 12840000 corpus positions)
+ 2021-05-05 22:36:44,457 : INFO : adding document #0 to Dictionary(201603 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,465 : INFO : built Dictionary(201703 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1285 documents (total 12850000 corpus positions)
+ 2021-05-05 22:36:44,509 : INFO : adding document #0 to Dictionary(201703 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,518 : INFO : built Dictionary(201805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1286 documents (total 12860000 corpus positions)
+ 2021-05-05 22:36:44,563 : INFO : adding document #0 to Dictionary(201805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,572 : INFO : built Dictionary(201915 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1287 documents (total 12870000 corpus positions)
+ 2021-05-05 22:36:44,621 : INFO : adding document #0 to Dictionary(201915 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,630 : INFO : built Dictionary(202010 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1288 documents (total 12880000 corpus positions)
+ 2021-05-05 22:36:44,675 : INFO : adding document #0 to Dictionary(202010 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,684 : INFO : built Dictionary(202058 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1289 documents (total 12890000 corpus positions)
+ 2021-05-05 22:36:44,732 : INFO : adding document #0 to Dictionary(202058 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,739 : INFO : built Dictionary(202244 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1290 documents (total 12900000 corpus positions)
+ 2021-05-05 22:36:44,785 : INFO : adding document #0 to Dictionary(202244 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,792 : INFO : built Dictionary(202351 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1291 documents (total 12910000 corpus positions)
+ 2021-05-05 22:36:44,840 : INFO : adding document #0 to Dictionary(202351 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,848 : INFO : built Dictionary(202399 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1292 documents (total 12920000 corpus positions)
+ 2021-05-05 22:36:44,893 : INFO : adding document #0 to Dictionary(202399 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,902 : INFO : built Dictionary(202476 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1293 documents (total 12930000 corpus positions)
+ 2021-05-05 22:36:44,945 : INFO : adding document #0 to Dictionary(202476 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:44,953 : INFO : built Dictionary(202532 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1294 documents (total 12940000 corpus positions)
+ 2021-05-05 22:36:44,997 : INFO : adding document #0 to Dictionary(202532 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,009 : INFO : built Dictionary(202593 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1295 documents (total 12950000 corpus positions)
+ 2021-05-05 22:36:45,057 : INFO : adding document #0 to Dictionary(202593 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,066 : INFO : built Dictionary(202681 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1296 documents (total 12960000 corpus positions)
+ 2021-05-05 22:36:45,120 : INFO : adding document #0 to Dictionary(202681 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,128 : INFO : built Dictionary(202745 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1297 documents (total 12970000 corpus positions)
+ 2021-05-05 22:36:45,177 : INFO : adding document #0 to Dictionary(202745 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,184 : INFO : built Dictionary(202802 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1298 documents (total 12980000 corpus positions)
+ 2021-05-05 22:36:45,228 : INFO : adding document #0 to Dictionary(202802 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,236 : INFO : built Dictionary(202869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1299 documents (total 12990000 corpus positions)
+ 2021-05-05 22:36:45,279 : INFO : adding document #0 to Dictionary(202869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,287 : INFO : built Dictionary(202980 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1300 documents (total 13000000 corpus positions)
+ 2021-05-05 22:36:45,331 : INFO : adding document #0 to Dictionary(202980 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,339 : INFO : built Dictionary(203065 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1301 documents (total 13010000 corpus positions)
+ 2021-05-05 22:36:45,383 : INFO : adding document #0 to Dictionary(203065 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,393 : INFO : built Dictionary(203317 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1302 documents (total 13020000 corpus positions)
+ 2021-05-05 22:36:45,441 : INFO : adding document #0 to Dictionary(203317 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,448 : INFO : built Dictionary(203351 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1303 documents (total 13030000 corpus positions)
+ 2021-05-05 22:36:45,496 : INFO : adding document #0 to Dictionary(203351 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,504 : INFO : built Dictionary(203470 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1304 documents (total 13040000 corpus positions)
+ 2021-05-05 22:36:45,550 : INFO : adding document #0 to Dictionary(203470 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,558 : INFO : built Dictionary(203508 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1305 documents (total 13050000 corpus positions)
+ 2021-05-05 22:36:45,607 : INFO : adding document #0 to Dictionary(203508 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,619 : INFO : built Dictionary(203629 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1306 documents (total 13060000 corpus positions)
+ 2021-05-05 22:36:45,665 : INFO : adding document #0 to Dictionary(203629 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,676 : INFO : built Dictionary(203742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1307 documents (total 13070000 corpus positions)
+ 2021-05-05 22:36:45,723 : INFO : adding document #0 to Dictionary(203742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,733 : INFO : built Dictionary(203908 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1308 documents (total 13080000 corpus positions)
+ 2021-05-05 22:36:45,778 : INFO : adding document #0 to Dictionary(203908 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,787 : INFO : built Dictionary(203971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1309 documents (total 13090000 corpus positions)
+ 2021-05-05 22:36:45,832 : INFO : adding document #0 to Dictionary(203971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,840 : INFO : built Dictionary(204023 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1310 documents (total 13100000 corpus positions)
+ 2021-05-05 22:36:45,885 : INFO : adding document #0 to Dictionary(204023 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,895 : INFO : built Dictionary(204117 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1311 documents (total 13110000 corpus positions)
+ 2021-05-05 22:36:45,941 : INFO : adding document #0 to Dictionary(204117 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:45,951 : INFO : built Dictionary(204160 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1312 documents (total 13120000 corpus positions)
+ 2021-05-05 22:36:45,997 : INFO : adding document #0 to Dictionary(204160 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,006 : INFO : built Dictionary(204282 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1313 documents (total 13130000 corpus positions)
+ 2021-05-05 22:36:46,053 : INFO : adding document #0 to Dictionary(204282 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,063 : INFO : built Dictionary(204399 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1314 documents (total 13140000 corpus positions)
+ 2021-05-05 22:36:46,113 : INFO : adding document #0 to Dictionary(204399 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,126 : INFO : built Dictionary(204443 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1315 documents (total 13150000 corpus positions)
+ 2021-05-05 22:36:46,178 : INFO : adding document #0 to Dictionary(204443 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,187 : INFO : built Dictionary(204495 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1316 documents (total 13160000 corpus positions)
+ 2021-05-05 22:36:46,235 : INFO : adding document #0 to Dictionary(204495 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,246 : INFO : built Dictionary(204581 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1317 documents (total 13170000 corpus positions)
+ 2021-05-05 22:36:46,291 : INFO : adding document #0 to Dictionary(204581 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,299 : INFO : built Dictionary(204660 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1318 documents (total 13180000 corpus positions)
+ 2021-05-05 22:36:46,347 : INFO : adding document #0 to Dictionary(204660 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,354 : INFO : built Dictionary(204710 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1319 documents (total 13190000 corpus positions)
+ 2021-05-05 22:36:46,402 : INFO : adding document #0 to Dictionary(204710 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,409 : INFO : built Dictionary(204770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1320 documents (total 13200000 corpus positions)
+ 2021-05-05 22:36:46,454 : INFO : adding document #0 to Dictionary(204770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,462 : INFO : built Dictionary(204881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1321 documents (total 13210000 corpus positions)
+ 2021-05-05 22:36:46,508 : INFO : adding document #0 to Dictionary(204881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,516 : INFO : built Dictionary(204982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1322 documents (total 13220000 corpus positions)
+ 2021-05-05 22:36:46,561 : INFO : adding document #0 to Dictionary(204982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,569 : INFO : built Dictionary(205053 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1323 documents (total 13230000 corpus positions)
+ 2021-05-05 22:36:46,618 : INFO : adding document #0 to Dictionary(205053 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,627 : INFO : built Dictionary(205131 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1324 documents (total 13240000 corpus positions)
+ 2021-05-05 22:36:46,669 : INFO : adding document #0 to Dictionary(205131 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,678 : INFO : built Dictionary(205218 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1325 documents (total 13250000 corpus positions)
+ 2021-05-05 22:36:46,725 : INFO : adding document #0 to Dictionary(205218 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,734 : INFO : built Dictionary(205292 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1326 documents (total 13260000 corpus positions)
+ 2021-05-05 22:36:46,780 : INFO : adding document #0 to Dictionary(205292 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,788 : INFO : built Dictionary(205387 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1327 documents (total 13270000 corpus positions)
+ 2021-05-05 22:36:46,831 : INFO : adding document #0 to Dictionary(205387 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,839 : INFO : built Dictionary(205444 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1328 documents (total 13280000 corpus positions)
+ 2021-05-05 22:36:46,882 : INFO : adding document #0 to Dictionary(205444 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,890 : INFO : built Dictionary(205489 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1329 documents (total 13290000 corpus positions)
+ 2021-05-05 22:36:46,937 : INFO : adding document #0 to Dictionary(205489 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,945 : INFO : built Dictionary(205546 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1330 documents (total 13300000 corpus positions)
+ 2021-05-05 22:36:46,990 : INFO : adding document #0 to Dictionary(205546 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:46,999 : INFO : built Dictionary(205648 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1331 documents (total 13310000 corpus positions)
+ 2021-05-05 22:36:47,047 : INFO : adding document #0 to Dictionary(205648 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,055 : INFO : built Dictionary(205723 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1332 documents (total 13320000 corpus positions)
+ 2021-05-05 22:36:47,098 : INFO : adding document #0 to Dictionary(205723 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,106 : INFO : built Dictionary(205796 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1333 documents (total 13330000 corpus positions)
+ 2021-05-05 22:36:47,152 : INFO : adding document #0 to Dictionary(205796 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,160 : INFO : built Dictionary(205842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1334 documents (total 13340000 corpus positions)
+ 2021-05-05 22:36:47,205 : INFO : adding document #0 to Dictionary(205842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,214 : INFO : built Dictionary(205894 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1335 documents (total 13350000 corpus positions)
+ 2021-05-05 22:36:47,258 : INFO : adding document #0 to Dictionary(205894 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,266 : INFO : built Dictionary(205959 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1336 documents (total 13360000 corpus positions)
+ 2021-05-05 22:36:47,311 : INFO : adding document #0 to Dictionary(205959 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,319 : INFO : built Dictionary(206012 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1337 documents (total 13370000 corpus positions)
+ 2021-05-05 22:36:47,366 : INFO : adding document #0 to Dictionary(206012 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,376 : INFO : built Dictionary(206094 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1338 documents (total 13380000 corpus positions)
+ 2021-05-05 22:36:47,432 : INFO : adding document #0 to Dictionary(206094 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,440 : INFO : built Dictionary(206271 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1339 documents (total 13390000 corpus positions)
+ 2021-05-05 22:36:47,484 : INFO : adding document #0 to Dictionary(206271 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,492 : INFO : built Dictionary(206342 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1340 documents (total 13400000 corpus positions)
+ 2021-05-05 22:36:47,535 : INFO : adding document #0 to Dictionary(206342 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,543 : INFO : built Dictionary(206450 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1341 documents (total 13410000 corpus positions)
+ 2021-05-05 22:36:47,586 : INFO : adding document #0 to Dictionary(206450 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,594 : INFO : built Dictionary(206507 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1342 documents (total 13420000 corpus positions)
+ 2021-05-05 22:36:47,637 : INFO : adding document #0 to Dictionary(206507 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,645 : INFO : built Dictionary(206634 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1343 documents (total 13430000 corpus positions)
+ 2021-05-05 22:36:47,688 : INFO : adding document #0 to Dictionary(206634 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,696 : INFO : built Dictionary(206694 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1344 documents (total 13440000 corpus positions)
+ 2021-05-05 22:36:47,739 : INFO : adding document #0 to Dictionary(206694 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,748 : INFO : built Dictionary(206732 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1345 documents (total 13450000 corpus positions)
+ 2021-05-05 22:36:47,792 : INFO : adding document #0 to Dictionary(206732 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,800 : INFO : built Dictionary(206807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1346 documents (total 13460000 corpus positions)
+ 2021-05-05 22:36:47,845 : INFO : adding document #0 to Dictionary(206807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,853 : INFO : built Dictionary(206867 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1347 documents (total 13470000 corpus positions)
+ 2021-05-05 22:36:47,897 : INFO : adding document #0 to Dictionary(206867 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,907 : INFO : built Dictionary(206924 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1348 documents (total 13480000 corpus positions)
+ 2021-05-05 22:36:47,956 : INFO : adding document #0 to Dictionary(206924 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:47,964 : INFO : built Dictionary(207045 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1349 documents (total 13490000 corpus positions)
+ 2021-05-05 22:36:48,016 : INFO : adding document #0 to Dictionary(207045 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,026 : INFO : built Dictionary(207123 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1350 documents (total 13500000 corpus positions)
+ 2021-05-05 22:36:48,077 : INFO : adding document #0 to Dictionary(207123 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,086 : INFO : built Dictionary(207202 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1351 documents (total 13510000 corpus positions)
+ 2021-05-05 22:36:48,132 : INFO : adding document #0 to Dictionary(207202 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,140 : INFO : built Dictionary(207254 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1352 documents (total 13520000 corpus positions)
+ 2021-05-05 22:36:48,187 : INFO : adding document #0 to Dictionary(207254 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,196 : INFO : built Dictionary(207344 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1353 documents (total 13530000 corpus positions)
+ 2021-05-05 22:36:48,243 : INFO : adding document #0 to Dictionary(207344 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,251 : INFO : built Dictionary(207430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1354 documents (total 13540000 corpus positions)
+ 2021-05-05 22:36:48,294 : INFO : adding document #0 to Dictionary(207430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,301 : INFO : built Dictionary(207497 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1355 documents (total 13550000 corpus positions)
+ 2021-05-05 22:36:48,345 : INFO : adding document #0 to Dictionary(207497 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,353 : INFO : built Dictionary(207608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1356 documents (total 13560000 corpus positions)
+ 2021-05-05 22:36:48,396 : INFO : adding document #0 to Dictionary(207608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,406 : INFO : built Dictionary(207763 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1357 documents (total 13570000 corpus positions)
+ 2021-05-05 22:36:48,452 : INFO : adding document #0 to Dictionary(207763 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,460 : INFO : built Dictionary(207924 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1358 documents (total 13580000 corpus positions)
+ 2021-05-05 22:36:48,504 : INFO : adding document #0 to Dictionary(207924 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,512 : INFO : built Dictionary(207994 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1359 documents (total 13590000 corpus positions)
+ 2021-05-05 22:36:48,560 : INFO : adding document #0 to Dictionary(207994 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,567 : INFO : built Dictionary(208094 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1360 documents (total 13600000 corpus positions)
+ 2021-05-05 22:36:48,614 : INFO : adding document #0 to Dictionary(208094 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,622 : INFO : built Dictionary(208150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1361 documents (total 13610000 corpus positions)
+ 2021-05-05 22:36:48,666 : INFO : adding document #0 to Dictionary(208150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,674 : INFO : built Dictionary(208223 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1362 documents (total 13620000 corpus positions)
+ 2021-05-05 22:36:48,718 : INFO : adding document #0 to Dictionary(208223 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,727 : INFO : built Dictionary(208284 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1363 documents (total 13630000 corpus positions)
+ 2021-05-05 22:36:48,773 : INFO : adding document #0 to Dictionary(208284 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,780 : INFO : built Dictionary(208330 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1364 documents (total 13640000 corpus positions)
+ 2021-05-05 22:36:48,824 : INFO : adding document #0 to Dictionary(208330 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,832 : INFO : built Dictionary(208382 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1365 documents (total 13650000 corpus positions)
+ 2021-05-05 22:36:48,874 : INFO : adding document #0 to Dictionary(208382 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,882 : INFO : built Dictionary(208526 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1366 documents (total 13660000 corpus positions)
+ 2021-05-05 22:36:48,929 : INFO : adding document #0 to Dictionary(208526 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,937 : INFO : built Dictionary(208669 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1367 documents (total 13670000 corpus positions)
+ 2021-05-05 22:36:48,979 : INFO : adding document #0 to Dictionary(208669 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:48,986 : INFO : built Dictionary(208724 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1368 documents (total 13680000 corpus positions)
+ 2021-05-05 22:36:49,037 : INFO : adding document #0 to Dictionary(208724 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,045 : INFO : built Dictionary(208794 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1369 documents (total 13690000 corpus positions)
+ 2021-05-05 22:36:49,089 : INFO : adding document #0 to Dictionary(208794 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,097 : INFO : built Dictionary(208858 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1370 documents (total 13700000 corpus positions)
+ 2021-05-05 22:36:49,140 : INFO : adding document #0 to Dictionary(208858 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,148 : INFO : built Dictionary(208909 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1371 documents (total 13710000 corpus positions)
+ 2021-05-05 22:36:49,191 : INFO : adding document #0 to Dictionary(208909 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,199 : INFO : built Dictionary(209014 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1372 documents (total 13720000 corpus positions)
+ 2021-05-05 22:36:49,242 : INFO : adding document #0 to Dictionary(209014 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,250 : INFO : built Dictionary(209091 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1373 documents (total 13730000 corpus positions)
+ 2021-05-05 22:36:49,295 : INFO : adding document #0 to Dictionary(209091 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,303 : INFO : built Dictionary(209214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1374 documents (total 13740000 corpus positions)
+ 2021-05-05 22:36:49,349 : INFO : adding document #0 to Dictionary(209214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,358 : INFO : built Dictionary(209330 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1375 documents (total 13750000 corpus positions)
+ 2021-05-05 22:36:49,402 : INFO : adding document #0 to Dictionary(209330 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,411 : INFO : built Dictionary(209476 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1376 documents (total 13760000 corpus positions)
+ 2021-05-05 22:36:49,455 : INFO : adding document #0 to Dictionary(209476 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,462 : INFO : built Dictionary(209546 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1377 documents (total 13770000 corpus positions)
+ 2021-05-05 22:36:49,506 : INFO : adding document #0 to Dictionary(209546 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,514 : INFO : built Dictionary(209623 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1378 documents (total 13780000 corpus positions)
+ 2021-05-05 22:36:49,558 : INFO : adding document #0 to Dictionary(209623 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,566 : INFO : built Dictionary(209819 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1379 documents (total 13790000 corpus positions)
+ 2021-05-05 22:36:49,610 : INFO : adding document #0 to Dictionary(209819 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,617 : INFO : built Dictionary(209975 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1380 documents (total 13800000 corpus positions)
+ 2021-05-05 22:36:49,663 : INFO : adding document #0 to Dictionary(209975 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,671 : INFO : built Dictionary(210072 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1381 documents (total 13810000 corpus positions)
+ 2021-05-05 22:36:49,716 : INFO : adding document #0 to Dictionary(210072 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,724 : INFO : built Dictionary(210191 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1382 documents (total 13820000 corpus positions)
+ 2021-05-05 22:36:49,767 : INFO : adding document #0 to Dictionary(210191 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,775 : INFO : built Dictionary(210364 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1383 documents (total 13830000 corpus positions)
+ 2021-05-05 22:36:49,819 : INFO : adding document #0 to Dictionary(210364 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,827 : INFO : built Dictionary(210448 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1384 documents (total 13840000 corpus positions)
+ 2021-05-05 22:36:49,872 : INFO : adding document #0 to Dictionary(210448 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,880 : INFO : built Dictionary(210496 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1385 documents (total 13850000 corpus positions)
+ 2021-05-05 22:36:49,923 : INFO : adding document #0 to Dictionary(210496 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,931 : INFO : built Dictionary(210608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1386 documents (total 13860000 corpus positions)
+ 2021-05-05 22:36:49,973 : INFO : adding document #0 to Dictionary(210608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:49,982 : INFO : built Dictionary(210733 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1387 documents (total 13870000 corpus positions)
+ 2021-05-05 22:36:50,033 : INFO : adding document #0 to Dictionary(210733 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,042 : INFO : built Dictionary(210854 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1388 documents (total 13880000 corpus positions)
+ 2021-05-05 22:36:50,088 : INFO : adding document #0 to Dictionary(210854 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,097 : INFO : built Dictionary(210937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1389 documents (total 13890000 corpus positions)
+ 2021-05-05 22:36:50,141 : INFO : adding document #0 to Dictionary(210937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,151 : INFO : built Dictionary(211085 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1390 documents (total 13900000 corpus positions)
+ 2021-05-05 22:36:50,194 : INFO : adding document #0 to Dictionary(211085 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,204 : INFO : built Dictionary(211162 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1391 documents (total 13910000 corpus positions)
+ 2021-05-05 22:36:50,250 : INFO : adding document #0 to Dictionary(211162 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,259 : INFO : built Dictionary(211314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1392 documents (total 13920000 corpus positions)
+ 2021-05-05 22:36:50,302 : INFO : adding document #0 to Dictionary(211314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,310 : INFO : built Dictionary(211431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1393 documents (total 13930000 corpus positions)
+ 2021-05-05 22:36:50,354 : INFO : adding document #0 to Dictionary(211431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,363 : INFO : built Dictionary(211579 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1394 documents (total 13940000 corpus positions)
+ 2021-05-05 22:36:50,407 : INFO : adding document #0 to Dictionary(211579 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,417 : INFO : built Dictionary(211706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1395 documents (total 13950000 corpus positions)
+ 2021-05-05 22:36:50,461 : INFO : adding document #0 to Dictionary(211706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,470 : INFO : built Dictionary(211933 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1396 documents (total 13960000 corpus positions)
+ 2021-05-05 22:36:50,519 : INFO : adding document #0 to Dictionary(211933 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,527 : INFO : built Dictionary(212078 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1397 documents (total 13970000 corpus positions)
+ 2021-05-05 22:36:50,571 : INFO : adding document #0 to Dictionary(212078 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,580 : INFO : built Dictionary(212183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1398 documents (total 13980000 corpus positions)
+ 2021-05-05 22:36:50,624 : INFO : adding document #0 to Dictionary(212183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,633 : INFO : built Dictionary(212313 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1399 documents (total 13990000 corpus positions)
+ 2021-05-05 22:36:50,676 : INFO : adding document #0 to Dictionary(212313 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,685 : INFO : built Dictionary(212395 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1400 documents (total 14000000 corpus positions)
+ 2021-05-05 22:36:50,729 : INFO : adding document #0 to Dictionary(212395 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,740 : INFO : built Dictionary(212502 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1401 documents (total 14010000 corpus positions)
+ 2021-05-05 22:36:50,783 : INFO : adding document #0 to Dictionary(212502 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,791 : INFO : built Dictionary(212584 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1402 documents (total 14020000 corpus positions)
+ 2021-05-05 22:36:50,838 : INFO : adding document #0 to Dictionary(212584 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,846 : INFO : built Dictionary(212723 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1403 documents (total 14030000 corpus positions)
+ 2021-05-05 22:36:50,889 : INFO : adding document #0 to Dictionary(212723 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,896 : INFO : built Dictionary(212897 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1404 documents (total 14040000 corpus positions)
+ 2021-05-05 22:36:50,942 : INFO : adding document #0 to Dictionary(212897 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:50,951 : INFO : built Dictionary(213064 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1405 documents (total 14050000 corpus positions)
+ 2021-05-05 22:36:50,998 : INFO : adding document #0 to Dictionary(213064 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,007 : INFO : built Dictionary(213181 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1406 documents (total 14060000 corpus positions)
+ 2021-05-05 22:36:51,054 : INFO : adding document #0 to Dictionary(213181 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,062 : INFO : built Dictionary(213269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1407 documents (total 14070000 corpus positions)
+ 2021-05-05 22:36:51,108 : INFO : adding document #0 to Dictionary(213269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,116 : INFO : built Dictionary(213380 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1408 documents (total 14080000 corpus positions)
+ 2021-05-05 22:36:51,162 : INFO : adding document #0 to Dictionary(213380 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,171 : INFO : built Dictionary(213478 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1409 documents (total 14090000 corpus positions)
+ 2021-05-05 22:36:51,219 : INFO : adding document #0 to Dictionary(213478 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,229 : INFO : built Dictionary(213549 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1410 documents (total 14100000 corpus positions)
+ 2021-05-05 22:36:51,274 : INFO : adding document #0 to Dictionary(213549 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,283 : INFO : built Dictionary(213639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1411 documents (total 14110000 corpus positions)
+ 2021-05-05 22:36:51,326 : INFO : adding document #0 to Dictionary(213639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,334 : INFO : built Dictionary(213722 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1412 documents (total 14120000 corpus positions)
+ 2021-05-05 22:36:51,377 : INFO : adding document #0 to Dictionary(213722 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,384 : INFO : built Dictionary(213773 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1413 documents (total 14130000 corpus positions)
+ 2021-05-05 22:36:51,428 : INFO : adding document #0 to Dictionary(213773 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,436 : INFO : built Dictionary(213874 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1414 documents (total 14140000 corpus positions)
+ 2021-05-05 22:36:51,479 : INFO : adding document #0 to Dictionary(213874 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,488 : INFO : built Dictionary(213947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1415 documents (total 14150000 corpus positions)
+ 2021-05-05 22:36:51,531 : INFO : adding document #0 to Dictionary(213947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,540 : INFO : built Dictionary(214026 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1416 documents (total 14160000 corpus positions)
+ 2021-05-05 22:36:51,593 : INFO : adding document #0 to Dictionary(214026 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,601 : INFO : built Dictionary(214130 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1417 documents (total 14170000 corpus positions)
+ 2021-05-05 22:36:51,648 : INFO : adding document #0 to Dictionary(214130 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,656 : INFO : built Dictionary(214223 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1418 documents (total 14180000 corpus positions)
+ 2021-05-05 22:36:51,702 : INFO : adding document #0 to Dictionary(214223 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,710 : INFO : built Dictionary(214277 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1419 documents (total 14190000 corpus positions)
+ 2021-05-05 22:36:51,756 : INFO : adding document #0 to Dictionary(214277 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,764 : INFO : built Dictionary(214302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1420 documents (total 14200000 corpus positions)
+ 2021-05-05 22:36:51,809 : INFO : adding document #0 to Dictionary(214302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,816 : INFO : built Dictionary(214379 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1421 documents (total 14210000 corpus positions)
+ 2021-05-05 22:36:51,861 : INFO : adding document #0 to Dictionary(214379 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,870 : INFO : built Dictionary(214502 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1422 documents (total 14220000 corpus positions)
+ 2021-05-05 22:36:51,917 : INFO : adding document #0 to Dictionary(214502 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,925 : INFO : built Dictionary(214556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1423 documents (total 14230000 corpus positions)
+ 2021-05-05 22:36:51,969 : INFO : adding document #0 to Dictionary(214556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:51,977 : INFO : built Dictionary(214603 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1424 documents (total 14240000 corpus positions)
+ 2021-05-05 22:36:52,022 : INFO : adding document #0 to Dictionary(214603 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,031 : INFO : built Dictionary(214706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1425 documents (total 14250000 corpus positions)
+ 2021-05-05 22:36:52,077 : INFO : adding document #0 to Dictionary(214706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,086 : INFO : built Dictionary(214805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1426 documents (total 14260000 corpus positions)
+ 2021-05-05 22:36:52,135 : INFO : adding document #0 to Dictionary(214805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,143 : INFO : built Dictionary(214850 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1427 documents (total 14270000 corpus positions)
+ 2021-05-05 22:36:52,192 : INFO : adding document #0 to Dictionary(214850 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,202 : INFO : built Dictionary(214904 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1428 documents (total 14280000 corpus positions)
+ 2021-05-05 22:36:52,249 : INFO : adding document #0 to Dictionary(214904 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,258 : INFO : built Dictionary(215055 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1429 documents (total 14290000 corpus positions)
+ 2021-05-05 22:36:52,302 : INFO : adding document #0 to Dictionary(215055 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,312 : INFO : built Dictionary(215216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1430 documents (total 14300000 corpus positions)
+ 2021-05-05 22:36:52,357 : INFO : adding document #0 to Dictionary(215216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,365 : INFO : built Dictionary(215311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1431 documents (total 14310000 corpus positions)
+ 2021-05-05 22:36:52,411 : INFO : adding document #0 to Dictionary(215311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,419 : INFO : built Dictionary(215389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1432 documents (total 14320000 corpus positions)
+ 2021-05-05 22:36:52,462 : INFO : adding document #0 to Dictionary(215389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,469 : INFO : built Dictionary(215454 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1433 documents (total 14330000 corpus positions)
+ 2021-05-05 22:36:52,513 : INFO : adding document #0 to Dictionary(215454 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,520 : INFO : built Dictionary(215487 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1434 documents (total 14340000 corpus positions)
+ 2021-05-05 22:36:52,563 : INFO : adding document #0 to Dictionary(215487 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,570 : INFO : built Dictionary(215560 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1435 documents (total 14350000 corpus positions)
+ 2021-05-05 22:36:52,613 : INFO : adding document #0 to Dictionary(215560 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,620 : INFO : built Dictionary(215609 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1436 documents (total 14360000 corpus positions)
+ 2021-05-05 22:36:52,663 : INFO : adding document #0 to Dictionary(215609 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,671 : INFO : built Dictionary(215684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1437 documents (total 14370000 corpus positions)
+ 2021-05-05 22:36:52,715 : INFO : adding document #0 to Dictionary(215684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,723 : INFO : built Dictionary(215827 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1438 documents (total 14380000 corpus positions)
+ 2021-05-05 22:36:52,767 : INFO : adding document #0 to Dictionary(215827 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,774 : INFO : built Dictionary(215872 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1439 documents (total 14390000 corpus positions)
+ 2021-05-05 22:36:52,818 : INFO : adding document #0 to Dictionary(215872 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,827 : INFO : built Dictionary(215961 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1440 documents (total 14400000 corpus positions)
+ 2021-05-05 22:36:52,875 : INFO : adding document #0 to Dictionary(215961 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,883 : INFO : built Dictionary(216207 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1441 documents (total 14410000 corpus positions)
+ 2021-05-05 22:36:52,927 : INFO : adding document #0 to Dictionary(216207 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,935 : INFO : built Dictionary(216247 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1442 documents (total 14420000 corpus positions)
+ 2021-05-05 22:36:52,981 : INFO : adding document #0 to Dictionary(216247 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:52,990 : INFO : built Dictionary(216296 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1443 documents (total 14430000 corpus positions)
+ 2021-05-05 22:36:53,039 : INFO : adding document #0 to Dictionary(216296 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,050 : INFO : built Dictionary(216421 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1444 documents (total 14440000 corpus positions)
+ 2021-05-05 22:36:53,092 : INFO : adding document #0 to Dictionary(216421 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,099 : INFO : built Dictionary(216470 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1445 documents (total 14450000 corpus positions)
+ 2021-05-05 22:36:53,143 : INFO : adding document #0 to Dictionary(216470 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,150 : INFO : built Dictionary(216542 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1446 documents (total 14460000 corpus positions)
+ 2021-05-05 22:36:53,193 : INFO : adding document #0 to Dictionary(216542 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,202 : INFO : built Dictionary(216626 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1447 documents (total 14470000 corpus positions)
+ 2021-05-05 22:36:53,248 : INFO : adding document #0 to Dictionary(216626 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,257 : INFO : built Dictionary(216674 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1448 documents (total 14480000 corpus positions)
+ 2021-05-05 22:36:53,306 : INFO : adding document #0 to Dictionary(216674 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,314 : INFO : built Dictionary(216742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1449 documents (total 14490000 corpus positions)
+ 2021-05-05 22:36:53,361 : INFO : adding document #0 to Dictionary(216742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,369 : INFO : built Dictionary(216807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1450 documents (total 14500000 corpus positions)
+ 2021-05-05 22:36:53,414 : INFO : adding document #0 to Dictionary(216807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,422 : INFO : built Dictionary(216841 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1451 documents (total 14510000 corpus positions)
+ 2021-05-05 22:36:53,467 : INFO : adding document #0 to Dictionary(216841 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,478 : INFO : built Dictionary(216913 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1452 documents (total 14520000 corpus positions)
+ 2021-05-05 22:36:53,525 : INFO : adding document #0 to Dictionary(216913 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,533 : INFO : built Dictionary(216979 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1453 documents (total 14530000 corpus positions)
+ 2021-05-05 22:36:53,580 : INFO : adding document #0 to Dictionary(216979 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,591 : INFO : built Dictionary(217083 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1454 documents (total 14540000 corpus positions)
+ 2021-05-05 22:36:53,642 : INFO : adding document #0 to Dictionary(217083 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,649 : INFO : built Dictionary(217181 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1455 documents (total 14550000 corpus positions)
+ 2021-05-05 22:36:53,696 : INFO : adding document #0 to Dictionary(217181 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,705 : INFO : built Dictionary(217752 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1456 documents (total 14560000 corpus positions)
+ 2021-05-05 22:36:53,749 : INFO : adding document #0 to Dictionary(217752 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,757 : INFO : built Dictionary(217844 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1457 documents (total 14570000 corpus positions)
+ 2021-05-05 22:36:53,802 : INFO : adding document #0 to Dictionary(217844 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,810 : INFO : built Dictionary(217895 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1458 documents (total 14580000 corpus positions)
+ 2021-05-05 22:36:53,856 : INFO : adding document #0 to Dictionary(217895 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,864 : INFO : built Dictionary(218024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1459 documents (total 14590000 corpus positions)
+ 2021-05-05 22:36:53,909 : INFO : adding document #0 to Dictionary(218024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,918 : INFO : built Dictionary(218138 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1460 documents (total 14600000 corpus positions)
+ 2021-05-05 22:36:53,965 : INFO : adding document #0 to Dictionary(218138 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:53,973 : INFO : built Dictionary(218183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1461 documents (total 14610000 corpus positions)
+ 2021-05-05 22:36:54,019 : INFO : adding document #0 to Dictionary(218183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,029 : INFO : built Dictionary(218273 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1462 documents (total 14620000 corpus positions)
+ 2021-05-05 22:36:54,079 : INFO : adding document #0 to Dictionary(218273 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,087 : INFO : built Dictionary(218347 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1463 documents (total 14630000 corpus positions)
+ 2021-05-05 22:36:54,131 : INFO : adding document #0 to Dictionary(218347 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,139 : INFO : built Dictionary(218454 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1464 documents (total 14640000 corpus positions)
+ 2021-05-05 22:36:54,182 : INFO : adding document #0 to Dictionary(218454 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,190 : INFO : built Dictionary(218500 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1465 documents (total 14650000 corpus positions)
+ 2021-05-05 22:36:54,234 : INFO : adding document #0 to Dictionary(218500 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,241 : INFO : built Dictionary(218632 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1466 documents (total 14660000 corpus positions)
+ 2021-05-05 22:36:54,285 : INFO : adding document #0 to Dictionary(218632 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,293 : INFO : built Dictionary(218727 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1467 documents (total 14670000 corpus positions)
+ 2021-05-05 22:36:54,339 : INFO : adding document #0 to Dictionary(218727 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,347 : INFO : built Dictionary(218799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1468 documents (total 14680000 corpus positions)
+ 2021-05-05 22:36:54,393 : INFO : adding document #0 to Dictionary(218799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,401 : INFO : built Dictionary(218855 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1469 documents (total 14690000 corpus positions)
+ 2021-05-05 22:36:54,448 : INFO : adding document #0 to Dictionary(218855 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,457 : INFO : built Dictionary(219089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1470 documents (total 14700000 corpus positions)
+ 2021-05-05 22:36:54,500 : INFO : adding document #0 to Dictionary(219089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,507 : INFO : built Dictionary(219139 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1471 documents (total 14710000 corpus positions)
+ 2021-05-05 22:36:54,551 : INFO : adding document #0 to Dictionary(219139 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,560 : INFO : built Dictionary(219502 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1472 documents (total 14720000 corpus positions)
+ 2021-05-05 22:36:54,603 : INFO : adding document #0 to Dictionary(219502 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,611 : INFO : built Dictionary(219663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1473 documents (total 14730000 corpus positions)
+ 2021-05-05 22:36:54,655 : INFO : adding document #0 to Dictionary(219663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,662 : INFO : built Dictionary(219717 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1474 documents (total 14740000 corpus positions)
+ 2021-05-05 22:36:54,705 : INFO : adding document #0 to Dictionary(219717 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,712 : INFO : built Dictionary(219842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1475 documents (total 14750000 corpus positions)
+ 2021-05-05 22:36:54,761 : INFO : adding document #0 to Dictionary(219842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,769 : INFO : built Dictionary(219954 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1476 documents (total 14760000 corpus positions)
+ 2021-05-05 22:36:54,817 : INFO : adding document #0 to Dictionary(219954 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,821 : INFO : built Dictionary(219965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1477 documents (total 14770000 corpus positions)
+ 2021-05-05 22:36:54,865 : INFO : adding document #0 to Dictionary(219965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,876 : INFO : built Dictionary(220238 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1478 documents (total 14780000 corpus positions)
+ 2021-05-05 22:36:54,919 : INFO : adding document #0 to Dictionary(220238 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,926 : INFO : built Dictionary(220300 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1479 documents (total 14790000 corpus positions)
+ 2021-05-05 22:36:54,970 : INFO : adding document #0 to Dictionary(220300 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:54,978 : INFO : built Dictionary(220377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1480 documents (total 14800000 corpus positions)
+ 2021-05-05 22:36:55,023 : INFO : adding document #0 to Dictionary(220377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,031 : INFO : built Dictionary(220486 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1481 documents (total 14810000 corpus positions)
+ 2021-05-05 22:36:55,074 : INFO : adding document #0 to Dictionary(220486 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,082 : INFO : built Dictionary(220560 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1482 documents (total 14820000 corpus positions)
+ 2021-05-05 22:36:55,124 : INFO : adding document #0 to Dictionary(220560 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,132 : INFO : built Dictionary(220628 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1483 documents (total 14830000 corpus positions)
+ 2021-05-05 22:36:55,175 : INFO : adding document #0 to Dictionary(220628 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,183 : INFO : built Dictionary(220730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1484 documents (total 14840000 corpus positions)
+ 2021-05-05 22:36:55,227 : INFO : adding document #0 to Dictionary(220730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,235 : INFO : built Dictionary(220846 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1485 documents (total 14850000 corpus positions)
+ 2021-05-05 22:36:55,283 : INFO : adding document #0 to Dictionary(220846 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,292 : INFO : built Dictionary(220978 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1486 documents (total 14860000 corpus positions)
+ 2021-05-05 22:36:55,336 : INFO : adding document #0 to Dictionary(220978 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,345 : INFO : built Dictionary(221087 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1487 documents (total 14870000 corpus positions)
+ 2021-05-05 22:36:55,388 : INFO : adding document #0 to Dictionary(221087 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,396 : INFO : built Dictionary(221146 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1488 documents (total 14880000 corpus positions)
+ 2021-05-05 22:36:55,439 : INFO : adding document #0 to Dictionary(221146 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,447 : INFO : built Dictionary(221230 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1489 documents (total 14890000 corpus positions)
+ 2021-05-05 22:36:55,491 : INFO : adding document #0 to Dictionary(221230 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,500 : INFO : built Dictionary(221311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1490 documents (total 14900000 corpus positions)
+ 2021-05-05 22:36:55,544 : INFO : adding document #0 to Dictionary(221311 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,552 : INFO : built Dictionary(221530 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1491 documents (total 14910000 corpus positions)
+ 2021-05-05 22:36:55,596 : INFO : adding document #0 to Dictionary(221530 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,604 : INFO : built Dictionary(221586 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1492 documents (total 14920000 corpus positions)
+ 2021-05-05 22:36:55,647 : INFO : adding document #0 to Dictionary(221586 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,655 : INFO : built Dictionary(221709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1493 documents (total 14930000 corpus positions)
+ 2021-05-05 22:36:55,703 : INFO : adding document #0 to Dictionary(221709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,711 : INFO : built Dictionary(221782 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1494 documents (total 14940000 corpus positions)
+ 2021-05-05 22:36:55,756 : INFO : adding document #0 to Dictionary(221782 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,764 : INFO : built Dictionary(221836 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1495 documents (total 14950000 corpus positions)
+ 2021-05-05 22:36:55,808 : INFO : adding document #0 to Dictionary(221836 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,815 : INFO : built Dictionary(221913 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1496 documents (total 14960000 corpus positions)
+ 2021-05-05 22:36:55,860 : INFO : adding document #0 to Dictionary(221913 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,870 : INFO : built Dictionary(222298 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1497 documents (total 14970000 corpus positions)
+ 2021-05-05 22:36:55,914 : INFO : adding document #0 to Dictionary(222298 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,923 : INFO : built Dictionary(222375 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1498 documents (total 14980000 corpus positions)
+ 2021-05-05 22:36:55,977 : INFO : adding document #0 to Dictionary(222375 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:55,985 : INFO : built Dictionary(222500 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1499 documents (total 14990000 corpus positions)
+ 2021-05-05 22:36:56,032 : INFO : adding document #0 to Dictionary(222500 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,041 : INFO : built Dictionary(222611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1500 documents (total 15000000 corpus positions)
+ 2021-05-05 22:36:56,086 : INFO : adding document #0 to Dictionary(222611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,094 : INFO : built Dictionary(222684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1501 documents (total 15010000 corpus positions)
+ 2021-05-05 22:36:56,141 : INFO : adding document #0 to Dictionary(222684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,149 : INFO : built Dictionary(222771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1502 documents (total 15020000 corpus positions)
+ 2021-05-05 22:36:56,194 : INFO : adding document #0 to Dictionary(222771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,202 : INFO : built Dictionary(222857 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1503 documents (total 15030000 corpus positions)
+ 2021-05-05 22:36:56,246 : INFO : adding document #0 to Dictionary(222857 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,255 : INFO : built Dictionary(222977 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1504 documents (total 15040000 corpus positions)
+ 2021-05-05 22:36:56,299 : INFO : adding document #0 to Dictionary(222977 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,307 : INFO : built Dictionary(223049 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1505 documents (total 15050000 corpus positions)
+ 2021-05-05 22:36:56,350 : INFO : adding document #0 to Dictionary(223049 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,358 : INFO : built Dictionary(223121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1506 documents (total 15060000 corpus positions)
+ 2021-05-05 22:36:56,403 : INFO : adding document #0 to Dictionary(223121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,412 : INFO : built Dictionary(223196 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1507 documents (total 15070000 corpus positions)
+ 2021-05-05 22:36:56,456 : INFO : adding document #0 to Dictionary(223196 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,464 : INFO : built Dictionary(223254 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1508 documents (total 15080000 corpus positions)
+ 2021-05-05 22:36:56,508 : INFO : adding document #0 to Dictionary(223254 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,517 : INFO : built Dictionary(223413 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1509 documents (total 15090000 corpus positions)
+ 2021-05-05 22:36:56,565 : INFO : adding document #0 to Dictionary(223413 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,573 : INFO : built Dictionary(223475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1510 documents (total 15100000 corpus positions)
+ 2021-05-05 22:36:56,616 : INFO : adding document #0 to Dictionary(223475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,626 : INFO : built Dictionary(223574 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1511 documents (total 15110000 corpus positions)
+ 2021-05-05 22:36:56,673 : INFO : adding document #0 to Dictionary(223574 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,681 : INFO : built Dictionary(223677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1512 documents (total 15120000 corpus positions)
+ 2021-05-05 22:36:56,725 : INFO : adding document #0 to Dictionary(223677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,733 : INFO : built Dictionary(223704 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1513 documents (total 15130000 corpus positions)
+ 2021-05-05 22:36:56,779 : INFO : adding document #0 to Dictionary(223704 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,787 : INFO : built Dictionary(223751 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1514 documents (total 15140000 corpus positions)
+ 2021-05-05 22:36:56,831 : INFO : adding document #0 to Dictionary(223751 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,840 : INFO : built Dictionary(223799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1515 documents (total 15150000 corpus positions)
+ 2021-05-05 22:36:56,886 : INFO : adding document #0 to Dictionary(223799 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,894 : INFO : built Dictionary(223958 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1516 documents (total 15160000 corpus positions)
+ 2021-05-05 22:36:56,938 : INFO : adding document #0 to Dictionary(223958 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,946 : INFO : built Dictionary(224046 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1517 documents (total 15170000 corpus positions)
+ 2021-05-05 22:36:56,990 : INFO : adding document #0 to Dictionary(224046 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:56,998 : INFO : built Dictionary(224100 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1518 documents (total 15180000 corpus positions)
+ 2021-05-05 22:36:57,049 : INFO : adding document #0 to Dictionary(224100 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,059 : INFO : built Dictionary(224140 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1519 documents (total 15190000 corpus positions)
+ 2021-05-05 22:36:57,102 : INFO : adding document #0 to Dictionary(224140 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,109 : INFO : built Dictionary(224228 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1520 documents (total 15200000 corpus positions)
+ 2021-05-05 22:36:57,153 : INFO : adding document #0 to Dictionary(224228 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,160 : INFO : built Dictionary(224370 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1521 documents (total 15210000 corpus positions)
+ 2021-05-05 22:36:57,202 : INFO : adding document #0 to Dictionary(224370 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,210 : INFO : built Dictionary(224404 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1522 documents (total 15220000 corpus positions)
+ 2021-05-05 22:36:57,254 : INFO : adding document #0 to Dictionary(224404 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,262 : INFO : built Dictionary(224477 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1523 documents (total 15230000 corpus positions)
+ 2021-05-05 22:36:57,305 : INFO : adding document #0 to Dictionary(224477 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,313 : INFO : built Dictionary(224524 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1524 documents (total 15240000 corpus positions)
+ 2021-05-05 22:36:57,356 : INFO : adding document #0 to Dictionary(224524 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,364 : INFO : built Dictionary(224613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1525 documents (total 15250000 corpus positions)
+ 2021-05-05 22:36:57,407 : INFO : adding document #0 to Dictionary(224613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,415 : INFO : built Dictionary(224657 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1526 documents (total 15260000 corpus positions)
+ 2021-05-05 22:36:57,459 : INFO : adding document #0 to Dictionary(224657 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,466 : INFO : built Dictionary(224728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1527 documents (total 15270000 corpus positions)
+ 2021-05-05 22:36:57,511 : INFO : adding document #0 to Dictionary(224728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,519 : INFO : built Dictionary(224859 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1528 documents (total 15280000 corpus positions)
+ 2021-05-05 22:36:57,563 : INFO : adding document #0 to Dictionary(224859 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,570 : INFO : built Dictionary(224994 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1529 documents (total 15290000 corpus positions)
+ 2021-05-05 22:36:57,613 : INFO : adding document #0 to Dictionary(224994 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,622 : INFO : built Dictionary(225079 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1530 documents (total 15300000 corpus positions)
+ 2021-05-05 22:36:57,665 : INFO : adding document #0 to Dictionary(225079 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,673 : INFO : built Dictionary(225155 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1531 documents (total 15310000 corpus positions)
+ 2021-05-05 22:36:57,716 : INFO : adding document #0 to Dictionary(225155 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,726 : INFO : built Dictionary(225211 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1532 documents (total 15320000 corpus positions)
+ 2021-05-05 22:36:57,771 : INFO : adding document #0 to Dictionary(225211 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,779 : INFO : built Dictionary(225298 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1533 documents (total 15330000 corpus positions)
+ 2021-05-05 22:36:57,823 : INFO : adding document #0 to Dictionary(225298 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,831 : INFO : built Dictionary(225389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1534 documents (total 15340000 corpus positions)
+ 2021-05-05 22:36:57,875 : INFO : adding document #0 to Dictionary(225389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,883 : INFO : built Dictionary(225487 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1535 documents (total 15350000 corpus positions)
+ 2021-05-05 22:36:57,929 : INFO : adding document #0 to Dictionary(225487 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,937 : INFO : built Dictionary(225611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1536 documents (total 15360000 corpus positions)
+ 2021-05-05 22:36:57,981 : INFO : adding document #0 to Dictionary(225611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:57,988 : INFO : built Dictionary(225677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1537 documents (total 15370000 corpus positions)
+ 2021-05-05 22:36:58,032 : INFO : adding document #0 to Dictionary(225677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,041 : INFO : built Dictionary(225812 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1538 documents (total 15380000 corpus positions)
+ 2021-05-05 22:36:58,085 : INFO : adding document #0 to Dictionary(225812 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,093 : INFO : built Dictionary(225881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1539 documents (total 15390000 corpus positions)
+ 2021-05-05 22:36:58,142 : INFO : adding document #0 to Dictionary(225881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,152 : INFO : built Dictionary(225947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1540 documents (total 15400000 corpus positions)
+ 2021-05-05 22:36:58,196 : INFO : adding document #0 to Dictionary(225947 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,203 : INFO : built Dictionary(225988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1541 documents (total 15410000 corpus positions)
+ 2021-05-05 22:36:58,246 : INFO : adding document #0 to Dictionary(225988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,252 : INFO : built Dictionary(226056 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1542 documents (total 15420000 corpus positions)
+ 2021-05-05 22:36:58,296 : INFO : adding document #0 to Dictionary(226056 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,304 : INFO : built Dictionary(226197 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1543 documents (total 15430000 corpus positions)
+ 2021-05-05 22:36:58,348 : INFO : adding document #0 to Dictionary(226197 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,355 : INFO : built Dictionary(226293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1544 documents (total 15440000 corpus positions)
+ 2021-05-05 22:36:58,400 : INFO : adding document #0 to Dictionary(226293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,407 : INFO : built Dictionary(226384 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1545 documents (total 15450000 corpus positions)
+ 2021-05-05 22:36:58,450 : INFO : adding document #0 to Dictionary(226384 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,459 : INFO : built Dictionary(226442 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1546 documents (total 15460000 corpus positions)
+ 2021-05-05 22:36:58,505 : INFO : adding document #0 to Dictionary(226442 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,512 : INFO : built Dictionary(226548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1547 documents (total 15470000 corpus positions)
+ 2021-05-05 22:36:58,555 : INFO : adding document #0 to Dictionary(226548 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,562 : INFO : built Dictionary(226618 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1548 documents (total 15480000 corpus positions)
+ 2021-05-05 22:36:58,606 : INFO : adding document #0 to Dictionary(226618 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,613 : INFO : built Dictionary(226729 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1549 documents (total 15490000 corpus positions)
+ 2021-05-05 22:36:58,656 : INFO : adding document #0 to Dictionary(226729 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,663 : INFO : built Dictionary(226777 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1550 documents (total 15500000 corpus positions)
+ 2021-05-05 22:36:58,712 : INFO : adding document #0 to Dictionary(226777 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,719 : INFO : built Dictionary(226844 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1551 documents (total 15510000 corpus positions)
+ 2021-05-05 22:36:58,762 : INFO : adding document #0 to Dictionary(226844 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,769 : INFO : built Dictionary(226907 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1552 documents (total 15520000 corpus positions)
+ 2021-05-05 22:36:58,813 : INFO : adding document #0 to Dictionary(226907 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,820 : INFO : built Dictionary(226979 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1553 documents (total 15530000 corpus positions)
+ 2021-05-05 22:36:58,864 : INFO : adding document #0 to Dictionary(226979 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,872 : INFO : built Dictionary(227117 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1554 documents (total 15540000 corpus positions)
+ 2021-05-05 22:36:58,920 : INFO : adding document #0 to Dictionary(227117 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,926 : INFO : built Dictionary(227203 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1555 documents (total 15550000 corpus positions)
+ 2021-05-05 22:36:58,970 : INFO : adding document #0 to Dictionary(227203 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:58,978 : INFO : built Dictionary(227302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1556 documents (total 15560000 corpus positions)
+ 2021-05-05 22:36:59,025 : INFO : adding document #0 to Dictionary(227302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,034 : INFO : built Dictionary(227343 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1557 documents (total 15570000 corpus positions)
+ 2021-05-05 22:36:59,078 : INFO : adding document #0 to Dictionary(227343 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,087 : INFO : built Dictionary(227388 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1558 documents (total 15580000 corpus positions)
+ 2021-05-05 22:36:59,131 : INFO : adding document #0 to Dictionary(227388 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,139 : INFO : built Dictionary(227443 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1559 documents (total 15590000 corpus positions)
+ 2021-05-05 22:36:59,186 : INFO : adding document #0 to Dictionary(227443 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,193 : INFO : built Dictionary(227511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1560 documents (total 15600000 corpus positions)
+ 2021-05-05 22:36:59,241 : INFO : adding document #0 to Dictionary(227511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,250 : INFO : built Dictionary(227601 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1561 documents (total 15610000 corpus positions)
+ 2021-05-05 22:36:59,295 : INFO : adding document #0 to Dictionary(227601 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,303 : INFO : built Dictionary(227694 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1562 documents (total 15620000 corpus positions)
+ 2021-05-05 22:36:59,348 : INFO : adding document #0 to Dictionary(227694 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,356 : INFO : built Dictionary(227766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1563 documents (total 15630000 corpus positions)
+ 2021-05-05 22:36:59,399 : INFO : adding document #0 to Dictionary(227766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,408 : INFO : built Dictionary(227861 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1564 documents (total 15640000 corpus positions)
+ 2021-05-05 22:36:59,452 : INFO : adding document #0 to Dictionary(227861 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,460 : INFO : built Dictionary(227972 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1565 documents (total 15650000 corpus positions)
+ 2021-05-05 22:36:59,503 : INFO : adding document #0 to Dictionary(227972 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,511 : INFO : built Dictionary(228037 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1566 documents (total 15660000 corpus positions)
+ 2021-05-05 22:36:59,555 : INFO : adding document #0 to Dictionary(228037 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,563 : INFO : built Dictionary(228148 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1567 documents (total 15670000 corpus positions)
+ 2021-05-05 22:36:59,606 : INFO : adding document #0 to Dictionary(228148 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,616 : INFO : built Dictionary(228314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1568 documents (total 15680000 corpus positions)
+ 2021-05-05 22:36:59,658 : INFO : adding document #0 to Dictionary(228314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,665 : INFO : built Dictionary(228358 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1569 documents (total 15690000 corpus positions)
+ 2021-05-05 22:36:59,709 : INFO : adding document #0 to Dictionary(228358 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,717 : INFO : built Dictionary(228433 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1570 documents (total 15700000 corpus positions)
+ 2021-05-05 22:36:59,764 : INFO : adding document #0 to Dictionary(228433 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,772 : INFO : built Dictionary(228492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1571 documents (total 15710000 corpus positions)
+ 2021-05-05 22:36:59,816 : INFO : adding document #0 to Dictionary(228492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,824 : INFO : built Dictionary(228556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1572 documents (total 15720000 corpus positions)
+ 2021-05-05 22:36:59,867 : INFO : adding document #0 to Dictionary(228556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,875 : INFO : built Dictionary(228620 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1573 documents (total 15730000 corpus positions)
+ 2021-05-05 22:36:59,925 : INFO : adding document #0 to Dictionary(228620 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,933 : INFO : built Dictionary(228677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1574 documents (total 15740000 corpus positions)
+ 2021-05-05 22:36:59,981 : INFO : adding document #0 to Dictionary(228677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:36:59,989 : INFO : built Dictionary(228771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1575 documents (total 15750000 corpus positions)
+ 2021-05-05 22:37:00,035 : INFO : adding document #0 to Dictionary(228771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,043 : INFO : built Dictionary(228834 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1576 documents (total 15760000 corpus positions)
+ 2021-05-05 22:37:00,087 : INFO : adding document #0 to Dictionary(228834 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,096 : INFO : built Dictionary(228891 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1577 documents (total 15770000 corpus positions)
+ 2021-05-05 22:37:00,140 : INFO : adding document #0 to Dictionary(228891 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,148 : INFO : built Dictionary(228984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1578 documents (total 15780000 corpus positions)
+ 2021-05-05 22:37:00,193 : INFO : adding document #0 to Dictionary(228984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,201 : INFO : built Dictionary(229022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1579 documents (total 15790000 corpus positions)
+ 2021-05-05 22:37:00,249 : INFO : adding document #0 to Dictionary(229022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,257 : INFO : built Dictionary(229076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1580 documents (total 15800000 corpus positions)
+ 2021-05-05 22:37:00,301 : INFO : adding document #0 to Dictionary(229076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,309 : INFO : built Dictionary(229136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1581 documents (total 15810000 corpus positions)
+ 2021-05-05 22:37:00,355 : INFO : adding document #0 to Dictionary(229136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,363 : INFO : built Dictionary(229212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1582 documents (total 15820000 corpus positions)
+ 2021-05-05 22:37:00,405 : INFO : adding document #0 to Dictionary(229212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,413 : INFO : built Dictionary(229281 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1583 documents (total 15830000 corpus positions)
+ 2021-05-05 22:37:00,459 : INFO : adding document #0 to Dictionary(229281 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,469 : INFO : built Dictionary(229356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1584 documents (total 15840000 corpus positions)
+ 2021-05-05 22:37:00,515 : INFO : adding document #0 to Dictionary(229356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,524 : INFO : built Dictionary(229427 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1585 documents (total 15850000 corpus positions)
+ 2021-05-05 22:37:00,568 : INFO : adding document #0 to Dictionary(229427 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,575 : INFO : built Dictionary(229466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1586 documents (total 15860000 corpus positions)
+ 2021-05-05 22:37:00,619 : INFO : adding document #0 to Dictionary(229466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,627 : INFO : built Dictionary(229520 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1587 documents (total 15870000 corpus positions)
+ 2021-05-05 22:37:00,672 : INFO : adding document #0 to Dictionary(229520 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,681 : INFO : built Dictionary(229613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1588 documents (total 15880000 corpus positions)
+ 2021-05-05 22:37:00,725 : INFO : adding document #0 to Dictionary(229613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,733 : INFO : built Dictionary(229697 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1589 documents (total 15890000 corpus positions)
+ 2021-05-05 22:37:00,776 : INFO : adding document #0 to Dictionary(229697 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,784 : INFO : built Dictionary(229759 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1590 documents (total 15900000 corpus positions)
+ 2021-05-05 22:37:00,827 : INFO : adding document #0 to Dictionary(229759 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,839 : INFO : built Dictionary(229872 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1591 documents (total 15910000 corpus positions)
+ 2021-05-05 22:37:00,884 : INFO : adding document #0 to Dictionary(229872 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,892 : INFO : built Dictionary(229955 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1592 documents (total 15920000 corpus positions)
+ 2021-05-05 22:37:00,936 : INFO : adding document #0 to Dictionary(229955 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,944 : INFO : built Dictionary(230050 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1593 documents (total 15930000 corpus positions)
+ 2021-05-05 22:37:00,989 : INFO : adding document #0 to Dictionary(230050 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:00,996 : INFO : built Dictionary(230114 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1594 documents (total 15940000 corpus positions)
+ 2021-05-05 22:37:01,048 : INFO : adding document #0 to Dictionary(230114 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,058 : INFO : built Dictionary(230202 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1595 documents (total 15950000 corpus positions)
+ 2021-05-05 22:37:01,101 : INFO : adding document #0 to Dictionary(230202 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,109 : INFO : built Dictionary(230258 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1596 documents (total 15960000 corpus positions)
+ 2021-05-05 22:37:01,156 : INFO : adding document #0 to Dictionary(230258 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,164 : INFO : built Dictionary(230350 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1597 documents (total 15970000 corpus positions)
+ 2021-05-05 22:37:01,207 : INFO : adding document #0 to Dictionary(230350 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,216 : INFO : built Dictionary(230430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1598 documents (total 15980000 corpus positions)
+ 2021-05-05 22:37:01,259 : INFO : adding document #0 to Dictionary(230430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,267 : INFO : built Dictionary(230552 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1599 documents (total 15990000 corpus positions)
+ 2021-05-05 22:37:01,312 : INFO : adding document #0 to Dictionary(230552 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,320 : INFO : built Dictionary(230612 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1600 documents (total 16000000 corpus positions)
+ 2021-05-05 22:37:01,362 : INFO : adding document #0 to Dictionary(230612 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,370 : INFO : built Dictionary(230649 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1601 documents (total 16010000 corpus positions)
+ 2021-05-05 22:37:01,414 : INFO : adding document #0 to Dictionary(230649 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,422 : INFO : built Dictionary(230711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1602 documents (total 16020000 corpus positions)
+ 2021-05-05 22:37:01,465 : INFO : adding document #0 to Dictionary(230711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,476 : INFO : built Dictionary(230832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1603 documents (total 16030000 corpus positions)
+ 2021-05-05 22:37:01,519 : INFO : adding document #0 to Dictionary(230832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,526 : INFO : built Dictionary(230847 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1604 documents (total 16040000 corpus positions)
+ 2021-05-05 22:37:01,570 : INFO : adding document #0 to Dictionary(230847 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,579 : INFO : built Dictionary(230953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1605 documents (total 16050000 corpus positions)
+ 2021-05-05 22:37:01,622 : INFO : adding document #0 to Dictionary(230953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,630 : INFO : built Dictionary(231024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1606 documents (total 16060000 corpus positions)
+ 2021-05-05 22:37:01,673 : INFO : adding document #0 to Dictionary(231024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,683 : INFO : built Dictionary(231054 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1607 documents (total 16070000 corpus positions)
+ 2021-05-05 22:37:01,726 : INFO : adding document #0 to Dictionary(231054 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,734 : INFO : built Dictionary(231083 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1608 documents (total 16080000 corpus positions)
+ 2021-05-05 22:37:01,777 : INFO : adding document #0 to Dictionary(231083 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,785 : INFO : built Dictionary(231150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1609 documents (total 16090000 corpus positions)
+ 2021-05-05 22:37:01,830 : INFO : adding document #0 to Dictionary(231150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,839 : INFO : built Dictionary(231292 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1610 documents (total 16100000 corpus positions)
+ 2021-05-05 22:37:01,881 : INFO : adding document #0 to Dictionary(231292 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,890 : INFO : built Dictionary(231396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1611 documents (total 16110000 corpus positions)
+ 2021-05-05 22:37:01,933 : INFO : adding document #0 to Dictionary(231396 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,941 : INFO : built Dictionary(231475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1612 documents (total 16120000 corpus positions)
+ 2021-05-05 22:37:01,985 : INFO : adding document #0 to Dictionary(231475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:01,993 : INFO : built Dictionary(231527 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1613 documents (total 16130000 corpus positions)
+ 2021-05-05 22:37:02,037 : INFO : adding document #0 to Dictionary(231527 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,045 : INFO : built Dictionary(231578 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1614 documents (total 16140000 corpus positions)
+ 2021-05-05 22:37:02,089 : INFO : adding document #0 to Dictionary(231578 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,096 : INFO : built Dictionary(231621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1615 documents (total 16150000 corpus positions)
+ 2021-05-05 22:37:02,141 : INFO : adding document #0 to Dictionary(231621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,149 : INFO : built Dictionary(231677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1616 documents (total 16160000 corpus positions)
+ 2021-05-05 22:37:02,193 : INFO : adding document #0 to Dictionary(231677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,201 : INFO : built Dictionary(231794 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1617 documents (total 16170000 corpus positions)
+ 2021-05-05 22:37:02,245 : INFO : adding document #0 to Dictionary(231794 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,253 : INFO : built Dictionary(231892 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1618 documents (total 16180000 corpus positions)
+ 2021-05-05 22:37:02,296 : INFO : adding document #0 to Dictionary(231892 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,304 : INFO : built Dictionary(231983 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1619 documents (total 16190000 corpus positions)
+ 2021-05-05 22:37:02,351 : INFO : adding document #0 to Dictionary(231983 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,361 : INFO : built Dictionary(232099 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1620 documents (total 16200000 corpus positions)
+ 2021-05-05 22:37:02,404 : INFO : adding document #0 to Dictionary(232099 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,412 : INFO : built Dictionary(232182 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1621 documents (total 16210000 corpus positions)
+ 2021-05-05 22:37:02,456 : INFO : adding document #0 to Dictionary(232182 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,463 : INFO : built Dictionary(232252 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1622 documents (total 16220000 corpus positions)
+ 2021-05-05 22:37:02,507 : INFO : adding document #0 to Dictionary(232252 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,515 : INFO : built Dictionary(232329 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1623 documents (total 16230000 corpus positions)
+ 2021-05-05 22:37:02,558 : INFO : adding document #0 to Dictionary(232329 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,567 : INFO : built Dictionary(232478 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1624 documents (total 16240000 corpus positions)
+ 2021-05-05 22:37:02,615 : INFO : adding document #0 to Dictionary(232478 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,622 : INFO : built Dictionary(232516 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1625 documents (total 16250000 corpus positions)
+ 2021-05-05 22:37:02,665 : INFO : adding document #0 to Dictionary(232516 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,673 : INFO : built Dictionary(232568 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1626 documents (total 16260000 corpus positions)
+ 2021-05-05 22:37:02,717 : INFO : adding document #0 to Dictionary(232568 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,725 : INFO : built Dictionary(232632 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1627 documents (total 16270000 corpus positions)
+ 2021-05-05 22:37:02,769 : INFO : adding document #0 to Dictionary(232632 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,777 : INFO : built Dictionary(232672 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1628 documents (total 16280000 corpus positions)
+ 2021-05-05 22:37:02,821 : INFO : adding document #0 to Dictionary(232672 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,829 : INFO : built Dictionary(232713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1629 documents (total 16290000 corpus positions)
+ 2021-05-05 22:37:02,875 : INFO : adding document #0 to Dictionary(232713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,883 : INFO : built Dictionary(232844 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1630 documents (total 16300000 corpus positions)
+ 2021-05-05 22:37:02,931 : INFO : adding document #0 to Dictionary(232844 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,940 : INFO : built Dictionary(232908 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1631 documents (total 16310000 corpus positions)
+ 2021-05-05 22:37:02,983 : INFO : adding document #0 to Dictionary(232908 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:02,990 : INFO : built Dictionary(232962 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1632 documents (total 16320000 corpus positions)
+ 2021-05-05 22:37:03,038 : INFO : adding document #0 to Dictionary(232962 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,046 : INFO : built Dictionary(233022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1633 documents (total 16330000 corpus positions)
+ 2021-05-05 22:37:03,090 : INFO : adding document #0 to Dictionary(233022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,097 : INFO : built Dictionary(233076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1634 documents (total 16340000 corpus positions)
+ 2021-05-05 22:37:03,141 : INFO : adding document #0 to Dictionary(233076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,149 : INFO : built Dictionary(233211 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1635 documents (total 16350000 corpus positions)
+ 2021-05-05 22:37:03,193 : INFO : adding document #0 to Dictionary(233211 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,201 : INFO : built Dictionary(233276 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1636 documents (total 16360000 corpus positions)
+ 2021-05-05 22:37:03,246 : INFO : adding document #0 to Dictionary(233276 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,255 : INFO : built Dictionary(233413 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1637 documents (total 16370000 corpus positions)
+ 2021-05-05 22:37:03,301 : INFO : adding document #0 to Dictionary(233413 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,310 : INFO : built Dictionary(233476 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1638 documents (total 16380000 corpus positions)
+ 2021-05-05 22:37:03,354 : INFO : adding document #0 to Dictionary(233476 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,363 : INFO : built Dictionary(233582 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1639 documents (total 16390000 corpus positions)
+ 2021-05-05 22:37:03,410 : INFO : adding document #0 to Dictionary(233582 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,419 : INFO : built Dictionary(233673 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1640 documents (total 16400000 corpus positions)
+ 2021-05-05 22:37:03,463 : INFO : adding document #0 to Dictionary(233673 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,470 : INFO : built Dictionary(233772 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1641 documents (total 16410000 corpus positions)
+ 2021-05-05 22:37:03,515 : INFO : adding document #0 to Dictionary(233772 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,524 : INFO : built Dictionary(233858 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1642 documents (total 16420000 corpus positions)
+ 2021-05-05 22:37:03,573 : INFO : adding document #0 to Dictionary(233858 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,581 : INFO : built Dictionary(233930 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1643 documents (total 16430000 corpus positions)
+ 2021-05-05 22:37:03,625 : INFO : adding document #0 to Dictionary(233930 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,632 : INFO : built Dictionary(233985 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1644 documents (total 16440000 corpus positions)
+ 2021-05-05 22:37:03,677 : INFO : adding document #0 to Dictionary(233985 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,686 : INFO : built Dictionary(234073 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1645 documents (total 16450000 corpus positions)
+ 2021-05-05 22:37:03,735 : INFO : adding document #0 to Dictionary(234073 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,744 : INFO : built Dictionary(234141 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1646 documents (total 16460000 corpus positions)
+ 2021-05-05 22:37:03,791 : INFO : adding document #0 to Dictionary(234141 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,798 : INFO : built Dictionary(234193 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1647 documents (total 16470000 corpus positions)
+ 2021-05-05 22:37:03,848 : INFO : adding document #0 to Dictionary(234193 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,855 : INFO : built Dictionary(234268 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1648 documents (total 16480000 corpus positions)
+ 2021-05-05 22:37:03,899 : INFO : adding document #0 to Dictionary(234268 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,908 : INFO : built Dictionary(234367 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1649 documents (total 16490000 corpus positions)
+ 2021-05-05 22:37:03,952 : INFO : adding document #0 to Dictionary(234367 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:03,959 : INFO : built Dictionary(234435 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1650 documents (total 16500000 corpus positions)
+ 2021-05-05 22:37:04,003 : INFO : adding document #0 to Dictionary(234435 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,012 : INFO : built Dictionary(234530 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1651 documents (total 16510000 corpus positions)
+ 2021-05-05 22:37:04,058 : INFO : adding document #0 to Dictionary(234530 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,066 : INFO : built Dictionary(234598 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1652 documents (total 16520000 corpus positions)
+ 2021-05-05 22:37:04,110 : INFO : adding document #0 to Dictionary(234598 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,119 : INFO : built Dictionary(234672 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1653 documents (total 16530000 corpus positions)
+ 2021-05-05 22:37:04,163 : INFO : adding document #0 to Dictionary(234672 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,172 : INFO : built Dictionary(234770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1654 documents (total 16540000 corpus positions)
+ 2021-05-05 22:37:04,215 : INFO : adding document #0 to Dictionary(234770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,224 : INFO : built Dictionary(234855 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1655 documents (total 16550000 corpus positions)
+ 2021-05-05 22:37:04,266 : INFO : adding document #0 to Dictionary(234855 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,274 : INFO : built Dictionary(234937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1656 documents (total 16560000 corpus positions)
+ 2021-05-05 22:37:04,317 : INFO : adding document #0 to Dictionary(234937 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,326 : INFO : built Dictionary(235036 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1657 documents (total 16570000 corpus positions)
+ 2021-05-05 22:37:04,369 : INFO : adding document #0 to Dictionary(235036 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,376 : INFO : built Dictionary(235093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1658 documents (total 16580000 corpus positions)
+ 2021-05-05 22:37:04,423 : INFO : adding document #0 to Dictionary(235093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,431 : INFO : built Dictionary(235150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1659 documents (total 16590000 corpus positions)
+ 2021-05-05 22:37:04,492 : INFO : adding document #0 to Dictionary(235150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,501 : INFO : built Dictionary(235224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1660 documents (total 16600000 corpus positions)
+ 2021-05-05 22:37:04,553 : INFO : adding document #0 to Dictionary(235224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,561 : INFO : built Dictionary(235268 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1661 documents (total 16610000 corpus positions)
+ 2021-05-05 22:37:04,607 : INFO : adding document #0 to Dictionary(235268 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,615 : INFO : built Dictionary(235318 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1662 documents (total 16620000 corpus positions)
+ 2021-05-05 22:37:04,663 : INFO : adding document #0 to Dictionary(235318 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,671 : INFO : built Dictionary(235392 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1663 documents (total 16630000 corpus positions)
+ 2021-05-05 22:37:04,718 : INFO : adding document #0 to Dictionary(235392 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,725 : INFO : built Dictionary(235471 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1664 documents (total 16640000 corpus positions)
+ 2021-05-05 22:37:04,770 : INFO : adding document #0 to Dictionary(235471 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,779 : INFO : built Dictionary(235517 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1665 documents (total 16650000 corpus positions)
+ 2021-05-05 22:37:04,828 : INFO : adding document #0 to Dictionary(235517 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,840 : INFO : built Dictionary(235654 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1666 documents (total 16660000 corpus positions)
+ 2021-05-05 22:37:04,914 : INFO : adding document #0 to Dictionary(235654 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:04,928 : INFO : built Dictionary(235778 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1667 documents (total 16670000 corpus positions)
+ 2021-05-05 22:37:04,998 : INFO : adding document #0 to Dictionary(235778 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,008 : INFO : built Dictionary(235926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1668 documents (total 16680000 corpus positions)
+ 2021-05-05 22:37:05,058 : INFO : adding document #0 to Dictionary(235926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,068 : INFO : built Dictionary(236046 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1669 documents (total 16690000 corpus positions)
+ 2021-05-05 22:37:05,128 : INFO : adding document #0 to Dictionary(236046 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,147 : INFO : built Dictionary(236267 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1670 documents (total 16700000 corpus positions)
+ 2021-05-05 22:37:05,218 : INFO : adding document #0 to Dictionary(236267 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,235 : INFO : built Dictionary(236350 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1671 documents (total 16710000 corpus positions)
+ 2021-05-05 22:37:05,349 : INFO : adding document #0 to Dictionary(236350 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,368 : INFO : built Dictionary(236474 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1672 documents (total 16720000 corpus positions)
+ 2021-05-05 22:37:05,436 : INFO : adding document #0 to Dictionary(236474 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,444 : INFO : built Dictionary(236554 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1673 documents (total 16730000 corpus positions)
+ 2021-05-05 22:37:05,494 : INFO : adding document #0 to Dictionary(236554 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,505 : INFO : built Dictionary(236726 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1674 documents (total 16740000 corpus positions)
+ 2021-05-05 22:37:05,554 : INFO : adding document #0 to Dictionary(236726 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,566 : INFO : built Dictionary(236813 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1675 documents (total 16750000 corpus positions)
+ 2021-05-05 22:37:05,618 : INFO : adding document #0 to Dictionary(236813 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,627 : INFO : built Dictionary(236859 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1676 documents (total 16760000 corpus positions)
+ 2021-05-05 22:37:05,677 : INFO : adding document #0 to Dictionary(236859 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,686 : INFO : built Dictionary(236938 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1677 documents (total 16770000 corpus positions)
+ 2021-05-05 22:37:05,731 : INFO : adding document #0 to Dictionary(236938 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,740 : INFO : built Dictionary(237039 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1678 documents (total 16780000 corpus positions)
+ 2021-05-05 22:37:05,790 : INFO : adding document #0 to Dictionary(237039 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,799 : INFO : built Dictionary(237124 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1679 documents (total 16790000 corpus positions)
+ 2021-05-05 22:37:05,851 : INFO : adding document #0 to Dictionary(237124 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,860 : INFO : built Dictionary(237163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1680 documents (total 16800000 corpus positions)
+ 2021-05-05 22:37:05,910 : INFO : adding document #0 to Dictionary(237163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,918 : INFO : built Dictionary(237228 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1681 documents (total 16810000 corpus positions)
+ 2021-05-05 22:37:05,966 : INFO : adding document #0 to Dictionary(237228 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:05,976 : INFO : built Dictionary(237319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1682 documents (total 16820000 corpus positions)
+ 2021-05-05 22:37:06,021 : INFO : adding document #0 to Dictionary(237319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,030 : INFO : built Dictionary(237365 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1683 documents (total 16830000 corpus positions)
+ 2021-05-05 22:37:06,073 : INFO : adding document #0 to Dictionary(237365 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,082 : INFO : built Dictionary(237475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1684 documents (total 16840000 corpus positions)
+ 2021-05-05 22:37:06,132 : INFO : adding document #0 to Dictionary(237475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,142 : INFO : built Dictionary(237547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1685 documents (total 16850000 corpus positions)
+ 2021-05-05 22:37:06,190 : INFO : adding document #0 to Dictionary(237547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,199 : INFO : built Dictionary(237604 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1686 documents (total 16860000 corpus positions)
+ 2021-05-05 22:37:06,244 : INFO : adding document #0 to Dictionary(237604 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,252 : INFO : built Dictionary(237692 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1687 documents (total 16870000 corpus positions)
+ 2021-05-05 22:37:06,300 : INFO : adding document #0 to Dictionary(237692 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,310 : INFO : built Dictionary(237745 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1688 documents (total 16880000 corpus positions)
+ 2021-05-05 22:37:06,360 : INFO : adding document #0 to Dictionary(237745 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,373 : INFO : built Dictionary(237824 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1689 documents (total 16890000 corpus positions)
+ 2021-05-05 22:37:06,435 : INFO : adding document #0 to Dictionary(237824 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,444 : INFO : built Dictionary(237889 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1690 documents (total 16900000 corpus positions)
+ 2021-05-05 22:37:06,499 : INFO : adding document #0 to Dictionary(237889 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,508 : INFO : built Dictionary(237942 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1691 documents (total 16910000 corpus positions)
+ 2021-05-05 22:37:06,558 : INFO : adding document #0 to Dictionary(237942 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,567 : INFO : built Dictionary(237985 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1692 documents (total 16920000 corpus positions)
+ 2021-05-05 22:37:06,613 : INFO : adding document #0 to Dictionary(237985 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,623 : INFO : built Dictionary(238081 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1693 documents (total 16930000 corpus positions)
+ 2021-05-05 22:37:06,672 : INFO : adding document #0 to Dictionary(238081 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,682 : INFO : built Dictionary(238181 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1694 documents (total 16940000 corpus positions)
+ 2021-05-05 22:37:06,731 : INFO : adding document #0 to Dictionary(238181 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,742 : INFO : built Dictionary(238244 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1695 documents (total 16950000 corpus positions)
+ 2021-05-05 22:37:06,806 : INFO : adding document #0 to Dictionary(238244 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,817 : INFO : built Dictionary(238315 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1696 documents (total 16960000 corpus positions)
+ 2021-05-05 22:37:06,876 : INFO : adding document #0 to Dictionary(238315 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,888 : INFO : built Dictionary(238375 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1697 documents (total 16970000 corpus positions)
+ 2021-05-05 22:37:06,948 : INFO : adding document #0 to Dictionary(238375 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:06,959 : INFO : built Dictionary(238425 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1698 documents (total 16980000 corpus positions)
+ 2021-05-05 22:37:07,023 : INFO : adding document #0 to Dictionary(238425 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:07,034 : INFO : built Dictionary(238478 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1699 documents (total 16990000 corpus positions)
+ 2021-05-05 22:37:07,088 : INFO : adding document #0 to Dictionary(238478 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:07,098 : INFO : built Dictionary(238524 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1700 documents (total 17000000 corpus positions)
+ 2021-05-05 22:37:07,125 : INFO : adding document #0 to Dictionary(238524 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
+ 2021-05-05 22:37:07,134 : INFO : built Dictionary(238542 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1701 documents (total 17005207 corpus positions)
+ 2021-05-05 22:37:07,538 : INFO : discarding 218466 tokens: [('a', 1701), ('ability', 934), ('able', 1202), ('about', 1687), ('above', 1327), ('abstention', 13), ('accepted', 945), ('according', 1468), ('account', 1113), ('act', 1312)]...
+ 2021-05-05 22:37:07,538 : INFO : keeping 20076 tokens which were in no less than 20 and no more than 850 (=50.0%) documents
+ 2021-05-05 22:37:07,649 : INFO : resulting dictionary: Dictionary(20076 unique tokens: ['abacus', 'abnormal', 'abolished', 'abolition', 'absence']...)
+
+
+
+
+Training
+--------
+
+Training the ensemble works very similar to training a single model,
+
+You can use any model that is based on LdaModel, such as LdaMulticore, to train the Ensemble.
+In experiments, LdaMulticore showed better results.
+
+
+
+.. code-block:: default
+
+
+ from gensim.models import LdaModel
+ topic_model_class = LdaModel
+
+
+
+
+
+
+
+
+Any arbitrary number of models can be used, but it should be a multiple of your workers so that the
+load can be distributed properly. In this example, 4 processes will train 8 models each.
+
+
+
+.. code-block:: default
+
+
+ ensemble_workers = 4
+ num_models = 8
+
+
+
+
+
+
+
+
+After training all the models, some distance computations are required which can take quite some
+time as well. You can speed this up by using workers for that as well.
+
+
+
+.. code-block:: default
+
+
+ distance_workers = 4
+
+
+
+
+
+
+
+
+All other parameters that are unknown to EnsembleLda are forwarded to each LDA Model, such as
+
+
+
+.. code-block:: default
+
+ num_topics = 20
+ passes = 2
+
+
+
+
+
+
+
+
+Now start the training
+
+Since 20 topics were trained on each of the 8 models, we expect there to be 160 different topics.
+The number of stable topics which are clustered from all those topics is smaller.
+
+
+
+.. code-block:: default
+
+
+ from gensim.models import EnsembleLda
+ ensemble = EnsembleLda(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=num_topics,
+ passes=passes,
+ num_models=num_models,
+ topic_model_class=LdaModel,
+ ensemble_workers=ensemble_workers,
+ distance_workers=distance_workers
+ )
+
+ print(len(ensemble.ttda))
+ print(len(ensemble.get_topics()))
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+ 2021-05-05 22:37:17,535 : INFO : generating 8 topic models...
+ 2021-05-05 22:41:11,338 : INFO : generating a 160 x 160 asymmetric distance matrix...
+ 2021-05-05 22:41:13,465 : INFO : fitting the clustering model, using 4 for min_samples
+ 2021-05-05 22:41:13,529 : INFO : generating stable topics, using 3 for min_cores
+ 2021-05-05 22:41:13,530 : INFO : found 3 clusters
+ 2021-05-05 22:41:13,579 : INFO : found 1 stable topics
+ 2021-05-05 22:41:13,584 : INFO : generating classic gensim model representation based on results from the ensemble
+ 2021-05-05 22:41:14,020 : INFO : using symmetric alpha at 1.0
+ 2021-05-05 22:41:14,020 : INFO : using symmetric eta at 1.0
+ 2021-05-05 22:41:14,025 : INFO : using serial LDA version on this node
+ 2021-05-05 22:41:14,028 : INFO : running online (multi-pass) LDA training, 1 topics, 0 passes over the supplied corpus of 1701 documents, updating model once every 1701 documents, evaluating perplexity every 1701 documents, iterating 50x with a convergence threshold of 0.001000
+ 2021-05-05 22:41:14,028 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
+ 2021-05-05 22:41:14,039 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel(num_terms=20076, num_topics=1, decay=0.5, chunksize=2000) in 0.00s', 'datetime': '2021-05-05T22:41:14.029108', 'gensim': '4.1.0.dev0', 'python': '3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]', 'platform': 'Darwin-15.6.0-x86_64-i386-64bit', 'event': 'created'}
+ 160
+ 1
+
+
+
+
+Tuning
+------
+
+Different from LdaModel, the number of resulting topics varies greatly depending on the clustering parameters.
+
+You can provide those in the ``recluster()`` function or the ``EnsembleLda`` constructor.
+
+Play around until you get as many topics as you desire, which however may reduce their quality.
+If your ensemble doesn't have enough topics to begin with, you should make sure to make it large enough.
+
+Having an epsilon that is smaller than the smallest distance doesn't make sense.
+Make sure to chose one that is within the range of values in ``asymmetric_distance_matrix``.
+
+
+
+.. code-block:: default
+
+
+ import numpy as np
+ shape = ensemble.asymmetric_distance_matrix.shape
+ without_diagonal = ensemble.asymmetric_distance_matrix[~np.eye(shape[0], dtype=bool)].reshape(shape[0], -1)
+ print(without_diagonal.min(), without_diagonal.mean(), without_diagonal.max())
+
+ ensemble.recluster(eps=0.09, min_samples=2, min_cores=2)
+
+ print(len(ensemble.get_topics()))
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+ 0.006134824668422967 0.03832085602841112 0.20069650782257142
+ 2021-05-05 22:41:14,160 : INFO : fitting the clustering model
+ 2021-05-05 22:41:14,220 : INFO : generating stable topics
+ 2021-05-05 22:41:14,221 : INFO : found 3 clusters
+ 2021-05-05 22:41:14,299 : INFO : found 1 stable topics
+ 2021-05-05 22:41:14,304 : INFO : generating classic gensim model representation based on results from the ensemble
+ 2021-05-05 22:41:14,309 : INFO : using symmetric alpha at 1.0
+ 2021-05-05 22:41:14,310 : INFO : using symmetric eta at 1.0
+ 2021-05-05 22:41:14,314 : INFO : using serial LDA version on this node
+ 2021-05-05 22:41:14,317 : INFO : running online (multi-pass) LDA training, 1 topics, 0 passes over the supplied corpus of 1701 documents, updating model once every 1701 documents, evaluating perplexity every 1701 documents, iterating 50x with a convergence threshold of 0.001000
+ 2021-05-05 22:41:14,317 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
+ 2021-05-05 22:41:14,318 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel(num_terms=20076, num_topics=1, decay=0.5, chunksize=2000) in 0.00s', 'datetime': '2021-05-05T22:41:14.318036', 'gensim': '4.1.0.dev0', 'python': '3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]', 'platform': 'Darwin-15.6.0-x86_64-i386-64bit', 'event': 'created'}
+ 1
+
+
+
+
+Increasing the Size
+-------------------
+
+If you have some models lying around that were trained on a corpus based on the same dictionary,
+they are compatible and you can add them to the ensemble.
+
+By setting num_models of the EnsembleLda constructor to 0 you can also create an ensemble that is
+entirely made out of your existing topic models with the following method.
+
+Afterwards the number and quality of stable topics might be different depending on your added topics and parameters.
+
+
+
+.. code-block:: default
+
+
+ from gensim.models import LdaMulticore
+
+ model1 = LdaMulticore(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=9,
+ passes=4,
+ )
+
+ model2 = LdaModel(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=11,
+ passes=2,
+ )
+
+ # add_model supports various types of input, check out its docstring
+ ensemble.add_model(model1)
+ ensemble.add_model(model2)
+
+ ensemble.recluster()
+
+ print(len(ensemble.ttda))
+ print(len(ensemble.get_topics()))
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+ 2021-05-05 22:41:14,506 : INFO : using symmetric alpha at 0.1111111111111111
+ 2021-05-05 22:41:14,507 : INFO : using symmetric eta at 0.1111111111111111
+ 2021-05-05 22:41:14,512 : INFO : using serial LDA version on this node
+ 2021-05-05 22:41:14,538 : INFO : running online LDA training, 9 topics, 4 passes over the supplied corpus of 1701 documents, updating every 14000 documents, evaluating every ~1701 documents, iterating 50x with a convergence threshold of 0.001000
+ 2021-05-05 22:41:14,538 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
+ 2021-05-05 22:41:14,542 : INFO : training LDA model using 7 processes
+ 2021-05-05 22:41:14,574 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #1701/1701, outstanding queue size 1
+ 2021-05-05 22:41:26,440 : INFO : topic #5 (0.111): 0.033*"as" + 0.001*"km" + 0.001*"energy" + 0.001*"emperor" + 0.001*"actor" + 0.001*"economy" + 0.001*"china" + 0.001*"league" + 0.001*"bc" + 0.001*"soviet"
+ 2021-05-05 22:41:26,441 : INFO : topic #1 (0.111): 0.030*"as" + 0.001*"km" + 0.001*"est" + 0.001*"y" + 0.001*"minister" + 0.001*"actor" + 0.001*"female" + 0.001*"spanish" + 0.001*"bc" + 0.001*"league"
+ 2021-05-05 22:41:26,441 : INFO : topic #2 (0.111): 0.032*"as" + 0.001*"china" + 0.001*"israel" + 0.001*"india" + 0.001*"band" + 0.001*"software" + 0.001*"energy" + 0.001*"km" + 0.001*"league" + 0.001*"female"
+ 2021-05-05 22:41:26,442 : INFO : topic #3 (0.111): 0.033*"as" + 0.001*"league" + 0.001*"soviet" + 0.001*"software" + 0.001*"chinese" + 0.001*"km" + 0.001*"japanese" + 0.001*"energy" + 0.001*"spanish" + 0.001*"y"
+ 2021-05-05 22:41:26,443 : INFO : topic #7 (0.111): 0.029*"as" + 0.001*"league" + 0.001*"km" + 0.001*"actor" + 0.001*"soviet" + 0.001*"season" + 0.001*"band" + 0.001*"emperor" + 0.001*"jewish" + 0.001*"la"
+ 2021-05-05 22:41:26,443 : INFO : topic diff=1.026745, rho=1.000000
+ 2021-05-05 22:41:46,618 : INFO : -9.132 per-word bound, 561.0 perplexity estimate based on a held-out corpus of 1701 documents with 4222006 words
+ 2021-05-05 22:41:46,619 : INFO : PROGRESS: pass 1, dispatched chunk #0 = documents up to #1701/1701, outstanding queue size 1
+ 2021-05-05 22:41:58,023 : INFO : topic #0 (0.111): 0.038*"as" + 0.002*"jewish" + 0.001*"jesus" + 0.001*"israel" + 0.001*"australia" + 0.001*"y" + 0.001*"km" + 0.001*"software" + 0.001*"bible" + 0.001*"judaism"
+ 2021-05-05 22:41:58,024 : INFO : topic #2 (0.111): 0.032*"as" + 0.001*"israel" + 0.001*"china" + 0.001*"import" + 0.001*"kong" + 0.001*"hong" + 0.001*"india" + 0.001*"apollo" + 0.001*"band" + 0.001*"aircraft"
+ 2021-05-05 22:41:58,024 : INFO : topic #4 (0.111): 0.030*"as" + 0.001*"software" + 0.001*"philosophy" + 0.001*"india" + 0.001*"y" + 0.001*"indian" + 0.001*"lincoln" + 0.001*"dna" + 0.001*"scientific" + 0.001*"bc"
+ 2021-05-05 22:41:58,025 : INFO : topic #6 (0.111): 0.023*"as" + 0.002*"soviet" + 0.002*"km" + 0.002*"est" + 0.001*"economy" + 0.001*"russian" + 0.001*"chinese" + 0.001*"y" + 0.001*"africa" + 0.001*"minister"
+ 2021-05-05 22:41:58,025 : INFO : topic #7 (0.111): 0.028*"as" + 0.002*"league" + 0.002*"actor" + 0.002*"season" + 0.001*"baseball" + 0.001*"band" + 0.001*"singer" + 0.001*"football" + 0.001*"actress" + 0.001*"album"
+ 2021-05-05 22:41:58,026 : INFO : topic diff=0.163646, rho=0.592297
+ 2021-05-05 22:42:18,212 : INFO : -9.061 per-word bound, 534.0 perplexity estimate based on a held-out corpus of 1701 documents with 4222006 words
+ 2021-05-05 22:42:18,213 : INFO : PROGRESS: pass 2, dispatched chunk #0 = documents up to #1701/1701, outstanding queue size 1
+ 2021-05-05 22:42:29,923 : INFO : topic #6 (0.111): 0.024*"as" + 0.003*"soviet" + 0.002*"km" + 0.002*"est" + 0.002*"economy" + 0.001*"russian" + 0.001*"minister" + 0.001*"elected" + 0.001*"iraq" + 0.001*"election"
+ 2021-05-05 22:42:29,924 : INFO : topic #8 (0.111): 0.028*"as" + 0.003*"km" + 0.003*"est" + 0.002*"economy" + 0.002*"africa" + 0.002*"microsoft" + 0.002*"growth" + 0.001*"female" + 0.001*"soviet" + 0.001*"finland"
+ 2021-05-05 22:42:29,924 : INFO : topic #4 (0.111): 0.032*"as" + 0.001*"y" + 0.001*"philosophy" + 0.001*"software" + 0.001*"dna" + 0.001*"india" + 0.001*"energy" + 0.001*"scientific" + 0.001*"lincoln" + 0.001*"frac"
+ 2021-05-05 22:42:29,925 : INFO : topic #1 (0.111): 0.030*"as" + 0.001*"apple" + 0.001*"finalist" + 0.001*"mac" + 0.001*"km" + 0.001*"australian" + 0.001*"cuba" + 0.001*"os" + 0.001*"address" + 0.001*"software"
+ 2021-05-05 22:42:29,926 : INFO : topic #7 (0.111): 0.027*"as" + 0.003*"league" + 0.003*"actor" + 0.002*"season" + 0.002*"baseball" + 0.002*"football" + 0.002*"singer" + 0.002*"band" + 0.002*"actress" + 0.002*"album"
+ 2021-05-05 22:42:29,926 : INFO : topic diff=0.192558, rho=0.509614
+ 2021-05-05 22:42:50,260 : INFO : -9.000 per-word bound, 512.0 perplexity estimate based on a held-out corpus of 1701 documents with 4222006 words
+ 2021-05-05 22:42:50,261 : INFO : PROGRESS: pass 3, dispatched chunk #0 = documents up to #1701/1701, outstanding queue size 1
+ 2021-05-05 22:43:01,567 : INFO : topic #0 (0.111): 0.039*"as" + 0.004*"jewish" + 0.002*"jesus" + 0.002*"israel" + 0.002*"christ" + 0.002*"bible" + 0.002*"judaism" + 0.002*"hebrew" + 0.001*"orthodox" + 0.001*"holy"
+ 2021-05-05 22:43:01,567 : INFO : topic #4 (0.111): 0.032*"as" + 0.002*"y" + 0.002*"philosophy" + 0.001*"energy" + 0.001*"frac" + 0.001*"dna" + 0.001*"scientific" + 0.001*"cell" + 0.001*"software" + 0.001*"evolution"
+ 2021-05-05 22:43:01,568 : INFO : topic #2 (0.111): 0.031*"as" + 0.002*"apollo" + 0.002*"import" + 0.002*"kong" + 0.002*"hong" + 0.001*"aircraft" + 0.001*"china" + 0.001*"moon" + 0.001*"israel" + 0.001*"chinese"
+ 2021-05-05 22:43:01,569 : INFO : topic #6 (0.111): 0.024*"as" + 0.003*"soviet" + 0.003*"km" + 0.003*"est" + 0.002*"economy" + 0.002*"minister" + 0.002*"russian" + 0.002*"elected" + 0.002*"constitution" + 0.002*"election"
+ 2021-05-05 22:43:01,569 : INFO : topic #5 (0.111): 0.033*"as" + 0.003*"emperor" + 0.002*"energy" + 0.001*"engine" + 0.001*"bc" + 0.001*"japanese" + 0.001*"bass" + 0.001*"ford" + 0.001*"imperial" + 0.001*"instrument"
+ 2021-05-05 22:43:01,570 : INFO : topic diff=0.167185, rho=0.454053
+ 2021-05-05 22:43:21,983 : INFO : -8.965 per-word bound, 499.6 perplexity estimate based on a held-out corpus of 1701 documents with 4222006 words
+ 2021-05-05 22:43:22,030 : INFO : LdaMulticore lifecycle event {'msg': 'trained LdaModel(num_terms=20076, num_topics=9, decay=0.5, chunksize=2000) in 127.49s', 'datetime': '2021-05-05T22:43:22.030248', 'gensim': '4.1.0.dev0', 'python': '3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]', 'platform': 'Darwin-15.6.0-x86_64-i386-64bit', 'event': 'created'}
+ 2021-05-05 22:43:22,031 : INFO : using symmetric alpha at 0.09090909090909091
+ 2021-05-05 22:43:22,031 : INFO : using symmetric eta at 0.09090909090909091
+ 2021-05-05 22:43:22,035 : INFO : using serial LDA version on this node
+ 2021-05-05 22:43:22,063 : INFO : running online (multi-pass) LDA training, 11 topics, 2 passes over the supplied corpus of 1701 documents, updating model once every 1701 documents, evaluating perplexity every 1701 documents, iterating 50x with a convergence threshold of 0.001000
+ 2021-05-05 22:43:22,064 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
+ 2021-05-05 22:43:42,751 : INFO : -10.507 per-word bound, 1455.6 perplexity estimate based on a held-out corpus of 1701 documents with 4222006 words
+ 2021-05-05 22:43:42,751 : INFO : PROGRESS: pass 0, at document #1701/1701
+ 2021-05-05 22:43:52,019 : INFO : topic #1 (0.091): 0.026*"as" + 0.001*"league" + 0.001*"km" + 0.001*"india" + 0.001*"band" + 0.001*"software" + 0.001*"soviet" + 0.001*"israel" + 0.001*"jewish" + 0.001*"japanese"
+ 2021-05-05 22:43:52,020 : INFO : topic #7 (0.091): 0.030*"as" + 0.001*"japanese" + 0.001*"km" + 0.001*"soviet" + 0.001*"philosophy" + 0.001*"actor" + 0.001*"bc" + 0.001*"love" + 0.001*"russian" + 0.001*"energy"
+ 2021-05-05 22:43:52,021 : INFO : topic #3 (0.091): 0.023*"as" + 0.001*"km" + 0.001*"league" + 0.001*"bc" + 0.001*"season" + 0.001*"minister" + 0.001*"moon" + 0.001*"actor" + 0.001*"ball" + 0.001*"lincoln"
+ 2021-05-05 22:43:52,021 : INFO : topic #6 (0.091): 0.026*"as" + 0.001*"band" + 0.001*"energy" + 0.001*"software" + 0.001*"km" + 0.001*"minister" + 0.001*"est" + 0.001*"speed" + 0.001*"prime" + 0.001*"elected"
+ 2021-05-05 22:43:52,022 : INFO : topic #4 (0.091): 0.029*"as" + 0.001*"india" + 0.001*"km" + 0.001*"irish" + 0.001*"band" + 0.001*"africa" + 0.001*"jewish" + 0.001*"soviet" + 0.001*"minister" + 0.001*"italian"
+ 2021-05-05 22:43:52,022 : INFO : topic diff=1.054397, rho=1.000000
+ 2021-05-05 22:44:12,699 : INFO : -9.155 per-word bound, 569.9 perplexity estimate based on a held-out corpus of 1701 documents with 4222006 words
+ 2021-05-05 22:44:12,699 : INFO : PROGRESS: pass 1, at document #1701/1701
+ 2021-05-05 22:44:22,190 : INFO : topic #5 (0.091): 0.042*"as" + 0.001*"energy" + 0.001*"irish" + 0.001*"india" + 0.001*"australia" + 0.001*"soviet" + 0.001*"japanese" + 0.001*"y" + 0.001*"china" + 0.001*"australian"
+ 2021-05-05 22:44:22,190 : INFO : topic #10 (0.091): 0.030*"as" + 0.002*"soviet" + 0.002*"jewish" + 0.002*"km" + 0.001*"russian" + 0.001*"emperor" + 0.001*"israel" + 0.001*"actor" + 0.001*"minister" + 0.001*"league"
+ 2021-05-05 22:44:22,191 : INFO : topic #2 (0.091): 0.016*"as" + 0.003*"km" + 0.002*"est" + 0.002*"lebanon" + 0.002*"minister" + 0.002*"egypt" + 0.002*"israel" + 0.002*"prime" + 0.002*"energy" + 0.002*"marriage"
+ 2021-05-05 22:44:22,191 : INFO : topic #9 (0.091): 0.037*"as" + 0.001*"aircraft" + 0.001*"km" + 0.001*"software" + 0.001*"economy" + 0.001*"bridge" + 0.001*"israel" + 0.001*"minister" + 0.001*"spanish" + 0.001*"park"
+ 2021-05-05 22:44:22,192 : INFO : topic #4 (0.091): 0.030*"as" + 0.002*"india" + 0.001*"irish" + 0.001*"band" + 0.001*"ireland" + 0.001*"indian" + 0.001*"africa" + 0.001*"km" + 0.001*"love" + 0.001*"christmas"
+ 2021-05-05 22:44:22,192 : INFO : topic diff=0.195202, rho=0.577350
+ 2021-05-05 22:44:22,193 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel(num_terms=20076, num_topics=11, decay=0.5, chunksize=2000) in 60.13s', 'datetime': '2021-05-05T22:44:22.193045', 'gensim': '4.1.0.dev0', 'python': '3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]', 'platform': 'Darwin-15.6.0-x86_64-i386-64bit', 'event': 'created'}
+ 2021-05-05 22:44:22,193 : INFO : ensemble contains 9 models and 160 topics now
+ 2021-05-05 22:44:22,203 : INFO : ensemble contains 10 models and 169 topics now
+ 2021-05-05 22:44:22,231 : INFO : asymmetric distance matrix is outdated due to add_model
+ 2021-05-05 22:44:22,231 : INFO : generating a 180 x 180 asymmetric distance matrix...
+ 2021-05-05 22:44:24,696 : INFO : fitting the clustering model, using 5 for min_samples
+ 2021-05-05 22:44:24,780 : INFO : generating stable topics, using 3 for min_cores
+ 2021-05-05 22:44:24,781 : INFO : found 3 clusters
+ 2021-05-05 22:44:24,843 : INFO : found 1 stable topics
+ 2021-05-05 22:44:24,851 : INFO : generating classic gensim model representation based on results from the ensemble
+ 2021-05-05 22:44:24,854 : INFO : using symmetric alpha at 1.0
+ 2021-05-05 22:44:24,854 : INFO : using symmetric eta at 1.0
+ 2021-05-05 22:44:24,859 : INFO : using serial LDA version on this node
+ 2021-05-05 22:44:24,863 : INFO : running online (multi-pass) LDA training, 1 topics, 0 passes over the supplied corpus of 1701 documents, updating model once every 1701 documents, evaluating perplexity every 1701 documents, iterating 50x with a convergence threshold of 0.001000
+ 2021-05-05 22:44:24,863 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
+ 2021-05-05 22:44:24,863 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel(num_terms=20076, num_topics=1, decay=0.5, chunksize=2000) in 0.00s', 'datetime': '2021-05-05T22:44:24.863402', 'gensim': '4.1.0.dev0', 'python': '3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]', 'platform': 'Darwin-15.6.0-x86_64-i386-64bit', 'event': 'created'}
+ 180
+ 1
+
+
+
+
+
+.. rst-class:: sphx-glr-timing
+
+ **Total running time of the script:** ( 8 minutes 55.221 seconds)
+
+**Estimated memory usage:** 507 MB
+
+
+.. _sphx_glr_download_auto_examples_tutorials_run_ensemblelda.py:
+
+
+.. only :: html
+
+ .. container:: sphx-glr-footer
+ :class: sphx-glr-footer-example
+
+
+
+ .. container:: sphx-glr-download sphx-glr-download-python
+
+ :download:`Download Python source code: run_ensemblelda.py `
+
+
+
+ .. container:: sphx-glr-download sphx-glr-download-jupyter
+
+ :download:`Download Jupyter notebook: run_ensemblelda.ipynb `
+
+
+.. only:: html
+
+ .. rst-class:: sphx-glr-signature
+
+ `Gallery generated by Sphinx-Gallery `_
diff --git a/docs/src/auto_examples/tutorials/sg_execution_times.rst b/docs/src/auto_examples/tutorials/sg_execution_times.rst
index bb21c1ef5c..da986968c9 100644
--- a/docs/src/auto_examples/tutorials/sg_execution_times.rst
+++ b/docs/src/auto_examples/tutorials/sg_execution_times.rst
@@ -5,10 +5,10 @@
Computation times
=================
-**02:47.007** total execution time for **auto_examples_tutorials** files:
+**08:55.221** total execution time for **auto_examples_tutorials** files:
+-------------------------------------------------------------------------------------+-----------+----------+
-| :ref:`sphx_glr_auto_examples_tutorials_run_lda.py` (``run_lda.py``) | 02:47.007 | 657.5 MB |
+| :ref:`sphx_glr_auto_examples_tutorials_run_ensemblelda.py` (``run_ensemblelda.py``) | 08:55.221 | 506.6 MB |
+-------------------------------------------------------------------------------------+-----------+----------+
| :ref:`sphx_glr_auto_examples_tutorials_run_annoy.py` (``run_annoy.py``) | 00:00.000 | 0.0 MB |
+-------------------------------------------------------------------------------------+-----------+----------+
@@ -16,6 +16,8 @@ Computation times
+-------------------------------------------------------------------------------------+-----------+----------+
| :ref:`sphx_glr_auto_examples_tutorials_run_fasttext.py` (``run_fasttext.py``) | 00:00.000 | 0.0 MB |
+-------------------------------------------------------------------------------------+-----------+----------+
+| :ref:`sphx_glr_auto_examples_tutorials_run_lda.py` (``run_lda.py``) | 00:00.000 | 0.0 MB |
++-------------------------------------------------------------------------------------+-----------+----------+
| :ref:`sphx_glr_auto_examples_tutorials_run_scm.py` (``run_scm.py``) | 00:00.000 | 0.0 MB |
+-------------------------------------------------------------------------------------+-----------+----------+
| :ref:`sphx_glr_auto_examples_tutorials_run_wmd.py` (``run_wmd.py``) | 00:00.000 | 0.0 MB |
diff --git a/docs/src/conf.py b/docs/src/conf.py
index 34eb52c9de..5ea73d184d 100644
--- a/docs/src/conf.py
+++ b/docs/src/conf.py
@@ -61,9 +61,9 @@
# built documents.
#
# The short X.Y version.
-version = '4.0.0'
+version = '4.1'
# The full version, including alpha/beta/rc tags.
-release = '4.0.1'
+release = '4.1.0'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
diff --git a/docs/src/corpora/opinosiscorpus.rst b/docs/src/corpora/opinosiscorpus.rst
new file mode 100644
index 0000000000..3f62454677
--- /dev/null
+++ b/docs/src/corpora/opinosiscorpus.rst
@@ -0,0 +1,9 @@
+:mod:`corpora.opinosiscorpus` -- Topic related review sentences
+===============================================================
+
+.. automodule:: gensim.corpora.opinosiscorpus
+ :synopsis: Topic related review sentences
+ :members:
+ :inherited-members:
+ :undoc-members:
+ :show-inheritance:
diff --git a/docs/src/gallery/core/run_corpora_and_vector_spaces.py b/docs/src/gallery/core/run_corpora_and_vector_spaces.py
index 0a49614123..983a9d1235 100644
--- a/docs/src/gallery/core/run_corpora_and_vector_spaces.py
+++ b/docs/src/gallery/core/run_corpora_and_vector_spaces.py
@@ -138,7 +138,7 @@
class MyCorpus:
def __iter__(self):
- for line in open('https://radimrehurek.com/gensim/mycorpus.txt'):
+ for line in open('https://radimrehurek.com/mycorpus.txt'):
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(line.lower().split())
@@ -154,7 +154,7 @@ def __iter__(self):
# in RAM at once. You can even create the documents on the fly!
###############################################################################
-# Download the sample `mycorpus.txt file here <./mycorpus.txt>`_. The assumption that
+# Download the sample `mycorpus.txt file here `_. The assumption that
# each document occupies one line in a single file is not important; you can mold
# the `__iter__` function to fit your input format, whatever it is.
# Walking directories, parsing XML, accessing the network...
@@ -180,7 +180,7 @@ def __iter__(self):
# Similarly, to construct the dictionary without loading all texts into memory:
# collect statistics about all tokens
-dictionary = corpora.Dictionary(line.lower().split() for line in open('https://radimrehurek.com/gensim/mycorpus.txt'))
+dictionary = corpora.Dictionary(line.lower().split() for line in open('https://radimrehurek.com/mycorpus.txt'))
# remove stop words and words that appear only once
stop_ids = [
dictionary.token2id[stopword]
diff --git a/docs/src/gallery/howtos/run_doc.py b/docs/src/gallery/howtos/run_doc.py
index 15e870f1be..dbcd6d91e3 100644
--- a/docs/src/gallery/howtos/run_doc.py
+++ b/docs/src/gallery/howtos/run_doc.py
@@ -155,7 +155,7 @@
#
# First, get Sphinx Gallery to build your documentation::
#
-# make -C docs/src html
+# make --directory docs/src html
#
# This can take a while if your documentation uses a large dataset, or if you've changed many other tutorials or guides.
# Once this completes successfully, open ``docs/auto_examples/index.html`` in your browser.
@@ -176,7 +176,7 @@
# Gallery also generates .rst (RST for Sphinx) and .ipynb (Jupyter notebook) files from the script.
# Finally, ``sg_execution_times.rst`` contains the time taken to run each example.
#
-# Finally, make a PR on `github `__.
+# Finally, open a PR at `github `__.
# One of our friendly maintainers will review it, make suggestions, and eventually merge it.
-# Your documentation will then appear in the gallery alongside the rest of the example.
-# At that stage, give yourself a pat on the back: you're done!
+# Your documentation will then appear in the `gallery `__,
+# alongside the rest of the examples. Thanks a lot!
diff --git a/docs/src/gallery/tutorials/run_ensemblelda.py b/docs/src/gallery/tutorials/run_ensemblelda.py
new file mode 100644
index 0000000000..aa87d0ecd3
--- /dev/null
+++ b/docs/src/gallery/tutorials/run_ensemblelda.py
@@ -0,0 +1,158 @@
+r"""
+Ensemble LDA
+============
+
+Introduces Gensim's EnsembleLda model
+
+"""
+
+import logging
+logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
+
+###############################################################################
+# This tutorial will explain how to use the EnsembleLDA model class.
+#
+# EnsembleLda is a method of finding and generating stable topics from the results of multiple topic models,
+# it can be used to remove topics from your results that are noise and are not reproducible.
+#
+
+###############################################################################
+# Corpus
+# ------
+# We will use the gensim downloader api to get a small corpus for training our ensemble.
+#
+# The preprocessing is similar to :ref:`sphx_glr_auto_examples_tutorials_run_word2vec.py`,
+# so it won't be explained again in detail.
+#
+
+import gensim.downloader as api
+from gensim.corpora import Dictionary
+from nltk.stem.wordnet import WordNetLemmatizer
+
+lemmatizer = WordNetLemmatizer()
+docs = api.load('text8')
+
+dictionary = Dictionary()
+for doc in docs:
+ dictionary.add_documents([[lemmatizer.lemmatize(token) for token in doc]])
+dictionary.filter_extremes(no_below=20, no_above=0.5)
+
+corpus = [dictionary.doc2bow(doc) for doc in docs]
+
+###############################################################################
+# Training
+# --------
+#
+# Training the ensemble works very similar to training a single model,
+#
+# You can use any model that is based on LdaModel, such as LdaMulticore, to train the Ensemble.
+# In experiments, LdaMulticore showed better results.
+#
+
+from gensim.models import LdaModel
+topic_model_class = LdaModel
+
+###############################################################################
+# Any arbitrary number of models can be used, but it should be a multiple of your workers so that the
+# load can be distributed properly. In this example, 4 processes will train 8 models each.
+#
+
+ensemble_workers = 4
+num_models = 8
+
+###############################################################################
+# After training all the models, some distance computations are required which can take quite some
+# time as well. You can speed this up by using workers for that as well.
+#
+
+distance_workers = 4
+
+###############################################################################
+# All other parameters that are unknown to EnsembleLda are forwarded to each LDA Model, such as
+#
+num_topics = 20
+passes = 2
+
+###############################################################################
+# Now start the training
+#
+# Since 20 topics were trained on each of the 8 models, we expect there to be 160 different topics.
+# The number of stable topics which are clustered from all those topics is smaller.
+#
+
+from gensim.models import EnsembleLda
+ensemble = EnsembleLda(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=num_topics,
+ passes=passes,
+ num_models=num_models,
+ topic_model_class=LdaModel,
+ ensemble_workers=ensemble_workers,
+ distance_workers=distance_workers
+)
+
+print(len(ensemble.ttda))
+print(len(ensemble.get_topics()))
+
+###############################################################################
+# Tuning
+# ------
+#
+# Different from LdaModel, the number of resulting topics varies greatly depending on the clustering parameters.
+#
+# You can provide those in the ``recluster()`` function or the ``EnsembleLda`` constructor.
+#
+# Play around until you get as many topics as you desire, which however may reduce their quality.
+# If your ensemble doesn't have enough topics to begin with, you should make sure to make it large enough.
+#
+# Having an epsilon that is smaller than the smallest distance doesn't make sense.
+# Make sure to chose one that is within the range of values in ``asymmetric_distance_matrix``.
+#
+
+import numpy as np
+shape = ensemble.asymmetric_distance_matrix.shape
+without_diagonal = ensemble.asymmetric_distance_matrix[~np.eye(shape[0], dtype=bool)].reshape(shape[0], -1)
+print(without_diagonal.min(), without_diagonal.mean(), without_diagonal.max())
+
+ensemble.recluster(eps=0.09, min_samples=2, min_cores=2)
+
+print(len(ensemble.get_topics()))
+
+###############################################################################
+# Increasing the Size
+# -------------------
+#
+# If you have some models lying around that were trained on a corpus based on the same dictionary,
+# they are compatible and you can add them to the ensemble.
+#
+# By setting num_models of the EnsembleLda constructor to 0 you can also create an ensemble that is
+# entirely made out of your existing topic models with the following method.
+#
+# Afterwards the number and quality of stable topics might be different depending on your added topics and parameters.
+#
+
+from gensim.models import LdaMulticore
+
+model1 = LdaMulticore(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=9,
+ passes=4,
+)
+
+model2 = LdaModel(
+ corpus=corpus,
+ id2word=dictionary,
+ num_topics=11,
+ passes=2,
+)
+
+# add_model supports various types of input, check out its docstring
+ensemble.add_model(model1)
+ensemble.add_model(model2)
+
+ensemble.recluster()
+
+print(len(ensemble.ttda))
+print(len(ensemble.get_topics()))
diff --git a/docs/src/models/ensemblelda.rst b/docs/src/models/ensemblelda.rst
new file mode 100644
index 0000000000..42e63c94be
--- /dev/null
+++ b/docs/src/models/ensemblelda.rst
@@ -0,0 +1,9 @@
+:mod:`models.ensembelda` -- Ensemble Latent Dirichlet Allocation
+================================================================
+
+.. automodule:: gensim.models.ensemblelda
+ :synopsis: Ensemble Latent Dirichlet Allocation
+ :members:
+ :inherited-members:
+ :undoc-members:
+ :show-inheritance:
diff --git a/docs/src/people.rst b/docs/src/people.rst
index f75b652926..99dbd33d9e 100644
--- a/docs/src/people.rst
+++ b/docs/src/people.rst
@@ -48,4 +48,12 @@ Silver Sponsors
Bronze Sponsors
---------------
-`You? `_
+.. figure:: _static/images/eaccidents-logo.png
+ :target: https://eaccidents.com/
+ :width: 50%
+ :alt: EAccidents
+
+.. figure:: _static/images/techtarget-logo.png
+ :target: https://www.techtarget.com/
+ :width: 50%
+ :alt: TechTarget
diff --git a/docs/src/sphinx_rtd_theme/notification.html b/docs/src/sphinx_rtd_theme/notification.html
index e0c483e40a..4b0d7922e1 100644
--- a/docs/src/sphinx_rtd_theme/notification.html
+++ b/docs/src/sphinx_rtd_theme/notification.html
@@ -1,5 +1,5 @@
- You're viewing documentation for Gensim 4.0.0. For Gensim 3.8.3, please visit the old
Gensim 3.8.3 documentation.
+ You're viewing documentation for Gensim 4.0.0. For Gensim 3.8.3, please visit the old
Gensim 3.8.3 documentation and
Migration Guide.
diff --git a/docs/src/wiki.rst b/docs/src/wiki.rst
index 40e7c6343f..800e1b9c65 100644
--- a/docs/src/wiki.rst
+++ b/docs/src/wiki.rst
@@ -236,5 +236,5 @@ into LDA topic distributions:
By the way, improvements to the Wiki markup parsing code are welcome :-)
.. [3] Hoffman, Blei, Bach. 2010. Online learning for Latent Dirichlet Allocation
- [`pdf
`_] [`code `_]
+ [`pdf `_] [`code `_]
diff --git a/gensim/__init__.py b/gensim/__init__.py
index 84e9fc6463..899afc7591 100644
--- a/gensim/__init__.py
+++ b/gensim/__init__.py
@@ -4,7 +4,7 @@
"""
-__version__ = '4.0.1'
+__version__ = '4.1.0'
import logging
diff --git a/gensim/corpora/__init__.py b/gensim/corpora/__init__.py
index 0d51a9b903..7b7044e0b9 100644
--- a/gensim/corpora/__init__.py
+++ b/gensim/corpora/__init__.py
@@ -15,3 +15,4 @@
from .textcorpus import TextCorpus, TextDirectoryCorpus # noqa:F401
from .ucicorpus import UciCorpus # noqa:F401
from .malletcorpus import MalletCorpus # noqa:F401
+from .opinosiscorpus import OpinosisCorpus # noqa:F401
diff --git a/gensim/corpora/dictionary.py b/gensim/corpora/dictionary.py
index 3236dd081e..d954061caf 100644
--- a/gensim/corpora/dictionary.py
+++ b/gensim/corpora/dictionary.py
@@ -10,6 +10,7 @@
from collections.abc import Mapping
import logging
import itertools
+from typing import Optional, List, Tuple
from gensim import utils
@@ -25,9 +26,7 @@ class Dictionary(utils.SaveLoad, Mapping):
Attributes
----------
token2id : dict of (str, int)
- token -> tokenId.
- id2token : dict of (int, str)
- Reverse mapping for token2id, initialized in a lazy manner to save memory (not created until needed).
+ token -> token_id. I.e. the reverse mapping to `self[token_id]`.
cfs : dict of (int, int)
Collection frequencies: token_id -> how many instances of this token are contained in the documents.
dfs : dict of (int, int)
@@ -689,6 +688,30 @@ def load_from_text(fname):
result.dfs[wordid] = int(docfreq)
return result
+ def most_common(self, n: Optional[int] = None) -> List[Tuple[str, int]]:
+ """Return a list of the n most common words and their counts from the most common to the least.
+
+ Words with equal counts are ordered in the increasing order of their ids.
+
+ Parameters
+ ----------
+ n : int or None, optional
+ The number of most common words to be returned. If `None`, all words in the dictionary
+ will be returned. Default is `None`.
+
+ Returns
+ -------
+ most_common : list of (str, int)
+ The n most common words and their counts from the most common to the least.
+
+ """
+ most_common = [
+ (self[word], count)
+ for word, count
+ in sorted(self.cfs.items(), key=lambda x: (-x[1], x[0]))[:n]
+ ]
+ return most_common
+
@staticmethod
def from_corpus(corpus, id2word=None):
"""Create :class:`~gensim.corpora.dictionary.Dictionary` from an existing corpus.
diff --git a/gensim/corpora/lowcorpus.py b/gensim/corpora/lowcorpus.py
index 80dacf8ec0..01b1043a9c 100644
--- a/gensim/corpora/lowcorpus.py
+++ b/gensim/corpora/lowcorpus.py
@@ -11,28 +11,11 @@
from gensim import utils
from gensim.corpora import IndexedCorpus
-
+from gensim.parsing.preprocessing import split_on_space
logger = logging.getLogger(__name__)
-def split_on_space(s):
- """Split line by spaces, used in :class:`gensim.corpora.lowcorpus.LowCorpus`.
-
- Parameters
- ----------
- s : str
- Some line.
-
- Returns
- -------
- list of str
- List of tokens from `s`.
-
- """
- return [word for word in utils.to_unicode(s).strip().split(' ') if word]
-
-
class LowCorpus(IndexedCorpus):
"""Corpus handles input in `GibbsLda++ format `_.
@@ -86,7 +69,7 @@ def __init__(self, fname, id2word=None, line2words=split_on_space):
If not provided, the mapping is constructed directly from `fname`.
line2words : callable, optional
Function which converts lines(str) into tokens(list of str),
- using :func:`~gensim.corpora.lowcorpus.split_on_space` as default.
+ using :func:`~gensim.parsing.preprocessing.split_on_space` as default.
"""
IndexedCorpus.__init__(self, fname)
diff --git a/gensim/corpora/opinosiscorpus.py b/gensim/corpora/opinosiscorpus.py
new file mode 100644
index 0000000000..b4e25731ce
--- /dev/null
+++ b/gensim/corpora/opinosiscorpus.py
@@ -0,0 +1,77 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Author: Tobias B
+# Copyright (C) 2021 Radim Rehurek
+# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
+
+
+"""Creates a corpus and dictionary from the Opinosis dataset.
+
+References
+----------
+.. [1] Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei. Opinosis: a graph-based approach to abstractive
+ summarization of highly redundant opinions [online]. In : Proceedings of the 23rd International Conference on
+ Computational Linguistics. 2010. p. 340-348. Available from: https://kavita-ganesan.com/opinosis/
+"""
+
+import os
+import re
+from gensim.corpora import Dictionary
+from gensim.parsing.porter import PorterStemmer
+from gensim.parsing.preprocessing import STOPWORDS
+
+
+class OpinosisCorpus:
+ """Creates a corpus and dictionary from the Opinosis dataset.
+
+ http://kavita-ganesan.com/opinosis-opinion-dataset/
+
+ This data is organized in folders, each folder containing a few short docs.
+
+ Data can be obtained quickly using the following commands in bash:
+
+ mkdir opinosis && cd opinosis
+ wget https://github.com/kavgan/opinosis/raw/master/OpinosisDataset1.0_0.zip
+ unzip OpinosisDataset1.0_0.zip
+
+ corpus and dictionary can be accessed by using the .corpus and .id2word members
+
+ """
+
+ def __init__(self, path):
+ """Load the downloaded corpus.
+
+ Parameters
+ ----------
+ path : string
+ Path to the extracted zip file. If 'summaries-gold' is in a folder
+ called 'opinosis', then the Path parameter would be 'opinosis',
+ either relative to you current working directory or absolute.
+ """
+ # citation
+ path = os.path.join(path, "summaries-gold")
+ dictionary = Dictionary()
+ corpus = []
+ stemmer = PorterStemmer()
+
+ for directory, b, filenames in os.walk(path):
+ # each subdirectory of path is one collection of reviews to a specific product
+ # now get the corpus/documents
+ for filename in filenames:
+ filepath = directory + os.sep + filename
+ # write down the document and the topicId and split into train and testdata
+ with open(filepath) as file:
+ doc = file.read()
+
+ preprocessed_doc = [
+ stemmer.stem(token) for token in re.findall(r'\w+', doc.lower())
+ if token not in STOPWORDS
+ ]
+
+ dictionary.add_documents([preprocessed_doc])
+ corpus += [dictionary.doc2bow(preprocessed_doc)]
+
+ # and return the results the same way the other corpus generating functions do
+ self.corpus = corpus
+ self.id2word = dictionary
diff --git a/gensim/corpora/textcorpus.py b/gensim/corpora/textcorpus.py
index e5616fe9d7..c2b8b620bf 100644
--- a/gensim/corpora/textcorpus.py
+++ b/gensim/corpora/textcorpus.py
@@ -44,93 +44,15 @@
from gensim import interfaces, utils
from gensim.corpora.dictionary import Dictionary
-from gensim.parsing.preprocessing import STOPWORDS, RE_WHITESPACE
+from gensim.parsing.preprocessing import (
+ remove_stopword_tokens, remove_short_tokens,
+ lower_to_unicode, strip_multiple_whitespaces,
+)
from gensim.utils import deaccent, simple_tokenize
logger = logging.getLogger(__name__)
-def remove_stopwords(tokens, stopwords=STOPWORDS):
- """Remove stopwords using list from `gensim.parsing.preprocessing.STOPWORDS`.
-
- Parameters
- ----------
- tokens : iterable of str
- Sequence of tokens.
- stopwords : iterable of str, optional
- Sequence of stopwords
-
- Returns
- -------
- list of str
- List of tokens without `stopwords`.
-
- """
- return [token for token in tokens if token not in stopwords]
-
-
-def remove_short(tokens, minsize=3):
- """Remove tokens shorter than `minsize` chars.
-
- Parameters
- ----------
- tokens : iterable of str
- Sequence of tokens.
- minsize : int, optimal
- Minimal length of token (include).
-
- Returns
- -------
- list of str
- List of tokens without short tokens.
-
- """
- return [token for token in tokens if len(token) >= minsize]
-
-
-def lower_to_unicode(text, encoding='utf8', errors='strict'):
- """Lowercase `text` and convert to unicode, using :func:`gensim.utils.any2unicode`.
-
- Parameters
- ----------
- text : str
- Input text.
- encoding : str, optional
- Encoding that will be used for conversion.
- errors : str, optional
- Error handling behaviour, used as parameter for `unicode` function (python2 only).
-
- Returns
- -------
- str
- Unicode version of `text`.
-
- See Also
- --------
- :func:`gensim.utils.any2unicode`
- Convert any string to unicode-string.
-
- """
- return utils.to_unicode(text.lower(), encoding, errors)
-
-
-def strip_multiple_whitespaces(s):
- """Collapse multiple whitespace characters into a single space.
-
- Parameters
- ----------
- s : str
- Input string
-
- Returns
- -------
- str
- String with collapsed whitespaces.
-
- """
- return RE_WHITESPACE.sub(" ", s)
-
-
class TextCorpus(interfaces.CorpusABC):
"""Helper class to simplify the pipeline of getting BoW vectors from plain text.
@@ -177,12 +99,12 @@ class TextCorpus(interfaces.CorpusABC):
The default preprocessing consists of:
- #. :func:`~gensim.corpora.textcorpus.lower_to_unicode` - lowercase and convert to unicode (assumes utf8 encoding)
+ #. :func:`~gensim.parsing.preprocessing.lower_to_unicode` - lowercase and convert to unicode (assumes utf8 encoding)
#. :func:`~gensim.utils.deaccent`- deaccent (asciifolding)
- #. :func:`~gensim.corpora.textcorpus.strip_multiple_whitespaces` - collapse multiple whitespaces into a single one
+ #. :func:`~gensim.parsing.preprocessing.strip_multiple_whitespaces` - collapse multiple whitespaces into one
#. :func:`~gensim.utils.simple_tokenize` - tokenize by splitting on whitespace
- #. :func:`~gensim.corpora.textcorpus.remove_short` - remove words less than 3 characters long
- #. :func:`~gensim.corpora.textcorpus.remove_stopwords` - remove stopwords
+ #. :func:`~gensim.parsing.preprocessing.remove_short_tokens` - remove words less than 3 characters long
+ #. :func:`~gensim.parsing.preprocessing.remove_stopword_tokens` - remove stopwords
"""
@@ -204,15 +126,15 @@ def __init__(self, input=None, dictionary=None, metadata=False, character_filter
Each will be applied to the text of each document in order, and should return a single string with
the modified text. For Python 2, the original text will not be unicode, so it may be useful to
convert to unicode as the first character filter.
- If None - using :func:`~gensim.corpora.textcorpus.lower_to_unicode`,
- :func:`~gensim.utils.deaccent` and :func:`~gensim.corpora.textcorpus.strip_multiple_whitespaces`.
+ If None - using :func:`~gensim.parsing.preprocessing.lower_to_unicode`,
+ :func:`~gensim.utils.deaccent` and :func:`~gensim.parsing.preprocessing.strip_multiple_whitespaces`.
tokenizer : callable, optional
Tokenizer for document, if None - using :func:`~gensim.utils.simple_tokenize`.
token_filters : iterable of callable, optional
Each will be applied to the iterable of tokens in order, and should return another iterable of tokens.
These filters can add, remove, or replace tokens, or do nothing at all.
- If None - using :func:`~gensim.corpora.textcorpus.remove_short` and
- :func:`~gensim.corpora.textcorpus.remove_stopwords`.
+ If None - using :func:`~gensim.parsing.preprocessing.remove_short_tokens` and
+ :func:`~gensim.parsing.preprocessing.remove_stopword_tokens`.
Examples
--------
@@ -254,7 +176,7 @@ def __init__(self, input=None, dictionary=None, metadata=False, character_filter
self.token_filters = token_filters
if self.token_filters is None:
- self.token_filters = [remove_short, remove_stopwords]
+ self.token_filters = [remove_short_tokens, remove_stopword_tokens]
self.length = None
self.dictionary = None
diff --git a/gensim/models/__init__.py b/gensim/models/__init__.py
index 8aa19a0465..ac08d1fdb4 100644
--- a/gensim/models/__init__.py
+++ b/gensim/models/__init__.py
@@ -21,6 +21,8 @@
from .ldaseqmodel import LdaSeqModel # noqa:F401
from .fasttext import FastText # noqa:F401
from .translation_matrix import TranslationMatrix, BackMappingTranslationMatrix # noqa:F401
+from .ensemblelda import EnsembleLda # noqa:F401
+from .nmf import Nmf # noqa:F401
from gensim import interfaces, utils
diff --git a/gensim/models/atmodel.py b/gensim/models/atmodel.py
index 2fceb00307..838c7634e3 100755
--- a/gensim/models/atmodel.py
+++ b/gensim/models/atmodel.py
@@ -23,6 +23,9 @@
`_. The model correlates the authorship information with the topics to give a better
insight on the subject knowledge of an author.
+.. _'Online Learning for LDA' by Hoffman et al.: online-lda_
+.. _online-lda: https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf
+
Example
-------
.. sourcecode:: pycon
@@ -185,15 +188,30 @@ def __init__(self, corpus=None, num_topics=100, id2word=None, author2doc=None, d
iterations : int, optional
Maximum number of times the model loops over each document.
decay : float, optional
- Controls how old documents are forgotten.
+ A number between (0.5, 1] to weight what percentage of the previous lambda value is forgotten
+ when each new document is examined. Corresponds to :math:`\\kappa` from
+ `'Online Learning for LDA' by Hoffman et al.`_
offset : float, optional
- Controls down-weighting of iterations.
- alpha : float, optional
- Hyperparameters for author-topic model.Supports special values of 'asymmetric'
- and 'auto': the former uses a fixed normalized asymmetric 1.0/topicno prior,
- the latter learns an asymmetric prior directly from your data.
- eta : float, optional
- Hyperparameters for author-topic model.
+ Hyper-parameter that controls how much we will slow down the first steps the first few iterations.
+ Corresponds to :math:`\\tau_0` from `'Online Learning for LDA' by Hoffman et al.`_
+ alpha : {float, numpy.ndarray of float, list of float, str}, optional
+ A-priori belief on document-topic distribution, this can be:
+ * scalar for a symmetric prior over document-topic distribution,
+ * 1D array of length equal to num_topics to denote an asymmetric user defined prior for each topic.
+
+ Alternatively default prior selecting strategies can be employed by supplying a string:
+ * 'symmetric': (default) Uses a fixed symmetric prior of `1.0 / num_topics`,
+ * 'asymmetric': Uses a fixed normalized asymmetric prior of `1.0 / (topic_index + sqrt(num_topics))`,
+ * 'auto': Learns an asymmetric prior from the corpus (not available if `distributed==True`).
+ eta : {float, numpy.ndarray of float, list of float, str}, optional
+ A-priori belief on topic-word distribution, this can be:
+ * scalar for a symmetric prior over topic-word distribution,
+ * 1D array of length equal to num_words to denote an asymmetric user defined prior for each word,
+ * matrix of shape (num_topics, num_words) to assign a probability for each word-topic combination.
+
+ Alternatively default prior selecting strategies can be employed by supplying a string:
+ * 'symmetric': (default) Uses a fixed symmetric prior of `1.0 / num_topics`,
+ * 'auto': Learns an asymmetric prior from the corpus.
update_every : int, optional
Make updates in topic probability for latest mini-batch.
eval_every : int, optional
@@ -279,23 +297,17 @@ def __init__(self, corpus=None, num_topics=100, id2word=None, author2doc=None, d
self.init_empty_corpus()
self.alpha, self.optimize_alpha = self.init_dir_prior(alpha, 'alpha')
-
assert self.alpha.shape == (self.num_topics,), \
"Invalid alpha shape. Got shape %s, but expected (%d, )" % (str(self.alpha.shape), self.num_topics)
- if isinstance(eta, str):
- if eta == 'asymmetric':
- raise ValueError("The 'asymmetric' option cannot be used for eta")
-
self.eta, self.optimize_eta = self.init_dir_prior(eta, 'eta')
-
- self.random_state = utils.get_random_state(random_state)
-
assert (self.eta.shape == (self.num_terms,) or self.eta.shape == (self.num_topics, self.num_terms)), (
"Invalid eta shape. Got shape %s, but expected (%d, 1) or (%d, %d)" %
(str(self.eta.shape), self.num_terms, self.num_topics, self.num_terms)
)
+ self.random_state = utils.get_random_state(random_state)
+
# VB constants
self.iterations = iterations
self.gamma_threshold = gamma_threshold
@@ -486,9 +498,12 @@ def inference(self, chunk, author2doc, doc2author, rhot, collect_sstats=False, c
# Update gamma.
# phi is computed implicitly below,
+ dot = np.dot(cts / phinorm, expElogbetad.T)
for ai, a in enumerate(authors_d):
- tilde_gamma[ai, :] = self.alpha + len(self.author2doc[self.id2author[a]])\
- * expElogthetad[ai, :] * np.dot(cts / phinorm, expElogbetad.T)
+ tilde_gamma[ai, :] = (
+ self.alpha
+ + len(self.author2doc[self.id2author[a]]) * expElogthetad[ai, :] * dot
+ )
# Update gamma.
# Interpolation between document d's "local" gamma (tilde_gamma),
@@ -612,15 +627,14 @@ def update(self, corpus=None, author2doc=None, doc2author=None, chunksize=None,
Notes
-----
- This update also supports updating an already trained model (self)
- with new documents from `corpus`: the two models are then merged in proportion to the number of old vs. new
- documents. This feature is still experimental for non-stationary input streams.
+ This update also supports updating an already trained model (`self`) with new documents from `corpus`;
+ the two models are then merged in proportion to the number of old vs. new documents.
+ This feature is still experimental for non-stationary input streams.
- For stationary input (no topic drift in new documents), on the other hand, this equals the online update of
- `Hoffman et al. Stochastic Variational Inference
- `_ and is guaranteed to converge for any `decay`
- in (0.5, 1.0>. Additionally, for smaller `corpus` sizes, an increasing `offset` may be beneficial (see
- Table 1 in Hoffman et al.)
+ For stationary input (no topic drift in new documents), on the other hand, this equals the
+ online update of `'Online Learning for LDA' by Hoffman et al.`_
+ and is guaranteed to converge for any `decay` in (0.5, 1]. Additionally, for smaller corpus sizes, an
+ increasing `offset` may be beneficial (see Table 1 in the same paper).
If update is called with authors that already exist in the model, it will resume training on not only new
documents for that author, but also the previously seen documents. This is necessary for those authors' topic
@@ -647,9 +661,12 @@ def update(self, corpus=None, author2doc=None, doc2author=None, chunksize=None,
chunksize : int, optional
Controls the size of the mini-batches.
decay : float, optional
- Controls how old documents are forgotten.
+ A number between (0.5, 1] to weight what percentage of the previous lambda value is forgotten
+ when each new document is examined. Corresponds to :math:`\\kappa` from
+ `'Online Learning for LDA' by Hoffman et al.`_
offset : float, optional
- Controls down-weighting of iterations.
+ Hyper-parameter that controls how much we will slow down the first steps the first few iterations.
+ Corresponds to :math:`\\tau_0` from `'Online Learning for LDA' by Hoffman et al.`_
passes : int, optional
Number of times the model makes a pass over the entire training data.
update_every : int, optional
diff --git a/gensim/models/coherencemodel.py b/gensim/models/coherencemodel.py
index 70fea79804..b3c89640a7 100644
--- a/gensim/models/coherencemodel.py
+++ b/gensim/models/coherencemodel.py
@@ -444,11 +444,14 @@ def topics(self, topics):
self._topics = new_topics
def _ensure_elements_are_ids(self, topic):
- try:
- return np.array([self.dictionary.token2id[token] for token in topic])
- except KeyError: # might be a list of token ids already, but let's verify all in dict
- topic = (self.dictionary.id2token[_id] for _id in topic)
- return np.array([self.dictionary.token2id[token] for token in topic])
+ ids_from_tokens = [self.dictionary.token2id[t] for t in topic if t in self.dictionary.token2id]
+ ids_from_ids = [i for i in topic if i in self.dictionary]
+ if len(ids_from_tokens) > len(ids_from_ids):
+ return np.array(ids_from_tokens)
+ elif len(ids_from_ids) > len(ids_from_tokens):
+ return np.array(ids_from_ids)
+ else:
+ raise ValueError('unable to interpret topic as either a list of tokens or a list of ids')
def _update_accumulator(self, new_topics):
if self._relevant_ids_will_differ(new_topics):
diff --git a/gensim/models/doc2vec.py b/gensim/models/doc2vec.py
index 0dcad7165a..c4b28316b7 100644
--- a/gensim/models/doc2vec.py
+++ b/gensim/models/doc2vec.py
@@ -21,7 +21,7 @@
`_.
For a usage example, see the `Doc2vec tutorial
-`_.
+`_.
**Make sure you have a C compiler before installing Gensim, to use the optimized doc2vec routines** (70x speedup
compared to plain NumPy implementation, https://rare-technologies.com/parallelizing-word2vec-in-python/).
@@ -158,7 +158,7 @@ def count(self, new_val):
class Doc2Vec(Word2Vec):
def __init__(self, documents=None, corpus_file=None, vector_size=100, dm_mean=None, dm=1, dbow_words=0, dm_concat=0,
dm_tag_count=1, dv=None, dv_mapfile=None, comment=None, trim_rule=None, callbacks=(),
- window=5, epochs=10, **kwargs):
+ window=5, epochs=10, shrink_windows=True, **kwargs):
"""Class for training, using and evaluating neural networks described in
`Distributed Representations of Sentences and Documents `_.
@@ -248,6 +248,12 @@ def __init__(self, documents=None, corpus_file=None, vector_size=100, dm_mean=No
callbacks : :obj: `list` of :obj: `~gensim.models.callbacks.CallbackAny2Vec`, optional
List of callbacks that need to be executed/run at specific stages during training.
+ shrink_windows : bool, optional
+ New in 4.1. Experimental.
+ If True, the effective window size is uniformly sampled from [1, `window`]
+ for each target word during training, to match the original word2vec algorithm's
+ approximate weighting of context words by distance. Otherwise, the effective
+ window size is always fixed to `window` words to either side.
Some important internal attributes are the following:
@@ -265,6 +271,7 @@ def __init__(self, documents=None, corpus_file=None, vector_size=100, dm_mean=No
.. sourcecode:: pycon
>>> model.dv['doc003']
+
"""
corpus_iterable = documents
@@ -293,6 +300,7 @@ def __init__(self, documents=None, corpus_file=None, vector_size=100, dm_mean=No
callbacks=callbacks,
window=window,
epochs=epochs,
+ shrink_windows=shrink_windows,
**kwargs,
)
@@ -357,8 +365,10 @@ def reset_from(self, other_model):
self.dv.expandos = other_model.dv.expandos
self.init_weights()
- def _do_train_epoch(self, corpus_file, thread_id, offset, cython_vocab, thread_private_mem, cur_epoch,
- total_examples=None, total_words=None, offsets=None, start_doctags=None, **kwargs):
+ def _do_train_epoch(
+ self, corpus_file, thread_id, offset, cython_vocab, thread_private_mem, cur_epoch,
+ total_examples=None, total_words=None, offsets=None, start_doctags=None, **kwargs
+ ):
work, neu1 = thread_private_mem
doctag_vectors = self.dv.vectors
doctags_lockf = self.dv.vectors_lockf
@@ -425,10 +435,12 @@ def _do_train_job(self, job, alpha, inits):
)
return tally, self._raw_word_count(job)
- def train(self, corpus_iterable=None, corpus_file=None, total_examples=None, total_words=None,
- epochs=None, start_alpha=None, end_alpha=None,
- word_count=0, queue_factor=2, report_delay=1.0, callbacks=(),
- **kwargs):
+ def train(
+ self, corpus_iterable=None, corpus_file=None, total_examples=None, total_words=None,
+ epochs=None, start_alpha=None, end_alpha=None,
+ word_count=0, queue_factor=2, report_delay=1.0, callbacks=(),
+ **kwargs,
+ ):
"""Update the model's neural weights.
To support linear learning-rate decay from (initial) `alpha` to `min_alpha`, and accurate
@@ -576,13 +588,13 @@ def estimated_lookup_memory(self):
"""
return 60 * len(self.dv) + 140 * len(self.dv)
- def infer_vector(self, doc_words, alpha=None, min_alpha=None, epochs=None, steps=None):
+ def infer_vector(self, doc_words, alpha=None, min_alpha=None, epochs=None):
"""Infer a vector for given post-bulk training document.
Notes
-----
Subsequent calls to this function may infer different representations for the same document.
- For a more stable representation, increase the number of steps to assert a stricket convergence.
+ For a more stable representation, increase the number of epochs to assert a stricter convergence.
Parameters
----------
@@ -795,7 +807,7 @@ def load(cls, *args, **kwargs):
return super(Doc2Vec, cls).load(*args, rethrow=True, **kwargs)
except AttributeError as ae:
logger.error(
- "Model load error. Was model saved using code from an older Gensim Version? "
+ "Model load error. Was model saved using code from an older Gensim version? "
"Try loading older model using gensim-3.8.3, then re-saving, to restore "
"compatibility with current code.")
raise ae
@@ -1042,7 +1054,7 @@ def scan_vocab(self, corpus_iterable=None, corpus_file=None, progress_per=10000,
return total_words, corpus_count
- def similarity_unseen_docs(self, doc_words1, doc_words2, alpha=None, min_alpha=None, steps=None):
+ def similarity_unseen_docs(self, doc_words1, doc_words2, alpha=None, min_alpha=None, epochs=None):
"""Compute cosine similarity between two post-bulk out of training documents.
Parameters
@@ -1057,7 +1069,7 @@ def similarity_unseen_docs(self, doc_words1, doc_words2, alpha=None, min_alpha=N
The initial learning rate.
min_alpha : float, optional
Learning rate will linearly drop to `min_alpha` as training progresses.
- steps : int, optional
+ epochs : int, optional
Number of epoch to train the new document.
Returns
@@ -1066,8 +1078,8 @@ def similarity_unseen_docs(self, doc_words1, doc_words2, alpha=None, min_alpha=N
The cosine similarity between `doc_words1` and `doc_words2`.
"""
- d1 = self.infer_vector(doc_words=doc_words1, alpha=alpha, min_alpha=min_alpha, steps=steps)
- d2 = self.infer_vector(doc_words=doc_words2, alpha=alpha, min_alpha=min_alpha, steps=steps)
+ d1 = self.infer_vector(doc_words=doc_words1, alpha=alpha, min_alpha=min_alpha, epochs=epochs)
+ d2 = self.infer_vector(doc_words=doc_words2, alpha=alpha, min_alpha=min_alpha, epochs=epochs)
return np.dot(matutils.unitvec(d1), matutils.unitvec(d2))
diff --git a/gensim/models/doc2vec_corpusfile.pyx b/gensim/models/doc2vec_corpusfile.pyx
index 40bf20bdd3..9216d13bd4 100644
--- a/gensim/models/doc2vec_corpusfile.pyx
+++ b/gensim/models/doc2vec_corpusfile.pyx
@@ -59,7 +59,7 @@ cdef void prepare_c_structures_for_batch(
int *effective_words, unsigned long long *next_random, cvocab_t *vocab,
np.uint32_t *indexes, int *codelens, np.uint8_t **codes, np.uint32_t **points,
np.uint32_t *reduced_windows, int *document_len, int train_words,
- int docvecs_count, int doc_tag,
+ int docvecs_count, int doc_tag, int shrink_windows,
) nogil:
cdef VocabItem predict_word
cdef string token
@@ -87,8 +87,12 @@ cdef void prepare_c_structures_for_batch(
document_len[0] = i
if train_words and reduced_windows != NULL:
- for i in range(document_len[0]):
- reduced_windows[i] = random_int32(next_random) % window
+ if shrink_windows:
+ for i in range(document_len[0]):
+ reduced_windows[i] = random_int32(next_random) % window
+ else:
+ for i in range(document_len[0]):
+ reduced_windows[i] = 0
if doc_tag < docvecs_count:
effective_words[0] += 1
@@ -160,6 +164,7 @@ def d2v_train_epoch_dbow(
cdef long long total_documents = 0
cdef long long total_effective_words = 0, total_words = 0
cdef int sent_idx, idx_start, idx_end
+ cdef int shrink_windows = int(model.shrink_windows)
cdef vector[string] doc_words
cdef long long _doc_tag = start_doctag
@@ -183,7 +188,7 @@ def d2v_train_epoch_dbow(
prepare_c_structures_for_batch(
doc_words, c.sample, c.hs, c.window, &total_words, &effective_words,
&c.next_random, vocab.get_vocab_ptr(), c.indexes, c.codelens, c.codes, c.points,
- c.reduced_windows, &document_len, c.train_words, c.docvecs_count, _doc_tag)
+ c.reduced_windows, &document_len, c.train_words, c.docvecs_count, _doc_tag, shrink_windows)
for i in range(document_len):
if c.train_words: # simultaneous skip-gram wordvec-training
@@ -300,6 +305,7 @@ def d2v_train_epoch_dm(
cdef long long total_effective_words = 0, total_words = 0
cdef int sent_idx, idx_start, idx_end
cdef REAL_t count, inv_count = 1.0
+ cdef int shrink_windows = int(model.shrink_windows)
cdef vector[string] doc_words
cdef long long _doc_tag = start_doctag
@@ -323,7 +329,7 @@ def d2v_train_epoch_dm(
prepare_c_structures_for_batch(
doc_words, c.sample, c.hs, c.window, &total_words, &effective_words, &c.next_random,
vocab.get_vocab_ptr(), c.indexes, c.codelens, c.codes, c.points, c.reduced_windows,
- &document_len, c.train_words, c.docvecs_count, _doc_tag)
+ &document_len, c.train_words, c.docvecs_count, _doc_tag, shrink_windows)
for i in range(document_len):
j = i - c.window + c.reduced_windows[i]
@@ -453,6 +459,7 @@ def d2v_train_epoch_dm_concat(
cdef long long total_documents = 0
cdef long long total_effective_words = 0, total_words = 0
cdef int sent_idx, idx_start, idx_end
+ cdef int shrink_windows = int(model.shrink_windows)
cdef vector[string] doc_words
cdef long long _doc_tag = start_doctag
@@ -490,7 +497,8 @@ def d2v_train_epoch_dm_concat(
prepare_c_structures_for_batch(
doc_words, c.sample, c.hs, c.window, &total_words, &effective_words,
&c.next_random, vocab.get_vocab_ptr(), c.indexes, c.codelens, c.codes,
- c.points, NULL, &document_len, c.train_words, c.docvecs_count, _doc_tag)
+ c.points, NULL, &document_len, c.train_words, c.docvecs_count, _doc_tag,
+ shrink_windows)
for i in range(document_len):
j = i - c.window # negative OK: will pad with null word
diff --git a/gensim/models/doc2vec_inner.pyx b/gensim/models/doc2vec_inner.pyx
index 23ede53c90..1657c59787 100644
--- a/gensim/models/doc2vec_inner.pyx
+++ b/gensim/models/doc2vec_inner.pyx
@@ -365,8 +365,12 @@ def train_document_dbow(model, doc_words, doctag_indexes, alpha, work=None,
if c.train_words:
# single randint() call avoids a big thread-synchronization slowdown
- for i, item in enumerate(model.random.randint(0, c.window, c.document_len)):
- c.reduced_windows[i] = item
+ if model.shrink_windows:
+ for i, item in enumerate(model.random.randint(0, c.window, c.document_len)):
+ c.reduced_windows[i] = item
+ else:
+ for i in range(c.document_len):
+ c.reduced_windows[i] = 0
for i in range(c.doctag_len):
c.doctag_indexes[i] = doctag_indexes[i]
@@ -497,8 +501,12 @@ def train_document_dm(model, doc_words, doctag_indexes, alpha, work=None, neu1=N
c.document_len = i
# single randint() call avoids a big thread-sync slowdown
- for i, item in enumerate(model.random.randint(0, c.window, c.document_len)):
- c.reduced_windows[i] = item
+ if model.shrink_windows:
+ for i, item in enumerate(model.random.randint(0, c.window, c.document_len)):
+ c.reduced_windows[i] = item
+ else:
+ for i in range(c.document_len):
+ c.reduced_windows[i] = 0
for i in range(c.doctag_len):
c.doctag_indexes[i] = doctag_indexes[i]
diff --git a/gensim/models/ensemblelda.py b/gensim/models/ensemblelda.py
new file mode 100644
index 0000000000..39d7e06620
--- /dev/null
+++ b/gensim/models/ensemblelda.py
@@ -0,0 +1,1366 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Authors: Tobias Brigl , Alex Salles ,
+# Alex Loosley , Data Reply Munich
+# Copyright (C) 2021 Radim Rehurek
+# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
+
+
+"""Ensemble Latent Dirichlet Allocation (eLDA), an algorithm for extracting reliable topics.
+
+The aim of topic modelling is to find a set of topics that represent the global structure of a corpus of documents. One
+issue that occurs with topics extracted from an NMF or LDA model is reproducibility. That is, if the topic model is
+trained repeatedly allowing only the random seed to change, would the same (or similar) topic representation be reliably
+learned. Unreliable topics are undesireable because they are not a good representation of the corpus.
+
+Ensemble LDA addresses the issue by training an ensemble of topic models and throwing out topics that do not reoccur
+across the ensemble. In this regard, the topics extracted are more reliable and there is the added benefit over many
+topic models that the user does not need to know the exact number of topics ahead of time.
+
+For more information, see the :ref:`citation section ` below, watch our `Machine Learning Prague 2019 talk
+`_,
+or view our `Machine Learning Summer School poster
+`_.
+
+Usage examples
+--------------
+
+Train an ensemble of LdaModels using a Gensim corpus:
+
+.. sourcecode:: pycon
+
+ >>> from gensim.test.utils import common_texts
+ >>> from gensim.corpora.dictionary import Dictionary
+ >>> from gensim.models import EnsembleLda
+ >>>
+ >>> # Create a corpus from a list of texts
+ >>> common_dictionary = Dictionary(common_texts)
+ >>> common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]
+ >>>
+ >>> # Train the model on the corpus. corpus has to be provided as a
+ >>> # keyword argument, as they are passed through to the children.
+ >>> elda = EnsembleLda(corpus=common_corpus, id2word=common_dictionary, num_topics=10, num_models=4)
+
+Save a model to disk, or reload a pre-trained model:
+
+.. sourcecode:: pycon
+
+ >>> from gensim.test.utils import datapath
+ >>>
+ >>> # Save model to disk.
+ >>> temp_file = datapath("model")
+ >>> elda.save(temp_file)
+ >>>
+ >>> # Load a potentially pretrained model from disk.
+ >>> elda = EnsembleLda.load(temp_file)
+
+Query, the model using new, unseen documents:
+
+.. sourcecode:: pycon
+
+ >>> # Create a new corpus, made of previously unseen documents.
+ >>> other_texts = [
+ ... ['computer', 'time', 'graph'],
+ ... ['survey', 'response', 'eps'],
+ ... ['human', 'system', 'computer']
+ ... ]
+ >>> other_corpus = [common_dictionary.doc2bow(text) for text in other_texts]
+ >>>
+ >>> unseen_doc = other_corpus[0]
+ >>> vector = elda[unseen_doc] # get topic probability distribution for a document
+
+Increase the ensemble size by adding a new model. Make sure it uses the same dictionary:
+
+.. sourcecode:: pycon
+
+ >>> from gensim.models import LdaModel
+ >>> elda.add_model(LdaModel(common_corpus, id2word=common_dictionary, num_topics=10))
+ >>> elda.recluster()
+ >>> vector = elda[unseen_doc]
+
+To optimize the ensemble for your specific case, the children can be clustered again using
+different hyperparameters:
+
+.. sourcecode:: pycon
+
+ >>> elda.recluster(eps=0.2)
+
+.. _Citation:
+
+Citation
+--------
+BRIGL, Tobias, 2019, Extracting Reliable Topics using Ensemble Latent Dirichlet Allocation [Bachelor Thesis].
+Technische Hochschule Ingolstadt. Munich: Data Reply GmbH. Available from:
+https://www.sezanzeb.de/machine_learning/ensemble_LDA/
+
+"""
+import logging
+import os
+from multiprocessing import Process, Pipe, ProcessError
+import importlib
+from typing import Set, Optional, List
+
+import numpy as np
+from scipy.spatial.distance import cosine
+from dataclasses import dataclass
+
+from gensim import utils
+from gensim.models import ldamodel, ldamulticore, basemodel
+from gensim.utils import SaveLoad
+
+
+logger = logging.getLogger(__name__)
+
+# _COSINE_DISTANCE_CALCULATION_THRESHOLD is used so that cosine distance calculations can be sped up by skipping
+# distance calculations for highly masked topic-term distributions
+_COSINE_DISTANCE_CALCULATION_THRESHOLD = 0.05
+
+# nps max random state of 2**32 - 1 is too large for windows
+_MAX_RANDOM_STATE = np.iinfo(np.int32).max
+
+
+@dataclass
+class Topic:
+ is_core: bool # if the topic has enough neighbors
+ neighboring_labels: Set[int] # which other clusters are close by
+ neighboring_topic_indices: Set[int] # which other topics are close by
+ label: Optional[int] # to which cluster this topic belongs
+ num_neighboring_labels: int # how many different labels a core has as parents
+ valid_neighboring_labels: Set[int] # A set of labels of close by clusters that are large enough
+
+
+@dataclass
+class Cluster:
+ max_num_neighboring_labels: int # the max number of parent labels among each topic of a given cluster
+ neighboring_labels: List[Set[int]] # a concatenated list of the neighboring_labels sets of each topic
+ label: int # the unique identifier of the cluster
+ num_cores: int # how many topics in the cluster are cores
+
+
+def _is_valid_core(topic):
+ """Check if the topic is a valid core, i.e. no neighboring valid cluster is overlapping with it.
+
+ Parameters
+ ----------
+ topic : :class:`Topic`
+ topic to validate
+
+ """
+ return topic.is_core and (topic.valid_neighboring_labels == {topic.label})
+
+
+def _remove_from_all_sets(label, clusters):
+ """Remove a label from every set in "neighboring_labels" for each core in ``clusters``."""
+ for cluster in clusters:
+ for neighboring_labels_set in cluster.neighboring_labels:
+ if label in neighboring_labels_set:
+ neighboring_labels_set.remove(label)
+
+
+def _contains_isolated_cores(label, cluster, min_cores):
+ """Check if the cluster has at least ``min_cores`` of cores that belong to no other cluster."""
+ return sum([neighboring_labels == {label} for neighboring_labels in cluster.neighboring_labels]) >= min_cores
+
+
+def _aggregate_topics(grouped_by_labels):
+ """Aggregate the labeled topics to a list of clusters.
+
+ Parameters
+ ----------
+ grouped_by_labels : dict of (int, list of :class:`Topic`)
+ The return value of _group_by_labels. A mapping of the label to a list of each topic which belongs to the
+ label.
+
+ Returns
+ -------
+ list of :class:`Cluster`
+ It is sorted by max_num_neighboring_labels in descending order. There is one single element for each cluster.
+
+ """
+ clusters = []
+
+ for label, topics in grouped_by_labels.items():
+ max_num_neighboring_labels = 0
+ neighboring_labels = [] # will be a list of sets
+
+ for topic in topics:
+ max_num_neighboring_labels = max(topic.num_neighboring_labels, max_num_neighboring_labels)
+ neighboring_labels.append(topic.neighboring_labels)
+
+ neighboring_labels = [x for x in neighboring_labels if len(x) > 0]
+
+ clusters.append(Cluster(
+ max_num_neighboring_labels=max_num_neighboring_labels,
+ neighboring_labels=neighboring_labels,
+ label=label,
+ num_cores=len([topic for topic in topics if topic.is_core]),
+ ))
+
+ logger.info("found %s clusters", len(clusters))
+
+ return clusters
+
+
+def _group_by_labels(cbdbscan_topics):
+ """Group all the learned cores by their label, which was assigned in the cluster_model.
+
+ Parameters
+ ----------
+ cbdbscan_topics : list of :class:`Topic`
+ A list of topic data resulting from fitting a :class:`~CBDBSCAN` object.
+ After calling .fit on a CBDBSCAN model, the results can be retrieved from it by accessing the .results
+ member, which can be used as the argument to this function. It is a list of infos gathered during
+ the clustering step and each element in the list corresponds to a single topic.
+
+ Returns
+ -------
+ dict of (int, list of :class:`Topic`)
+ A mapping of the label to a list of topics that belong to that particular label. Also adds
+ a new member to each topic called num_neighboring_labels, which is the number of
+ neighboring_labels of that topic.
+
+ """
+ grouped_by_labels = {}
+
+ for topic in cbdbscan_topics:
+ if topic.is_core:
+ topic.num_neighboring_labels = len(topic.neighboring_labels)
+
+ label = topic.label
+ if label not in grouped_by_labels:
+ grouped_by_labels[label] = []
+ grouped_by_labels[label].append(topic)
+
+ return grouped_by_labels
+
+
+def _teardown(pipes, processes, i):
+ """Close pipes and terminate processes.
+
+ Parameters
+ ----------
+ pipes : {list of :class:`multiprocessing.Pipe`}
+ list of pipes that the processes use to communicate with the parent
+ processes : {list of :class:`multiprocessing.Process`}
+ list of worker processes
+ """
+ for parent_conn, child_conn in pipes:
+ child_conn.close()
+ parent_conn.close()
+
+ for process in processes:
+ if process.is_alive():
+ process.terminate()
+ del process
+
+
+def mass_masking(a, threshold=None):
+ """Original masking method. Returns a new binary mask."""
+ if threshold is None:
+ threshold = 0.95
+
+ sorted_a = np.sort(a)[::-1]
+ largest_mass = sorted_a.cumsum() < threshold
+ smallest_valid = sorted_a[largest_mass][-1]
+ return a >= smallest_valid
+
+
+def rank_masking(a, threshold=None):
+ """Faster masking method. Returns a new binary mask."""
+ if threshold is None:
+ threshold = 0.11
+
+ return a > np.sort(a)[::-1][int(len(a) * threshold)]
+
+
+def _validate_clusters(clusters, min_cores):
+ """Check which clusters from the cbdbscan step are significant enough. is_valid is set accordingly."""
+ # Clusters with noisy invalid neighbors may have a harder time being marked as stable, so start with the
+ # easy ones and potentially already remove some noise by also sorting smaller clusters to the front.
+ # This clears up the clusters a bit before checking the ones with many neighbors.
+ def _cluster_sort_key(cluster):
+ return cluster.max_num_neighboring_labels, cluster.num_cores, cluster.label
+
+ sorted_clusters = sorted(clusters, key=_cluster_sort_key, reverse=False)
+
+ for cluster in sorted_clusters:
+ cluster.is_valid = None
+ if cluster.num_cores < min_cores:
+ cluster.is_valid = False
+ _remove_from_all_sets(cluster.label, sorted_clusters)
+
+ # now that invalid clusters are removed, check which clusters contain enough cores that don't belong to any
+ # other cluster.
+ for cluster in [cluster for cluster in sorted_clusters if cluster.is_valid is None]:
+ label = cluster.label
+ if _contains_isolated_cores(label, cluster, min_cores):
+ cluster.is_valid = True
+ else:
+ cluster.is_valid = False
+ _remove_from_all_sets(label, sorted_clusters)
+
+ return [cluster for cluster in sorted_clusters if cluster.is_valid]
+
+
+def _generate_topic_models_multiproc(ensemble, num_models, ensemble_workers):
+ """Generate the topic models to form the ensemble in a multiprocessed way.
+
+ Depending on the used topic model this can result in a speedup.
+
+ Parameters
+ ----------
+ ensemble: EnsembleLda
+ the ensemble
+ num_models : int
+ how many models to train in the ensemble
+ ensemble_workers : int
+ into how many processes to split the models will be set to max(workers, num_models), to avoid workers that
+ are supposed to train 0 models.
+
+ to get maximum performance, set to the number of your cores, if non-parallelized models are being used in
+ the ensemble (LdaModel).
+
+ For LdaMulticore, the performance gain is small and gets larger for a significantly smaller corpus.
+ In that case, ensemble_workers=2 can be used.
+
+ """
+ # the way random_states is handled needs to prevent getting different results when multiprocessing is on,
+ # or getting the same results in every lda children. so it is solved by generating a list of state seeds before
+ # multiprocessing is started.
+ random_states = [ensemble.random_state.randint(_MAX_RANDOM_STATE) for _ in range(num_models)]
+
+ # each worker has to work on at least one model.
+ # Don't spawn idle workers:
+ workers = min(ensemble_workers, num_models)
+
+ # create worker processes:
+ # from what I know this is basically forking with a jump to a target function in each child
+ # so modifying the ensemble object will not modify the one in the parent because of no shared memory
+ processes = []
+ pipes = []
+ num_models_unhandled = num_models # how many more models need to be trained by workers?
+
+ for i in range(workers):
+ parent_conn, child_conn = Pipe()
+ num_subprocess_models = 0
+ if i == workers - 1: # i is a index, hence -1
+ # is this the last worker that needs to be created?
+ # then task that worker with all the remaining models
+ num_subprocess_models = num_models_unhandled
+ else:
+ num_subprocess_models = int(num_models_unhandled / (workers - i))
+
+ # get the chunk from the random states that is meant to be for those models
+ random_states_for_worker = random_states[-num_models_unhandled:][:num_subprocess_models]
+
+ args = (ensemble, num_subprocess_models, random_states_for_worker, child_conn)
+ try:
+ process = Process(target=_generate_topic_models_worker, args=args)
+ processes.append(process)
+ pipes.append((parent_conn, child_conn))
+ process.start()
+ num_models_unhandled -= num_subprocess_models
+ except ProcessError:
+ logger.error(f"could not start process {i}")
+ _teardown(pipes, processes)
+ raise
+
+ # aggregate results
+ # will also block until workers are finished
+ for parent_conn, _ in pipes:
+ answer = parent_conn.recv()
+ parent_conn.close()
+ # this does basically the same as the _generate_topic_models function (concatenate all the ttdas):
+ if not ensemble.memory_friendly_ttda:
+ ensemble.tms += answer
+ ttda = np.concatenate([m.get_topics() for m in answer])
+ else:
+ ttda = answer
+ ensemble.ttda = np.concatenate([ensemble.ttda, ttda])
+
+ for process in processes:
+ process.terminate()
+
+
+def _generate_topic_models(ensemble, num_models, random_states=None):
+ """Train the topic models that form the ensemble.
+
+ Parameters
+ ----------
+ ensemble: EnsembleLda
+ the ensemble
+ num_models : int
+ number of models to be generated
+ random_states : list
+ list of numbers or np.random.RandomState objects. Will be autogenerated based on the ensembles
+ RandomState if None (default).
+ """
+ if random_states is None:
+ random_states = [ensemble.random_state.randint(_MAX_RANDOM_STATE) for _ in range(num_models)]
+
+ assert len(random_states) == num_models
+
+ kwargs = ensemble.gensim_kw_args.copy()
+
+ tm = None # remember one of the topic models from the following
+ # loop, in order to collect some properties from it afterwards.
+
+ for i in range(num_models):
+ kwargs["random_state"] = random_states[i]
+
+ tm = ensemble.get_topic_model_class()(**kwargs)
+
+ # adds the lambda (that is the unnormalized get_topics) to ttda, which is
+ # a list of all those lambdas
+ ensemble.ttda = np.concatenate([ensemble.ttda, tm.get_topics()])
+
+ # only saves the model if it is not "memory friendly"
+ if not ensemble.memory_friendly_ttda:
+ ensemble.tms += [tm]
+
+ # use one of the tms to get some info that will be needed later
+ ensemble.sstats_sum = tm.state.sstats.sum()
+ ensemble.eta = tm.eta
+
+
+def _generate_topic_models_worker(ensemble, num_models, random_states, pipe):
+ """Wrapper for _generate_topic_models to write the results into a pipe.
+
+ This is intended to be used inside a subprocess."""
+ #
+ # Same as _generate_topic_models, but runs in a separate subprocess, and
+ # sends the updated ensemble state to the parent subprocess via a pipe.
+ #
+ logger.info(f"spawned worker to generate {num_models} topic models")
+
+ _generate_topic_models(ensemble=ensemble, num_models=num_models, random_states=random_states)
+
+ # send the ttda that is in the child/workers version of the memory into the pipe
+ # available, after _generate_topic_models has been called in the worker
+ if ensemble.memory_friendly_ttda:
+ # remember that this code is inside the worker processes memory,
+ # so self.ttda is the ttda of only a chunk of models
+ pipe.send(ensemble.ttda)
+ else:
+ pipe.send(ensemble.tms)
+
+ pipe.close()
+
+
+def _calculate_asymmetric_distance_matrix_chunk(
+ ttda1,
+ ttda2,
+ start_index,
+ masking_method,
+ masking_threshold,
+):
+ """Calculate an (asymmetric) distance from each topic in ``ttda1`` to each topic in ``ttda2``.
+
+ Parameters
+ ----------
+ ttda1 and ttda2: 2D arrays of floats
+ Two ttda matrices that are going to be used for distance calculation. Each row in ttda corresponds to one
+ topic. Each cell in the resulting matrix corresponds to the distance between a topic pair.
+ start_index : int
+ this function might be used in multiprocessing, so start_index has to be set as ttda1 is a chunk of the
+ complete ttda in that case. start_index would be 0 if ``ttda1 == self.ttda``. When self.ttda is split into
+ two pieces, each 100 ttdas long, then start_index should be be 100. default is 0
+ masking_method: function
+
+ masking_threshold: float
+
+ Returns
+ -------
+ 2D numpy.ndarray of floats
+ Asymmetric distance matrix of size ``len(ttda1)`` by ``len(ttda2)``.
+
+ """
+ # initialize the distance matrix. ndarray is faster than zeros
+ distances = np.ndarray((len(ttda1), len(ttda2)))
+
+ if ttda1.shape[0] > 0 and ttda2.shape[0] > 0:
+ # the worker might not have received a ttda because it was chunked up too much
+
+ # some help to find a better threshold by useful log messages
+ avg_mask_size = 0
+
+ # now iterate over each topic
+ for ttd1_idx, ttd1 in enumerate(ttda1):
+ # create mask from ttd1 that removes noise from a and keeps the largest terms
+ mask = masking_method(ttd1, masking_threshold)
+ ttd1_masked = ttd1[mask]
+
+ avg_mask_size += mask.sum()
+
+ # now look at every possible pair for topic a:
+ for ttd2_idx, ttd2 in enumerate(ttda2):
+ # distance to itself is 0
+ if ttd1_idx + start_index == ttd2_idx:
+ distances[ttd1_idx][ttd2_idx] = 0
+ continue
+
+ # now mask b based on a, which will force the shape of a onto b
+ ttd2_masked = ttd2[mask]
+
+ # Smart distance calculation avoids calculating cosine distance for highly masked topic-term
+ # distributions that will have distance values near 1.
+ if ttd2_masked.sum() <= _COSINE_DISTANCE_CALCULATION_THRESHOLD:
+ distance = 1
+ else:
+ distance = cosine(ttd1_masked, ttd2_masked)
+
+ distances[ttd1_idx][ttd2_idx] = distance
+
+ percent = round(100 * avg_mask_size / ttda1.shape[0] / ttda1.shape[1], 1)
+ logger.info(f'the given threshold of {masking_threshold} covered on average {percent}% of tokens')
+
+ return distances
+
+
+def _asymmetric_distance_matrix_worker(
+ worker_id,
+ entire_ttda,
+ ttdas_sent,
+ n_ttdas,
+ masking_method,
+ masking_threshold,
+ pipe,
+):
+ """Worker that computes the distance to all other nodes from a chunk of nodes."""
+ logger.info(f"spawned worker {worker_id} to generate {n_ttdas} rows of the asymmetric distance matrix")
+ # the chunk of ttda that's going to be calculated:
+ ttda1 = entire_ttda[ttdas_sent:ttdas_sent + n_ttdas]
+ distance_chunk = _calculate_asymmetric_distance_matrix_chunk(
+ ttda1=ttda1,
+ ttda2=entire_ttda,
+ start_index=ttdas_sent,
+ masking_method=masking_method,
+ masking_threshold=masking_threshold,
+ )
+ pipe.send((worker_id, distance_chunk)) # remember that this code is inside the workers memory
+ pipe.close()
+
+
+def _calculate_assymetric_distance_matrix_multiproc(
+ workers,
+ entire_ttda,
+ masking_method,
+ masking_threshold,
+):
+ processes = []
+ pipes = []
+ ttdas_sent = 0
+
+ for i in range(workers):
+ try:
+ parent_conn, child_conn = Pipe()
+
+ # Load Balancing, for example if there are 9 ttdas and 4 workers, the load will be balanced 2, 2, 2, 3.
+ n_ttdas = 0
+ if i == workers - 1: # i is a index, hence -1
+ # is this the last worker that needs to be created?
+ # then task that worker with all the remaining models
+ n_ttdas = len(entire_ttda) - ttdas_sent
+ else:
+ n_ttdas = int((len(entire_ttda) - ttdas_sent) / (workers - i))
+
+ args = (i, entire_ttda, ttdas_sent, n_ttdas, masking_method, masking_threshold, child_conn)
+ process = Process(target=_asymmetric_distance_matrix_worker, args=args)
+ ttdas_sent += n_ttdas
+
+ processes.append(process)
+ pipes.append((parent_conn, child_conn))
+ process.start()
+ except ProcessError:
+ logger.error(f"could not start process {i}")
+ _teardown(pipes, processes)
+ raise
+
+ distances = []
+ # note, that the following loop maintains order in how the ttda will be concatenated
+ # which is very important. Ordering in ttda has to be the same as when using only one process
+ for parent_conn, _ in pipes:
+ worker_id, distance_chunk = parent_conn.recv()
+ parent_conn.close() # child conn will be closed from inside the worker
+ # this does basically the same as the _generate_topic_models function (concatenate all the ttdas):
+ distances.append(distance_chunk)
+
+ for process in processes:
+ process.terminate()
+
+ return np.concatenate(distances)
+
+
+class EnsembleLda(SaveLoad):
+ """Ensemble Latent Dirichlet Allocation (eLDA), a method of training a topic model ensemble.
+
+ Extracts stable topics that are consistently learned across multiple LDA models. eLDA has the added benefit that
+ the user does not need to know the exact number of topics the topic model should extract ahead of time.
+
+ """
+
+ def __init__(
+ self, topic_model_class="ldamulticore", num_models=3,
+ min_cores=None, # default value from _generate_stable_topics()
+ epsilon=0.1, ensemble_workers=1, memory_friendly_ttda=True,
+ min_samples=None, masking_method=mass_masking, masking_threshold=None,
+ distance_workers=1, random_state=None, **gensim_kw_args,
+ ):
+ """Create and train a new EnsembleLda model.
+
+ Will start training immediatelly, except if iterations, passes or num_models is 0 or if the corpus is missing.
+
+ Parameters
+ ----------
+ topic_model_class : str, topic model, optional
+ Examples:
+ * 'ldamulticore' (default, recommended)
+ * 'lda'
+ * ldamodel.LdaModel
+ * ldamulticore.LdaMulticore
+ ensemble_workers : int, optional
+ Spawns that many processes and distributes the models from the ensemble to those as evenly as possible.
+ num_models should be a multiple of ensemble_workers.
+
+ Setting it to 0 or 1 will both use the non-multiprocessing version. Default: 1
+ num_models : int, optional
+ How many LDA models to train in this ensemble.
+ Default: 3
+ min_cores : int, optional
+ Minimum cores a cluster of topics has to contain so that it is recognized as stable topic.
+ epsilon : float, optional
+ Defaults to 0.1. Epsilon for the CBDBSCAN clustering that generates the stable topics.
+ ensemble_workers : int, optional
+ Spawns that many processes and distributes the models from the ensemble to those as evenly as possible.
+ num_models should be a multiple of ensemble_workers.
+
+ Setting it to 0 or 1 will both use the nonmultiprocessing version. Default: 1
+ memory_friendly_ttda : boolean, optional
+ If True, the models in the ensemble are deleted after training and only a concatenation of each model's
+ topic term distribution (called ttda) is kept to save memory.
+
+ Defaults to True. When False, trained models are stored in a list in self.tms, and no models that are not
+ of a gensim model type can be added to this ensemble using the add_model function.
+
+ If False, any topic term matrix can be suplied to add_model.
+ min_samples : int, optional
+ Required int of nearby topics for a topic to be considered as 'core' in the CBDBSCAN clustering.
+ masking_method : function, optional
+ Choose one of :meth:`~gensim.models.ensemblelda.mass_masking` (default) or
+ :meth:`~gensim.models.ensemblelda.rank_masking` (percentile, faster).
+
+ For clustering, distances between topic-term distributions are asymmetric. In particular, the distance
+ (technically a divergence) from distribution A to B is more of a measure of if A is contained in B. At a
+ high level, this involves using distribution A to mask distribution B and then calculating the cosine
+ distance between the two. The masking can be done in two ways:
+
+ 1. mass: forms mask by taking the top ranked terms until their cumulative mass reaches the
+ 'masking_threshold'
+
+ 2. rank: forms mask by taking the top ranked terms (by mass) until the 'masking_threshold' is reached.
+ For example, a ranking threshold of 0.11 means the top 0.11 terms by weight are used to form a mask.
+ masking_threshold : float, optional
+ Default: None, which uses ``0.95`` for "mass", and ``0.11`` for masking_method "rank". In general, too
+ small a mask threshold leads to inaccurate calculations (no signal) and too big a mask leads to noisy
+ distance calculations. Defaults are often a good sweet spot for this hyperparameter.
+ distance_workers : int, optional
+ When ``distance_workers`` is ``None``, it defaults to ``os.cpu_count()`` for maximum performance. Default is
+ 1, which is not multiprocessed. Set to ``> 1`` to enable multiprocessing.
+ **gensim_kw_args
+ Parameters for each gensim model (e.g. :py:class:`gensim.models.LdaModel`) in the ensemble.
+
+ """
+
+ if "id2word" not in gensim_kw_args:
+ gensim_kw_args["id2word"] = None
+ if "corpus" not in gensim_kw_args:
+ gensim_kw_args["corpus"] = None
+
+ if gensim_kw_args["id2word"] is None and not gensim_kw_args["corpus"] is None:
+ logger.warning("no word id mapping provided; initializing from corpus, assuming identity")
+ gensim_kw_args["id2word"] = utils.dict_from_corpus(gensim_kw_args["corpus"])
+ if gensim_kw_args["id2word"] is None and gensim_kw_args["corpus"] is None:
+ raise ValueError(
+ "at least one of corpus/id2word must be specified, to establish "
+ "input space dimensionality. Corpus should be provided using the "
+ "`corpus` keyword argument."
+ )
+
+ if type(topic_model_class) == type and issubclass(topic_model_class, ldamodel.LdaModel):
+ self.topic_model_class = topic_model_class
+ else:
+ kinds = {
+ "lda": ldamodel.LdaModel,
+ "ldamulticore": ldamulticore.LdaMulticore
+ }
+ if topic_model_class not in kinds:
+ raise ValueError(
+ "topic_model_class should be one of 'lda', 'ldamulticode' or a model "
+ "inheriting from LdaModel"
+ )
+ self.topic_model_class = kinds[topic_model_class]
+
+ self.num_models = num_models
+ self.gensim_kw_args = gensim_kw_args
+
+ self.memory_friendly_ttda = memory_friendly_ttda
+
+ self.distance_workers = distance_workers
+ self.masking_threshold = masking_threshold
+ self.masking_method = masking_method
+
+ # this will provide the gensim api to the ensemble basically
+ self.classic_model_representation = None
+
+ # the ensembles state
+ self.random_state = utils.get_random_state(random_state)
+ self.sstats_sum = 0
+ self.eta = None
+ self.tms = []
+ # initialize empty 2D topic term distribution array (ttda) (number of topics x number of terms)
+ self.ttda = np.empty((0, len(gensim_kw_args["id2word"])))
+ self.asymmetric_distance_matrix_outdated = True
+
+ # in case the model will not train due to some
+ # parameters, stop here and don't train.
+ if num_models <= 0:
+ return
+ if gensim_kw_args.get("corpus") is None:
+ return
+ if "iterations" in gensim_kw_args and gensim_kw_args["iterations"] <= 0:
+ return
+ if "passes" in gensim_kw_args and gensim_kw_args["passes"] <= 0:
+ return
+
+ logger.info(f"generating {num_models} topic models using {ensemble_workers} workers")
+
+ if ensemble_workers > 1:
+ _generate_topic_models_multiproc(self, num_models, ensemble_workers)
+ else:
+ _generate_topic_models(self, num_models)
+
+ self._generate_asymmetric_distance_matrix()
+ self._generate_topic_clusters(epsilon, min_samples)
+ self._generate_stable_topics(min_cores)
+
+ # create model that can provide the usual gensim api to the stable topics from the ensemble
+ self.generate_gensim_representation()
+
+ def get_topic_model_class(self):
+ """Get the class that is used for :meth:`gensim.models.EnsembleLda.generate_gensim_representation`."""
+ if self.topic_model_class is None:
+ instruction = (
+ 'Try setting topic_model_class manually to what the individual models were based on, '
+ 'e.g. LdaMulticore.'
+ )
+ try:
+ module = importlib.import_module(self.topic_model_module_string)
+ self.topic_model_class = getattr(module, self.topic_model_class_string)
+ del self.topic_model_module_string
+ del self.topic_model_class_string
+ except ModuleNotFoundError:
+ logger.error(
+ f'Could not import the "{self.topic_model_class_string}" module in order to provide the '
+ f'"{self.topic_model_class_string}" class as "topic_model_class" attribute. {instruction}'
+ )
+ except AttributeError:
+ logger.error(
+ f'Could not import the "{self.topic_model_class_string}" class from the '
+ f'"{self.topic_model_module_string}" module in order to set the "topic_model_class" attribute. '
+ f'{instruction}'
+ )
+ return self.topic_model_class
+
+ def save(self, *args, **kwargs):
+ if self.get_topic_model_class() is not None:
+ self.topic_model_module_string = self.topic_model_class.__module__
+ self.topic_model_class_string = self.topic_model_class.__name__
+ kwargs['ignore'] = frozenset(kwargs.get('ignore', ())).union(('topic_model_class', ))
+ super(EnsembleLda, self).save(*args, **kwargs)
+
+ save.__doc__ = SaveLoad.save.__doc__
+
+ def convert_to_memory_friendly(self):
+ """Remove the stored gensim models and only keep their ttdas.
+
+ This frees up memory, but you won't have access to the individual models anymore if you intended to use them
+ outside of the ensemble.
+ """
+ self.tms = []
+ self.memory_friendly_ttda = True
+
+ def generate_gensim_representation(self):
+ """Create a gensim model from the stable topics.
+
+ The returned representation is an Gensim LdaModel (:py:class:`gensim.models.LdaModel`) that has been
+ instantiated with an A-priori belief on word probability, eta, that represents the topic-term distributions of
+ any stable topics the were found by clustering over the ensemble of topic distributions.
+
+ When no stable topics have been detected, None is returned.
+
+ Returns
+ -------
+ :py:class:`gensim.models.LdaModel`
+ A Gensim LDA Model classic_model_representation for which:
+ ``classic_model_representation.get_topics() == self.get_topics()``
+
+ """
+ logger.info("generating classic gensim model representation based on results from the ensemble")
+
+ sstats_sum = self.sstats_sum
+ # if sstats_sum (which is the number of words actually) should be wrong for some fantastic funny reason
+ # that makes you want to peel your skin off, recreate it (takes a while):
+ if sstats_sum == 0 and "corpus" in self.gensim_kw_args and not self.gensim_kw_args["corpus"] is None:
+ for document in self.gensim_kw_args["corpus"]:
+ for token in document:
+ sstats_sum += token[1]
+ self.sstats_sum = sstats_sum
+
+ stable_topics = self.get_topics()
+
+ num_stable_topics = len(stable_topics)
+
+ if num_stable_topics == 0:
+ logger.error(
+ "the model did not detect any stable topic. You can try to adjust epsilon: "
+ "recluster(eps=...)"
+ )
+ self.classic_model_representation = None
+ return
+
+ # create a new gensim model
+ params = self.gensim_kw_args.copy()
+ params["eta"] = self.eta
+ params["num_topics"] = num_stable_topics
+ # adjust params in a way that no training happens
+ params["passes"] = 0 # no training
+ # iterations is needed for inference, pass it to the model
+
+ classic_model_representation = self.get_topic_model_class()(**params)
+
+ # when eta was None, use what gensim generates as default eta for the following tasks:
+ eta = classic_model_representation.eta
+ if sstats_sum == 0:
+ sstats_sum = classic_model_representation.state.sstats.sum()
+ self.sstats_sum = sstats_sum
+
+ # the following is important for the denormalization
+ # to generate the proper sstats for the new gensim model:
+ # transform to dimensionality of stable_topics. axis=1 is summed
+ eta_sum = 0
+ if isinstance(eta, (int, float)):
+ eta_sum = [eta * len(stable_topics[0])] * num_stable_topics
+ else:
+ if len(eta.shape) == 1: # [e1, e2, e3]
+ eta_sum = [[eta.sum()]] * num_stable_topics
+ if len(eta.shape) > 1: # [[e11, e12, ...], [e21, e22, ...], ...]
+ eta_sum = np.array(eta.sum(axis=1)[:, None])
+
+ # the factor, that will be used when get_topics() is used, for normalization
+ # will never change, because the sum for eta as well as the sum for sstats is constant.
+ # Therefore predicting normalization_factor becomes super easy.
+ # corpus is a mapping of id to occurrences
+
+ # so one can also easily calculate the
+ # right sstats, so that get_topics() will return the stable topics no
+ # matter eta.
+
+ normalization_factor = np.array([[sstats_sum / num_stable_topics]] * num_stable_topics) + eta_sum
+
+ sstats = stable_topics * normalization_factor
+ sstats -= eta
+
+ classic_model_representation.state.sstats = sstats.astype(np.float32)
+ # fix expElogbeta.
+ classic_model_representation.sync_state()
+
+ self.classic_model_representation = classic_model_representation
+
+ return classic_model_representation
+
+ def add_model(self, target, num_new_models=None):
+ """Add the topic term distribution array (ttda) of another model to the ensemble.
+
+ This way, multiple topic models can be connected to an ensemble manually. Make sure that all the models use
+ the exact same dictionary/idword mapping.
+
+ In order to generate new stable topics afterwards, use:
+ 2. ``self.``:meth:`~gensim.models.ensemblelda.EnsembleLda.recluster`
+
+ The ttda of another ensemble can also be used, in that case set ``num_new_models`` to the ``num_models``
+ parameter of the ensemble, that means the number of classic models in the ensemble that generated the ttda.
+ This is important, because that information is used to estimate "min_samples" for _generate_topic_clusters.
+
+ If you trained this ensemble in the past with a certain Dictionary that you want to reuse for other
+ models, you can get it from: ``self.id2word``.
+
+ Parameters
+ ----------
+ target : {see description}
+ 1. A single EnsembleLda object
+ 2. List of EnsembleLda objects
+ 3. A single Gensim topic model (e.g. (:py:class:`gensim.models.LdaModel`)
+ 4. List of Gensim topic models
+
+ if memory_friendly_ttda is True, target can also be:
+ 5. topic-term-distribution-array
+
+ example: [[0.1, 0.1, 0.8], [...], ...]
+
+ [topic1, topic2, ...]
+ with topic being an array of probabilities:
+ [token1, token2, ...]
+
+ token probabilities in a single topic sum to one, therefore, all the words sum to len(ttda)
+
+ num_new_models : integer, optional
+ the model keeps track of how many models were used in this ensemble. Set higher if ttda contained topics
+ from more than one model. Default: None, which takes care of it automatically.
+
+ If target is a 2D-array of float values, it assumes 1.
+
+ If the ensemble has ``memory_friendly_ttda`` set to False, then it will always use the number of models in
+ the target parameter.
+
+ """
+ # If the model has never seen a ttda before, initialize.
+ # If it has, append.
+
+ # Be flexible. Can be a single element or a list of elements
+ # make sure it is a numpy array
+ if not isinstance(target, (np.ndarray, list)):
+ target = np.array([target])
+ else:
+ target = np.array(target)
+ assert len(target) > 0
+
+ if self.memory_friendly_ttda:
+ # for memory friendly models/ttdas, append the ttdas to itself
+
+ detected_num_models = 0
+ ttda = []
+
+ # 1. ttda array, because that's the only accepted input that contains numbers
+ if isinstance(target.dtype.type(), (np.number, float)):
+ ttda = target
+ detected_num_models = 1
+
+ # 2. list of ensemblelda objects
+ elif isinstance(target[0], type(self)):
+ ttda = np.concatenate([ensemble.ttda for ensemble in target], axis=0)
+ detected_num_models = sum([ensemble.num_models for ensemble in target])
+
+ # 3. list of gensim models
+ elif isinstance(target[0], basemodel.BaseTopicModel):
+ ttda = np.concatenate([model.get_topics() for model in target], axis=0)
+ detected_num_models = len(target)
+
+ # unknown
+ else:
+ raise ValueError(f"target is of unknown type or a list of unknown types: {type(target[0])}")
+
+ # new models were added, increase num_models
+ # if the user didn't provide a custon numer to use
+ if num_new_models is None:
+ self.num_models += detected_num_models
+ else:
+ self.num_models += num_new_models
+
+ else: # memory unfriendly ensembles
+ ttda = []
+
+ # 1. ttda array
+ if isinstance(target.dtype.type(), (np.number, float)):
+ raise ValueError(
+ 'ttda arrays cannot be added to ensembles, for which memory_friendly_ttda=False, '
+ 'you can call convert_to_memory_friendly, but it will discard the stored gensim '
+ 'models and only keep the relevant topic term distributions from them.'
+ )
+
+ # 2. list of ensembles
+ elif isinstance(target[0], type(self)):
+ for ensemble in target:
+ self.tms += ensemble.tms
+ ttda = np.concatenate([ensemble.ttda for ensemble in target], axis=0)
+
+ # 3. list of gensim models
+ elif isinstance(target[0], basemodel.BaseTopicModel):
+ self.tms += target.tolist()
+ ttda = np.concatenate([model.get_topics() for model in target], axis=0)
+
+ # unknown
+ else:
+ raise ValueError(f"target is of unknown type or a list of unknown types: {type(target[0])}")
+
+ # in this case, len(self.tms) should
+ # always match self.num_models
+ if num_new_models is not None and num_new_models + self.num_models != len(self.tms):
+ logger.info(
+ 'num_new_models will be ignored. num_models should match the number of '
+ 'stored models for a memory unfriendly ensemble'
+ )
+ self.num_models = len(self.tms)
+
+ logger.info(f"ensemble contains {self.num_models} models and {len(self.ttda)} topics now")
+
+ if self.ttda.shape[1] != ttda.shape[1]:
+ raise ValueError(
+ f"target ttda dimensions do not match. Topics must be {self.ttda.shape[-1]} but was {ttda.shape[-1]} "
+ f"elements large"
+ )
+
+ self.ttda = np.append(self.ttda, ttda, axis=0)
+
+ # tell recluster that the distance matrix needs to be regenerated
+ self.asymmetric_distance_matrix_outdated = True
+
+ def _generate_asymmetric_distance_matrix(self):
+ """Calculate the pairwise distance matrix for all the ttdas from the ensemble.
+
+ Returns the asymmetric pairwise distance matrix that is used in the DBSCAN clustering.
+
+ Afterwards, the model needs to be reclustered for this generated matrix to take effect.
+
+ """
+ workers = self.distance_workers
+
+ # matrix is up to date afterwards
+ self.asymmetric_distance_matrix_outdated = False
+
+ logger.info(f"generating a {len(self.ttda)} x {len(self.ttda)} asymmetric distance matrix...")
+
+ if workers is not None and workers <= 1:
+ self.asymmetric_distance_matrix = _calculate_asymmetric_distance_matrix_chunk(
+ ttda1=self.ttda,
+ ttda2=self.ttda,
+ start_index=0,
+ masking_method=self.masking_method,
+ masking_threshold=self.masking_threshold,
+ )
+ else:
+ # best performance on 2-core machine: 2 workers
+ if workers is None:
+ workers = os.cpu_count()
+
+ self.asymmetric_distance_matrix = _calculate_assymetric_distance_matrix_multiproc(
+ workers=workers,
+ entire_ttda=self.ttda,
+ masking_method=self.masking_method,
+ masking_threshold=self.masking_threshold,
+ )
+
+ def _generate_topic_clusters(self, eps=0.1, min_samples=None):
+ """Run the CBDBSCAN algorithm on all the detected topics and label them with label-indices.
+
+ The final approval and generation of stable topics is done in ``_generate_stable_topics()``.
+
+ Parameters
+ ----------
+ eps : float
+ dbscan distance scale
+ min_samples : int, optional
+ defaults to ``int(self.num_models / 2)``, dbscan min neighbours threshold required to consider
+ a topic to be a core. Should scale with the number of models, ``self.num_models``
+
+ """
+ if min_samples is None:
+ min_samples = int(self.num_models / 2)
+ logger.info("fitting the clustering model, using %s for min_samples", min_samples)
+ else:
+ logger.info("fitting the clustering model")
+
+ self.cluster_model = CBDBSCAN(eps=eps, min_samples=min_samples)
+ self.cluster_model.fit(self.asymmetric_distance_matrix)
+
+ def _generate_stable_topics(self, min_cores=None):
+ """Generate stable topics out of the clusters.
+
+ The function finds clusters of topics using a variant of DBScan. If a cluster has enough core topics
+ (c.f. parameter ``min_cores``), then this cluster represents a stable topic. The stable topic is specifically
+ calculated as the average over all topic-term distributions of the core topics in the cluster.
+
+ This function is the last step that has to be done in the ensemble. After this step is complete,
+ Stable topics can be retrieved afterwards using the :meth:`~gensim.models.ensemblelda.EnsembleLda.get_topics`
+ method.
+
+ Parameters
+ ----------
+ min_cores : int
+ Minimum number of core topics needed to form a cluster that represents a stable topic.
+ Using ``None`` defaults to ``min_cores = min(3, max(1, int(self.num_models /4 +1)))``
+
+ """
+ # min_cores being 0 makes no sense. there has to be a core for a cluster
+ # or there is no cluster
+ if min_cores == 0:
+ min_cores = 1
+
+ if min_cores is None:
+ # min_cores is a number between 1 and 3, depending on the number of models
+ min_cores = min(3, max(1, int(self.num_models / 4 + 1)))
+ logger.info("generating stable topics, using %s for min_cores", min_cores)
+ else:
+ logger.info("generating stable topics")
+
+ cbdbscan_topics = self.cluster_model.results
+
+ grouped_by_labels = _group_by_labels(cbdbscan_topics)
+ clusters = _aggregate_topics(grouped_by_labels)
+ valid_clusters = _validate_clusters(clusters, min_cores)
+ valid_cluster_labels = {cluster.label for cluster in valid_clusters}
+
+ for topic in cbdbscan_topics:
+ topic.valid_neighboring_labels = {
+ label for label in topic.neighboring_labels
+ if label in valid_cluster_labels
+ }
+
+ # keeping only VALID cores
+ valid_core_mask = np.vectorize(_is_valid_core)(cbdbscan_topics)
+ valid_topics = self.ttda[valid_core_mask]
+ topic_labels = np.array([topic.label for topic in cbdbscan_topics])[valid_core_mask]
+ unique_labels = np.unique(topic_labels)
+
+ num_stable_topics = len(unique_labels)
+ stable_topics = np.empty((num_stable_topics, len(self.id2word)))
+
+ # for each cluster
+ for label_index, label in enumerate(unique_labels):
+ # mean of all the topics that are of that cluster
+ topics_of_cluster = np.array([topic for t, topic in enumerate(valid_topics) if topic_labels[t] == label])
+ stable_topics[label_index] = topics_of_cluster.mean(axis=0)
+
+ self.valid_clusters = valid_clusters
+ self.stable_topics = stable_topics
+
+ logger.info("found %s stable topics", len(stable_topics))
+
+ def recluster(self, eps=0.1, min_samples=None, min_cores=None):
+ """Reapply CBDBSCAN clustering and stable topic generation.
+
+ Stable topics can be retrieved using :meth:`~gensim.models.ensemblelda.EnsembleLda.get_topics`.
+
+ Parameters
+ ----------
+ eps : float
+ epsilon for the CBDBSCAN algorithm, having the same meaning as in classic DBSCAN clustering.
+ default: ``0.1``
+ min_samples : int
+ The minimum number of samples in the neighborhood of a topic to be considered a core in CBDBSCAN.
+ default: ``int(self.num_models / 2)``
+ min_cores : int
+ how many cores a cluster has to have, to be treated as stable topic. That means, how many topics
+ that look similar have to be present, so that the average topic in those is used as stable topic.
+ default: ``min(3, max(1, int(self.num_models /4 +1)))``
+
+ """
+ # if new models were added to the ensemble, the distance matrix needs to be generated again
+ if self.asymmetric_distance_matrix_outdated:
+ logger.info("asymmetric distance matrix is outdated due to add_model")
+ self._generate_asymmetric_distance_matrix()
+
+ # Run CBDBSCAN to get topic clusters:
+ self._generate_topic_clusters(eps, min_samples)
+
+ # Interpret the results of CBDBSCAN to identify stable topics:
+ self._generate_stable_topics(min_cores)
+
+ # Create gensim LdaModel representation of topic model with stable topics (can be used for inference):
+ self.generate_gensim_representation()
+
+ # GENSIM API
+ # to make using the ensemble in place of a gensim model as easy as possible
+
+ def get_topics(self):
+ """Return only the stable topics from the ensemble.
+
+ Returns
+ -------
+ 2D Numpy.numpy.ndarray of floats
+ List of stable topic term distributions
+
+ """
+ return self.stable_topics
+
+ def _ensure_gensim_representation(self):
+ """Check if stable topics and the internal gensim representation exist. Raise an error if not."""
+ if self.classic_model_representation is None:
+ if len(self.stable_topics) == 0:
+ raise ValueError("no stable topic was detected")
+ else:
+ raise ValueError("use generate_gensim_representation() first")
+
+ def __getitem__(self, i):
+ """See :meth:`gensim.models.LdaModel.__getitem__`."""
+ self._ensure_gensim_representation()
+ return self.classic_model_representation[i]
+
+ def inference(self, *posargs, **kwargs):
+ """See :meth:`gensim.models.LdaModel.inference`."""
+ self._ensure_gensim_representation()
+ return self.classic_model_representation.inference(*posargs, **kwargs)
+
+ def log_perplexity(self, *posargs, **kwargs):
+ """See :meth:`gensim.models.LdaModel.log_perplexity`."""
+ self._ensure_gensim_representation()
+ return self.classic_model_representation.log_perplexity(*posargs, **kwargs)
+
+ def print_topics(self, *posargs, **kwargs):
+ """See :meth:`gensim.models.LdaModel.print_topics`."""
+ self._ensure_gensim_representation()
+ return self.classic_model_representation.print_topics(*posargs, **kwargs)
+
+ @property
+ def id2word(self):
+ """Return the :py:class:`gensim.corpora.dictionary.Dictionary` object used in the model."""
+ return self.gensim_kw_args["id2word"]
+
+
+class CBDBSCAN:
+ """A Variation of the DBSCAN algorithm called Checkback DBSCAN (CBDBSCAN).
+
+ The algorithm works based on DBSCAN-like parameters 'eps' and 'min_samples' that respectively define how far a
+ "nearby" point is, and the minimum number of nearby points needed to label a candidate datapoint a core of a
+ cluster. (See https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html).
+
+ The algorithm works as follows:
+
+ 1. (A)symmetric distance matrix provided at fit-time (called 'amatrix').
+ For the sake of example below, assume the there are only five topics (amatrix contains distances with dim 5x5),
+ T_1, T_2, T_3, T_4, T_5:
+ 2. Start by scanning a candidate topic with respect to a parent topic
+ (e.g. T_1 with respect to parent None)
+ 3. Check which topics are nearby the candidate topic using 'self.eps' as a threshold and call them neighbours
+ (e.g. assume T_3, T_4, and T_5 are nearby and become neighbours)
+ 4. If there are more neighbours than 'self.min_samples', the candidate topic becomes a core candidate for a cluster
+ (e.g. if 'min_samples'=1, then T_1 becomes the first core of a cluster)
+ 5. If candidate is a core, CheckBack (CB) to find the fraction of neighbours that are either the parent or the
+ parent's neighbours. If this fraction is more than 75%, give the candidate the same label as its parent.
+ (e.g. in the trivial case there is no parent (or neighbours of that parent), a new incremental label is given)
+ 6. If candidate is a core, recursively scan the next nearby topic (e.g. scan T_3) labeling the previous topic as
+ the parent and the previous neighbours as the parent_neighbours - repeat steps 2-6:
+
+ 2. (e.g. Scan candidate T_3 with respect to parent T_1 that has parent_neighbours T_3, T_4, and T_5)
+ 3. (e.g. T5 is the only neighbour)
+ 4. (e.g. number of neighbours is 1, therefore candidate T_3 becomes a core)
+ 5. (e.g. CheckBack finds that two of the four parent and parent neighbours are neighbours of candidate T_3.
+ Therefore the candidate T_3 does NOT get the same label as its parent T_1)
+ 6. (e.g. Scan candidate T_5 with respect to parent T_3 that has parent_neighbours T_5)
+
+ The CB step has the effect that it enforces cluster compactness and allows the model to avoid creating clusters for
+ unstable topics made of a composition of multiple stable topics.
+
+ """
+
+ def __init__(self, eps, min_samples):
+ """Create a new CBDBSCAN object. Call fit in order to train it on an asymmetric distance matrix.
+
+ Parameters
+ ----------
+ eps : float
+ epsilon for the CBDBSCAN algorithm, having the same meaning as in classic DBSCAN clustering.
+ min_samples : int
+ The minimum number of samples in the neighborhood of a topic to be considered a core in CBDBSCAN.
+
+ """
+ self.eps = eps
+ self.min_samples = min_samples
+
+ def fit(self, amatrix):
+ """Apply the algorithm to an asymmetric distance matrix."""
+ self.next_label = 0
+
+ topic_clustering_results = [
+ Topic(
+ is_core=False,
+ neighboring_labels=set(),
+ neighboring_topic_indices=set(),
+ label=None,
+ num_neighboring_labels=0,
+ valid_neighboring_labels=set()
+ ) for i in range(len(amatrix))
+ ]
+
+ amatrix_copy = amatrix.copy()
+
+ # to avoid the problem of comparing the topic with itself
+ np.fill_diagonal(amatrix_copy, 1)
+
+ min_distance_per_topic = [(distance, index) for index, distance in enumerate(amatrix_copy.min(axis=1))]
+ min_distance_per_topic_sorted = sorted(min_distance_per_topic, key=lambda distance: distance[0])
+ ordered_min_similarity = [index for distance, index in min_distance_per_topic_sorted]
+
+ def scan_topic(topic_index, current_label=None, parent_neighbors=None):
+ """Extend the cluster in one direction.
+
+ Results are accumulated to ``self.results``.
+
+ Parameters
+ ----------
+ topic_index : int
+ The topic that might be added to the existing cluster, or which might create a new cluster if necessary.
+ current_label : int
+ The label of the cluster that might be suitable for ``topic_index``
+
+ """
+ neighbors_sorted = sorted(
+ [
+ (distance, index)
+ for index, distance in enumerate(amatrix_copy[topic_index])
+ ],
+ key=lambda x: x[0],
+ )
+ neighboring_topic_indices = [index for distance, index in neighbors_sorted if distance < self.eps]
+
+ num_neighboring_topics = len(neighboring_topic_indices)
+
+ # If the number of neighbor indices of a topic is large enough, it is considered a core.
+ # This also takes neighbor indices that already are identified as core in count.
+ if num_neighboring_topics >= self.min_samples:
+ # This topic is a core!
+ topic_clustering_results[topic_index].is_core = True
+
+ # if current_label is none, then this is the first core
+ # of a new cluster (hence next_label is used)
+ if current_label is None:
+ # next_label is initialized with 0 in fit() for the first cluster
+ current_label = self.next_label
+ self.next_label += 1
+
+ else:
+ # In case the core has a parent, check the distance to the parents neighbors (since the matrix is
+ # asymmetric, it takes return distances into account here)
+ # If less than 25% of the elements are close enough, then create a new cluster rather than further
+ # growing the current cluster in that direction.
+ close_parent_neighbors_mask = amatrix_copy[topic_index][parent_neighbors] < self.eps
+
+ if close_parent_neighbors_mask.mean() < 0.25:
+ # start new cluster by changing current_label
+ current_label = self.next_label
+ self.next_label += 1
+
+ topic_clustering_results[topic_index].label = current_label
+
+ for neighboring_topic_index in neighboring_topic_indices:
+ if topic_clustering_results[neighboring_topic_index].label is None:
+ ordered_min_similarity.remove(neighboring_topic_index)
+ # try to extend the cluster into the direction of the neighbor
+ scan_topic(neighboring_topic_index, current_label, neighboring_topic_indices + [topic_index])
+
+ topic_clustering_results[neighboring_topic_index].neighboring_topic_indices.add(topic_index)
+ topic_clustering_results[neighboring_topic_index].neighboring_labels.add(current_label)
+
+ else:
+ # this topic is not a core!
+ if current_label is None:
+ topic_clustering_results[topic_index].label = -1
+ else:
+ topic_clustering_results[topic_index].label = current_label
+
+ # elements are going to be removed from that array in scan_topic, do until it is empty
+ while len(ordered_min_similarity) != 0:
+ next_topic_index = ordered_min_similarity.pop(0)
+ scan_topic(next_topic_index)
+
+ self.results = topic_clustering_results
diff --git a/gensim/models/fasttext.py b/gensim/models/fasttext.py
index 2c2750d70c..a94bc17f27 100644
--- a/gensim/models/fasttext.py
+++ b/gensim/models/fasttext.py
@@ -276,7 +276,7 @@ def __init__(self, sentences=None, corpus_file=None, sg=0, hs=0, vector_size=100
max_vocab_size=None, word_ngrams=1, sample=1e-3, seed=1, workers=3, min_alpha=0.0001,
negative=5, ns_exponent=0.75, cbow_mean=1, hashfxn=hash, epochs=5, null_word=0, min_n=3, max_n=6,
sorted_vocab=1, bucket=2000000, trim_rule=None, batch_words=MAX_WORDS_IN_BATCH, callbacks=(),
- max_final_vocab=None):
+ max_final_vocab=None, shrink_windows=True,):
"""Train, use and evaluate word representations learned using the method
described in `Enriching Word Vectors with Subword Information `_,
aka FastText.
@@ -301,7 +301,7 @@ def __init__(self, sentences=None, corpus_file=None, sg=0, hs=0, vector_size=100
uninitialized).
min_count : int, optional
The model ignores all words with total frequency lower than this.
- size : int, optional
+ vector_size : int, optional
Dimensionality of the word vectors.
window : int, optional
The maximum distance between the current and predicted word within a sentence.
@@ -385,6 +385,12 @@ def __init__(self, sentences=None, corpus_file=None, sg=0, hs=0, vector_size=100
``min_count```. If the specified ``min_count`` is more than the
automatically calculated ``min_count``, the former will be used.
Set to ``None`` if not required.
+ shrink_windows : bool, optional
+ New in 4.1. Experimental.
+ If True, the effective window size is uniformly sampled from [1, `window`]
+ for each target word during training, to match the original word2vec algorithm's
+ approximate weighting of context words by distance. Otherwise, the effective
+ window size is always fixed to `window` words to either side.
Examples
--------
@@ -432,7 +438,8 @@ def __init__(self, sentences=None, corpus_file=None, sg=0, hs=0, vector_size=100
max_vocab_size=max_vocab_size, max_final_vocab=max_final_vocab,
min_count=min_count, sample=sample, sorted_vocab=sorted_vocab,
null_word=null_word, ns_exponent=ns_exponent, hashfxn=hashfxn,
- seed=seed, hs=hs, negative=negative, cbow_mean=cbow_mean, min_alpha=min_alpha)
+ seed=seed, hs=hs, negative=negative, cbow_mean=cbow_mean,
+ min_alpha=min_alpha, shrink_windows=shrink_windows)
def _init_post_load(self, hidden_output):
num_vectors = len(self.wv.vectors)
@@ -489,16 +496,20 @@ def estimate_memory(self, vocab_size=None, report=None):
)
return report
- def _do_train_epoch(self, corpus_file, thread_id, offset, cython_vocab, thread_private_mem, cur_epoch,
- total_examples=None, total_words=None, **kwargs):
+ def _do_train_epoch(
+ self, corpus_file, thread_id, offset, cython_vocab, thread_private_mem, cur_epoch,
+ total_examples=None, total_words=None, **kwargs,
+ ):
work, neu1 = thread_private_mem
if self.sg:
- examples, tally, raw_tally = train_epoch_sg(self, corpus_file, offset, cython_vocab, cur_epoch,
- total_examples, total_words, work, neu1)
+ examples, tally, raw_tally = train_epoch_sg(
+ self, corpus_file, offset, cython_vocab, cur_epoch, total_examples, total_words, work, neu1,
+ )
else:
- examples, tally, raw_tally = train_epoch_cbow(self, corpus_file, offset, cython_vocab, cur_epoch,
- total_examples, total_words, work, neu1)
+ examples, tally, raw_tally = train_epoch_cbow(
+ self, corpus_file, offset, cython_vocab, cur_epoch, total_examples, total_words, work, neu1,
+ )
return examples, tally, raw_tally
diff --git a/gensim/models/fasttext_corpusfile.pyx b/gensim/models/fasttext_corpusfile.pyx
index e5ec611aa0..5d275b42b6 100644
--- a/gensim/models/fasttext_corpusfile.pyx
+++ b/gensim/models/fasttext_corpusfile.pyx
@@ -46,7 +46,8 @@ cdef void prepare_c_structures_for_batch(
vector[vector[string]] &sentences, int sample, int hs, int window, long long *total_words,
int *effective_words, int *effective_sentences, unsigned long long *next_random, cvocab_t *vocab,
int *sentence_idx, np.uint32_t *indexes, int *codelens, np.uint8_t **codes, np.uint32_t **points,
- np.uint32_t *reduced_windows, int *subwords_idx_len, np.uint32_t **subwords_idx) nogil:
+ np.uint32_t *reduced_windows, int *subwords_idx_len, np.uint32_t **subwords_idx, int shrink_windows,
+ ) nogil:
cdef VocabItem word
cdef string token
cdef vector[string] sent
@@ -88,8 +89,12 @@ cdef void prepare_c_structures_for_batch(
break
# precompute "reduced window" offsets in a single randint() call
- for i in range(effective_words[0]):
- reduced_windows[i] = random_int32(next_random) % window
+ if shrink_windows:
+ for i in range(effective_words[0]):
+ reduced_windows[i] = random_int32(next_random) % window
+ else:
+ for i in range(effective_words[0]):
+ reduced_windows[i] = 0
def train_epoch_sg(
@@ -136,6 +141,7 @@ def train_epoch_sg(
cdef long long total_sentences = 0
cdef long long total_effective_words = 0, total_words = 0
cdef int sent_idx, idx_start, idx_end
+ cdef int shrink_windows = int(model.shrink_windows)
init_ft_config(&c, model, _alpha, _work, _l1)
@@ -153,7 +159,7 @@ def train_epoch_sg(
prepare_c_structures_for_batch(
sentences, c.sample, c.hs, c.window, &total_words, &effective_words, &effective_sentences,
&c.next_random, vocab.get_vocab_ptr(), c.sentence_idx, c.indexes, c.codelens,
- c.codes, c.points, c.reduced_windows, c.subwords_idx_len, c.subwords_idx)
+ c.codes, c.points, c.reduced_windows, c.subwords_idx_len, c.subwords_idx, shrink_windows)
for sent_idx in range(effective_sentences):
idx_start = c.sentence_idx[sent_idx]
@@ -226,6 +232,7 @@ def train_epoch_cbow(model, corpus_file, offset, _cython_vocab, _cur_epoch, _exp
cdef long long total_sentences = 0
cdef long long total_effective_words = 0, total_words = 0
cdef int sent_idx, idx_start, idx_end
+ cdef int shrink_windows = int(model.shrink_windows)
init_ft_config(&c, model, _alpha, _work, _neu1)
@@ -243,7 +250,7 @@ def train_epoch_cbow(model, corpus_file, offset, _cython_vocab, _cur_epoch, _exp
prepare_c_structures_for_batch(
sentences, c.sample, c.hs, c.window, &total_words, &effective_words, &effective_sentences,
&c.next_random, vocab.get_vocab_ptr(), c.sentence_idx, c.indexes, c.codelens,
- c.codes, c.points, c.reduced_windows, c.subwords_idx_len, c.subwords_idx)
+ c.codes, c.points, c.reduced_windows, c.subwords_idx_len, c.subwords_idx, shrink_windows)
for sent_idx in range(effective_sentences):
idx_start = c.sentence_idx[sent_idx]
diff --git a/gensim/models/fasttext_inner.pyx b/gensim/models/fasttext_inner.pyx
index e71ed6f31d..e27bd62feb 100644
--- a/gensim/models/fasttext_inner.pyx
+++ b/gensim/models/fasttext_inner.pyx
@@ -601,8 +601,12 @@ def train_batch_any(model, sentences, alpha, _work, _neu1):
num_words, num_sentences = populate_ft_config(&c, model.wv, model.wv.buckets_word, sentences)
# precompute "reduced window" offsets in a single randint() call
- for i, randint in enumerate(model.random.randint(0, c.window, num_words)):
- c.reduced_windows[i] = randint
+ if model.shrink_windows:
+ for i, randint in enumerate(model.random.randint(0, c.window, num_words)):
+ c.reduced_windows[i] = randint
+ else:
+ for i in range(num_words):
+ c.reduced_windows[i] = 0
# release GIL & train on all sentences in the batch
with nogil:
diff --git a/gensim/models/keyedvectors.py b/gensim/models/keyedvectors.py
index daa5482184..b5debb21c1 100644
--- a/gensim/models/keyedvectors.py
+++ b/gensim/models/keyedvectors.py
@@ -171,6 +171,7 @@
import itertools
import warnings
from numbers import Integral
+from typing import Iterable
from numpy import (
dot, float32 as REAL, double, array, zeros, vstack,
@@ -178,6 +179,7 @@
)
import numpy as np
from scipy import stats
+from scipy.spatial.distance import cdist
from gensim import utils, matutils # utility fnc for pickling, common scipy operations etc
from gensim.corpora.dictionary import Dictionary
@@ -187,7 +189,21 @@
logger = logging.getLogger(__name__)
-KEY_TYPES = (str, int, np.integer)
+_KEY_TYPES = (str, int, np.integer)
+
+_EXTENDED_KEY_TYPES = (str, int, np.integer, np.ndarray)
+
+
+def _ensure_list(value):
+ """Ensure that the specified value is wrapped in a list, for those supported cases
+ where we also accept a single key or vector."""
+ if value is None:
+ return []
+
+ if isinstance(value, _KEY_TYPES) or (isinstance(value, ndarray) and len(value.shape) == 1):
+ return [value]
+
+ return value
class KeyedVectors(utils.SaveLoad):
@@ -375,7 +391,7 @@ def __getitem__(self, key_or_keys):
Vector representation for `key_or_keys` (1D if `key_or_keys` is single key, otherwise - 2D).
"""
- if isinstance(key_or_keys, KEY_TYPES):
+ if isinstance(key_or_keys, _KEY_TYPES):
return self.get_vector(key_or_keys)
return vstack([self.get_vector(key) for key in key_or_keys])
@@ -489,7 +505,7 @@ def add_vectors(self, keys, weights, extras=None, replace=False):
if True - replace vectors, otherwise - keep old vectors.
"""
- if isinstance(keys, KEY_TYPES):
+ if isinstance(keys, _KEY_TYPES):
keys = [keys]
weights = np.array(weights).reshape(1, -1)
elif isinstance(weights, list):
@@ -727,10 +743,9 @@ def most_similar(
if isinstance(topn, Integral) and topn < 1:
return []
- if positive is None:
- positive = []
- if negative is None:
- negative = []
+ # allow passing a single string-key or vector for the positive/negative arguments
+ positive = _ensure_list(positive)
+ negative = _ensure_list(negative)
self.fill_norms()
clip_end = clip_end or len(self.vectors)
@@ -739,18 +754,14 @@ def most_similar(
clip_start = 0
clip_end = restrict_vocab
- if isinstance(positive, KEY_TYPES) and not negative:
- # allow calls like most_similar('dog'), as a shorthand for most_similar(['dog'])
- positive = [positive]
-
# add weights for each key, if not already present; default to 1.0 for positive and -1.0 for negative keys
positive = [
- (item, 1.0) if isinstance(item, KEY_TYPES + (ndarray,))
- else item for item in positive
+ (item, 1.0) if isinstance(item, _EXTENDED_KEY_TYPES) else item
+ for item in positive
]
negative = [
- (item, -1.0) if isinstance(item, KEY_TYPES + (ndarray,))
- else item for item in negative
+ (item, -1.0) if isinstance(item, _EXTENDED_KEY_TYPES) else item
+ for item in negative
]
# compute the weighted average of all keys
@@ -901,23 +912,16 @@ def wmdistance(self, document1, document2, norm=True):
# Both documents are composed of a single unique token => zero distance.
return 0.0
- # Sets for faster look-up.
- docset1 = set(document1)
- docset2 = set(document2)
+ doclist1 = list(set(document1))
+ doclist2 = list(set(document2))
+ v1 = np.array([self.get_vector(token, norm=norm) for token in doclist1])
+ v2 = np.array([self.get_vector(token, norm=norm) for token in doclist2])
+ doc1_indices = dictionary.doc2idx(doclist1)
+ doc2_indices = dictionary.doc2idx(doclist2)
# Compute distance matrix.
distance_matrix = zeros((vocab_len, vocab_len), dtype=double)
- for i, t1 in dictionary.items():
- if t1 not in docset1:
- continue
-
- for j, t2 in dictionary.items():
- if t2 not in docset2 or distance_matrix[i, j] != 0.0:
- continue
-
- # Compute Euclidean distance between (potentially unit-normed) word vectors.
- distance_matrix[i, j] = distance_matrix[j, i] = np.sqrt(
- np_sum((self.get_vector(t1, norm=norm) - self.get_vector(t2, norm=norm))**2))
+ distance_matrix[np.ix_(doc1_indices, doc2_indices)] = cdist(v1, v2)
if abs(np_sum(distance_matrix)) < 1e-8:
# `emd` gets stuck if the distance matrix contains only zeros.
@@ -974,21 +978,16 @@ def most_similar_cosmul(self, positive=None, negative=None, topn=10):
if isinstance(topn, Integral) and topn < 1:
return []
- if positive is None:
- positive = []
- if negative is None:
- negative = []
+ # allow passing a single string-key or vector for the positive/negative arguments
+ positive = _ensure_list(positive)
+ negative = _ensure_list(negative)
self.fill_norms()
- if isinstance(positive, str) and not negative:
- # allow calls like most_similar_cosmul('dog'), as a shorthand for most_similar_cosmul(['dog'])
- positive = [positive]
-
all_words = {
self.get_index(word) for word in positive + negative
if not isinstance(word, ndarray) and word in self.key_to_index
- }
+ }
positive = [
self.get_vector(word, norm=True) if isinstance(word, str) else word
@@ -1106,7 +1105,7 @@ def distances(self, word_or_vector, other_words=()):
If either `word_or_vector` or any word in `other_words` is absent from vocab.
"""
- if isinstance(word_or_vector, KEY_TYPES):
+ if isinstance(word_or_vector, _KEY_TYPES):
input_vector = self.get_vector(word_or_vector)
else:
input_vector = word_or_vector
@@ -1695,6 +1694,70 @@ def intersect_word2vec_format(self, fname, lockf=0.0, binary=False, encoding='ut
msg=f"merged {overlap_count} vectors into {self.vectors.shape} matrix from {fname}",
)
+ def vectors_for_all(self, keys: Iterable, allow_inference: bool = True,
+ copy_vecattrs: bool = False) -> 'KeyedVectors':
+ """Produce vectors for all given keys as a new :class:`KeyedVectors` object.
+
+ Notes
+ -----
+ The keys will always be deduplicated. For optimal performance, you should not pass entire
+ corpora to the method. Instead, you should construct a dictionary of unique words in your
+ corpus:
+
+ >>> from collections import Counter
+ >>> import itertools
+ >>>
+ >>> from gensim.models import FastText
+ >>> from gensim.test.utils import datapath, common_texts
+ >>>
+ >>> model_corpus_file = datapath('lee_background.cor') # train word vectors on some corpus
+ >>> model = FastText(corpus_file=model_corpus_file, vector_size=20, min_count=1)
+ >>> corpus = common_texts # infer word vectors for words from another corpus
+ >>> word_counts = Counter(itertools.chain.from_iterable(corpus)) # count words in your corpus
+ >>> words_by_freq = (k for k, v in word_counts.most_common())
+ >>> word_vectors = model.wv.vectors_for_all(words_by_freq) # create word-vectors for words in your corpus
+
+ Parameters
+ ----------
+ keys : iterable
+ The keys that will be vectorized.
+ allow_inference : bool, optional
+ In subclasses such as :class:`~gensim.models.fasttext.FastTextKeyedVectors`,
+ vectors for out-of-vocabulary keys (words) may be inferred. Default is True.
+ copy_vecattrs : bool, optional
+ Additional attributes set via the :meth:`KeyedVectors.set_vecattr` method
+ will be preserved in the produced :class:`KeyedVectors` object. Default is False.
+ To ensure that *all* the produced vectors will have vector attributes assigned,
+ you should set `allow_inference=False`.
+
+ Returns
+ -------
+ keyedvectors : :class:`~gensim.models.keyedvectors.KeyedVectors`
+ Vectors for all the given keys.
+
+ """
+ # Pick only the keys that actually exist & deduplicate them.
+ # We keep the original key order, to improve cache locality, for performance.
+ vocab, seen = [], set()
+ for key in keys:
+ if key not in seen:
+ seen.add(key)
+ if key in (self if allow_inference else self.key_to_index):
+ vocab.append(key)
+
+ kv = KeyedVectors(self.vector_size, len(vocab), dtype=self.vectors.dtype)
+
+ for key in vocab: # produce and index vectors for all the given keys
+ weights = self[key]
+ _add_word_to_kv(kv, None, key, weights, len(vocab))
+ if copy_vecattrs:
+ for attr in self.expandos:
+ try:
+ kv.set_vecattr(key, attr, self.get_vecattr(key, attr))
+ except KeyError:
+ pass
+ return kv
+
def _upconvert_old_d2vkv(self):
"""Convert a deserialized older Doc2VecKeyedVectors instance to latest generic KeyedVectors"""
self.vocab = self.doctags
diff --git a/gensim/models/ldamodel.py b/gensim/models/ldamodel.py
index 439aeb91fb..6691ddcc31 100755
--- a/gensim/models/ldamodel.py
+++ b/gensim/models/ldamodel.py
@@ -4,7 +4,7 @@
# Copyright (C) 2011 Radim Rehurek
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
-"""Optimized `Latent Dirichlet Allocation (LDA) ` in Python.
+"""Optimized `Latent Dirichlet Allocation (LDA) `_ in Python.
For a faster implementation of LDA (parallelized for multicore machines), see also :mod:`gensim.models.ldamulticore`.
@@ -13,9 +13,13 @@
for online training.
The core estimation code is based on the `onlineldavb.py script
-`_, by `Hoffman, Blei, Bach:
-Online Learning for Latent Dirichlet Allocation, NIPS 2010
-`_.
+`_, by
+Matthew D. Hoffman, David M. Blei, Francis Bach:
+`'Online Learning for Latent Dirichlet Allocation', NIPS 2010`_.
+
+.. _'Online Learning for Latent Dirichlet Allocation', NIPS 2010: online-lda_
+.. _'Online Learning for LDA' by Hoffman et al.: online-lda_
+.. _online-lda: https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf
The algorithm:
@@ -198,8 +202,7 @@ def blend(self, rhot, other, targetsize=None):
The number of documents is stretched in both state objects, so that they are of comparable magnitude.
This procedure corresponds to the stochastic gradient update from
- `Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
- `_, see equations (5) and (9).
+ `'Online Learning for LDA' by Hoffman et al.`_, see equations (5) and (9).
Parameters
----------
@@ -311,8 +314,7 @@ def load(cls, fname, *args, **kwargs):
class LdaModel(interfaces.TransformationABC, basemodel.BaseTopicModel):
- """Train and use Online Latent Dirichlet Allocation (OLDA) models as presented in
- `Hoffman et al. :"Online Learning for Latent Dirichlet Allocation" `_.
+ """Train and use Online Latent Dirichlet Allocation model as presented in `'Online Learning for LDA' by Hoffman et al.`_
Examples
-------
@@ -374,30 +376,31 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
update_every : int, optional
Number of documents to be iterated through for each update.
Set to 0 for batch learning, > 1 for online iterative learning.
- alpha : {numpy.ndarray, str}, optional
- Can be set to an 1D array of length equal to the number of expected topics that expresses
- our a-priori belief for each topics' probability.
- Alternatively default prior selecting strategies can be employed by supplying a string:
+ alpha : {float, numpy.ndarray of float, list of float, str}, optional
+ A-priori belief on document-topic distribution, this can be:
+ * scalar for a symmetric prior over document-topic distribution,
+ * 1D array of length equal to num_topics to denote an asymmetric user defined prior for each topic.
- * 'symmetric': Default; uses a fixed symmetric prior per topic,
+ Alternatively default prior selecting strategies can be employed by supplying a string:
+ * 'symmetric': (default) Uses a fixed symmetric prior of `1.0 / num_topics`,
* 'asymmetric': Uses a fixed normalized asymmetric prior of `1.0 / (topic_index + sqrt(num_topics))`,
* 'auto': Learns an asymmetric prior from the corpus (not available if `distributed==True`).
- eta : {float, np.array, str}, optional
- A-priori belief on word probability, this can be:
+ eta : {float, numpy.ndarray of float, list of float, str}, optional
+ A-priori belief on topic-word distribution, this can be:
+ * scalar for a symmetric prior over topic-word distribution,
+ * 1D array of length equal to num_words to denote an asymmetric user defined prior for each word,
+ * matrix of shape (num_topics, num_words) to assign a probability for each word-topic combination.
- * scalar for a symmetric prior over topic/word probability,
- * vector of length num_words to denote an asymmetric user defined probability for each word,
- * matrix of shape (num_topics, num_words) to assign a probability for each word-topic combination,
- * the string 'auto' to learn the asymmetric prior from the data.
+ Alternatively default prior selecting strategies can be employed by supplying a string:
+ * 'symmetric': (default) Uses a fixed symmetric prior of `1.0 / num_topics`,
+ * 'auto': Learns an asymmetric prior from the corpus.
decay : float, optional
A number between (0.5, 1] to weight what percentage of the previous lambda value is forgotten
- when each new document is examined. Corresponds to Kappa from
- `Matthew D. Hoffman, David M. Blei, Francis Bach:
- "Online Learning for Latent Dirichlet Allocation NIPS'10" `_.
+ when each new document is examined.
+ Corresponds to :math:`\\kappa` from `'Online Learning for LDA' by Hoffman et al.`_
offset : float, optional
Hyper-parameter that controls how much we will slow down the first steps the first few iterations.
- Corresponds to Tau_0 from `Matthew D. Hoffman, David M. Blei, Francis Bach:
- "Online Learning for Latent Dirichlet Allocation NIPS'10" `_.
+ Corresponds to :math:`\\tau_0` from `'Online Learning for LDA' by Hoffman et al.`_
eval_every : int, optional
Log perplexity is estimated every that many updates. Setting this to one slows down training by ~2x.
iterations : int, optional
@@ -409,7 +412,7 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
random_state : {np.random.RandomState, int}, optional
Either a randomState object or a seed to generate one. Useful for reproducibility.
ns_conf : dict of (str, object), optional
- Key word parameters propagated to :func:`gensim.utils.getNS` to get a Pyro4 Nameserved.
+ Key word parameters propagated to :func:`gensim.utils.getNS` to get a Pyro4 nameserver.
Only used if `distributed` is set to True.
minimum_phi_value : float, optional
if `per_word_topics` is True, this represents a lower bound on the term probabilities.
@@ -459,22 +462,16 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
self.callbacks = callbacks
self.alpha, self.optimize_alpha = self.init_dir_prior(alpha, 'alpha')
-
assert self.alpha.shape == (self.num_topics,), \
"Invalid alpha shape. Got shape %s, but expected (%d, )" % (str(self.alpha.shape), self.num_topics)
- if isinstance(eta, str):
- if eta == 'asymmetric':
- raise ValueError("The 'asymmetric' option cannot be used for eta")
-
self.eta, self.optimize_eta = self.init_dir_prior(eta, 'eta')
-
- self.random_state = utils.get_random_state(random_state)
-
assert self.eta.shape == (self.num_terms,) or self.eta.shape == (self.num_topics, self.num_terms), (
"Invalid eta shape. Got shape %s, but expected (%d, 1) or (%d, %d)" %
(str(self.eta.shape), self.num_terms, self.num_topics, self.num_terms))
+ self.random_state = utils.get_random_state(random_state)
+
# VB constants
self.iterations = iterations
self.gamma_threshold = gamma_threshold
@@ -531,24 +528,36 @@ def init_dir_prior(self, prior, name):
Parameters
----------
- prior : {str, list of float, numpy.ndarray of float, float}
- A-priori belief on word probability. If `name` == 'eta' then the prior can be:
+ prior : {float, numpy.ndarray of float, list of float, str}
+ A-priori belief on document-topic distribution. If `name` == 'alpha', then the prior can be:
+ * scalar for a symmetric prior over document-topic distribution,
+ * 1D array of length equal to num_topics to denote an asymmetric user defined prior for each topic.
- * scalar for a symmetric prior over topic/word probability,
- * vector of length num_words to denote an asymmetric user defined probability for each word,
- * matrix of shape (num_topics, num_words) to assign a probability for each word-topic combination,
- * the string 'auto' to learn the asymmetric prior from the data.
+ Alternatively default prior selecting strategies can be employed by supplying a string:
+ * 'symmetric': (default) Uses a fixed symmetric prior of `1.0 / num_topics`,
+ * 'asymmetric': Uses a fixed normalized asymmetric prior of `1.0 / (topic_index + sqrt(num_topics))`,
+ * 'auto': Learns an asymmetric prior from the corpus (not available if `distributed==True`).
- If `name` == 'alpha', then the prior can be:
+ A-priori belief on topic-word distribution. If `name` == 'eta' then the prior can be:
+ * scalar for a symmetric prior over topic-word distribution,
+ * 1D array of length equal to num_words to denote an asymmetric user defined prior for each word,
+ * matrix of shape (num_topics, num_words) to assign a probability for each word-topic combination.
- * an 1D array of length equal to the number of expected topics,
- * 'symmetric': Uses a fixed symmetric prior per topic,
- * 'asymmetric': Uses a fixed normalized asymmetric prior of `1.0 / (topic_index + sqrt(num_topics))`,
+ Alternatively default prior selecting strategies can be employed by supplying a string:
+ * 'symmetric': (default) Uses a fixed symmetric prior of `1.0 / num_topics`,
* 'auto': Learns an asymmetric prior from the corpus.
name : {'alpha', 'eta'}
Whether the `prior` is parameterized by the alpha vector (1 parameter per topic)
or by the eta (1 parameter per unique term in the vocabulary).
+ Returns
+ -------
+ init_prior: numpy.ndarray
+ Initialized Dirichlet prior:
+ If 'alpha' was provided as `name` the shape is (self.num_topics, ).
+ If 'eta' was provided as `name` the shape is (len(self.id2word), ).
+ is_auto: bool
+ Flag that shows if hyperparameter optimization should be used or not.
"""
if prior is None:
prior = 'symmetric'
@@ -570,6 +579,8 @@ def init_dir_prior(self, prior, name):
dtype=self.dtype, count=prior_shape,
)
elif prior == 'asymmetric':
+ if name == 'eta':
+ raise ValueError("The 'asymmetric' option cannot be used for eta")
init_prior = np.fromiter(
(1.0 / (i + np.sqrt(prior_shape)) for i in range(prior_shape)),
dtype=self.dtype, count=prior_shape,
@@ -632,7 +643,7 @@ def inference(self, chunk, collect_sstats=False):
"""Given a chunk of sparse document vectors, estimate gamma (parameters controlling the topic weights)
for each document in the chunk.
- This function does not modify the model The whole input chunk of document is assumed to fit in RAM;
+ This function does not modify the model. The whole input chunk of document is assumed to fit in RAM;
chunking of a large corpus must be done earlier in the pipeline. Avoids computing the `phi` variational
parameter directly using the optimization presented in
`Lee, Seung: Algorithms for non-negative matrix factorization"
@@ -693,7 +704,7 @@ def inference(self, chunk, collect_sstats=False):
expElogthetad = expElogtheta[d, :]
expElogbetad = self.expElogbeta[:, ids]
- # The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w.
+ # The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_kw.
# phinorm is the normalizer.
# TODO treat zeros explicitly, instead of adding epsilon?
phinorm = np.dot(expElogthetad, expElogbetad) + epsilon
@@ -849,13 +860,15 @@ def update(self, corpus, chunksize=None, decay=None, offset=None,
Notes
-----
- This update also supports updating an already trained model with new documents; the two models are then merged
- in proportion to the number of old vs. new documents. This feature is still experimental for non-stationary
- input streams. For stationary input (no topic drift in new documents), on the other hand, this equals the
- online update of `Matthew D. Hoffman, David M. Blei, Francis Bach:
- "Online Learning for Latent Dirichlet Allocation NIPS'10" `_.
- and is guaranteed to converge for any `decay` in (0.5, 1.0). Additionally, for smaller corpus sizes, an
- increasing `offset` may be beneficial (see Table 1 in the same paper).
+ This update also supports updating an already trained model (`self`) with new documents from `corpus`;
+ the two models are then merged in proportion to the number of old vs. new documents.
+ This feature is still experimental for non-stationary input streams.
+
+ For stationary input (no topic drift in new documents), on the other hand,
+ this equals the online update of `'Online Learning for LDA' by Hoffman et al.`_
+ and is guaranteed to converge for any `decay` in (0.5, 1].
+ Additionally, for smaller corpus sizes,
+ an increasing `offset` may be beneficial (see Table 1 in the same paper).
Parameters
----------
@@ -866,13 +879,11 @@ def update(self, corpus, chunksize=None, decay=None, offset=None,
Number of documents to be used in each training chunk.
decay : float, optional
A number between (0.5, 1] to weight what percentage of the previous lambda value is forgotten
- when each new document is examined. Corresponds to Kappa from
- `Matthew D. Hoffman, David M. Blei, Francis Bach:
- "Online Learning for Latent Dirichlet Allocation NIPS'10" `_.
+ when each new document is examined. Corresponds to :math:`\\kappa` from
+ `'Online Learning for LDA' by Hoffman et al.`_
offset : float, optional
Hyper-parameter that controls how much we will slow down the first steps the first few iterations.
- Corresponds to Tau_0 from `Matthew D. Hoffman, David M. Blei, Francis Bach:
- "Online Learning for Latent Dirichlet Allocation NIPS'10" `_.
+ Corresponds to :math:`\\tau_0` from `'Online Learning for LDA' by Hoffman et al.`_
passes : int, optional
Number of passes through the corpus during training.
update_every : int, optional
diff --git a/gensim/models/ldamulticore.py b/gensim/models/ldamulticore.py
index b65fcdd240..fdb5ce70a9 100644
--- a/gensim/models/ldamulticore.py
+++ b/gensim/models/ldamulticore.py
@@ -38,8 +38,13 @@
unseen documents. The model can also be updated with new documents for online training.
The core estimation code is based on the `onlineldavb.py script
-`_, by `Hoffman, Blei, Bach:
-Online Learning for Latent Dirichlet Allocation, NIPS 2010 `_.
+`_, by
+Matthew D. Hoffman, David M. Blei, Francis Bach:
+`'Online Learning for Latent Dirichlet Allocation', NIPS 2010`_.
+
+.. _'Online Learning for Latent Dirichlet Allocation', NIPS 2010: online-lda_
+.. _'Online Learning for LDA' by Hoffman et al.: online-lda_
+.. _online-lda: https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf
Usage examples
--------------
@@ -128,28 +133,30 @@ def __init__(self, corpus=None, num_topics=100, id2word=None, workers=None,
Number of documents to be used in each training chunk.
passes : int, optional
Number of passes through the corpus during training.
- alpha : {np.ndarray, str}, optional
- Can be set to an 1D array of length equal to the number of expected topics that expresses
- our a-priori belief for the each topics' probability.
- Alternatively default prior selecting strategies can be employed by supplying a string:
+ alpha : {float, numpy.ndarray of float, list of float, str}, optional
+ A-priori belief on document-topic distribution, this can be:
+ * scalar for a symmetric prior over document-topic distribution,
+ * 1D array of length equal to num_topics to denote an asymmetric user defined prior for each topic.
- * 'asymmetric': Uses a fixed normalized asymmetric prior of `1.0 / topicno`.
- eta : {float, np.array, str}, optional
- A-priori belief on word probability, this can be:
+ Alternatively default prior selecting strategies can be employed by supplying a string:
+ * 'symmetric': (default) Uses a fixed symmetric prior of `1.0 / num_topics`,
+ * 'asymmetric': Uses a fixed normalized asymmetric prior of `1.0 / (topic_index + sqrt(num_topics))`.
+ eta : {float, numpy.ndarray of float, list of float, str}, optional
+ A-priori belief on topic-word distribution, this can be:
+ * scalar for a symmetric prior over topic-word distribution,
+ * 1D array of length equal to num_words to denote an asymmetric user defined prior for each word,
+ * matrix of shape (num_topics, num_words) to assign a probability for each word-topic combination.
- * scalar for a symmetric prior over topic/word probability,
- * vector of length num_words to denote an asymmetric user defined probability for each word,
- * matrix of shape (num_topics, num_words) to assign a probability for each word-topic combination,
- * the string 'auto' to learn the asymmetric prior from the data.
+ Alternatively default prior selecting strategies can be employed by supplying a string:
+ * 'symmetric': (default) Uses a fixed symmetric prior of `1.0 / num_topics`,
+ * 'auto': Learns an asymmetric prior from the corpus.
decay : float, optional
A number between (0.5, 1] to weight what percentage of the previous lambda value is forgotten
- when each new document is examined. Corresponds to Kappa from
- `Matthew D. Hoffman, David M. Blei, Francis Bach:
- "Online Learning for Latent Dirichlet Allocation NIPS'10" `_.
+ when each new document is examined. Corresponds to :math:`\\kappa` from
+ `'Online Learning for LDA' by Hoffman et al.`_
offset : float, optional
Hyper-parameter that controls how much we will slow down the first steps the first few iterations.
- Corresponds to Tau_0 from `Matthew D. Hoffman, David M. Blei, Francis Bach:
- "Online Learning for Latent Dirichlet Allocation NIPS'10" `_.
+ Corresponds to :math:`\\tau_0` from `'Online Learning for LDA' by Hoffman et al.`_
eval_every : int, optional
Log perplexity is estimated every that many updates. Setting this to one slows down training by ~2x.
iterations : int, optional
@@ -174,7 +181,7 @@ def __init__(self, corpus=None, num_topics=100, id2word=None, workers=None,
self.batch = batch
if isinstance(alpha, str) and alpha == 'auto':
- raise NotImplementedError("auto-tuning alpha not implemented in multicore LDA; use plain LdaModel.")
+ raise NotImplementedError("auto-tuning alpha not implemented in LdaMulticore; use plain LdaModel.")
super(LdaMulticore, self).__init__(
corpus=corpus, num_topics=num_topics,
@@ -194,14 +201,13 @@ def update(self, corpus, chunks_as_numpy=False):
Notes
-----
- This update also supports updating an already trained model (`self`)
- with new documents from `corpus`; the two models are then merged in
- proportion to the number of old vs. new documents. This feature is still
- experimental for non-stationary input streams.
+ This update also supports updating an already trained model (`self`) with new documents from `corpus`;
+ the two models are then merged in proportion to the number of old vs. new documents.
+ This feature is still experimental for non-stationary input streams.
For stationary input (no topic drift in new documents), on the other hand,
- this equals the online update of Hoffman et al. and is guaranteed to
- converge for any `decay` in (0.5, 1.0>.
+ this equals the online update of `'Online Learning for LDA' by Hoffman et al.`_
+ and is guaranteed to converge for any `decay` in (0.5, 1].
Parameters
----------
diff --git a/gensim/models/ldaseqmodel.py b/gensim/models/ldaseqmodel.py
index 100aed748f..0f222c9c6c 100644
--- a/gensim/models/ldaseqmodel.py
+++ b/gensim/models/ldaseqmodel.py
@@ -5,8 +5,8 @@
# Based on Copyright (C) 2016 Radim Rehurek
"""Lda Sequence model, inspired by `David M. Blei, John D. Lafferty: "Dynamic Topic Models"
-`_ .
-The original C/C++ implementation can be found on `blei-lab/dtm `.
+`_.
+The original C/C++ implementation can be found on `blei-lab/dtm `_.
TODO: The next steps to take this forward would be:
diff --git a/gensim/models/lsi_dispatcher.py b/gensim/models/lsi_dispatcher.py
index b593e94cd3..2265dc7811 100755
--- a/gensim/models/lsi_dispatcher.py
+++ b/gensim/models/lsi_dispatcher.py
@@ -278,7 +278,11 @@ def exit(self):
logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s', level=logging.INFO)
parser = argparse.ArgumentParser(description=__doc__[:-135], formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument(
- 'maxsize', type=int, help='Maximum number of jobs to be kept pre-fetched in the queue.', default=MAX_JOBS_QUEUE
+ 'maxsize',
+ nargs='?',
+ type=int,
+ help='Maximum number of jobs to be kept pre-fetched in the queue.',
+ default=MAX_JOBS_QUEUE,
)
args = parser.parse_args()
diff --git a/gensim/models/lsimodel.py b/gensim/models/lsimodel.py
index 97cc921f34..06055722e1 100644
--- a/gensim/models/lsimodel.py
+++ b/gensim/models/lsimodel.py
@@ -670,7 +670,9 @@ def show_topic(self, topicno, topn=10):
c = np.asarray(self.projection.u.T[topicno, :]).flatten()
norm = np.sqrt(np.sum(np.dot(c, c)))
most = matutils.argsort(np.abs(c), topn, reverse=True)
- return [(self.id2word[val], 1.0 * c[val] / norm) for val in most]
+
+ # Output only (word, score) pairs for `val`s that are within `self.id2word`. See #3090 for details.
+ return [(self.id2word[val], 1.0 * c[val] / norm) for val in most if val in self.id2word]
def show_topics(self, num_topics=-1, num_words=10, log=False, formatted=True):
"""Get the most significant topics.
diff --git a/gensim/models/nmf.py b/gensim/models/nmf.py
index 626d9ca16a..132a0f8774 100644
--- a/gensim/models/nmf.py
+++ b/gensim/models/nmf.py
@@ -38,6 +38,7 @@
.. sourcecode:: pycon
+ >>> from gensim.models import Nmf
>>> from gensim.test.utils import common_texts
>>> from gensim.corpora.dictionary import Dictionary
>>>
diff --git a/gensim/models/phrases.py b/gensim/models/phrases.py
index 8e20333b8f..c95682fa5e 100644
--- a/gensim/models/phrases.py
+++ b/gensim/models/phrases.py
@@ -364,7 +364,7 @@ def load(cls, *args, **kwargs):
}
elif isinstance(component, tuple): # 3.8 => 4.0: phrasegram keys are strings, not tuples with bytestrings
model.phrasegrams = {
- str(model.delimiter.join(component), encoding='utf8'): score
+ str(model.delimiter.join(key), encoding='utf8'): val
for key, val in phrasegrams.items()
}
except StopIteration:
@@ -391,15 +391,13 @@ def load(cls, *args, **kwargs):
raise ValueError(f'failed to load {cls.__name__} model, unknown scoring "{model.scoring}"')
# common_terms didn't exist pre-3.?, and was renamed to connector in 4.0.0.
- if hasattr(model, "common_terms"):
- model.connector_words = model.common_terms
- del model.common_terms
- else:
- logger.warning(
- 'older version of %s loaded without common_terms attribute, setting connector_words to an empty set',
- cls.__name__,
- )
- model.connector_words = frozenset()
+ if not hasattr(model, "connector_words"):
+ if hasattr(model, "common_terms"):
+ model.connector_words = model.common_terms
+ del model.common_terms
+ else:
+ logger.warning('loaded older version of %s, setting connector_words to an empty set', cls.__name__)
+ model.connector_words = frozenset()
if not hasattr(model, 'corpus_word_count'):
logger.warning('older version of %s loaded without corpus_word_count', cls.__name__)
diff --git a/gensim/models/word2vec.py b/gensim/models/word2vec.py
index 23c5b90429..356f711408 100755
--- a/gensim/models/word2vec.py
+++ b/gensim/models/word2vec.py
@@ -240,7 +240,7 @@ def __init__(
max_vocab_size=None, sample=1e-3, seed=1, workers=3, min_alpha=0.0001,
sg=0, hs=0, negative=5, ns_exponent=0.75, cbow_mean=1, hashfxn=hash, epochs=5, null_word=0,
trim_rule=None, sorted_vocab=1, batch_words=MAX_WORDS_IN_BATCH, compute_loss=False, callbacks=(),
- comment=None, max_final_vocab=None,
+ comment=None, max_final_vocab=None, shrink_windows=True,
):
"""Train, use and evaluate neural networks described in https://code.google.com/p/word2vec/.
@@ -345,6 +345,12 @@ def __init__(
:meth:`~gensim.models.word2vec.Word2Vec.get_latest_training_loss`.
callbacks : iterable of :class:`~gensim.models.callbacks.CallbackAny2Vec`, optional
Sequence of callbacks to be executed at specific stages during training.
+ shrink_windows : bool, optional
+ New in 4.1. Experimental.
+ If True, the effective window size is uniformly sampled from [1, `window`]
+ for each target word during training, to match the original word2vec algorithm's
+ approximate weighting of context words by distance. Otherwise, the effective
+ window size is always fixed to `window` words to either side.
Examples
--------
@@ -377,6 +383,7 @@ def __init__(
self.min_alpha = float(min_alpha)
self.window = int(window)
+ self.shrink_windows = bool(shrink_windows)
self.random = np.random.RandomState(seed)
self.hs = int(hs)
@@ -522,7 +529,7 @@ def build_vocab_from_freq(
# to be directly the raw vocab
raw_vocab = word_freq
logger.info(
- "collected %i different raw word, with total frequency of %i",
+ "collected %i unique word types, with total frequency of %i",
len(raw_vocab), sum(raw_vocab.values()),
)
@@ -553,7 +560,7 @@ def _scan_vocab(self, sentences, progress_per, trim_rule):
if sentence_no % progress_per == 0:
logger.info(
"PROGRESS: at sentence #%i, processed %i words, keeping %i word types",
- sentence_no, total_words, len(vocab)
+ sentence_no, total_words, len(vocab),
)
for word in sentence:
vocab[word] += 1
@@ -604,8 +611,8 @@ def prepare_vocab(
# set effective_min_count to min_count in case max_final_vocab isn't set
self.effective_min_count = min_count
- # if max_final_vocab is specified instead of min_count
- # pick a min_count which satisfies max_final_vocab as well as possible
+ # If max_final_vocab is specified instead of min_count,
+ # pick a min_count which satisfies max_final_vocab as well as possible.
if self.max_final_vocab is not None:
sorted_vocab = sorted(self.raw_vocab.keys(), key=lambda word: self.raw_vocab[word], reverse=True)
calc_min_count = 1
@@ -910,12 +917,12 @@ def _do_train_epoch(
if self.sg:
examples, tally, raw_tally = train_epoch_sg(
self, corpus_file, offset, cython_vocab, cur_epoch,
- total_examples, total_words, work, neu1, self.compute_loss,
+ total_examples, total_words, work, neu1, self.compute_loss
)
else:
examples, tally, raw_tally = train_epoch_cbow(
self, corpus_file, offset, cython_vocab, cur_epoch,
- total_examples, total_words, work, neu1, self.compute_loss,
+ total_examples, total_words, work, neu1, self.compute_loss
)
return examples, tally, raw_tally
@@ -1039,7 +1046,7 @@ def train(
msg=(
f"training model with {self.workers} workers on {len(self.wv)} vocabulary and "
f"{self.layer1_size} features, using sg={self.sg} hs={self.hs} sample={self.sample} "
- f"negative={self.negative} window={self.window}"
+ f"negative={self.negative} window={self.window} shrink_windows={self.shrink_windows}"
),
)
@@ -1799,8 +1806,9 @@ def predict_output_word(self, context_words_list, topn=10):
Parameters
----------
- context_words_list : list of str
- List of context words.
+ context_words_list : list of (str and/or int)
+ List of context words, which may be words themselves (str)
+ or their index in `self.wv.vectors` (int).
topn : int, optional
Return `topn` words and their probabilities.
@@ -1818,8 +1826,8 @@ def predict_output_word(self, context_words_list, topn=10):
if not hasattr(self.wv, 'vectors') or not hasattr(self, 'syn1neg'):
raise RuntimeError("Parameters required for predicting the output words not found.")
-
word2_indices = [self.wv.get_index(w) for w in context_words_list if w in self.wv]
+
if not word2_indices:
logger.warning("All the input context words are out-of-vocabulary for the current model.")
return None
@@ -1830,7 +1838,7 @@ def predict_output_word(self, context_words_list, topn=10):
# propagate hidden -> output and take softmax to get probabilities
prob_values = np.exp(np.dot(l1, self.syn1neg.T))
- prob_values /= sum(prob_values)
+ prob_values /= np.sum(prob_values)
top_indices = matutils.argsort(prob_values, topn=topn, reverse=True)
# returning the most probable output words with their probabilities
return [(self.wv.index_to_key[index1], prob_values[index1]) for index1 in top_indices]
@@ -1970,6 +1978,8 @@ def _load_specials(self, *args, **kwargs):
self.syn1 = self.syn1
del self.syn1
del self.trainables
+ if not hasattr(self, 'shrink_windows'):
+ self.shrink_windows = True
def get_latest_training_loss(self):
"""Get current value of the training loss.
diff --git a/gensim/models/word2vec_corpusfile.pyx b/gensim/models/word2vec_corpusfile.pyx
index 19b9b8c165..5d7f5004e4 100644
--- a/gensim/models/word2vec_corpusfile.pyx
+++ b/gensim/models/word2vec_corpusfile.pyx
@@ -186,7 +186,9 @@ cdef void prepare_c_structures_for_batch(
vector[vector[string]] &sentences, int sample, int hs, int window, long long *total_words,
int *effective_words, int *effective_sentences, unsigned long long *next_random,
cvocab_t *vocab, int *sentence_idx, np.uint32_t *indexes, int *codelens,
- np.uint8_t **codes, np.uint32_t **points, np.uint32_t *reduced_windows) nogil:
+ np.uint8_t **codes, np.uint32_t **points, np.uint32_t *reduced_windows,
+ int shrink_windows,
+ ) nogil:
cdef VocabItem word
cdef string token
cdef vector[string] sent
@@ -224,8 +226,12 @@ cdef void prepare_c_structures_for_batch(
break # TODO: log warning, tally overflow?
# precompute "reduced window" offsets in a single randint() call
- for i in range(effective_words[0]):
- reduced_windows[i] = random_int32(next_random) % window
+ if shrink_windows:
+ for i in range(effective_words[0]):
+ reduced_windows[i] = random_int32(next_random) % window
+ else:
+ for i in range(effective_words[0]):
+ reduced_windows[i] = 0
cdef REAL_t get_alpha(REAL_t alpha, REAL_t end_alpha, int cur_epoch, int num_epochs) nogil:
@@ -250,7 +256,7 @@ cdef REAL_t get_next_alpha(
def train_epoch_sg(model, corpus_file, offset, _cython_vocab, _cur_epoch, _expected_examples, _expected_words, _work,
- _neu1, compute_loss):
+ _neu1, compute_loss,):
"""Train Skipgram model for one epoch by training on an input stream. This function is used only in multistream mode.
Called internally from :meth:`~gensim.models.word2vec.Word2Vec.train`.
@@ -295,6 +301,7 @@ def train_epoch_sg(model, corpus_file, offset, _cython_vocab, _cur_epoch, _expec
cdef long long total_sentences = 0
cdef long long total_effective_words = 0, total_words = 0
cdef int sent_idx, idx_start, idx_end
+ cdef int shrink_windows = int(model.shrink_windows)
init_w2v_config(&c, model, _alpha, compute_loss, _work)
@@ -311,7 +318,7 @@ def train_epoch_sg(model, corpus_file, offset, _cython_vocab, _cur_epoch, _expec
prepare_c_structures_for_batch(
sentences, c.sample, c.hs, c.window, &total_words, &effective_words, &effective_sentences,
&c.next_random, vocab.get_vocab_ptr(), c.sentence_idx, c.indexes,
- c.codelens, c.codes, c.points, c.reduced_windows)
+ c.codelens, c.codes, c.points, c.reduced_windows, shrink_windows)
for sent_idx in range(effective_sentences):
idx_start = c.sentence_idx[sent_idx]
@@ -350,7 +357,7 @@ def train_epoch_sg(model, corpus_file, offset, _cython_vocab, _cur_epoch, _expec
def train_epoch_cbow(model, corpus_file, offset, _cython_vocab, _cur_epoch, _expected_examples, _expected_words, _work,
- _neu1, compute_loss):
+ _neu1, compute_loss,):
"""Train CBOW model for one epoch by training on an input stream. This function is used only in multistream mode.
Called internally from :meth:`~gensim.models.word2vec.Word2Vec.train`.
@@ -395,6 +402,7 @@ def train_epoch_cbow(model, corpus_file, offset, _cython_vocab, _cur_epoch, _exp
cdef long long total_sentences = 0
cdef long long total_effective_words = 0, total_words = 0
cdef int sent_idx, idx_start, idx_end
+ cdef int shrink_windows = int(model.shrink_windows)
init_w2v_config(&c, model, _alpha, compute_loss, _work, _neu1)
@@ -411,7 +419,7 @@ def train_epoch_cbow(model, corpus_file, offset, _cython_vocab, _cur_epoch, _exp
prepare_c_structures_for_batch(
sentences, c.sample, c.hs, c.window, &total_words, &effective_words,
&effective_sentences, &c.next_random, vocab.get_vocab_ptr(), c.sentence_idx,
- c.indexes, c.codelens, c.codes, c.points, c.reduced_windows)
+ c.indexes, c.codelens, c.codes, c.points, c.reduced_windows, shrink_windows)
for sent_idx in range(effective_sentences):
idx_start = c.sentence_idx[sent_idx]
diff --git a/gensim/models/word2vec_inner.pyx b/gensim/models/word2vec_inner.pyx
index 50bfc803bd..ffdc908b5c 100755
--- a/gensim/models/word2vec_inner.pyx
+++ b/gensim/models/word2vec_inner.pyx
@@ -112,12 +112,12 @@ cdef void w2v_fast_sentence_sg_hs(
"""
cdef long long a, b
- cdef long long row1 = word2_index * size, row2, sgn
+ cdef long long row1 = word2_index * size, row2, sgn
cdef REAL_t f, g, f_dot, lprob
memset(work, 0, size * cython.sizeof(REAL_t))
for b in range(codelen):
- row2 = word_point[b] * size
+ row2 = word_point[b] * size
f_dot = our_dot(&size, &syn0[row1], &ONE, &syn1[row2], &ONE)
if f_dot <= -MAX_EXP or f_dot >= MAX_EXP:
continue
@@ -206,7 +206,7 @@ cdef unsigned long long w2v_fast_sentence_sg_neg(
"""
cdef long long a
- cdef long long row1 = word2_index * size, row2
+ cdef long long row1 = word2_index * size, row2
cdef unsigned long long modulo = 281474976710655ULL
cdef REAL_t f, g, label, f_dot, log_e_f_dot
cdef np.uint32_t target_index
@@ -225,7 +225,7 @@ cdef unsigned long long w2v_fast_sentence_sg_neg(
continue
label = 0.0
- row2 = target_index * size
+ row2 = target_index * size
f_dot = our_dot(&size, &syn0[row1], &ONE, &syn1neg[row2], &ONE)
if f_dot <= -MAX_EXP or f_dot >= MAX_EXP:
continue
@@ -309,7 +309,7 @@ cdef void w2v_fast_sentence_cbow_hs(
continue
else:
count += ONEF
- our_saxpy(&size, &ONEF, &syn0[indexes[m] * size], &ONE, neu1, &ONE)
+ our_saxpy(&size, &ONEF, &syn0[indexes[m] * size], &ONE, neu1, &ONE)
if count > (0.5):
inv_count = ONEF/count
if cbow_mean:
@@ -317,7 +317,7 @@ cdef void w2v_fast_sentence_cbow_hs(
memset(work, 0, size * cython.sizeof(REAL_t))
for b in range(codelens[i]):
- row2 = word_point[b] * size
+ row2 = word_point[b] * size
f_dot = our_dot(&size, neu1, &ONE, &syn1[row2], &ONE)
if f_dot <= -MAX_EXP or f_dot >= MAX_EXP:
continue
@@ -342,7 +342,7 @@ cdef void w2v_fast_sentence_cbow_hs(
if m == i:
continue
else:
- our_saxpy(&size, &words_lockf[indexes[m] % lockf_len], work, &ONE, &syn0[indexes[m] * size], &ONE)
+ our_saxpy(&size, &words_lockf[indexes[m] % lockf_len], work, &ONE, &syn0[indexes[m] * size], &ONE)
cdef unsigned long long w2v_fast_sentence_cbow_neg(
@@ -416,7 +416,7 @@ cdef unsigned long long w2v_fast_sentence_cbow_neg(
continue
else:
count += ONEF
- our_saxpy(&size, &ONEF, &syn0[indexes[m] * size], &ONE, neu1, &ONE)
+ our_saxpy(&size, &ONEF, &syn0[indexes[m] * size], &ONE, neu1, &ONE)
if count > (0.5):
inv_count = ONEF/count
if cbow_mean:
@@ -435,7 +435,7 @@ cdef unsigned long long w2v_fast_sentence_cbow_neg(
continue
label = 0.0
- row2 = target_index * size
+ row2 = target_index * size
f_dot = our_dot(&size, neu1, &ONE, &syn1neg[row2], &ONE)
if f_dot <= -MAX_EXP or f_dot >= MAX_EXP:
continue
@@ -459,7 +459,7 @@ cdef unsigned long long w2v_fast_sentence_cbow_neg(
if m == i:
continue
else:
- our_saxpy(&size, &words_lockf[indexes[m] % lockf_len], work, &ONE, &syn0[indexes[m]*size], &ONE)
+ our_saxpy(&size, &words_lockf[indexes[m] % lockf_len], work, &ONE, &syn0[indexes[m] * size], &ONE)
return next_random
@@ -566,8 +566,12 @@ def train_batch_sg(model, sentences, alpha, _work, compute_loss):
break # TODO: log warning, tally overflow?
# precompute "reduced window" offsets in a single randint() call
- for i, item in enumerate(model.random.randint(0, c.window, effective_words)):
- c.reduced_windows[i] = item
+ if model.shrink_windows:
+ for i, item in enumerate(model.random.randint(0, c.window, effective_words)):
+ c.reduced_windows[i] = item
+ else:
+ for i in range(effective_words):
+ c.reduced_windows[i] = 0
# release GIL & train on all sentences
with nogil:
@@ -662,8 +666,12 @@ def train_batch_cbow(model, sentences, alpha, _work, _neu1, compute_loss):
break # TODO: log warning, tally overflow?
# precompute "reduced window" offsets in a single randint() call
- for i, item in enumerate(model.random.randint(0, c.window, effective_words)):
- c.reduced_windows[i] = item
+ if model.shrink_windows:
+ for i, item in enumerate(model.random.randint(0, c.window, effective_words)):
+ c.reduced_windows[i] = item
+ else:
+ for i in range(effective_words):
+ c.reduced_windows[i] = 0
# release GIL & train on all sentences
with nogil:
@@ -784,11 +792,11 @@ cdef void score_pair_sg_hs(
const np.uint32_t word2_index, REAL_t *work) nogil:
cdef long long b
- cdef long long row1 = word2_index * size, row2, sgn
+ cdef long long row1 = word2_index * size, row2, sgn
cdef REAL_t f
for b in range(codelen):
- row2 = word_point[b] * size
+ row2 = word_point[b] * size
f = our_dot(&size, &syn0[row1], &ONE, &syn1[row2], &ONE)
sgn = (-1)**word_code[b] # ch function: 0-> 1, 1 -> -1
f *= sgn
@@ -889,14 +897,14 @@ cdef void score_pair_cbow_hs(
continue
else:
count += ONEF
- our_saxpy(&size, &ONEF, &syn0[indexes[m] * size], &ONE, neu1, &ONE)
+ our_saxpy(&size, &ONEF, &syn0[indexes[m] * size], &ONE, neu1, &ONE)
if count > (0.5):
inv_count = ONEF/count
if cbow_mean:
sscal(&size, &inv_count, neu1, &ONE)
for b in range(codelens[i]):
- row2 = word_point[b] * size
+ row2 = word_point[b] * size
f = our_dot(&size, neu1, &ONE, &syn1[row2], &ONE)
sgn = (-1)**word_code[b] # ch function: 0-> 1, 1 -> -1
f *= sgn
diff --git a/gensim/parsing/__init__.py b/gensim/parsing/__init__.py
index 5bbf84239e..c608bf399b 100644
--- a/gensim/parsing/__init__.py
+++ b/gensim/parsing/__init__.py
@@ -1,8 +1,18 @@
"""This package contains functions to preprocess raw text"""
from .porter import PorterStemmer # noqa:F401
-from .preprocessing import (remove_stopwords, strip_punctuation, strip_punctuation2, # noqa:F401
- strip_tags, strip_short, strip_numeric,
- strip_non_alphanum, strip_multiple_whitespaces,
- split_alphanum, stem_text, preprocess_string,
- preprocess_documents, read_file, read_files)
+from .preprocessing import ( # noqa:F401
+ preprocess_documents,
+ preprocess_string,
+ read_file,
+ read_files,
+ remove_stopwords,
+ split_alphanum,
+ stem_text,
+ strip_multiple_whitespaces,
+ strip_non_alphanum,
+ strip_numeric,
+ strip_punctuation,
+ strip_short,
+ strip_tags,
+)
diff --git a/gensim/parsing/preprocessing.py b/gensim/parsing/preprocessing.py
index 777ca46e8e..bb96b1ec2f 100644
--- a/gensim/parsing/preprocessing.py
+++ b/gensim/parsing/preprocessing.py
@@ -68,17 +68,20 @@
RE_WHITESPACE = re.compile(r"(\s)+", re.UNICODE)
-def remove_stopwords(s):
+def remove_stopwords(s, stopwords=None):
"""Remove :const:`~gensim.parsing.preprocessing.STOPWORDS` from `s`.
Parameters
----------
s : str
+ stopwords : iterable of str, optional
+ Sequence of stopwords
+ If None - using :const:`~gensim.parsing.preprocessing.STOPWORDS`
Returns
-------
str
- Unicode string without :const:`~gensim.parsing.preprocessing.STOPWORDS`.
+ Unicode string without `stopwords`.
Examples
--------
@@ -90,11 +93,33 @@ def remove_stopwords(s):
"""
s = utils.to_unicode(s)
- return " ".join(w for w in s.split() if w not in STOPWORDS)
+ return " ".join(remove_stopword_tokens(s.split(), stopwords))
+
+
+def remove_stopword_tokens(tokens, stopwords=None):
+ """Remove stopword tokens using list `stopwords`.
+
+ Parameters
+ ----------
+ tokens : iterable of str
+ Sequence of tokens.
+ stopwords : iterable of str, optional
+ Sequence of stopwords
+ If None - using :const:`~gensim.parsing.preprocessing.STOPWORDS`
+
+ Returns
+ -------
+ list of str
+ List of tokens without `stopwords`.
+
+ """
+ if stopwords is None:
+ stopwords = STOPWORDS
+ return [token for token in tokens if token not in stopwords]
def strip_punctuation(s):
- """Replace punctuation characters with spaces in `s` using :const:`~gensim.parsing.preprocessing.RE_PUNCT`.
+ """Replace ASCII punctuation characters with spaces in `s` using :const:`~gensim.parsing.preprocessing.RE_PUNCT`.
Parameters
----------
@@ -115,12 +140,10 @@ def strip_punctuation(s):
"""
s = utils.to_unicode(s)
+ # For unicode enhancement options see https://github.com/RaRe-Technologies/gensim/issues/2962
return RE_PUNCT.sub(" ", s)
-strip_punctuation2 = strip_punctuation
-
-
def strip_tags(s):
"""Remove tags from `s` using :const:`~gensim.parsing.preprocessing.RE_TAGS`.
@@ -172,7 +195,26 @@ def strip_short(s, minsize=3):
"""
s = utils.to_unicode(s)
- return " ".join(e for e in s.split() if len(e) >= minsize)
+ return " ".join(remove_short_tokens(s.split(), minsize))
+
+
+def remove_short_tokens(tokens, minsize=3):
+ """Remove tokens shorter than `minsize` chars.
+
+ Parameters
+ ----------
+ tokens : iterable of str
+ Sequence of tokens.
+ minsize : int, optimal
+ Minimal length of token (include).
+
+ Returns
+ -------
+ list of str
+ List of tokens without short tokens.
+ """
+
+ return [token for token in tokens if len(token) >= minsize]
def strip_numeric(s):
@@ -310,6 +352,49 @@ def stem_text(text):
stem = stem_text
+def lower_to_unicode(text, encoding='utf8', errors='strict'):
+ """Lowercase `text` and convert to unicode, using :func:`gensim.utils.any2unicode`.
+
+ Parameters
+ ----------
+ text : str
+ Input text.
+ encoding : str, optional
+ Encoding that will be used for conversion.
+ errors : str, optional
+ Error handling behaviour, used as parameter for `unicode` function (python2 only).
+
+ Returns
+ -------
+ str
+ Unicode version of `text`.
+
+ See Also
+ --------
+ :func:`gensim.utils.any2unicode`
+ Convert any string to unicode-string.
+
+ """
+ return utils.to_unicode(text.lower(), encoding, errors)
+
+
+def split_on_space(s):
+ """Split line by spaces, used in :class:`gensim.corpora.lowcorpus.LowCorpus`.
+
+ Parameters
+ ----------
+ s : str
+ Some line.
+
+ Returns
+ -------
+ list of str
+ List of tokens from `s`.
+
+ """
+ return [word for word in utils.to_unicode(s).strip().split(' ') if word]
+
+
DEFAULT_FILTERS = [
lambda x: x.lower(), strip_tags, strip_punctuation,
strip_multiple_whitespaces, strip_numeric,
diff --git a/gensim/scripts/make_wiki_online.py b/gensim/scripts/make_wiki_online.py
index 0ec9704724..e5ee11283a 100755
--- a/gensim/scripts/make_wiki_online.py
+++ b/gensim/scripts/make_wiki_online.py
@@ -16,7 +16,7 @@
* `OUTPUT_PREFIX_wordids.txt`: mapping between words and their integer ids
* `OUTPUT_PREFIX_bow.mm`: bag-of-words (word counts) representation, in
- Matrix Matrix format
+ Matrix Market format
* `OUTPUT_PREFIX_tfidf.mm`: TF-IDF representation
* `OUTPUT_PREFIX.tfidf_model`: TF-IDF model dump
@@ -35,7 +35,6 @@
python -m gensim.scripts.make_wikicorpus ~/gensim/results/enwiki-latest-pages-articles.xml.bz2 ~/gensim/results/wiki
"""
-
import logging
import os.path
import sys
@@ -43,13 +42,11 @@
from gensim.corpora import Dictionary, HashDictionary, MmCorpus, WikiCorpus
from gensim.models import TfidfModel
-
# Wiki is first scanned for all distinct word types (~7M). The types that
# appear in more than 10% of articles are removed and from the rest, the
# DEFAULT_DICT_SIZE most frequent types are kept.
DEFAULT_DICT_SIZE = 100000
-
if __name__ == '__main__':
program = os.path.basename(sys.argv[0])
logger = logging.getLogger(program)
diff --git a/gensim/scripts/make_wikicorpus.py b/gensim/scripts/make_wikicorpus.py
index 66056cf10b..b76c8c2bd5 100755
--- a/gensim/scripts/make_wikicorpus.py
+++ b/gensim/scripts/make_wikicorpus.py
@@ -18,7 +18,7 @@
* `OUTPUT_PREFIX_bow.mm`: bag-of-words (word counts) representation in Matrix Market format
* `OUTPUT_PREFIX_bow.mm.index`: index for `OUTPUT_PREFIX_bow.mm`
* `OUTPUT_PREFIX_bow.mm.metadata.cpickle`: titles of documents
-* `OUTPUT_PREFIX_tfidf.mm`: TF-IDF representation in Matix Market format
+* `OUTPUT_PREFIX_tfidf.mm`: TF-IDF representation in Matrix Market format
* `OUTPUT_PREFIX_tfidf.mm.index`: index for `OUTPUT_PREFIX_tfidf.mm`
* `OUTPUT_PREFIX.tfidf_model`: TF-IDF model
diff --git a/gensim/similarities/__init__.py b/gensim/similarities/__init__.py
index 00464d29c3..3fdf94bd3a 100644
--- a/gensim/similarities/__init__.py
+++ b/gensim/similarities/__init__.py
@@ -3,19 +3,7 @@
"""
# bring classes directly into package namespace, to save some typing
-import warnings
-try:
- import Levenshtein # noqa:F401
-except ImportError:
- msg = (
- "The gensim.similarities.levenshtein submodule is disabled, because the optional "
- "Levenshtein package is unavailable. "
- "Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning."
- )
- warnings.warn(msg)
- LevenshteinSimilarityIndex = None
-else:
- from .levenshtein import LevenshteinSimilarityIndex # noqa:F401
+from .levenshtein import LevenshteinSimilarityIndex # noqa:F401
from .docsim import ( # noqa:F401
Similarity,
MatrixSimilarity,
diff --git a/gensim/similarities/docsim.py b/gensim/similarities/docsim.py
old mode 100755
new mode 100644
index 4dd0528f50..db66db67e0
--- a/gensim/similarities/docsim.py
+++ b/gensim/similarities/docsim.py
@@ -27,6 +27,7 @@
.. sourcecode:: pycon
+ >>> from gensim.similarities import Similarity
>>> from gensim.test.utils import common_corpus, common_dictionary, get_tmpfile
>>>
>>> index_tmpfile = get_tmpfile("index")
@@ -937,7 +938,7 @@ def __init__(self, corpus, similarity_matrix, num_best=None, chunksize=256, norm
"""
self.similarity_matrix = similarity_matrix
- self.corpus = corpus
+ self.corpus = list(corpus)
self.num_best = num_best
self.chunksize = chunksize
self.normalized = normalized
@@ -1175,7 +1176,7 @@ def __init__(self, corpus, num_features=None, num_terms=None, num_docs=None, num
matutils.unitvec(v)) for v in corpus)
self.index = matutils.corpus2csc(
corpus, num_terms=num_terms, num_docs=num_docs, num_nnz=num_nnz,
- dtype=dtype, printprogress=10000
+ dtype=dtype, printprogress=10000,
).T
# convert to Compressed Sparse Row for efficient row slicing and multiplications
diff --git a/gensim/similarities/fastss.pyx b/gensim/similarities/fastss.pyx
new file mode 100644
index 0000000000..a4e8cba54b
--- /dev/null
+++ b/gensim/similarities/fastss.pyx
@@ -0,0 +1,199 @@
+#!/usr/bin/env cython
+# cython: language_level=3
+# cython: boundscheck=False
+# cython: wraparound=False
+# coding: utf-8
+#
+# Copyright (C) 2021 Radim Rehurek
+# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
+# Code adapted from TinyFastSS (public domain), https://github.com/fujimotos/TinyFastSS
+
+"""Fast approximate string similarity search using the FastSS algorithm."""
+
+import itertools
+
+from cpython.ref cimport PyObject
+
+
+DEF MAX_WORD_LENGTH = 1000 # Maximum allowed word length, in characters. Must fit in the C `int` range.
+
+
+cdef extern from *:
+ """
+ #define WIDTH int
+ #define MAX_WORD_LENGTH 1000
+
+ int ceditdist(PyObject * s1, PyObject * s2, WIDTH maximum) {
+ WIDTH row1[MAX_WORD_LENGTH + 1];
+ WIDTH row2[MAX_WORD_LENGTH + 1];
+ WIDTH * CYTHON_RESTRICT pos_new;
+ WIDTH * CYTHON_RESTRICT pos_old;
+ int row_flip = 1; /* Does pos_new represent row1 or row2? */
+ int kind1 = PyUnicode_KIND(s1); /* How many bytes per unicode codepoint? */
+ int kind2 = PyUnicode_KIND(s2);
+
+ WIDTH len_s1 = (WIDTH)PyUnicode_GET_LENGTH(s1);
+ WIDTH len_s2 = (WIDTH)PyUnicode_GET_LENGTH(s2);
+ if (len_s1 > len_s2) {
+ PyObject * tmp = s1; s1 = s2; s2 = tmp;
+ const WIDTH tmpi = len_s1; len_s1 = len_s2; len_s2 = tmpi;
+ }
+ if (len_s2 - len_s1 > maximum) return maximum + 1;
+ if (len_s2 > MAX_WORD_LENGTH) return -1;
+ void * s1_data = PyUnicode_DATA(s1);
+ void * s2_data = PyUnicode_DATA(s2);
+
+ for (WIDTH tmpi = 0; tmpi <= len_s1; tmpi++) row2[tmpi] = tmpi;
+
+ for (WIDTH i2 = 0; i2 < len_s2; i2++) {
+ int all_bad = i2 >= maximum;
+ const Py_UCS4 ch = PyUnicode_READ(kind2, s2_data, i2);
+ row_flip = 1 - row_flip;
+ if (row_flip) {
+ pos_new = row2; pos_old = row1;
+ } else {
+ pos_new = row1; pos_old = row2;
+ }
+ *pos_new = i2 + 1;
+
+ for (WIDTH i1 = 0; i1 < len_s1; i1++) {
+ WIDTH val = *(pos_old++);
+ if (ch != PyUnicode_READ(kind1, s1_data, i1)) {
+ const WIDTH _val1 = *pos_old;
+ const WIDTH _val2 = *pos_new;
+ if (_val1 < val) val = _val1;
+ if (_val2 < val) val = _val2;
+ val += 1;
+ }
+ *(++pos_new) = val;
+ if (all_bad && val <= maximum) all_bad = 0;
+ }
+ if (all_bad) return maximum + 1;
+ }
+
+ return row_flip ? row2[len_s1] : row1[len_s1];
+ }
+ """
+ int ceditdist(PyObject *s1, PyObject *s2, int maximum)
+
+
+def editdist(s1: str, s2: str, max_dist=None):
+ """
+ Return the Levenshtein distance between two strings.
+
+ Use `max_dist` to control the maximum distance you care about. If the actual distance is larger
+ than `max_dist`, editdist will return early, with the value `max_dist+1`.
+ This is a performance optimization – for example if anything above distance 2 is uninteresting
+ to your application, call editdist with `max_dist=2` and ignore any return value greater than 2.
+
+ Leave `max_dist=None` (default) to always return the full Levenshtein distance (slower).
+
+ """
+ if s1 == s2:
+ return 0
+
+ result = ceditdist(s1, s2, MAX_WORD_LENGTH if max_dist is None else int(max_dist))
+ if result >= 0:
+ return result
+ elif result == -1:
+ raise ValueError(f"editdist doesn't support strings longer than {MAX_WORD_LENGTH} characters")
+ else:
+ raise ValueError(f"editdist returned an error: {result}")
+
+
+def indexkeys(word, max_dist):
+ """Return the set of index keys ("variants") of a word.
+
+ >>> indexkeys('aiu', 1)
+ {'aiu', 'iu', 'au', 'ai'}
+ """
+ res = set()
+ wordlen = len(word)
+ limit = min(max_dist, wordlen) + 1
+
+ for dist in range(limit):
+ for variant in itertools.combinations(word, wordlen - dist):
+ res.add(''.join(variant))
+
+ return res
+
+
+def set2bytes(s):
+ """Serialize a set of unicode strings into bytes.
+
+ >>> set2byte({u'a', u'b', u'c'})
+ b'a\x00b\x00c'
+ """
+ return '\x00'.join(s).encode('utf8')
+
+
+def bytes2set(b):
+ """Deserialize bytes into a set of unicode strings.
+
+ >>> bytes2set(b'a\x00b\x00c')
+ {u'a', u'b', u'c'}
+ """
+ return set(b.decode('utf8').split('\x00')) if b else set()
+
+
+class FastSS:
+
+ def __init__(self, words=None, max_dist=2):
+ """
+ Create a FastSS index. The index will contain encoded variants of all
+ indexed words, allowing fast "fuzzy string similarity" queries.
+
+ max_dist: maximum allowed edit distance of an indexed word to a query word. Keep
+ max_dist<=3 for sane performance.
+
+ """
+ self.db = {}
+ self.max_dist = max_dist
+ if words:
+ for word in words:
+ self.add(word)
+
+ def __str__(self):
+ return "%s" % (self.__class__.__name__, self.max_dist, len(self.db), )
+
+ def __contains__(self, word):
+ bkey = word.encode('utf8')
+ if bkey in self.db:
+ return word in bytes2set(self.db[bkey])
+ return False
+
+ def add(self, word):
+ """Add a string to the index."""
+ for key in indexkeys(word, self.max_dist):
+ bkey = key.encode('utf8')
+ wordset = {word}
+
+ if bkey in self.db:
+ wordset |= bytes2set(self.db[bkey])
+
+ self.db[bkey] = set2bytes(wordset)
+
+ def query(self, word, max_dist=None):
+ """Find all words from the index that are within max_dist of `word`."""
+ if max_dist is None:
+ max_dist = self.max_dist
+ if max_dist > self.max_dist:
+ raise ValueError(
+ f"query max_dist={max_dist} cannot be greater than max_dist={self.max_dist} from the constructor"
+ )
+
+ res = {d: [] for d in range(max_dist + 1)}
+ cands = set()
+
+ for key in indexkeys(word, max_dist):
+ bkey = key.encode('utf8')
+
+ if bkey in self.db:
+ cands.update(bytes2set(self.db[bkey]))
+
+ for cand in cands:
+ dist = editdist(word, cand, max_dist=max_dist)
+ if dist <= max_dist:
+ res[dist].append(cand)
+
+ return res
diff --git a/gensim/similarities/levenshtein.py b/gensim/similarities/levenshtein.py
index ca39e68dd0..51da72c065 100644
--- a/gensim/similarities/levenshtein.py
+++ b/gensim/similarities/levenshtein.py
@@ -5,149 +5,101 @@
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
"""
-This module provides a namespace for functions that use the Levenshtein distance.
+This module allows fast fuzzy search between strings, using kNN queries with Levenshtein similarity.
"""
-import itertools
import logging
-from math import floor
from gensim.similarities.termsim import TermSimilarityIndex
+from gensim import utils
+try:
+ from gensim.similarities.fastss import FastSS, editdist # noqa:F401
+except ImportError:
+ raise utils.NO_CYTHON
-logger = logging.getLogger(__name__)
-
-
-def levdist(t1, t2, max_distance=float("inf")):
- """Get the Levenshtein distance between two terms.
-
- Return the Levenshtein distance between two terms. The distance is a
- number between <1.0, inf>, higher is less similar.
-
- Parameters
- ----------
- t1 : {bytes, str, unicode}
- The first compared term.
- t2 : {bytes, str, unicode}
- The second compared term.
- max_distance : {int, float}, optional
- If you don't care about distances larger than a known threshold, a more
- efficient code path can be taken. For terms that are clearly "too far
- apart", we will not compute the distance exactly, but we will return
- `max(len(t1), len(t2))` more quickly, meaning "more than
- `max_distance`".
- Default: always compute distance exactly, no threshold clipping.
-
- Returns
- -------
- int
- The Levenshtein distance between `t1` and `t2`.
-
- """
- import Levenshtein
-
- distance = Levenshtein.distance(t1, t2)
- if distance > max_distance:
- return max(len(t1), len(t2))
- return distance
-
-
-def levsim(t1, t2, alpha=1.8, beta=5.0, min_similarity=0.0):
- """Get the Levenshtein similarity between two terms.
-
- Return the Levenshtein similarity between two terms. The similarity is a
- number between <0.0, 1.0>, higher is more similar.
-
- Parameters
- ----------
- t1 : {bytes, str, unicode}
- The first compared term.
- t2 : {bytes, str, unicode}
- The second compared term.
- alpha : float, optional
- The multiplicative factor alpha defined by Charlet and Damnati (2017).
- beta : float, optional
- The exponential factor beta defined by Charlet and Damnati (2017).
- min_similarity : {int, float}, optional
- If you don't care about similarities smaller than a known threshold, a
- more efficient code path can be taken. For terms that are clearly "too
- far apart", we will not compute the distance exactly, but we will
- return zero more quickly, meaning "less than `min_similarity`".
- Default: always compute similarity exactly, no threshold clipping.
-
- Returns
- -------
- float
- The Levenshtein similarity between `t1` and `t2`.
-
- Notes
- -----
- This notion of Levenshtein similarity was first defined in section 2.2 of
- `Delphine Charlet and Geraldine Damnati, "SimBow at SemEval-2017 Task 3:
- Soft-Cosine Semantic Similarity between Questions for Community Question
- Answering", 2017 `_.
-
- """
- assert alpha >= 0
- assert beta >= 0
-
- max_lengths = max(len(t1), len(t2))
- if max_lengths == 0:
- return 1.0
- min_similarity = float(max(min(min_similarity, 1.0), 0.0))
- max_distance = int(floor(max_lengths * (1 - (min_similarity / alpha) ** (1 / beta))))
- distance = levdist(t1, t2, max_distance)
- similarity = alpha * (1 - distance * 1.0 / max_lengths)**beta
- return similarity
+logger = logging.getLogger(__name__)
class LevenshteinSimilarityIndex(TermSimilarityIndex):
- """
- Computes Levenshtein similarities between terms and retrieves most similar
- terms for a given term.
+ r"""
+ Retrieve the most similar terms from a static set of terms ("dictionary")
+ given a query term, using Levenshtein similarity.
- Notes
- -----
- This is a naive implementation that iteratively computes pointwise Levenshtein similarities
- between individual terms. Using this implementation to compute the similarity of all terms in
- real-world dictionaries such as the English Wikipedia will take years.
+ "Levenshtein similarity" is a modification of the Levenshtein (edit) distance,
+ defined in [charletetal17]_.
+
+ This implementation uses the FastSS neighbourhood algorithm
+ for fast kNN nearest-neighbor retrieval.
Parameters
----------
dictionary : :class:`~gensim.corpora.dictionary.Dictionary`
A dictionary that specifies the considered terms.
alpha : float, optional
- The multiplicative factor alpha defined by Charlet and Damnati (2017).
+ Multiplicative factor `alpha` for the Levenshtein similarity. See [charletetal17]_.
beta : float, optional
- The exponential factor beta defined by Charlet and Damnati (2017).
- threshold : float, optional
- Only terms more similar than `threshold` are considered when retrieving
- the most similar terms for a given term.
+ The exponential factor `beta` for the Levenshtein similarity. See [charletetal17]_.
+ max_distance : int, optional
+ Do not consider terms with Levenshtein distance larger than this as
+ "similar". This is done for performance reasons: keep this value below 3
+ for reasonable retrieval performance. Default is 1.
See Also
--------
- :func:`gensim.similarities.levenshtein.levsim`
- The Levenshtein similarity.
+ :class:`~gensim.similarities.termsim.WordEmbeddingSimilarityIndex`
+ Retrieve most similar terms for a given term using the cosine
+ similarity over word embeddings.
:class:`~gensim.similarities.termsim.SparseTermSimilarityMatrix`
Build a term similarity matrix and compute the Soft Cosine Measure.
+ References
+ ----------
+
+ .. [charletetal17] Delphine Charlet and Geraldine Damnati, "SimBow at SemEval-2017 Task 3:
+ Soft-Cosine Semantic Similarity between Questions for Community Question Answering", 2017,
+ https://www.aclweb.org/anthology/S17-2051/.
+
"""
- def __init__(self, dictionary, alpha=1.8, beta=5.0, threshold=0.0):
+ def __init__(self, dictionary, alpha=1.8, beta=5.0, max_distance=2):
self.dictionary = dictionary
self.alpha = alpha
self.beta = beta
- self.threshold = threshold
+ self.max_distance = max_distance
+ logger.info("creating FastSS index from %s", dictionary)
+ self.index = FastSS(words=self.dictionary.values(), max_dist=max_distance)
super(LevenshteinSimilarityIndex, self).__init__()
+ def levsim(self, t1, t2, distance):
+ """Calculate the Levenshtein similarity between two terms given their Levenshtein distance."""
+ max_lengths = max(len(t1), len(t2)) or 1
+ return self.alpha * (1.0 - distance * 1.0 / max_lengths)**self.beta
+
def most_similar(self, t1, topn=10):
- similarities = (
- (levsim(t1, t2, self.alpha, self.beta, self.threshold), t2)
- for t2 in self.dictionary.values()
- if t1 != t2
- )
- most_similar = (
- (t2, similarity)
- for (similarity, t2) in sorted(similarities, reverse=True)
- if similarity > 0
- )
- return itertools.islice(most_similar, int(topn))
+ """kNN fuzzy search: find the `topn` most similar terms from `self.dictionary` to `t1`."""
+ result = {} # map of {dictionary term => its levenshtein similarity to t1}
+ if self.max_distance > 0:
+ effective_topn = topn + 1 if t1 in self.dictionary.token2id else topn
+ effective_topn = min(len(self.dictionary), effective_topn)
+
+ # Implement a "distance backoff" algorithm:
+ # Start with max_distance=1, for performance. And if that doesn't return enough results,
+ # continue with max_distance=2 etc, all the way until self.max_distance which
+ # is a hard cutoff.
+ # At that point stop searching, even if we don't have topn results yet.
+ #
+ # We use the backoff algo to speed up queries for short terms. These return enough results already
+ # with max_distance=1.
+ #
+ # See the discussion at https://github.com/RaRe-Technologies/gensim/pull/3146
+ for distance in range(1, self.max_distance + 1):
+ for t2 in self.index.query(t1, distance).get(distance, []):
+ if t1 == t2:
+ continue
+ similarity = self.levsim(t1, t2, distance)
+ if similarity > 0:
+ result[t2] = similarity
+ if len(result) >= effective_topn:
+ break
+
+ return sorted(result.items(), key=lambda x: (-x[1], x[0]))[:topn]
diff --git a/gensim/similarities/termsim.py b/gensim/similarities/termsim.py
index 8f39e9b36c..d2a3f6728f 100644
--- a/gensim/similarities/termsim.py
+++ b/gensim/similarities/termsim.py
@@ -99,10 +99,30 @@ def most_similar(self, t1, topn=10):
class WordEmbeddingSimilarityIndex(TermSimilarityIndex):
"""
- Use objects of this class to:
+ Computes cosine similarities between word embeddings and retrieves most
+ similar terms for a given term.
- 1) Compute cosine similarities between word embeddings.
- 2) Retrieve the closest word embeddings (by cosine similarity) to a given word embedding.
+ Notes
+ -----
+ By fitting the word embeddings to a vocabulary that you will be using, you
+ can eliminate all out-of-vocabulary (OOV) words that you would otherwise
+ receive from the `most_similar` method. In subword models such as fastText,
+ this procedure will also infer word-vectors for words from your vocabulary
+ that previously had no word-vector.
+
+ >>> from gensim.test.utils import common_texts, datapath
+ >>> from gensim.corpora import Dictionary
+ >>> from gensim.models import FastText
+ >>> from gensim.models.word2vec import LineSentence
+ >>> from gensim.similarities import WordEmbeddingSimilarityIndex
+ >>>
+ >>> model = FastText(common_texts, vector_size=20, min_count=1) # train word-vectors on a corpus
+ >>> different_corpus = LineSentence(datapath('lee_background.cor'))
+ >>> dictionary = Dictionary(different_corpus) # construct a vocabulary on a different corpus
+ >>> words = [word for word, count in dictionary.most_common()]
+ >>> word_vectors = model.wv.vectors_for_all(words) # remove OOV word-vectors and infer word-vectors for new words
+ >>> assert len(dictionary) == len(word_vectors) # all words from our vocabulary received their word-vectors
+ >>> termsim_index = WordEmbeddingSimilarityIndex(word_vectors)
Parameters
----------
@@ -114,13 +134,16 @@ class WordEmbeddingSimilarityIndex(TermSimilarityIndex):
exponent : float, optional
Take the word embedding similarities larger than `threshold` to the power of `exponent`.
kwargs : dict or None
- A dict with keyword arguments that will be passed to the `keyedvectors.most_similar` method
+ A dict with keyword arguments that will be passed to the
+ :meth:`~gensim.models.keyedvectors.KeyedVectors.most_similar` method
when retrieving the word embeddings closest to a given word embedding.
See Also
--------
+ :class:`~gensim.similarities.levenshtein.LevenshteinSimilarityIndex`
+ Retrieve most similar terms for a given term using the Levenshtein distance.
:class:`~gensim.similarities.termsim.SparseTermSimilarityMatrix`
- A sparse term similarity matrix built using a term similarity index.
+ Build a term similarity matrix and compute the Soft Cosine Measure.
"""
def __init__(self, keyedvectors, threshold=0.0, exponent=2.0, kwargs=None):
@@ -195,12 +218,12 @@ def tfidf_sort_key(term_index):
return (-term_idf, term_index)
if tfidf is None:
- logger.info("iterating over columns in dictionary order")
columns = sorted(dictionary.keys())
+ logger.info("iterating over %i columns in dictionary order", len(columns))
else:
assert max(tfidf.idfs) == matrix_order - 1
- logger.info("iterating over columns in tf-idf order")
columns = sorted(tfidf.idfs.keys(), key=tfidf_sort_key)
+ logger.info("iterating over %i columns in tf-idf order", len(columns))
nonzero_counter_dtype = _shortest_uint_dtype(nonzero_limit)
@@ -403,25 +426,29 @@ class SparseTermSimilarityMatrix(SaveLoad):
Examples
--------
- >>> from gensim.test.utils import common_texts
+ >>> from gensim.test.utils import common_texts as corpus, datapath
>>> from gensim.corpora import Dictionary
>>> from gensim.models import Word2Vec
>>> from gensim.similarities import SoftCosineSimilarity, SparseTermSimilarityMatrix, WordEmbeddingSimilarityIndex
>>> from gensim.similarities.index import AnnoyIndexer
- >>> from scikits.sparse.cholmod import cholesky
>>>
- >>> model = Word2Vec(common_texts, vector_size=20, min_count=1) # train word-vectors
- >>> annoy = AnnoyIndexer(model, num_trees=2) # use annoy for faster word similarity lookups
- >>> termsim_index = WordEmbeddingSimilarityIndex(model.wv, kwargs={'indexer': annoy})
- >>> dictionary = Dictionary(common_texts)
- >>> bow_corpus = [dictionary.doc2bow(document) for document in common_texts]
- >>> similarity_matrix = SparseTermSimilarityMatrix(termsim_index, dictionary, symmetric=True, dominant=True)
- >>> docsim_index = SoftCosineSimilarity(bow_corpus, similarity_matrix, num_best=10)
+ >>> model_corpus_file = datapath('lee_background.cor')
+ >>> model = Word2Vec(corpus_file=model_corpus_file, vector_size=20, min_count=1) # train word-vectors
>>>
- >>> query = 'graph trees computer'.split() # make a query
- >>> sims = docsim_index[dictionary.doc2bow(query)] # calculate similarity of query to each doc from bow_corpus
+ >>> dictionary = Dictionary(corpus)
+ >>> tfidf = TfidfModel(dictionary=dictionary)
+ >>> words = [word for word, count in dictionary.most_common()]
+ >>> word_vectors = model.wv.vectors_for_all(words, allow_inference=False) # produce vectors for words in corpus
+ >>>
+ >>> indexer = AnnoyIndexer(word_vectors, num_trees=2) # use Annoy for faster word similarity lookups
+ >>> termsim_index = WordEmbeddingSimilarityIndex(word_vectors, kwargs={'indexer': indexer})
+ >>> similarity_matrix = SparseTermSimilarityMatrix(termsim_index, dictionary, tfidf) # compute word similarities
>>>
- >>> word_embeddings = cholesky(similarity_matrix.matrix).L() # obtain word embeddings from similarity matrix
+ >>> tfidf_corpus = tfidf[[dictionary.doc2bow(document) for document in common_texts]]
+ >>> docsim_index = SoftCosineSimilarity(tfidf_corpus, similarity_matrix, num_best=10) # index tfidf_corpus
+ >>>
+ >>> query = 'graph trees computer'.split() # make a query
+ >>> sims = docsim_index[dictionary.doc2bow(query)] # find the ten closest documents from tfidf_corpus
Check out `the Gallery `_
for more examples.
diff --git a/gensim/test/test_coherencemodel.py b/gensim/test/test_coherencemodel.py
new file mode 100644
index 0000000000..9396fe5ac0
--- /dev/null
+++ b/gensim/test/test_coherencemodel.py
@@ -0,0 +1,300 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2010 Radim Rehurek
+# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
+
+"""
+Automated tests for checking transformation algorithms (the models package).
+"""
+
+import logging
+import unittest
+import multiprocessing as mp
+from functools import partial
+
+import numpy as np
+from gensim.matutils import argsort
+from gensim.models.coherencemodel import CoherenceModel, BOOLEAN_DOCUMENT_BASED
+from gensim.models.ldamodel import LdaModel
+from gensim.test.utils import get_tmpfile, common_texts, common_dictionary, common_corpus
+
+
+class TestCoherenceModel(unittest.TestCase):
+
+ # set up vars used in testing ("Deerwester" from the web tutorial)
+ texts = common_texts
+ dictionary = common_dictionary
+ corpus = common_corpus
+
+ def setUp(self):
+ # Suppose given below are the topics which two different LdaModels come up with.
+ # `topics1` is clearly better as it has a clear distinction between system-human
+ # interaction and graphs. Hence both the coherence measures for `topics1` should be
+ # greater.
+ self.topics1 = [
+ ['human', 'computer', 'system', 'interface'],
+ ['graph', 'minors', 'trees', 'eps']
+ ]
+ self.topics2 = [
+ ['user', 'graph', 'minors', 'system'],
+ ['time', 'graph', 'survey', 'minors']
+ ]
+ self.topics3 = [
+ ['token', 'computer', 'system', 'interface'],
+ ['graph', 'minors', 'trees', 'eps']
+ ]
+ # using this list the model should be unable to interpret topic
+ # as either a list of tokens or a list of ids
+ self.topics4 = [
+ ['not a token', 'not an id', 'tests using', "this list"],
+ ['should raise', 'an error', 'to pass', 'correctly']
+ ]
+ self.topicIds1 = []
+ for topic in self.topics1:
+ self.topicIds1.append([self.dictionary.token2id[token] for token in topic])
+
+ self.ldamodel = LdaModel(
+ corpus=self.corpus, id2word=self.dictionary, num_topics=2,
+ passes=0, iterations=0
+ )
+
+ def check_coherence_measure(self, coherence):
+ """Check provided topic coherence algorithm on given topics"""
+ if coherence in BOOLEAN_DOCUMENT_BASED:
+ kwargs = dict(corpus=self.corpus, dictionary=self.dictionary, coherence=coherence)
+ else:
+ kwargs = dict(texts=self.texts, dictionary=self.dictionary, coherence=coherence)
+
+ cm1 = CoherenceModel(topics=self.topics1, **kwargs)
+ cm2 = CoherenceModel(topics=self.topics2, **kwargs)
+ cm3 = CoherenceModel(topics=self.topics3, **kwargs)
+ cm4 = CoherenceModel(topics=self.topicIds1, **kwargs)
+ self.assertRaises(ValueError, lambda: CoherenceModel(topics=self.topics4, **kwargs))
+ self.assertEqual(cm1.get_coherence(), cm4.get_coherence())
+ self.assertIsInstance(cm3.get_coherence(), np.double)
+ self.assertGreater(cm1.get_coherence(), cm2.get_coherence())
+
+ def testUMass(self):
+ """Test U_Mass topic coherence algorithm on given topics"""
+ self.check_coherence_measure('u_mass')
+
+ def testCv(self):
+ """Test C_v topic coherence algorithm on given topics"""
+ self.check_coherence_measure('c_v')
+
+ def testCuci(self):
+ """Test C_uci topic coherence algorithm on given topics"""
+ self.check_coherence_measure('c_uci')
+
+ def testCnpmi(self):
+ """Test C_npmi topic coherence algorithm on given topics"""
+ self.check_coherence_measure('c_npmi')
+
+ def testUMassLdaModel(self):
+ """Perform sanity check to see if u_mass coherence works with LDA Model"""
+ # Note that this is just a sanity check because LDA does not guarantee a better coherence
+ # value on the topics if iterations are increased. This can be seen here:
+ # https://gist.github.com/dsquareindia/60fd9ab65b673711c3fa00509287ddde
+ CoherenceModel(model=self.ldamodel, corpus=self.corpus, coherence='u_mass')
+
+ def testCvLdaModel(self):
+ """Perform sanity check to see if c_v coherence works with LDA Model"""
+ CoherenceModel(model=self.ldamodel, texts=self.texts, coherence='c_v')
+
+ def testCw2vLdaModel(self):
+ """Perform sanity check to see if c_w2v coherence works with LDAModel."""
+ CoherenceModel(model=self.ldamodel, texts=self.texts, coherence='c_w2v')
+
+ def testCuciLdaModel(self):
+ """Perform sanity check to see if c_uci coherence works with LDA Model"""
+ CoherenceModel(model=self.ldamodel, texts=self.texts, coherence='c_uci')
+
+ def testCnpmiLdaModel(self):
+ """Perform sanity check to see if c_npmi coherence works with LDA Model"""
+ CoherenceModel(model=self.ldamodel, texts=self.texts, coherence='c_npmi')
+
+ def testErrors(self):
+ """Test if errors are raised on bad input"""
+ # not providing dictionary
+ self.assertRaises(
+ ValueError, CoherenceModel, topics=self.topics1, corpus=self.corpus,
+ coherence='u_mass'
+ )
+ # not providing texts for c_v and instead providing corpus
+ self.assertRaises(
+ ValueError, CoherenceModel, topics=self.topics1, corpus=self.corpus,
+ dictionary=self.dictionary, coherence='c_v'
+ )
+ # not providing corpus or texts for u_mass
+ self.assertRaises(
+ ValueError, CoherenceModel, topics=self.topics1, dictionary=self.dictionary,
+ coherence='u_mass'
+ )
+
+ def testProcesses(self):
+ get_model = partial(CoherenceModel,
+ topics=self.topics1, corpus=self.corpus, dictionary=self.dictionary, coherence='u_mass'
+ )
+
+ model, used_cpus = get_model(), mp.cpu_count() - 1
+ self.assertEqual(model.processes, used_cpus)
+ for p in range(-2, 1):
+ self.assertEqual(get_model(processes=p).processes, used_cpus)
+
+ for p in range(1, 4):
+ self.assertEqual(get_model(processes=p).processes, p)
+
+ def testPersistence(self):
+ fname = get_tmpfile('gensim_models_coherence.tst')
+ model = CoherenceModel(
+ topics=self.topics1, corpus=self.corpus, dictionary=self.dictionary, coherence='u_mass'
+ )
+ model.save(fname)
+ model2 = CoherenceModel.load(fname)
+ self.assertTrue(model.get_coherence() == model2.get_coherence())
+
+ def testPersistenceCompressed(self):
+ fname = get_tmpfile('gensim_models_coherence.tst.gz')
+ model = CoherenceModel(
+ topics=self.topics1, corpus=self.corpus, dictionary=self.dictionary, coherence='u_mass'
+ )
+ model.save(fname)
+ model2 = CoherenceModel.load(fname)
+ self.assertTrue(model.get_coherence() == model2.get_coherence())
+
+ def testPersistenceAfterProbabilityEstimationUsingCorpus(self):
+ fname = get_tmpfile('gensim_similarities.tst.pkl')
+ model = CoherenceModel(
+ topics=self.topics1, corpus=self.corpus, dictionary=self.dictionary, coherence='u_mass'
+ )
+ model.estimate_probabilities()
+ model.save(fname)
+ model2 = CoherenceModel.load(fname)
+ self.assertIsNotNone(model2._accumulator)
+ self.assertTrue(model.get_coherence() == model2.get_coherence())
+
+ def testPersistenceAfterProbabilityEstimationUsingTexts(self):
+ fname = get_tmpfile('gensim_similarities.tst.pkl')
+ model = CoherenceModel(
+ topics=self.topics1, texts=self.texts, dictionary=self.dictionary, coherence='c_v'
+ )
+ model.estimate_probabilities()
+ model.save(fname)
+ model2 = CoherenceModel.load(fname)
+ self.assertIsNotNone(model2._accumulator)
+ self.assertTrue(model.get_coherence() == model2.get_coherence())
+
+ def testAccumulatorCachingSameSizeTopics(self):
+ kwargs = dict(corpus=self.corpus, dictionary=self.dictionary, coherence='u_mass')
+ cm1 = CoherenceModel(topics=self.topics1, **kwargs)
+ cm1.estimate_probabilities()
+ accumulator = cm1._accumulator
+ self.assertIsNotNone(accumulator)
+ cm1.topics = self.topics1
+ self.assertEqual(accumulator, cm1._accumulator)
+ cm1.topics = self.topics2
+ self.assertEqual(None, cm1._accumulator)
+
+ def testAccumulatorCachingTopicSubsets(self):
+ kwargs = dict(corpus=self.corpus, dictionary=self.dictionary, coherence='u_mass')
+ cm1 = CoherenceModel(topics=self.topics1, **kwargs)
+ cm1.estimate_probabilities()
+ accumulator = cm1._accumulator
+ self.assertIsNotNone(accumulator)
+ cm1.topics = [t[:2] for t in self.topics1]
+ self.assertEqual(accumulator, cm1._accumulator)
+ cm1.topics = self.topics1
+ self.assertEqual(accumulator, cm1._accumulator)
+
+ def testAccumulatorCachingWithModelSetting(self):
+ kwargs = dict(corpus=self.corpus, dictionary=self.dictionary, coherence='u_mass')
+ cm1 = CoherenceModel(topics=self.topics1, **kwargs)
+ cm1.estimate_probabilities()
+ self.assertIsNotNone(cm1._accumulator)
+ cm1.model = self.ldamodel
+ topics = []
+ for topic in self.ldamodel.state.get_lambda():
+ bestn = argsort(topic, topn=cm1.topn, reverse=True)
+ topics.append(bestn)
+ self.assertTrue(np.array_equal(topics, cm1.topics))
+ self.assertIsNone(cm1._accumulator)
+
+ def testAccumulatorCachingWithTopnSettingGivenTopics(self):
+ kwargs = dict(corpus=self.corpus, dictionary=self.dictionary, topn=5, coherence='u_mass')
+ cm1 = CoherenceModel(topics=self.topics1, **kwargs)
+ cm1.estimate_probabilities()
+ self.assertIsNotNone(cm1._accumulator)
+
+ accumulator = cm1._accumulator
+ topics_before = cm1._topics
+ cm1.topn = 3
+ self.assertEqual(accumulator, cm1._accumulator)
+ self.assertEqual(3, len(cm1.topics[0]))
+ self.assertEqual(topics_before, cm1._topics)
+
+ # Topics should not have been truncated, so topn settings below 5 should work
+ cm1.topn = 4
+ self.assertEqual(accumulator, cm1._accumulator)
+ self.assertEqual(4, len(cm1.topics[0]))
+ self.assertEqual(topics_before, cm1._topics)
+
+ with self.assertRaises(ValueError):
+ cm1.topn = 6 # can't expand topics any further without model
+
+ def testAccumulatorCachingWithTopnSettingGivenModel(self):
+ kwargs = dict(corpus=self.corpus, dictionary=self.dictionary, topn=5, coherence='u_mass')
+ cm1 = CoherenceModel(model=self.ldamodel, **kwargs)
+ cm1.estimate_probabilities()
+ self.assertIsNotNone(cm1._accumulator)
+
+ accumulator = cm1._accumulator
+ topics_before = cm1._topics
+ cm1.topn = 3
+ self.assertEqual(accumulator, cm1._accumulator)
+ self.assertEqual(3, len(cm1.topics[0]))
+ self.assertEqual(topics_before, cm1._topics)
+
+ cm1.topn = 6 # should be able to expand given the model
+ self.assertEqual(6, len(cm1.topics[0]))
+
+ def testCompareCoherenceForTopics(self):
+ topics = [self.topics1, self.topics2]
+ cm = CoherenceModel.for_topics(
+ topics, dictionary=self.dictionary, texts=self.texts, coherence='c_v')
+ self.assertIsNotNone(cm._accumulator)
+
+ # Accumulator should have all relevant IDs.
+ for topic_list in topics:
+ cm.topics = topic_list
+ self.assertIsNotNone(cm._accumulator)
+
+ (coherence_topics1, coherence1), (coherence_topics2, coherence2) = \
+ cm.compare_model_topics(topics)
+
+ self.assertAlmostEqual(np.mean(coherence_topics1), coherence1, 4)
+ self.assertAlmostEqual(np.mean(coherence_topics2), coherence2, 4)
+ self.assertGreater(coherence1, coherence2)
+
+ def testCompareCoherenceForModels(self):
+ models = [self.ldamodel, self.ldamodel]
+ cm = CoherenceModel.for_models(
+ models, dictionary=self.dictionary, texts=self.texts, coherence='c_v')
+ self.assertIsNotNone(cm._accumulator)
+
+ # Accumulator should have all relevant IDs.
+ for model in models:
+ cm.model = model
+ self.assertIsNotNone(cm._accumulator)
+
+ (coherence_topics1, coherence1), (coherence_topics2, coherence2) = \
+ cm.compare_models(models)
+
+ self.assertAlmostEqual(np.mean(coherence_topics1), coherence1, 4)
+ self.assertAlmostEqual(np.mean(coherence_topics2), coherence2, 4)
+ self.assertAlmostEqual(coherence1, coherence2, places=4)
+
+
+if __name__ == '__main__':
+ logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.DEBUG)
+ unittest.main()
diff --git a/gensim/test/test_corpora_dictionary.py b/gensim/test/test_corpora_dictionary.py
index 7d6659e896..b1f4d4f33f 100644
--- a/gensim/test/test_corpora_dictionary.py
+++ b/gensim/test/test_corpora_dictionary.py
@@ -359,6 +359,18 @@ def test_patch_with_special_tokens(self):
self.assertNotIn((1, 1), d.doc2bow(corpus_with_special_tokens[0]))
self.assertIn((1, 1), d.doc2bow(corpus_with_special_tokens[1]))
+ def test_most_common_with_n(self):
+ texts = [['human', 'human', 'human', 'computer', 'computer', 'interface', 'interface']]
+ d = Dictionary(texts)
+ expected = [('human', 3), ('computer', 2)]
+ assert d.most_common(n=2) == expected
+
+ def test_most_common_without_n(self):
+ texts = [['human', 'human', 'human', 'computer', 'computer', 'interface', 'interface']]
+ d = Dictionary(texts)
+ expected = [('human', 3), ('computer', 2), ('interface', 2)]
+ assert d.most_common(n=None) == expected
+
# endclass TestDictionary
diff --git a/gensim/test/test_data/ensemblelda b/gensim/test/test_data/ensemblelda
new file mode 100644
index 0000000000..cf396af29c
Binary files /dev/null and b/gensim/test/test_data/ensemblelda differ
diff --git a/gensim/test/test_doc2vec.py b/gensim/test/test_doc2vec.py
index 60c9158744..c8b7516c99 100644
--- a/gensim/test/test_doc2vec.py
+++ b/gensim/test/test_doc2vec.py
@@ -589,6 +589,44 @@ def test_dmc_neg_fromfile(self):
)
self.model_sanity(model)
+ def test_dmm_fixedwindowsize(self):
+ """Test DMM doc2vec training with fixed window size."""
+ model = doc2vec.Doc2Vec(
+ list_corpus, vector_size=24,
+ dm=1, dm_mean=1, window=4, shrink_windows=False,
+ hs=0, negative=10, alpha=0.05, min_count=2, epochs=20
+ )
+ self.model_sanity(model)
+
+ def test_dmm_fixedwindowsize_fromfile(self):
+ """Test DMM doc2vec training with fixed window size, from file."""
+ with temporary_file(get_tmpfile('gensim_doc2vec.tst')) as corpus_file:
+ save_lee_corpus_as_line_sentence(corpus_file)
+ model = doc2vec.Doc2Vec(
+ corpus_file=corpus_file, vector_size=24,
+ dm=1, dm_mean=1, window=4, shrink_windows=False,
+ hs=0, negative=10, alpha=0.05, min_count=2, epochs=20
+ )
+ self.model_sanity(model)
+
+ def test_dbow_fixedwindowsize(self):
+ """Test DBOW doc2vec training with fixed window size."""
+ model = doc2vec.Doc2Vec(
+ list_corpus, vector_size=16, shrink_windows=False,
+ dm=0, hs=0, negative=5, min_count=2, epochs=20
+ )
+ self.model_sanity(model)
+
+ def test_dbow_fixedwindowsize_fromfile(self):
+ """Test DBOW doc2vec training with fixed window size, from file."""
+ with temporary_file(get_tmpfile('gensim_doc2vec.tst')) as corpus_file:
+ save_lee_corpus_as_line_sentence(corpus_file)
+ model = doc2vec.Doc2Vec(
+ corpus_file=corpus_file, vector_size=16, shrink_windows=False,
+ dm=0, hs=0, negative=5, min_count=2, epochs=20
+ )
+ self.model_sanity(model)
+
def test_parallel(self):
"""Test doc2vec parallel training with more than default 3 threads."""
# repeat the ~300 doc (~60000 word) Lee corpus to get 6000 docs (~1.2M words)
@@ -718,8 +756,8 @@ def __str__(self):
def epochs(self):
return self.models[0].epochs
- def infer_vector(self, document, alpha=None, min_alpha=None, epochs=None, steps=None):
- return np.concatenate([model.infer_vector(document, alpha, min_alpha, epochs, steps) for model in self.models])
+ def infer_vector(self, document, alpha=None, min_alpha=None, epochs=None):
+ return np.concatenate([model.infer_vector(document, alpha, min_alpha, epochs) for model in self.models])
def train(self, *ignore_args, **ignore_kwargs):
pass # train subcomponents individually
diff --git a/gensim/test/test_ensemblelda.py b/gensim/test/test_ensemblelda.py
new file mode 100644
index 0000000000..ad574108f9
--- /dev/null
+++ b/gensim/test/test_ensemblelda.py
@@ -0,0 +1,448 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Author: Tobias B
+
+"""
+Automated tests for checking the EnsembleLda Class
+"""
+
+import os
+import logging
+import unittest
+
+import numpy as np
+from copy import deepcopy
+
+import pytest
+
+from gensim.models import EnsembleLda, LdaMulticore, LdaModel
+from gensim.test.utils import datapath, get_tmpfile, common_corpus, common_dictionary
+
+NUM_TOPICS = 2
+NUM_MODELS = 4
+PASSES = 50
+RANDOM_STATE = 0
+
+# windows tests fail due to the required assertion precision being too high
+RTOL = 1e-04 if os.name == 'nt' else 1e-05
+
+
+class TestEnsembleLda(unittest.TestCase):
+ def get_elda(self):
+ return EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary, num_topics=NUM_TOPICS,
+ passes=PASSES, num_models=NUM_MODELS, random_state=RANDOM_STATE,
+ topic_model_class=LdaModel,
+ )
+
+ def get_elda_mem_unfriendly(self):
+ return EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary, num_topics=NUM_TOPICS,
+ passes=PASSES, num_models=NUM_MODELS, random_state=RANDOM_STATE,
+ memory_friendly_ttda=False, topic_model_class=LdaModel,
+ )
+
+ def assert_ttda_is_valid(self, elda):
+ """Check that ttda has one or more topic and that term probabilities add to one."""
+ assert len(elda.ttda) > 0
+ sum_over_terms = elda.ttda.sum(axis=1)
+ expected_sum_over_terms = np.ones(len(elda.ttda)).astype(np.float32)
+ np.testing.assert_allclose(sum_over_terms, expected_sum_over_terms, rtol=1e-04)
+
+ def test_elda(self):
+ elda = self.get_elda()
+
+ # given that the random_state doesn't change, it should
+ # always be 2 detected topics in this setup.
+ assert elda.stable_topics.shape[1] == len(common_dictionary)
+ assert len(elda.ttda) == NUM_MODELS * NUM_TOPICS
+ self.assert_ttda_is_valid(elda)
+
+ def test_backwards_compatibility_with_persisted_model(self):
+ elda = self.get_elda()
+
+ # compare with a pre-trained reference model
+ loaded_elda = EnsembleLda.load(datapath('ensemblelda'))
+ np.testing.assert_allclose(elda.ttda, loaded_elda.ttda, rtol=RTOL)
+ atol = loaded_elda.asymmetric_distance_matrix.max() * 1e-05
+ np.testing.assert_allclose(
+ elda.asymmetric_distance_matrix,
+ loaded_elda.asymmetric_distance_matrix, atol=atol,
+ )
+
+ def test_recluster(self):
+ # the following test is quite specific to the current implementation and not part of any api,
+ # but it makes improving those sections of the code easier as long as sorted_clusters and the
+ # cluster_model results are supposed to stay the same. Potentially this test will deprecate.
+
+ elda = EnsembleLda.load(datapath('ensemblelda'))
+ loaded_cluster_model_results = deepcopy(elda.cluster_model.results)
+ loaded_valid_clusters = deepcopy(elda.valid_clusters)
+ loaded_stable_topics = deepcopy(elda.get_topics())
+
+ # continue training with the distance matrix of the pretrained reference and see if
+ # the generated clusters match.
+ elda.asymmetric_distance_matrix_outdated = True
+ elda.recluster()
+
+ self.assert_clustering_results_equal(elda.cluster_model.results, loaded_cluster_model_results)
+ assert elda.valid_clusters == loaded_valid_clusters
+ np.testing.assert_allclose(elda.get_topics(), loaded_stable_topics, rtol=RTOL)
+
+ def test_recluster_does_nothing_when_stable_topics_already_found(self):
+ elda = self.get_elda()
+
+ # reclustering shouldn't change anything without
+ # added models or different parameters
+ elda.recluster()
+
+ assert elda.stable_topics.shape[1] == len(common_dictionary)
+ assert len(elda.ttda) == NUM_MODELS * NUM_TOPICS
+ self.assert_ttda_is_valid(elda)
+
+ def test_not_trained_given_zero_passes(self):
+ elda = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary, num_topics=NUM_TOPICS,
+ passes=0, num_models=NUM_MODELS, random_state=RANDOM_STATE,
+ )
+ assert len(elda.ttda) == 0
+
+ def test_not_trained_given_no_corpus(self):
+ elda = EnsembleLda(
+ id2word=common_dictionary, num_topics=NUM_TOPICS,
+ passes=PASSES, num_models=NUM_MODELS, random_state=RANDOM_STATE,
+ )
+ assert len(elda.ttda) == 0
+
+ def test_not_trained_given_zero_iterations(self):
+ elda = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary, num_topics=NUM_TOPICS,
+ iterations=0, num_models=NUM_MODELS, random_state=RANDOM_STATE,
+ )
+ assert len(elda.ttda) == 0
+
+ def test_not_trained_given_zero_models(self):
+ elda = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary, num_topics=NUM_TOPICS,
+ passes=PASSES, num_models=0, random_state=RANDOM_STATE
+ )
+ assert len(elda.ttda) == 0
+
+ def test_mem_unfriendly(self):
+ # elda_mem_unfriendly and self.elda should have topics that are
+ # the same up to floating point variations caused by the two different
+ # implementations
+
+ elda = self.get_elda()
+ elda_mem_unfriendly = self.get_elda_mem_unfriendly()
+
+ assert len(elda_mem_unfriendly.tms) == NUM_MODELS
+ np.testing.assert_allclose(elda.ttda, elda_mem_unfriendly.ttda, rtol=RTOL)
+ np.testing.assert_allclose(elda.get_topics(), elda_mem_unfriendly.get_topics(), rtol=RTOL)
+ self.assert_ttda_is_valid(elda_mem_unfriendly)
+
+ def test_generate_gensim_representation(self):
+ elda = self.get_elda()
+
+ gensim_model = elda.generate_gensim_representation()
+ topics = gensim_model.get_topics()
+ np.testing.assert_allclose(elda.get_topics(), topics, rtol=RTOL)
+
+ def assert_clustering_results_equal(self, clustering_results_1, clustering_results_2):
+ """Assert important attributes of the cluster results"""
+ np.testing.assert_array_equal(
+ [element.label for element in clustering_results_1],
+ [element.label for element in clustering_results_2],
+ )
+ np.testing.assert_array_equal(
+ [element.is_core for element in clustering_results_1],
+ [element.is_core for element in clustering_results_2],
+ )
+
+ def test_persisting(self):
+ elda = self.get_elda()
+ elda_mem_unfriendly = self.get_elda_mem_unfriendly()
+
+ fname = get_tmpfile('gensim_models_ensemblelda')
+ elda.save(fname)
+ loaded_elda = EnsembleLda.load(fname)
+ # storing the ensemble without memory_friendy_ttda
+ elda_mem_unfriendly.save(fname)
+ loaded_elda_mem_unfriendly = EnsembleLda.load(fname)
+
+ # topic_model_class will be lazy loaded and should be None first
+ assert loaded_elda.topic_model_class is None
+
+ # was it stored and loaded correctly?
+ # memory friendly.
+ loaded_elda_representation = loaded_elda.generate_gensim_representation()
+
+ # generating the representation also lazily loads the topic_model_class
+ assert loaded_elda.topic_model_class == LdaModel
+
+ topics = loaded_elda_representation.get_topics()
+ ttda = loaded_elda.ttda
+ amatrix = loaded_elda.asymmetric_distance_matrix
+ np.testing.assert_allclose(elda.get_topics(), topics, rtol=RTOL)
+ np.testing.assert_allclose(elda.ttda, ttda, rtol=RTOL)
+ np.testing.assert_allclose(elda.asymmetric_distance_matrix, amatrix, rtol=RTOL)
+
+ expected_clustering_results = elda.cluster_model.results
+ loaded_clustering_results = loaded_elda.cluster_model.results
+
+ self.assert_clustering_results_equal(expected_clustering_results, loaded_clustering_results)
+
+ # memory unfriendly
+ loaded_elda_mem_unfriendly_representation = loaded_elda_mem_unfriendly.generate_gensim_representation()
+ topics = loaded_elda_mem_unfriendly_representation.get_topics()
+ np.testing.assert_allclose(elda.get_topics(), topics, rtol=RTOL)
+
+ def test_multiprocessing(self):
+ # same configuration
+ random_state = RANDOM_STATE
+
+ # use 3 processes for the ensemble and the distance,
+ # so that the 4 models and 8 topics cannot be distributed
+ # to each worker evenly
+ workers = 3
+
+ # memory friendly. contains List of topic word distributions
+ elda = self.get_elda()
+ elda_multiprocessing = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary, topic_model_class=LdaModel,
+ num_topics=NUM_TOPICS, passes=PASSES, num_models=NUM_MODELS,
+ random_state=random_state, ensemble_workers=workers, distance_workers=workers,
+ )
+
+ # memory unfriendly. contains List of models
+ elda_mem_unfriendly = self.get_elda_mem_unfriendly()
+ elda_multiprocessing_mem_unfriendly = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary, topic_model_class=LdaModel,
+ num_topics=NUM_TOPICS, passes=PASSES, num_models=NUM_MODELS,
+ random_state=random_state, ensemble_workers=workers, distance_workers=workers,
+ memory_friendly_ttda=False,
+ )
+
+ np.testing.assert_allclose(
+ elda.get_topics(),
+ elda_multiprocessing.get_topics(),
+ rtol=RTOL
+ )
+ np.testing.assert_allclose(
+ elda_mem_unfriendly.get_topics(),
+ elda_multiprocessing_mem_unfriendly.get_topics(),
+ rtol=RTOL
+ )
+
+ def test_add_models_to_empty(self):
+ elda = self.get_elda()
+
+ ensemble = EnsembleLda(id2word=common_dictionary, num_models=0)
+ ensemble.add_model(elda.ttda[0:1])
+ ensemble.add_model(elda.ttda[1:])
+ ensemble.recluster()
+ np.testing.assert_allclose(ensemble.get_topics(), elda.get_topics(), rtol=RTOL)
+
+ # persisting an ensemble that is entirely built from existing ttdas
+ fname = get_tmpfile('gensim_models_ensemblelda')
+ ensemble.save(fname)
+ loaded_ensemble = EnsembleLda.load(fname)
+ np.testing.assert_allclose(loaded_ensemble.get_topics(), elda.get_topics(), rtol=RTOL)
+ self.test_inference(loaded_ensemble)
+
+ def test_add_models(self):
+ # make sure countings and sizes after adding are correct
+ # create new models and add other models to them.
+
+ # there are a ton of configurations for the first parameter possible,
+ # try them all
+
+ # quickly train something that can be used for counting results
+ num_new_models = 3
+ num_new_topics = 3
+
+ # 1. memory friendly
+ base_elda = self.get_elda()
+ cumulative_elda = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary,
+ num_topics=num_new_topics, passes=1, num_models=num_new_models,
+ iterations=1, random_state=RANDOM_STATE, topic_model_class=LdaMulticore,
+ workers=3, ensemble_workers=2,
+ )
+
+ # 1.1 ttda
+ num_topics_before_add_model = len(cumulative_elda.ttda)
+ num_models_before_add_model = cumulative_elda.num_models
+ cumulative_elda.add_model(base_elda.ttda)
+ assert len(cumulative_elda.ttda) == num_topics_before_add_model + len(base_elda.ttda)
+ assert cumulative_elda.num_models == num_models_before_add_model + 1 # defaults to 1 for one ttda matrix
+
+ # 1.2 an ensemble
+ num_topics_before_add_model = len(cumulative_elda.ttda)
+ num_models_before_add_model = cumulative_elda.num_models
+ cumulative_elda.add_model(base_elda, 5)
+ assert len(cumulative_elda.ttda) == num_topics_before_add_model + len(base_elda.ttda)
+ assert cumulative_elda.num_models == num_models_before_add_model + 5
+
+ # 1.3 a list of ensembles
+ num_topics_before_add_model = len(cumulative_elda.ttda)
+ num_models_before_add_model = cumulative_elda.num_models
+ # it should be totally legit to add a memory unfriendly object to a memory friendly one
+ base_elda_mem_unfriendly = self.get_elda_mem_unfriendly()
+ cumulative_elda.add_model([base_elda, base_elda_mem_unfriendly])
+ assert len(cumulative_elda.ttda) == num_topics_before_add_model + 2 * len(base_elda.ttda)
+ assert cumulative_elda.num_models == num_models_before_add_model + 2 * NUM_MODELS
+
+ # 1.4 a single gensim model
+ model = base_elda.classic_model_representation
+
+ num_topics_before_add_model = len(cumulative_elda.ttda)
+ num_models_before_add_model = cumulative_elda.num_models
+ cumulative_elda.add_model(model)
+ assert len(cumulative_elda.ttda) == num_topics_before_add_model + len(model.get_topics())
+ assert cumulative_elda.num_models == num_models_before_add_model + 1
+
+ # 1.5 a list gensim models
+ num_topics_before_add_model = len(cumulative_elda.ttda)
+ num_models_before_add_model = cumulative_elda.num_models
+ cumulative_elda.add_model([model, model])
+ assert len(cumulative_elda.ttda) == num_topics_before_add_model + 2 * len(model.get_topics())
+ assert cumulative_elda.num_models == num_models_before_add_model + 2
+
+ self.assert_ttda_is_valid(cumulative_elda)
+
+ # 2. memory unfriendly
+ elda_mem_unfriendly = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary,
+ num_topics=num_new_topics, passes=1, num_models=num_new_models,
+ iterations=1, random_state=RANDOM_STATE, topic_model_class=LdaMulticore,
+ workers=3, ensemble_workers=2, memory_friendly_ttda=False,
+ )
+
+ # 2.1 a single ensemble
+ num_topics_before_add_model = len(elda_mem_unfriendly.tms)
+ num_models_before_add_model = elda_mem_unfriendly.num_models
+ elda_mem_unfriendly.add_model(base_elda_mem_unfriendly)
+ assert len(elda_mem_unfriendly.tms) == num_topics_before_add_model + NUM_MODELS
+ assert elda_mem_unfriendly.num_models == num_models_before_add_model + NUM_MODELS
+
+ # 2.2 a list of ensembles
+ num_topics_before_add_model = len(elda_mem_unfriendly.tms)
+ num_models_before_add_model = elda_mem_unfriendly.num_models
+ elda_mem_unfriendly.add_model([base_elda_mem_unfriendly, base_elda_mem_unfriendly])
+ assert len(elda_mem_unfriendly.tms) == num_topics_before_add_model + 2 * NUM_MODELS
+ assert elda_mem_unfriendly.num_models == num_models_before_add_model + 2 * NUM_MODELS
+
+ # 2.3 a single gensim model
+ num_topics_before_add_model = len(elda_mem_unfriendly.tms)
+ num_models_before_add_model = elda_mem_unfriendly.num_models
+ elda_mem_unfriendly.add_model(base_elda_mem_unfriendly.tms[0])
+ assert len(elda_mem_unfriendly.tms) == num_topics_before_add_model + 1
+ assert elda_mem_unfriendly.num_models == num_models_before_add_model + 1
+
+ # 2.4 a list of gensim models
+ num_topics_before_add_model = len(elda_mem_unfriendly.tms)
+ num_models_before_add_model = elda_mem_unfriendly.num_models
+ elda_mem_unfriendly.add_model(base_elda_mem_unfriendly.tms)
+ assert len(elda_mem_unfriendly.tms) == num_topics_before_add_model + NUM_MODELS
+ assert elda_mem_unfriendly.num_models == num_models_before_add_model + NUM_MODELS
+
+ # 2.5 topic term distributions should throw errors, because the
+ # actual models are needed for the memory unfriendly ensemble
+ num_topics_before_add_model = len(elda_mem_unfriendly.tms)
+ num_models_before_add_model = elda_mem_unfriendly.num_models
+ with pytest.raises(ValueError):
+ elda_mem_unfriendly.add_model(base_elda_mem_unfriendly.tms[0].get_topics())
+ # remains unchanged
+ assert len(elda_mem_unfriendly.tms) == num_topics_before_add_model
+ assert elda_mem_unfriendly.num_models == num_models_before_add_model
+
+ assert elda_mem_unfriendly.num_models == len(elda_mem_unfriendly.tms)
+ self.assert_ttda_is_valid(elda_mem_unfriendly)
+
+ def test_add_and_recluster(self):
+ # See if after adding a model, the model still makes sense
+ num_new_models = 3
+ num_new_topics = 3
+ random_state = 1
+
+ # train models two sets of models (mem friendly and unfriendly)
+ elda_1 = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary,
+ num_topics=num_new_topics, passes=10, num_models=num_new_models,
+ iterations=30, random_state=random_state, topic_model_class='lda',
+ distance_workers=4,
+ )
+ elda_mem_unfriendly_1 = EnsembleLda(
+ corpus=common_corpus, id2word=common_dictionary,
+ num_topics=num_new_topics, passes=10, num_models=num_new_models,
+ iterations=30, random_state=random_state, topic_model_class=LdaModel,
+ distance_workers=4, memory_friendly_ttda=False,
+ )
+ elda_2 = self.get_elda()
+ elda_mem_unfriendly_2 = self.get_elda_mem_unfriendly()
+ assert elda_1.random_state != elda_2.random_state
+ assert elda_mem_unfriendly_1.random_state != elda_mem_unfriendly_2.random_state
+
+ # both should be similar
+ np.testing.assert_allclose(elda_1.ttda, elda_mem_unfriendly_1.ttda, rtol=RTOL)
+ np.testing.assert_allclose(elda_1.get_topics(), elda_mem_unfriendly_1.get_topics(), rtol=RTOL)
+ # and every next step applied to both should result in similar results
+
+ # 1. adding to ttda and tms
+ elda_1.add_model(elda_2)
+ elda_mem_unfriendly_1.add_model(elda_mem_unfriendly_2)
+
+ np.testing.assert_allclose(elda_1.ttda, elda_mem_unfriendly_1.ttda, rtol=RTOL)
+ assert len(elda_1.ttda) == len(elda_2.ttda) + num_new_models * num_new_topics
+ assert len(elda_mem_unfriendly_1.ttda) == len(elda_mem_unfriendly_2.ttda) + num_new_models * num_new_topics
+ assert len(elda_mem_unfriendly_1.tms) == NUM_MODELS + num_new_models
+ self.assert_ttda_is_valid(elda_1)
+ self.assert_ttda_is_valid(elda_mem_unfriendly_1)
+
+ # 2. distance matrix
+ elda_1._generate_asymmetric_distance_matrix()
+ elda_mem_unfriendly_1._generate_asymmetric_distance_matrix()
+ np.testing.assert_allclose(
+ elda_1.asymmetric_distance_matrix,
+ elda_mem_unfriendly_1.asymmetric_distance_matrix,
+ )
+
+ # 3. CBDBSCAN results
+ elda_1._generate_topic_clusters()
+ elda_mem_unfriendly_1._generate_topic_clusters()
+ clustering_results = elda_1.cluster_model.results
+ mem_unfriendly_clustering_results = elda_mem_unfriendly_1.cluster_model.results
+ self.assert_clustering_results_equal(clustering_results, mem_unfriendly_clustering_results)
+
+ # 4. finally, the stable topics
+ elda_1._generate_stable_topics()
+ elda_mem_unfriendly_1._generate_stable_topics()
+ np.testing.assert_allclose(
+ elda_1.get_topics(),
+ elda_mem_unfriendly_1.get_topics(),
+ )
+
+ elda_1.generate_gensim_representation()
+ elda_mem_unfriendly_1.generate_gensim_representation()
+
+ # same random state, hence topics should be still similar
+ np.testing.assert_allclose(elda_1.get_topics(), elda_mem_unfriendly_1.get_topics(), rtol=RTOL)
+
+ def test_inference(self, elda=None):
+ if elda is None:
+ elda = self.get_elda()
+
+ # get the most likely token id from topic 0
+ max_id = np.argmax(elda.get_topics()[0, :])
+ assert elda.classic_model_representation.iterations > 0
+ # topic 0 should be dominant in the inference.
+ # the difference between the probabilities should be significant and larger than 0.3
+ inferred = elda[[(max_id, 1)]]
+ assert inferred[0][1] - 0.3 > inferred[1][1]
+
+
+if __name__ == '__main__':
+ logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.WARN)
+ unittest.main()
diff --git a/gensim/test/test_fasttext.py b/gensim/test/test_fasttext.py
index a91a9000a7..efc6a3ca8e 100644
--- a/gensim/test/test_fasttext.py
+++ b/gensim/test/test_fasttext.py
@@ -13,6 +13,7 @@
import sys
import numpy as np
+import pytest
from gensim import utils
from gensim.models.word2vec import LineSentence
@@ -397,144 +398,7 @@ def test_wm_distance(self):
dist = self.test_model.wv.wmdistance(doc, oov_doc)
self.assertNotEqual(float('inf'), dist)
- def test_cbow_hs_training(self):
-
- model_gensim = FT_gensim(
- vector_size=48, sg=0, cbow_mean=1, alpha=0.05, window=5, hs=1, negative=0,
- min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
- sorted_vocab=1, workers=1, min_alpha=0.0, bucket=BUCKET)
-
- lee_data = LineSentence(datapath('lee_background.cor'))
- model_gensim.build_vocab(lee_data)
- orig0 = np.copy(model_gensim.wv.vectors[0])
- model_gensim.train(lee_data, total_examples=model_gensim.corpus_count, epochs=model_gensim.epochs)
- self.assertFalse((orig0 == model_gensim.wv.vectors[0]).all()) # vector should vary after training
-
- sims_gensim = model_gensim.wv.most_similar('night', topn=10)
- sims_gensim_words = [word for (word, distance) in sims_gensim] # get similar words
- expected_sims_words = [
- u'night,',
- u'night.',
- u'rights',
- u'kilometres',
- u'in',
- u'eight',
- u'according',
- u'flights',
- u'during',
- u'comes']
- overlaps = set(sims_gensim_words).intersection(expected_sims_words)
- overlap_count = len(overlaps)
- self.assertGreaterEqual(
- overlap_count, 2,
- "only %i overlap in expected %s & actual %s" % (overlap_count, expected_sims_words, sims_gensim_words))
-
- def test_cbow_hs_training_fromfile(self):
- with temporary_file('gensim_fasttext.tst') as corpus_file:
- model_gensim = FT_gensim(
- vector_size=48, sg=0, cbow_mean=1, alpha=0.05, window=5, hs=1, negative=0,
- min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
- sorted_vocab=1, workers=1, min_alpha=0.0, bucket=BUCKET * 4)
-
- lee_data = LineSentence(datapath('lee_background.cor'))
- utils.save_as_line_sentence(lee_data, corpus_file)
-
- model_gensim.build_vocab(corpus_file=corpus_file)
- orig0 = np.copy(model_gensim.wv.vectors[0])
- model_gensim.train(corpus_file=corpus_file,
- total_words=model_gensim.corpus_total_words,
- epochs=model_gensim.epochs)
- self.assertFalse((orig0 == model_gensim.wv.vectors[0]).all()) # vector should vary after training
-
- sims_gensim = model_gensim.wv.most_similar('night', topn=10)
- sims_gensim_words = [word for (word, distance) in sims_gensim] # get similar words
- expected_sims_words = [
- u'night,',
- u'night.',
- u'rights',
- u'kilometres',
- u'in',
- u'eight',
- u'according',
- u'flights',
- u'during',
- u'comes']
- overlaps = set(sims_gensim_words).intersection(expected_sims_words)
- overlap_count = len(overlaps)
- self.assertGreaterEqual(
- overlap_count, 2,
- "only %i overlap in expected %s & actual %s" % (overlap_count, expected_sims_words, sims_gensim_words))
-
- def test_sg_hs_training(self):
-
- model_gensim = FT_gensim(
- vector_size=48, sg=1, cbow_mean=1, alpha=0.025, window=5, hs=1, negative=0,
- min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
- sorted_vocab=1, workers=1, min_alpha=0.0, bucket=BUCKET)
-
- lee_data = LineSentence(datapath('lee_background.cor'))
- model_gensim.build_vocab(lee_data)
- orig0 = np.copy(model_gensim.wv.vectors[0])
- model_gensim.train(lee_data, total_examples=model_gensim.corpus_count, epochs=model_gensim.epochs)
- self.assertFalse((orig0 == model_gensim.wv.vectors[0]).all()) # vector should vary after training
-
- sims_gensim = model_gensim.wv.most_similar('night', topn=10)
- sims_gensim_words = [word for (word, distance) in sims_gensim] # get similar words
- expected_sims_words = [
- u'night,',
- u'night.',
- u'eight',
- u'nine',
- u'overnight',
- u'crew',
- u'overnight.',
- u'manslaughter',
- u'north',
- u'flight']
- overlaps = set(sims_gensim_words).intersection(expected_sims_words)
- overlap_count = len(overlaps)
- self.assertGreaterEqual(
- overlap_count, 2,
- "only %i overlap in expected %s & actual %s" % (overlap_count, expected_sims_words, sims_gensim_words))
-
- def test_sg_hs_training_fromfile(self):
- with temporary_file('gensim_fasttext.tst') as corpus_file:
- model_gensim = FT_gensim(
- vector_size=48, sg=1, cbow_mean=1, alpha=0.025, window=5, hs=1, negative=0,
- min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
- sorted_vocab=1, workers=1, min_alpha=0.0, bucket=BUCKET)
-
- lee_data = LineSentence(datapath('lee_background.cor'))
- utils.save_as_line_sentence(lee_data, corpus_file)
-
- model_gensim.build_vocab(corpus_file=corpus_file)
- orig0 = np.copy(model_gensim.wv.vectors[0])
- model_gensim.train(corpus_file=corpus_file,
- total_words=model_gensim.corpus_total_words,
- epochs=model_gensim.epochs)
- self.assertFalse((orig0 == model_gensim.wv.vectors[0]).all()) # vector should vary after training
-
- sims_gensim = model_gensim.wv.most_similar('night', topn=10)
- sims_gensim_words = [word for (word, distance) in sims_gensim] # get similar words
- expected_sims_words = [
- u'night,',
- u'night.',
- u'eight',
- u'nine',
- u'overnight',
- u'crew',
- u'overnight.',
- u'manslaughter',
- u'north',
- u'flight']
- overlaps = set(sims_gensim_words).intersection(expected_sims_words)
- overlap_count = len(overlaps)
- self.assertGreaterEqual(
- overlap_count, 2,
- "only %i overlap in expected %s & actual %s" % (overlap_count, expected_sims_words, sims_gensim_words))
-
def test_cbow_neg_training(self):
-
model_gensim = FT_gensim(
vector_size=48, sg=0, cbow_mean=1, alpha=0.05, window=5, hs=0, negative=5,
min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
@@ -850,6 +714,194 @@ def obsolete_testLoadOldModel(self):
self.assertEqual(model.wv.vectors_vocab.shape, (12, 100))
self.assertEqual(model.wv.vectors_ngrams.shape, (2000000, 100))
+ def test_vectors_for_all_with_inference(self):
+ """Test vectors_for_all can infer new vectors."""
+ words = [
+ 'responding',
+ 'approached',
+ 'chairman',
+ 'an out-of-vocabulary word',
+ 'another out-of-vocabulary word',
+ ]
+ vectors_for_all = self.test_model.wv.vectors_for_all(words)
+
+ expected = 5
+ predicted = len(vectors_for_all)
+ assert expected == predicted
+
+ expected = self.test_model.wv['responding']
+ predicted = vectors_for_all['responding']
+ assert np.allclose(expected, predicted)
+
+ smaller_distance = np.linalg.norm(
+ vectors_for_all['an out-of-vocabulary word']
+ - vectors_for_all['another out-of-vocabulary word']
+ )
+ greater_distance = np.linalg.norm(
+ vectors_for_all['an out-of-vocabulary word']
+ - vectors_for_all['responding']
+ )
+ assert greater_distance > smaller_distance
+
+ def test_vectors_for_all_without_inference(self):
+ """Test vectors_for_all does not infer new vectors when prohibited."""
+ words = [
+ 'responding',
+ 'approached',
+ 'chairman',
+ 'an out-of-vocabulary word',
+ 'another out-of-vocabulary word',
+ ]
+ vectors_for_all = self.test_model.wv.vectors_for_all(words, allow_inference=False)
+
+ expected = 3
+ predicted = len(vectors_for_all)
+ assert expected == predicted
+
+ expected = self.test_model.wv['responding']
+ predicted = vectors_for_all['responding']
+ assert np.allclose(expected, predicted)
+
+
+@pytest.mark.parametrize('shrink_windows', [True, False])
+def test_cbow_hs_training(shrink_windows):
+ model_gensim = FT_gensim(
+ vector_size=48, sg=0, cbow_mean=1, alpha=0.05, window=5, hs=1, negative=0,
+ min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
+ sorted_vocab=1, workers=1, min_alpha=0.0, bucket=BUCKET, shrink_windows=shrink_windows)
+
+ lee_data = LineSentence(datapath('lee_background.cor'))
+ model_gensim.build_vocab(lee_data)
+ orig0 = np.copy(model_gensim.wv.vectors[0])
+ model_gensim.train(lee_data, total_examples=model_gensim.corpus_count, epochs=model_gensim.epochs)
+ assert not (orig0 == model_gensim.wv.vectors[0]).all() # vector should vary after training
+
+ sims_gensim = model_gensim.wv.most_similar('night', topn=10)
+ sims_gensim_words = [word for (word, distance) in sims_gensim] # get similar words
+ expected_sims_words = [
+ u'night,',
+ u'night.',
+ u'rights',
+ u'kilometres',
+ u'in',
+ u'eight',
+ u'according',
+ u'flights',
+ u'during',
+ u'comes']
+ overlaps = set(sims_gensim_words).intersection(expected_sims_words)
+ overlap_count = len(overlaps)
+
+ message = f"only {overlap_count} overlap in expected {expected_sims_words} & actual {sims_gensim_words}"
+ assert overlap_count >= 2, message
+
+
+@pytest.mark.parametrize('shrink_windows', [True, False])
+def test_cbow_hs_training_fromfile(shrink_windows):
+ with temporary_file('gensim_fasttext.tst') as corpus_file:
+ model_gensim = FT_gensim(
+ vector_size=48, sg=0, cbow_mean=1, alpha=0.05, window=5, hs=1, negative=0,
+ min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
+ sorted_vocab=1, workers=1, min_alpha=0.0, bucket=BUCKET * 4, shrink_windows=shrink_windows)
+
+ lee_data = LineSentence(datapath('lee_background.cor'))
+ utils.save_as_line_sentence(lee_data, corpus_file)
+
+ model_gensim.build_vocab(corpus_file=corpus_file)
+ orig0 = np.copy(model_gensim.wv.vectors[0])
+ model_gensim.train(corpus_file=corpus_file,
+ total_words=model_gensim.corpus_total_words,
+ epochs=model_gensim.epochs)
+ assert not (orig0 == model_gensim.wv.vectors[0]).all() # vector should vary after training
+
+ sims_gensim = model_gensim.wv.most_similar('night', topn=10)
+ sims_gensim_words = [word for (word, distance) in sims_gensim] # get similar words
+ expected_sims_words = [
+ u'night,',
+ u'night.',
+ u'rights',
+ u'kilometres',
+ u'in',
+ u'eight',
+ u'according',
+ u'flights',
+ u'during',
+ u'comes']
+ overlaps = set(sims_gensim_words).intersection(expected_sims_words)
+ overlap_count = len(overlaps)
+ message = f"only {overlap_count} overlap in expected {expected_sims_words} & actual {sims_gensim_words}"
+ assert overlap_count >= 2, message
+
+
+@pytest.mark.parametrize('shrink_windows', [True, False])
+def test_sg_hs_training(shrink_windows):
+ model_gensim = FT_gensim(
+ vector_size=48, sg=1, cbow_mean=1, alpha=0.025, window=5, hs=1, negative=0,
+ min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
+ sorted_vocab=1, workers=1, min_alpha=0.0, bucket=BUCKET, shrink_windows=shrink_windows)
+
+ lee_data = LineSentence(datapath('lee_background.cor'))
+ model_gensim.build_vocab(lee_data)
+ orig0 = np.copy(model_gensim.wv.vectors[0])
+ model_gensim.train(lee_data, total_examples=model_gensim.corpus_count, epochs=model_gensim.epochs)
+ assert not (orig0 == model_gensim.wv.vectors[0]).all() # vector should vary after training
+
+ sims_gensim = model_gensim.wv.most_similar('night', topn=10)
+ sims_gensim_words = [word for (word, distance) in sims_gensim] # get similar words
+ expected_sims_words = [
+ u'night,',
+ u'night.',
+ u'eight',
+ u'nine',
+ u'overnight',
+ u'crew',
+ u'overnight.',
+ u'manslaughter',
+ u'north',
+ u'flight']
+ overlaps = set(sims_gensim_words).intersection(expected_sims_words)
+ overlap_count = len(overlaps)
+
+ message = f"only {overlap_count} overlap in expected {expected_sims_words} & actual {sims_gensim_words}"
+ assert overlap_count >= 2, message
+
+
+@pytest.mark.parametrize('shrink_windows', [True, False])
+def test_sg_hs_training_fromfile(shrink_windows):
+ with temporary_file('gensim_fasttext.tst') as corpus_file:
+ model_gensim = FT_gensim(
+ vector_size=48, sg=1, cbow_mean=1, alpha=0.025, window=5, hs=1, negative=0,
+ min_count=5, epochs=10, batch_words=1000, word_ngrams=1, sample=1e-3, min_n=3, max_n=6,
+ sorted_vocab=1, workers=1, min_alpha=0.0, bucket=BUCKET, shrink_windows=shrink_windows)
+
+ lee_data = LineSentence(datapath('lee_background.cor'))
+ utils.save_as_line_sentence(lee_data, corpus_file)
+
+ model_gensim.build_vocab(corpus_file=corpus_file)
+ orig0 = np.copy(model_gensim.wv.vectors[0])
+ model_gensim.train(corpus_file=corpus_file,
+ total_words=model_gensim.corpus_total_words,
+ epochs=model_gensim.epochs)
+ assert not (orig0 == model_gensim.wv.vectors[0]).all() # vector should vary after training
+
+ sims_gensim = model_gensim.wv.most_similar('night', topn=10)
+ sims_gensim_words = [word for (word, distance) in sims_gensim] # get similar words
+ expected_sims_words = [
+ u'night,',
+ u'night.',
+ u'eight',
+ u'nine',
+ u'overnight',
+ u'crew',
+ u'overnight.',
+ u'manslaughter',
+ u'north',
+ u'flight']
+ overlaps = set(sims_gensim_words).intersection(expected_sims_words)
+ overlap_count = len(overlaps)
+ message = f"only {overlap_count} overlap in expected {expected_sims_words} & actual {sims_gensim_words}"
+ assert overlap_count >= 2, message
+
with open(datapath('toy-data.txt')) as fin:
TOY_SENTENCES = [fin.read().strip().split(' ')]
diff --git a/gensim/test/test_keyedvectors.py b/gensim/test/test_keyedvectors.py
index fd96f9f26f..d5eda547ea 100644
--- a/gensim/test/test_keyedvectors.py
+++ b/gensim/test/test_keyedvectors.py
@@ -9,6 +9,7 @@
Automated tests for checking the poincare module from the models package.
"""
+import functools
import logging
import unittest
@@ -39,6 +40,81 @@ def test_most_similar(self):
predicted = [result[0] for result in self.vectors.most_similar('war', topn=5)]
self.assertEqual(expected, predicted)
+ def test_most_similar_vector(self):
+ """Can we pass vectors to most_similar directly?"""
+ positive = self.vectors.vectors[0:5]
+ most_similar = self.vectors.most_similar(positive=positive)
+ assert most_similar is not None
+
+ def test_most_similar_parameter_types(self):
+ """Are the positive/negative parameter types are getting interpreted correctly?"""
+ partial = functools.partial(self.vectors.most_similar, topn=5)
+
+ position = partial('war', 'peace')
+ position_list = partial(['war'], ['peace'])
+ keyword = partial(positive='war', negative='peace')
+ keyword_list = partial(positive=['war'], negative=['peace'])
+
+ #
+ # The above calls should all yield identical results.
+ #
+ assert position == position_list
+ assert position == keyword
+ assert position == keyword_list
+
+ def test_most_similar_cosmul_parameter_types(self):
+ """Are the positive/negative parameter types are getting interpreted correctly?"""
+ partial = functools.partial(self.vectors.most_similar_cosmul, topn=5)
+
+ position = partial('war', 'peace')
+ position_list = partial(['war'], ['peace'])
+ keyword = partial(positive='war', negative='peace')
+ keyword_list = partial(positive=['war'], negative=['peace'])
+
+ #
+ # The above calls should all yield identical results.
+ #
+ assert position == position_list
+ assert position == keyword
+ assert position == keyword_list
+
+ def test_vectors_for_all_list(self):
+ """Test vectors_for_all returns expected results with a list of keys."""
+ words = [
+ 'conflict',
+ 'administration',
+ 'terrorism',
+ 'an out-of-vocabulary word',
+ 'another out-of-vocabulary word',
+ ]
+ vectors_for_all = self.vectors.vectors_for_all(words)
+
+ expected = 3
+ predicted = len(vectors_for_all)
+ assert expected == predicted
+
+ expected = self.vectors['conflict']
+ predicted = vectors_for_all['conflict']
+ assert np.allclose(expected, predicted)
+
+ def test_vectors_for_all_with_copy_vecattrs(self):
+ """Test vectors_for_all returns can copy vector attributes."""
+ words = ['conflict']
+ vectors_for_all = self.vectors.vectors_for_all(words, copy_vecattrs=True)
+
+ expected = self.vectors.get_vecattr('conflict', 'count')
+ predicted = vectors_for_all.get_vecattr('conflict', 'count')
+ assert expected == predicted
+
+ def test_vectors_for_all_without_copy_vecattrs(self):
+ """Test vectors_for_all returns can copy vector attributes."""
+ words = ['conflict']
+ vectors_for_all = self.vectors.vectors_for_all(words, copy_vecattrs=False)
+
+ not_expected = self.vectors.get_vecattr('conflict', 'count')
+ predicted = vectors_for_all.get_vecattr('conflict', 'count')
+ assert not_expected != predicted
+
def test_most_similar_topn(self):
"""Test most_similar returns correct results when `topn` is specified."""
self.assertEqual(len(self.vectors.most_similar('war', topn=5)), 5)
diff --git a/gensim/test/test_parsing.py b/gensim/test/test_parsing.py
index d61671bd85..f96ad332d2 100644
--- a/gensim/test/test_parsing.py
+++ b/gensim/test/test_parsing.py
@@ -7,11 +7,24 @@
import logging
import unittest
+
+import mock
import numpy as np
-from gensim.parsing.preprocessing import \
- remove_stopwords, strip_punctuation2, strip_tags, strip_short, strip_numeric, strip_non_alphanum, \
- strip_multiple_whitespaces, split_alphanum, stem_text
+from gensim.parsing.preprocessing import (
+ remove_short_tokens,
+ remove_stopword_tokens,
+ remove_stopwords,
+ stem_text,
+ split_alphanum,
+ split_on_space,
+ strip_multiple_whitespaces,
+ strip_non_alphanum,
+ strip_numeric,
+ strip_punctuation,
+ strip_short,
+ strip_tags,
+)
# several documents
doc1 = """C'est un trou de verdure où chante une rivière,
@@ -38,7 +51,7 @@
for many searching purposes, a little fuzziness would help. """
-dataset = [strip_punctuation2(x.lower()) for x in [doc1, doc2, doc3, doc4]]
+dataset = [strip_punctuation(x.lower()) for x in [doc1, doc2, doc3, doc4]]
# doc1 and doc2 have class 0, doc3 and doc4 avec class 1
classes = np.array([[1, 0], [1, 0], [0, 1], [0, 1]])
@@ -67,6 +80,26 @@ def test_split_alphanum(self):
def test_strip_stopwords(self):
self.assertEqual(remove_stopwords("the world is square"), "world square")
+ # confirm redifining the global `STOPWORDS` working
+ with mock.patch('gensim.parsing.preprocessing.STOPWORDS', frozenset(["the"])):
+ self.assertEqual(remove_stopwords("the world is square"), "world is square")
+
+ def test_strip_stopword_tokens(self):
+ self.assertEqual(remove_stopword_tokens(["the", "world", "is", "sphere"]), ["world", "sphere"])
+
+ # confirm redifining the global `STOPWORDS` working
+ with mock.patch('gensim.parsing.preprocessing.STOPWORDS', frozenset(["the"])):
+ self.assertEqual(
+ remove_stopword_tokens(["the", "world", "is", "sphere"]),
+ ["world", "is", "sphere"]
+ )
+
+ def test_strip_short_tokens(self):
+ self.assertEqual(remove_short_tokens(["salut", "les", "amis", "du", "59"], 3), ["salut", "les", "amis"])
+
+ def test_split_on_space(self):
+ self.assertEqual(split_on_space(" salut les amis du 59 "), ["salut", "les", "amis", "du", "59"])
+
def test_stem_text(self):
target = \
"while it is quit us to be abl to search a larg " + \
diff --git a/gensim/test/test_phrases.py b/gensim/test/test_phrases.py
index bbfbfaad40..e8d9567b20 100644
--- a/gensim/test/test_phrases.py
+++ b/gensim/test/test_phrases.py
@@ -305,32 +305,42 @@ def test_pruning(self):
class TestPhrasesPersistence(PhrasesData, unittest.TestCase):
-
def test_save_load_custom_scorer(self):
"""Test saving and loading a Phrases object with a custom scorer."""
+ bigram = Phrases(self.sentences, min_count=1, threshold=.001, scoring=dumb_scorer)
with temporary_file("test.pkl") as fpath:
- bigram = Phrases(self.sentences, min_count=1, threshold=.001, scoring=dumb_scorer)
bigram.save(fpath)
bigram_loaded = Phrases.load(fpath)
- test_sentences = [['graph', 'minors', 'survey', 'human', 'interface', 'system']]
- seen_scores = list(bigram_loaded.find_phrases(test_sentences).values())
- assert all(score == 1 for score in seen_scores)
- assert len(seen_scores) == 3 # 'graph minors' and 'survey human' and 'interface system'
+ test_sentences = [['graph', 'minors', 'survey', 'human', 'interface', 'system']]
+ seen_scores = list(bigram_loaded.find_phrases(test_sentences).values())
+
+ assert all(score == 1 for score in seen_scores)
+ assert len(seen_scores) == 3 # 'graph minors' and 'survey human' and 'interface system'
def test_save_load(self):
"""Test saving and loading a Phrases object."""
+ bigram = Phrases(self.sentences, min_count=1, threshold=1)
+ with temporary_file("test.pkl") as fpath:
+ bigram.save(fpath)
+ bigram_loaded = Phrases.load(fpath)
+
+ test_sentences = [['graph', 'minors', 'survey', 'human', 'interface', 'system']]
+ seen_scores = set(round(score, 3) for score in bigram_loaded.find_phrases(test_sentences).values())
+ assert seen_scores == set([
+ 5.167, # score for graph minors
+ 3.444 # score for human interface
+ ])
+
+ def test_save_load_with_connector_words(self):
+ """Test saving and loading a Phrases object."""
+ connector_words = frozenset({'of'})
+ bigram = Phrases(self.sentences, min_count=1, threshold=1, connector_words=connector_words)
with temporary_file("test.pkl") as fpath:
- bigram = Phrases(self.sentences, min_count=1, threshold=1)
bigram.save(fpath)
bigram_loaded = Phrases.load(fpath)
- test_sentences = [['graph', 'minors', 'survey', 'human', 'interface', 'system']]
- seen_scores = set(round(score, 3) for score in bigram_loaded.find_phrases(test_sentences).values())
- assert seen_scores == set([
- 5.167, # score for graph minors
- 3.444 # score for human interface
- ])
+ assert bigram_loaded.connector_words == connector_words
def test_save_load_string_scoring(self):
"""Test backwards compatibility with a previous version of Phrases with custom scoring."""
@@ -385,6 +395,15 @@ def test_save_load(self):
bigram_loaded[['graph', 'minors', 'survey', 'human', 'interface', 'system']],
['graph_minors', 'survey', 'human_interface', 'system'])
+ def test_save_load_with_connector_words(self):
+ """Test saving and loading a FrozenPhrases object."""
+ connector_words = frozenset({'of'})
+ with temporary_file("test.pkl") as fpath:
+ bigram = FrozenPhrases(Phrases(self.sentences, min_count=1, threshold=1, connector_words=connector_words))
+ bigram.save(fpath)
+ bigram_loaded = FrozenPhrases.load(fpath)
+ self.assertEqual(bigram_loaded.connector_words, connector_words)
+
def test_save_load_string_scoring(self):
"""Test saving and loading a FrozenPhrases object with a string scoring parameter.
This should ensure backwards compatibility with the previous version of FrozenPhrases"""
diff --git a/gensim/test/test_similarities.py b/gensim/test/test_similarities.py
index 4929082c2a..35ddd03397 100644
--- a/gensim/test/test_similarities.py
+++ b/gensim/test/test_similarities.py
@@ -33,7 +33,7 @@
from gensim.similarities import SparseTermSimilarityMatrix
from gensim.similarities import LevenshteinSimilarityIndex
from gensim.similarities.docsim import _nlargest
-from gensim.similarities.levenshtein import levdist, levsim
+from gensim.similarities.fastss import editdist
try:
from pyemd import emd # noqa:F401
@@ -1544,123 +1544,48 @@ def test_inner_product_corpus_corpus_true_true(self):
self.assertTrue(numpy.allclose(expected_result, result.todense()))
-class TestLevenshteinDistance(unittest.TestCase):
- @unittest.skipIf(LevenshteinSimilarityIndex is None, "gensim.similarities.levenshtein is disabled")
- def test_max_distance(self):
- t1 = "holiday"
- t2 = "day"
- max_distance = max(len(t1), len(t2))
-
- self.assertEqual(4, levdist(t1, t2))
- self.assertEqual(4, levdist(t1, t2, 4))
- self.assertEqual(max_distance, levdist(t1, t2, 2))
- self.assertEqual(max_distance, levdist(t1, t2, -2))
-
-
-class TestLevenshteinSimilarity(unittest.TestCase):
- @unittest.skipIf(LevenshteinSimilarityIndex is None, "gensim.similarities.levenshtein is disabled")
- def test_empty_strings(self):
- t1 = ""
- t2 = ""
-
- self.assertEqual(1.0, levsim(t1, t2))
-
- @unittest.skipIf(LevenshteinSimilarityIndex is None, "gensim.similarities.levenshtein is disabled")
- def test_negative_hyperparameters(self):
- t1 = "holiday"
- t2 = "day"
- alpha = 2.0
- beta = 2.0
-
- with self.assertRaises(AssertionError):
- levsim(t1, t2, -alpha, beta)
-
- with self.assertRaises(AssertionError):
- levsim(t1, t2, alpha, -beta)
-
- with self.assertRaises(AssertionError):
- levsim(t1, t2, -alpha, -beta)
-
- @unittest.skipIf(LevenshteinSimilarityIndex is None, "gensim.similarities.levenshtein is disabled")
- def test_min_similarity(self):
- t1 = "holiday"
- t2 = "day"
- alpha = 2.0
- beta = 2.0
- similarity = alpha * (1 - 4.0 / 7)**beta
- assert similarity > 0.1 and similarity < 0.5
-
- self.assertAlmostEqual(similarity, levsim(t1, t2, alpha, beta))
-
- self.assertAlmostEqual(similarity, levsim(t1, t2, alpha, beta, -2))
- self.assertAlmostEqual(similarity, levsim(t1, t2, alpha, beta, -2.0))
-
- self.assertAlmostEqual(similarity, levsim(t1, t2, alpha, beta, 0))
- self.assertAlmostEqual(similarity, levsim(t1, t2, alpha, beta, 0.0))
-
- self.assertEqual(similarity, levsim(t1, t2, alpha, beta, 0.1))
- self.assertEqual(0.0, levsim(t1, t2, alpha, beta, 0.5))
- self.assertEqual(0.0, levsim(t1, t2, alpha, beta, 1.0))
-
- self.assertEqual(0.0, levsim(t1, t2, alpha, beta, 2))
- self.assertEqual(0.0, levsim(t1, t2, alpha, beta, 2.0))
-
-
class TestLevenshteinSimilarityIndex(unittest.TestCase):
def setUp(self):
self.documents = [[u"government", u"denied", u"holiday"], [u"holiday", u"slowing", u"hollingworth"]]
self.dictionary = Dictionary(self.documents)
+ max_distance = max(len(term) for term in self.dictionary.values())
+ self.index = LevenshteinSimilarityIndex(self.dictionary, max_distance=max_distance)
- @unittest.skipIf(LevenshteinSimilarityIndex is None, "gensim.similarities.levenshtein is disabled")
- def test_most_similar(self):
+ def test_most_similar_topn(self):
"""Test most_similar returns expected results."""
- index = LevenshteinSimilarityIndex(self.dictionary)
- results = list(index.most_similar(u"holiday", topn=1))
- self.assertLess(0, len(results))
- self.assertGreaterEqual(1, len(results))
- results = list(index.most_similar(u"holiday", topn=4))
- self.assertLess(1, len(results))
- self.assertGreaterEqual(4, len(results))
+ results = list(self.index.most_similar(u"holiday", topn=0))
+ self.assertEqual(0, len(results))
- # check the order of the results
- results = index.most_similar(u"holiday", topn=4)
- terms, _ = tuple(zip(*results))
- self.assertEqual((u"hollingworth", u"slowing", u"denied", u"government"), terms)
+ results = list(self.index.most_similar(u"holiday", topn=1))
+ self.assertEqual(1, len(results))
- # check that the term itself is not returned
- index = LevenshteinSimilarityIndex(self.dictionary)
- terms = [term for term, similarity in index.most_similar(u"holiday", topn=len(self.dictionary))]
- self.assertFalse(u"holiday" in terms)
+ results = list(self.index.most_similar(u"holiday", topn=4))
+ self.assertEqual(4, len(results))
- # check that the threshold works as expected
- index = LevenshteinSimilarityIndex(self.dictionary, threshold=0.0)
- results = list(index.most_similar(u"holiday", topn=10))
- self.assertLess(0, len(results))
- self.assertGreaterEqual(10, len(results))
+ results = list(self.index.most_similar(u"holiday", topn=len(self.dictionary)))
+ self.assertEqual(len(self.dictionary) - 1, len(results))
+ self.assertNotIn(u"holiday", results)
- index = LevenshteinSimilarityIndex(self.dictionary, threshold=1.0)
- results = list(index.most_similar(u"holiday", topn=10))
- self.assertEqual(0, len(results))
+ def test_most_similar_result_order(self):
+ results = self.index.most_similar(u"holiday", topn=4)
+ terms, _ = zip(*results)
+ expected_terms = (u"hollingworth", u"denied", u"slowing", u"government")
+ self.assertEqual(expected_terms, terms)
- # check that the alpha works as expected
+ def test_most_similar_alpha(self):
index = LevenshteinSimilarityIndex(self.dictionary, alpha=1.0)
first_similarities = numpy.array([similarity for term, similarity in index.most_similar(u"holiday", topn=10)])
index = LevenshteinSimilarityIndex(self.dictionary, alpha=2.0)
second_similarities = numpy.array([similarity for term, similarity in index.most_similar(u"holiday", topn=10)])
self.assertTrue(numpy.allclose(2.0 * first_similarities, second_similarities))
- # check that the beta works as expected
+ def test_most_similar_beta(self):
index = LevenshteinSimilarityIndex(self.dictionary, alpha=1.0, beta=1.0)
first_similarities = numpy.array([similarity for term, similarity in index.most_similar(u"holiday", topn=10)])
index = LevenshteinSimilarityIndex(self.dictionary, alpha=1.0, beta=2.0)
second_similarities = numpy.array([similarity for term, similarity in index.most_similar(u"holiday", topn=10)])
self.assertTrue(numpy.allclose(first_similarities ** 2.0, second_similarities))
- # check proper integration with SparseTermSimilarityMatrix
- index = LevenshteinSimilarityIndex(self.dictionary, alpha=1.0, beta=1.0)
- similarity_matrix = SparseTermSimilarityMatrix(index, DICTIONARY)
- self.assertTrue(scipy.sparse.issparse(similarity_matrix.matrix))
-
class TestWordEmbeddingSimilarityIndex(unittest.TestCase):
def setUp(self):
@@ -1707,6 +1632,32 @@ def test_most_similar(self):
self.assertTrue(numpy.allclose(first_similarities**2.0, second_similarities))
+class TestFastSS(unittest.TestCase):
+ def test_editdist_same_unicode_kind_latin1(self):
+ """Test editdist returns the expected result with two Latin-1 strings."""
+ expected = 2
+ actual = editdist('Zizka', 'siska')
+ assert expected == actual
+
+ def test_editdist_same_unicode_kind_ucs2(self):
+ """Test editdist returns the expected result with two UCS-2 strings."""
+ expected = 2
+ actual = editdist('Žižka', 'šiška')
+ assert expected == actual
+
+ def test_editdist_same_unicode_kind_ucs4(self):
+ """Test editdist returns the expected result with two UCS-4 strings."""
+ expected = 2
+ actual = editdist('Žižka 😀', 'šiška 😀')
+ assert expected == actual
+
+ def test_editdist_different_unicode_kinds(self):
+ """Test editdist returns the expected result with strings of different Unicode kinds."""
+ expected = 2
+ actual = editdist('Žižka', 'siska')
+ assert expected == actual
+
+
if __name__ == '__main__':
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.DEBUG)
unittest.main()
diff --git a/gensim/test/test_word2vec.py b/gensim/test/test_word2vec.py
index d46b2f3e37..43505b0be2 100644
--- a/gensim/test/test_word2vec.py
+++ b/gensim/test/test_word2vec.py
@@ -686,6 +686,38 @@ def test_cbow_neg_fromfile(self):
)
self.model_sanity(model, with_corpus_file=True)
+ def test_sg_fixedwindowsize(self):
+ """Test skipgram with fixed window size. Use NS."""
+ model = word2vec.Word2Vec(
+ sg=1, window=5, shrink_windows=False, hs=0,
+ negative=15, min_count=5, epochs=10, workers=2
+ )
+ self.model_sanity(model)
+
+ def test_sg_fixedwindowsize_fromfile(self):
+ """Test skipgram with fixed window size. Use HS and train from file."""
+ model = word2vec.Word2Vec(
+ sg=1, window=5, shrink_windows=False, hs=1,
+ negative=0, min_count=5, epochs=10, workers=2
+ )
+ self.model_sanity(model, with_corpus_file=True)
+
+ def test_cbow_fixedwindowsize(self, ranks=None):
+ """Test CBOW with fixed window size. Use HS."""
+ model = word2vec.Word2Vec(
+ sg=0, cbow_mean=1, alpha=0.1, window=5, shrink_windows=False,
+ hs=1, negative=0, min_count=5, epochs=10, workers=2
+ )
+ self.model_sanity(model, ranks=ranks)
+
+ def test_cbow_fixedwindowsize_fromfile(self):
+ """Test CBOW with fixed window size. Use NS and train from file."""
+ model = word2vec.Word2Vec(
+ sg=0, cbow_mean=1, alpha=0.1, window=5, shrink_windows=False,
+ hs=0, negative=15, min_count=5, epochs=10, workers=2
+ )
+ self.model_sanity(model, with_corpus_file=True)
+
def test_cosmul(self):
model = word2vec.Word2Vec(sentences, vector_size=2, min_count=1, hs=1, negative=0)
sims = model.wv.most_similar_cosmul('graph', topn=10)
@@ -843,6 +875,16 @@ def test_predict_output_word(self):
model_without_neg = word2vec.Word2Vec(sentences, min_count=1, negative=0)
self.assertRaises(RuntimeError, model_without_neg.predict_output_word, ['system', 'human'])
+ # passing indices instead of words in context
+ str_context = ['system', 'human']
+ mixed_context = [model_with_neg.wv.get_index(str_context[0]), str_context[1]]
+ idx_context = [model_with_neg.wv.get_index(w) for w in str_context]
+ prediction_from_str = model_with_neg.predict_output_word(str_context, topn=5)
+ prediction_from_mixed = model_with_neg.predict_output_word(mixed_context, topn=5)
+ prediction_from_idx = model_with_neg.predict_output_word(idx_context, topn=5)
+ self.assertEqual(prediction_from_str, prediction_from_mixed)
+ self.assertEqual(prediction_from_str, prediction_from_idx)
+
def test_load_old_model(self):
"""Test loading an old word2vec model of indeterminate version"""
diff --git a/gensim/utils.py b/gensim/utils.py
index 6a6d1f0c3a..30b6d85f58 100644
--- a/gensim/utils.py
+++ b/gensim/utils.py
@@ -50,11 +50,10 @@
RE_HTML_ENTITY = re.compile(r'&(#?)([xX]?)(\w{1,8});', re.UNICODE)
NO_CYTHON = RuntimeError(
- "Cython extensions are unavailable. "
- "Without them, this gensim functionality is disabled. "
- "If you've installed from a package, ask the package maintainer to include Cython extensions. "
- "If you're building gensim from source yourself, run `python setup.py build_ext --inplace` "
- "and retry. "
+ "Compiled extensions are unavailable. "
+ "If you've installed from a package, ask the package maintainer to include compiled extensions. "
+ "If you're building Gensim from source yourself, install Cython and a C compiler, and then "
+ "run `python setup.py build_ext --inplace` to retry. "
)
"""An exception that gensim code raises when Cython extensions are unavailable."""
@@ -835,6 +834,9 @@ def __getitem__(self, val):
return str(val)
raise ValueError("internal id out of bounds (%s, expected <0..%s))" % (val, self.num_terms))
+ def __contains__(self, val):
+ return 0 <= val < self.num_terms
+
def iteritems(self):
"""Iterate over all keys and values.
diff --git a/release/annotate_pr.py b/release/annotate_pr.py
new file mode 100644
index 0000000000..1f5d1c8bcb
--- /dev/null
+++ b/release/annotate_pr.py
@@ -0,0 +1,39 @@
+"""Helper script for including change log entries in an open PR.
+
+Automatically constructs the change log entry from the PR title.
+Copies the entry to the window manager clipboard.
+Opens the change log belonging to the specific PR in a browser window.
+All you have to do is paste and click "commit changes".
+"""
+import json
+import sys
+import webbrowser
+
+import smart_open
+
+
+def copy_to_clipboard(text):
+ try:
+ import pyperclip
+ except ImportError:
+ print('pyperclip is missing.', file=sys.stderr)
+ print('copy-paste the following text manually:', file=sys.stderr)
+ print('\t', text, file=sys.stderr)
+ else:
+ pyperclip.copy(text)
+
+
+prid = int(sys.argv[1])
+url = "https://api.github.com/repos/RaRe-Technologies/gensim/pulls/%d" % prid
+with smart_open.open(url) as fin:
+ prinfo = json.load(fin)
+
+prinfo['user_login'] = prinfo['user']['login']
+prinfo['user_html_url'] = prinfo['user']['html_url']
+text = '[#%(number)s](%(html_url)s): %(title)s, by [@%(user_login)s](%(user_html_url)s)' % prinfo
+copy_to_clipboard(text)
+
+prinfo['head_repo_html_url'] = prinfo['head']['repo']['html_url']
+prinfo['head_ref'] = prinfo['head']['ref']
+edit_url = '%(head_repo_html_url)s/edit/%(head_ref)s/CHANGELOG.md' % prinfo
+webbrowser.open(edit_url)
diff --git a/release/generate_changelog.py b/release/generate_changelog.py
index 72b03c7cda..97cc306f62 100644
--- a/release/generate_changelog.py
+++ b/release/generate_changelog.py
@@ -8,7 +8,7 @@
"""Generate changelog entries for all PRs merged since the last release."""
import re
import requests
-
+import time
#
# The releases get sorted in reverse chronological order, so the first release
@@ -37,6 +37,8 @@ def iter_merged_prs(since=release_timestamp):
yield pr
page += 1
+ # Avoid Github API throttling; see https://github.com/RaRe-Technologies/gensim/pull/3203#issuecomment-887453109
+ time.sleep(1)
def iter_closed_issues(since=release_timestamp):
@@ -58,6 +60,8 @@ def iter_closed_issues(since=release_timestamp):
if 'pull_request' not in issue and issue['closed_at'] > since:
yield issue
page += 1
+ # Avoid Github API throttling; see https://github.com/RaRe-Technologies/gensim/pull/3203#issuecomment-887453109
+ time.sleep(1)
fixed_issue_numbers = set()
diff --git a/release/hijack_pr.py b/release/hijack_pr.py
new file mode 100644
index 0000000000..f885579985
--- /dev/null
+++ b/release/hijack_pr.py
@@ -0,0 +1,33 @@
+"""Hijack a PR to add commits as a maintainer.
+
+This is a two-step process:
+
+ 1. Add a git remote that points to the contributor's repo
+ 2. Check out the actual contribution by reference
+
+As a maintainer, you can add changes by making new commits and pushing them
+back to the remote.
+"""
+import json
+import subprocess
+import sys
+
+import smart_open
+
+prid = int(sys.argv[1])
+url = f"https://api.github.com/repos/RaRe-Technologies/gensim/pulls/{prid}"
+with smart_open.open(url) as fin:
+ prinfo = json.load(fin)
+
+user = prinfo['head']['user']['login']
+ssh_url = prinfo['head']['repo']['ssh_url']
+
+remotes = subprocess.check_output(['git', 'remote']).strip().decode('utf-8').split('\n')
+if user not in remotes:
+ subprocess.check_call(['git', 'remote', 'add', user, ssh_url])
+
+subprocess.check_call(['git', 'fetch', user])
+
+ref = prinfo['head']['ref']
+subprocess.check_call(['git', 'checkout', f'{user}/{ref}'])
+subprocess.check_call(['git', 'switch', '-c', f'{ref}'])
diff --git a/setup.py b/setup.py
index 71bfb24cf7..779d6c1c26 100644
--- a/setup.py
+++ b/setup.py
@@ -27,6 +27,7 @@
'gensim.models.fasttext_inner': 'gensim/models/fasttext_inner.c',
'gensim._matutils': 'gensim/_matutils.c',
'gensim.models.nmf_pgd': 'gensim/models/nmf_pgd.c',
+ 'gensim.similarities.fastss': 'gensim/similarities/fastss.c',
}
cpp_extensions = {
@@ -155,13 +156,13 @@ def run(self):
gensim -- Topic Modelling in Python
==============================================
-|Travis|_
+|GA|_
|Wheel|_
-.. |Travis| image:: https://img.shields.io/travis/RaRe-Technologies/gensim/develop.svg
+.. |GA| image:: https://github.com/RaRe-Technologies/gensim/actions/workflows/tests.yml/badge.svg?branch=develop
.. |Wheel| image:: https://img.shields.io/pypi/wheel/gensim.svg
-.. _Travis: https://travis-ci.org/RaRe-Technologies/gensim
+.. _GA: https://github.com/RaRe-Technologies/gensim/actions
.. _Downloads: https://pypi.python.org/pypi/gensim
.. _License: http://radimrehurek.com/gensim/about.html
.. _Wheel: https://pypi.python.org/pypi/gensim
@@ -194,7 +195,7 @@ def run(self):
This software depends on `NumPy and Scipy `_, two Python packages for scientific computing.
You must have them installed prior to installing `gensim`.
-It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as `ATLAS `_ or `OpenBLAS `_ is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don't need to do anything special.
+It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as MKL, `ATLAS `_ or `OpenBLAS `_ is known to improve performance by as much as an order of magnitude. On OSX, NumPy picks up its vecLib BLAS automatically, so you don't need to do anything special.
Install the latest version of gensim::
@@ -205,9 +206,9 @@ def run(self):
python setup.py install
-For alternative modes of installation, see the `documentation `_.
+For alternative modes of installation, see the `documentation `_.
-Gensim is being `continuously tested `_ under Python 3.6, 3.7 and 3.8.
+Gensim is being `continuously tested `_ under all `supported Python versions `_.
Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.
@@ -271,14 +272,13 @@ def run(self):
'mock',
'cython',
'testfixtures',
- 'Morfessor==2.0.2a4',
+ 'Morfessor>=2.0.2a4',
]
if not (sys.platform.lower().startswith("win") and sys.version_info[:2] >= (3, 9)):
core_testenv.extend([
'pyemd',
'nmslib',
- 'python-Levenshtein >= 0.10.2',
])
# Add additional requirements for testing on Linux that are skipped on Windows.
@@ -315,13 +315,13 @@ def run(self):
'pandas',
]
-NUMPY_STR = 'numpy >= 1.11.3'
+NUMPY_STR = 'numpy >= 1.17.0'
#
# We pin the Cython version for reproducibility. We expect our extensions
# to build with any sane version of Cython, so we should update this pin
# periodically.
#
-CYTHON_STR = 'Cython==0.29.21'
+CYTHON_STR = 'Cython==0.29.23'
install_requires = [
NUMPY_STR,
@@ -338,7 +338,7 @@ def run(self):
setup(
name='gensim',
- version='4.0.1',
+ version='4.1.0',
description='Python framework for fast Vector Space Modelling',
long_description=LONG_DESCRIPTION,