Skip to content

Commit

Permalink
Merge branch 'release-0.10.2'
Browse files Browse the repository at this point in the history
  • Loading branch information
piskvorky committed Sep 18, 2014
2 parents 62c9237 + b53d632 commit afd70ff
Show file tree
Hide file tree
Showing 31 changed files with 17,205 additions and 318 deletions.
15 changes: 14 additions & 1 deletion CHANGELOG.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,21 @@
Changes
=======

0.10.1
0.10.2, 18/09/2014

* new parallelized, LdaMulticore implementation (Jan Zikes, #232)
* Dynamic Topic Models (DTM) wrapper (Arttii, #205)
* word2vec compiled from bundled C file at install time: no more pyximport (#233)
* standardize show_/print_topics in LdaMallet (Benjamin Bray, #223)
* add new word2vec multiplicative objective (3CosMul) of Levy & Goldberg (Gordon Mohr, #224)
* preserve case in MALLET wrapper (mcburton, #222)
* support for matrix-valued topic/word prior eta in LdaModel (mjwillson, #208)
* py3k fix to SparseCorpus (Andreas Madsen, #234)
* fix to LowCorpus when switching dictionaries (Christopher Corley, #237)

0.10.1, 22/07/2014

* word2vec: new n_similarity method for comparing two sets of words (François Scharffe, #219)
* make LDA print/show topics parameters consistent with LSI (Bram Vandekerckhove, #201)
* add option for efficient word2vec subsampling (Gordon Mohr, #206)
* fix length calculation for corpora on empty files (Christopher Corley, #209)
Expand Down
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ include COPYING
include COPYING.LESSER
include ez_setup.py
include gensim/models/voidptr.h
include gensim/models/word2vec_inner.c
include gensim/models/word2vec_inner.pyx
include gensim_addons/models/word2vec_inner.pyx
10 changes: 5 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ Features
* easy to plug in your own input corpus/datastream (trivial streaming API)
* easy to extend with other Vector Space algorithms (trivial transformation API)

* Efficient implementations of popular algorithms, such as online **Latent Semantic Analysis (LSA/LSI)**,
* Efficient multicore implementations of popular algorithms, such as online **Latent Semantic Analysis (LSA/LSI)**,
**Latent Dirichlet Allocation (LDA)**, **Random Projections (RP)**, **Hierarchical Dirichlet Process (HDP)** or **word2vec deep learning**.
* **Distributed computing**: can run *Latent Semantic Analysis* and *Latent Dirichlet Allocation* on a cluster of computers, and *word2vec* on multiple cores.
* **Distributed computing**: can run *Latent Semantic Analysis* and *Latent Dirichlet Allocation* on a cluster of computers.
* Extensive `HTML documentation and tutorials <http://radimrehurek.com/gensim/>`_.


Expand All @@ -45,19 +45,19 @@ It is also recommended you install a fast BLAS library before installing NumPy.

The simple way to install `gensim` is::

sudo easy_install gensim
pip install -U gensim

Or, if you have instead downloaded and unzipped the `source tar.gz <http://pypi.python.org/pypi/gensim>`_ package,
you'll need to run::

python setup.py test
sudo python setup.py install
python setup.py install


For alternative modes of installation (without root privileges, development
installation, optional install features), see the `documentation <http://radimrehurek.com/gensim/install.html>`_.

This version has been tested under Python 2.6, 2.7 and 3.3. Gensim's github repo is hooked to `Travis CI for automated testing <https://travis-ci.org/piskvorky/gensim>`_ on every commit push and pull request.
This version has been tested under Python 2.6, 2.7, 3.3 and 3.4 (support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5). Gensim's github repo is hooked to `Travis CI for automated testing <https://travis-ci.org/piskvorky/gensim>`_ on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn't it pure Python, and isn't Python slow and greedy?
--------------------------------------------------------------------------------------------------------
Expand Down
10 changes: 5 additions & 5 deletions docs/src/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ History
--------

Gensim started off as a collection of various Python scripts for the Czech Digital Mathematics Library `dml.cz <http://dml.cz/>`_ in 2008,
where it served to generate a short list of the most similar articles to a given article (gensim = "generate similar").
where it served to generate a short list of the most similar articles to a given article (**gensim = "generate similar"**).
I also wanted to try these fancy "Latent Semantic Methods", but the libraries that
realized the necessary computation were `not much fun to work with <http://soi.stanford.edu/~rmunk/PROPACK/>`_.

Expand Down Expand Up @@ -39,9 +39,9 @@ the source code of these modifications.
Apart from that, you are free to redistribute gensim in any way you like, though you're
not allowed to modify its license (doh!).

My intent here is, of course, to get more help and community involvement with the development of gensim.
My intent here is, of course, to **get more help and community involvement** with the development of gensim.
The legalese is therefore less important to me than your input and contributions.
Contact me if LGPL doesn't fit your bill but you'd still like to use it -- we'll work something out.
Contact me if LGPL doesn't fit your bill but you'd still like to use gensim -- we'll work something out.

.. seealso::

Expand All @@ -56,7 +56,7 @@ Contributors
--------------

Credit goes to all the people who contributed to gensim, be it in `discussions <http://groups.google.com/group/gensim>`_,
ideas, `code contributions <https://github.com/piskvorky/gensim/pulls>`_ or bug reports.
ideas, `code contributions <https://github.com/piskvorky/gensim/pulls>`_ or `bug reports <https://github.com/piskvorky/gensim/issues>`_.
It's really useful and motivating to get feedback, in any shape or form, so big thanks to you all!

Some honorable mentions are included in the `CHANGELOG.txt <https://github.com/piskvorky/gensim/blob/develop/CHANGELOG.txt>`_.
Expand All @@ -65,7 +65,7 @@ Some honorable mentions are included in the `CHANGELOG.txt <https://github.com/p
Academic citing
----------------

Gensim has been used in many students' final theses as well as research papers. When citing gensim,
Gensim has been used in `many students' final theses as well as research papers <http://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC>`_. When citing gensim,
please use `this BibTeX entry <bibtex_gensim.bib>`_::

@inproceedings{rehurek_lrec,
Expand Down
2 changes: 2 additions & 0 deletions docs/src/apiref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Modules:
corpora/ucicorpus
corpora/indexedcorpus
models/ldamodel
models/ldamulticore
models/ldamallet
models/lsimodel
models/tfidfmodel
Expand All @@ -33,6 +34,7 @@ Modules:
models/lda_dispatcher
models/lda_worker
models/word2vec
models/dtmmodel
similarities/docsim
similarities/simserver

4 changes: 2 additions & 2 deletions docs/src/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,9 @@
# built documents.
#
# The short X.Y version.
version = '0.10.1'
version = '0.10.2'
# The full version, including alpha/beta/rc tags.
release = '0.10.1'
release = '0.10.2'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
7 changes: 7 additions & 0 deletions docs/src/models/dtmmodel.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
:mod:`models.dtmmodel` -- Dynamic Topic Models (DTM) and Dynamic Influence Models (DIM)
=======================================================================================

.. automodule:: gensim.models.dtmmodel
:synopsis: Dynamic Topic Models
:members:
:inherited-members:
7 changes: 7 additions & 0 deletions docs/src/models/ldamulticore.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
:mod:`models.ldamulticore` -- parallelized Latent Dirichlet Allocation
======================================================================

.. automodule:: gensim.models.ldamulticore
:synopsis: Latent Dirichlet Allocation
:members:
:inherited-members:
10 changes: 9 additions & 1 deletion gensim/corpora/lowcorpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ def __init__(self, fname, id2word=None, line2words=split_on_space):
else:
logger.info("using provided word mapping (%i ids)" % len(id2word))
self.id2word = id2word
self.word2id = dict((v, k) for k, v in iteritems(self.id2word))
self.num_terms = len(self.word2id)
self.use_wordids = True # return documents as (wordIndex, wordCount) 2-tuples

Expand Down Expand Up @@ -179,4 +178,13 @@ def docbyoffset(self, offset):
f.seek(offset)
return self.line2doc(f.readline())

@property
def id2word(self):
return self._id2word

@id2word.setter
def id2word(self, val):
self._id2word = val
self.word2id = dict((v, k) for k, v in iteritems(val))

# endclass LowCorpus
6 changes: 2 additions & 4 deletions gensim/corpora/wikicorpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,10 +184,8 @@ def extract_pages(f, filter_namespaces=False):
"""
Extract pages from MediaWiki database dump.
Returns
-------
pages : iterable over (str, str)
Generates (title, content) pairs.
Return an iterable over (str, str) which generates (title, content) pairs.
"""
elems = (elem for _, elem in iterparse(f, events=("end",)))

Expand Down
2 changes: 1 addition & 1 deletion gensim/matutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ def __init__(self, sparse, documents_columns=True):

def __iter__(self):
for indprev, indnow in izip(self.sparse.indptr, self.sparse.indptr[1:]):
yield zip(self.sparse.indices[indprev:indnow], self.sparse.data[indprev:indnow])
yield list(zip(self.sparse.indices[indprev:indnow], self.sparse.data[indprev:indnow]))

def __len__(self):
return self.sparse.shape[1]
Expand Down
2 changes: 2 additions & 0 deletions gensim/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
from .rpmodel import RpModel
from .logentropy_model import LogEntropyModel
from .word2vec import Word2Vec
from .ldamulticore import LdaMulticore
from .dtmmodel import DtmModel

from gensim import interfaces, utils

Expand Down
Loading

0 comments on commit afd70ff

Please sign in to comment.