diff --git a/.gitignore b/.gitignore index a7f7289ad0..3bca94ef00 100644 --- a/.gitignore +++ b/.gitignore @@ -54,3 +54,6 @@ gensim.egg-info .*.sw[op] data *.bak +/build/ +/dist/ + diff --git a/CHANGELOG.txt b/CHANGELOG.txt index 690755f3b6..047551c6d4 100644 --- a/CHANGELOG.txt +++ b/CHANGELOG.txt @@ -1,6 +1,24 @@ Changes ======= +0.8.7 + +* initial version of word2vec, a neural network deep learning algo +* make distributed gensim compatible with the new Pyro +* allow merging dictionaries (by Florent Chandelier) +* new design for the gensim website! +* speed up handling of corner cases when returning top-n most similar +* make Random Projections compatible with new scipy (andrewjOc360, PR #110) +* allow "light" (faster) word lemmatization (by Karsten Jeschkies) +* save/load directly from bzip2 files (by Luis Pedro Coelho, PR #101) +* Blei corpus now tries harder to find its vocabulary file (by Luis Pedro Coelho, PR #100) +* sparse vector elements can now be a list (was: only a 2-tuple) +* simple_preprocess now optionally de-accents letters (ř/š/ú etc.) +* better serialization of numpy corpora +* print_topics() returns the topics, in addition to printing/logging +* fixes for more robust Windows multiprocessing +* lots of small fixes, data checks and documentation updates + 0.8.6 * added HashDictionary (by Homer Strong) diff --git a/README.rst b/README.rst index 29fef211cb..18d3be1f27 100644 --- a/README.rst +++ b/README.rst @@ -3,7 +3,6 @@ gensim -- Python Framework for Topic Modelling ============================================== - Gensim is a Python library for *topic modelling*, *document indexing* and *similarity retrieval* with large corpora. Target audience is the *natural language processing* (NLP) and *information retrieval* (IR) community. @@ -33,6 +32,8 @@ Installation This software depends on `NumPy and Scipy `_, two Python packages for scientific computing. You must have them installed prior to installing `gensim`. +It is also recommended you install a fast BLAS library prior to installing NumPy. This is optional, but using an optimized BLAS such as `ATLAS `_ or `OpenBLAS `_ is known to improve performance by as much as an order of magnitude. + The simple way to install `gensim` is:: sudo easy_install gensim @@ -60,4 +61,4 @@ It is also included in the source distribution package. Gensim is open source software, and has been released under the `GNU LGPL license `_. -Copyright (c) 2009-2012 Radim Rehurek +Copyright (c) 2009-2013 Radim Rehurek diff --git a/docs/_sources/about.txt b/docs/_sources/about.txt deleted file mode 100644 index c6cb4a79b8..0000000000 --- a/docs/_sources/about.txt +++ /dev/null @@ -1,87 +0,0 @@ -.. _about: - -============ -About -============ - -History --------- - -Gensim started off as a collection of various Python scripts for the Czech Digital Mathematics Library `dml.cz `_ in 2008, -where it served to generate a short list of the most similar articles to a given article (gensim = "generate similar"). -I also wanted to try these fancy "Latent Semantic Methods", but the libraries that -realized the necessary computation were `not much fun to work with `_. - -Naturally, I set out to reinvent the wheel. Our `2010 LREC publication `_ -describes the initial design decisions behind gensim (clarity, efficiency and scalability) -and is fairly representative of how gensim works even today. - -Later versions of gensim improved this efficiency and scalability tremendously (in fact, -I made algorithmic scalability of distributional semantics the topic of my `PhD thesis `_). - -By now, gensim is---to my knowledge---the most robust, efficient and hassle-free piece -of software to realize unsupervised semantic modelling from plain text. It stands -in contrast to brittle homework-assignment-implementations that do not scale on one hand, -and robust java-esque projects that do scale, but only if you're willing to sacrifice -several weeks of your, your technician's as well as your local scientist's time just to run "hello world". - -In 2011, I started using `Github `_ for source code hosting, -and the gensim website moved from my university hosting to its present domain. - - -Licensing ----------- - -Gensim is licensed under the OSI-approved `GNU LGPL license `_. -This means that it's free for both personal and commercial use, but if you make any -modification to gensim that you distribute to other people, you have to disclose -the source code of these modifications. - -Apart form that, you are free to redistribute gensim in any way you like, though you're -not allowed to modify its license (doh!). - -My intent here is, of course, to get more help and community involvement with the development of gensim. -The legalese is therefore less important to me than your input and contributions. -Contact me if LGPL doesn't fit your bill but you'd still like to use it -- we'll work something out. - -.. seealso:: - - I also host a document similarity package `gensim.simserver`. This is a high-level - interface to `gensim` functionality, and offers transactional remote (web-based) - document similarity queries and indexing. It uses gensim to do the heavy lifting: - you don't need the `simserver` to use gensim, but you do need gensim to use the `simserver`. - Note that unlike gensim, `gensim.simserver` is licensed under `Affero GPL `_, - which is much more restrictive for inclusion in commercial projects. - -Contributors --------------- - -Credit goes to all the people who contributed to gensim, be it in `discussions `_, -ideas, `code contributions `_ or bug reports. -It's really useful and motivating to get feedback, in any shape or form, so big thanks to you all! - -Some honorable mentions are included in the `CHANGELOG.txt `_. - - -Academic citing ----------------- - -Gensim has been used in many students' final theses as well as research papers. When citing gensim, -please use `this BibTeX entry `_:: - - @inproceedings{rehurek_lrec, - title = {{Software Framework for Topic Modelling with Large Corpora}}, - author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka}, - booktitle = {{Proceedings of the LREC 2010 Workshop on New - Challenges for NLP Frameworks}}, - pages = {45--50}, - year = 2010, - month = May, - day = 22, - publisher = {ELRA}, - address = {Valletta, Malta}, - note={\url{http://is.muni.cz/publication/884893/en}}, - language={English} - } - - diff --git a/docs/_sources/apiref.txt b/docs/_sources/apiref.txt deleted file mode 100644 index 14ed4a8f7d..0000000000 --- a/docs/_sources/apiref.txt +++ /dev/null @@ -1,36 +0,0 @@ -.. _apiref: - -API Reference -============= - -Modules: - -.. toctree:: - :maxdepth: 0 - - interfaces - utils - matutils - corpora/bleicorpus - corpora/dictionary - corpora/hashdictionary - corpora/lowcorpus - corpora/mmcorpus - corpora/svmlightcorpus - corpora/wikicorpus - corpora/textcorpus - corpora/ucicorpus - corpora/indexedcorpus - models/ldamodel - models/lsimodel - models/tfidfmodel - models/rpmodel - models/hdpmodel - models/logentropy_model - models/lsi_dispatcher - models/lsi_worker - models/lda_dispatcher - models/lda_worker - similarities/docsim - similarities/simserver - diff --git a/docs/_sources/changes_080.txt b/docs/_sources/changes_080.txt deleted file mode 100644 index be5df9ad15..0000000000 --- a/docs/_sources/changes_080.txt +++ /dev/null @@ -1,80 +0,0 @@ -.. _changes_080: - -Change Set for 0.8.0 -============================ - -Release 0.8.0 concludes the 0.7.x series, which was about API consolidation and performance. -In 0.8.x, I'd like to extend `gensim` with new functionality and features. - -Codestyle Changes ------------------- - -Codebase was modified to comply with `PEP8: Style Guide for Python Code `_. -This means the 0.8.0 API is **backward incompatible** with the 0.7.x series. - -That's not as tragic as it sounds, gensim was almost there anyway. The changes are few and pretty straightforward: - -1. the `numTopics` parameter is now `num_topics` -2. `addDocuments()` method becomes `add_documents()` -3. `toUtf8()` => `to_utf8()` -4. ... you get the idea: replace `camelCase` with `lowercase_with_underscores`. - -If you stored a model that is affected by this to disk, you'll need to rename its attributes manually: - ->>> lsa = gensim.models.LsiModel.load('/some/path') # load old <0.8.0 model ->>> lsa.num_terms, lsa.num_topics = lsa.numTerms, lsa.numTopics # rename attributes ->>> del lsa.numTerms, lsa.numTopics # clean up old attributes (optional) ->>> lsa.save('/some/path') # save again to disk, as 0.8.0 compatible - -Only attributes (variables) need to be renamed; method names (functions) are not affected, due to the way `pickle` works. - -Similarity Queries -------------------- - -Improved speed and scalability of :doc:`similarity queries `. - -The `Similarity` class can now index corpora of arbitrary size more efficiently. -Internally, this is done by splitting the index into several smaller pieces ("shards") that fit in RAM -and can be processed independently. In addition, documents can now be added to a `Similarity` index dynamically. - -There is also a new way to query the similarity indexes: - ->>> index = MatrixSimilarity(corpus) # create an index ->>> sims = index[document] # get cosine similarity of query "document" against every document in the index ->>> sims = index[chunk_of_documents] # new syntax! - -Advantage of the last line (querying multiple documents at the same time) is faster execution. - -This faster execution is also utilized *automatically for you* if you're using the ``for sims in index: ...`` syntax -(which returns pairwise similarities of documents in the index). - -To see the speed-up on your machine, run ``python -m gensim.test.simspeed`` (and compare to my results `here `_ to see how your machine fares). - -.. note:: - This current functionality of querying is as far as I wanted to get with gensim. - More optimizations and smarter indexing are certainly possible, but I'd like to - focus on other features now. Pull requests are still welcome though :) - -Check out the :mod:`updated documentation ` of the similarity classes for more info. - -Simplified Directory Structure --------------------------------- - -Instead of the java-esque ``ROOT_DIR/src/gensim`` directory structure of gensim, -the packages now reside directly in ``ROOT_DIR/gensim`` (no superfluous ``src``). See the new structure `on github `_. - -Other changes (that you're unlikely to notice unless you look) ----------------------------------------------------------------------- - -* Improved efficiency of ``lsi[corpus]`` transformations (documents are chunked internally for better performance). -* Large matrices (numpy/scipy.sparse, in `LsiModel`, `Similarity` etc.) are now mmapped to/from disk when doing `save/load`. The `cPickle` approach used previously was too `buggy `_ and `slow `_. -* Renamed `chunks` parameter to `chunksize` (i.e. `LsiModel(corpus, num_topics=100, chunksize=20000)`). This better reflects its purpose: size of a chunk=number of documents to be processed at once. -* Also improved memory efficiency of LSI and LDA model generation (again). -* Removed SciPy 0.6 from the list of supported SciPi versions (need >=0.7 now). -* Added more unit tests. -* Several smaller fixes; see the `commit history `_ for full account. - -.. admonition:: Future Directions? - - If you have ideas or proposals for new features for 0.8.x, now is the time to let me know: - `gensim mailing list `_. diff --git a/docs/_sources/corpora/bleicorpus.txt b/docs/_sources/corpora/bleicorpus.txt deleted file mode 100644 index 91c02213b1..0000000000 --- a/docs/_sources/corpora/bleicorpus.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.bleicorpus` -- Corpus in Blei's LDA-C format -========================================================== - -.. automodule:: gensim.corpora.bleicorpus - :synopsis: Corpus in Blei's LDA-C format - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/corpora.txt b/docs/_sources/corpora/corpora.txt deleted file mode 100644 index 3ea5151c96..0000000000 --- a/docs/_sources/corpora/corpora.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora` -- Package for corpora I/O -========================================== - -.. automodule:: gensim.corpora - :synopsis: Package for corpora I/O - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/dictionary.txt b/docs/_sources/corpora/dictionary.txt deleted file mode 100644 index 8542a61d6e..0000000000 --- a/docs/_sources/corpora/dictionary.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.dictionary` -- Construct word<->id mappings -========================================================== - -.. automodule:: gensim.corpora.dictionary - :synopsis: Construct word<->id mappings - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/dmlcorpus.txt b/docs/_sources/corpora/dmlcorpus.txt deleted file mode 100644 index 3248522fa0..0000000000 --- a/docs/_sources/corpora/dmlcorpus.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.dmlcorpus` -- Corpus in DML-CZ format -==================================================== - -.. automodule:: gensim.corpora.dmlcorpus - :synopsis: Corpus in DML-CZ format - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/indexedcorpus.txt b/docs/_sources/corpora/indexedcorpus.txt deleted file mode 100644 index 7f0a2764bf..0000000000 --- a/docs/_sources/corpora/indexedcorpus.txt +++ /dev/null @@ -1,12 +0,0 @@ -:mod:`corpora.indexedcorpus` -- Random access to corpus documents -================================================================= - -.. automodule:: gensim.corpora.indexedcorpus - :synopsis: Random access to corpus documents - :members: - :inherited-members: - - -.. autoclass:: IndexedCorpus - :members: - :inherited-members: \ No newline at end of file diff --git a/docs/_sources/corpora/lowcorpus.txt b/docs/_sources/corpora/lowcorpus.txt deleted file mode 100644 index 344495de31..0000000000 --- a/docs/_sources/corpora/lowcorpus.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.lowcorpus` -- Corpus in List-of-Words format -=========================================================== - -.. automodule:: gensim.corpora.lowcorpus - :synopsis: Corpus in List-of-Words format - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/mmcorpus.txt b/docs/_sources/corpora/mmcorpus.txt deleted file mode 100644 index aec7b2d963..0000000000 --- a/docs/_sources/corpora/mmcorpus.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.mmcorpus` -- Corpus in Matrix Market format -========================================================== - -.. automodule:: gensim.corpora.mmcorpus - :synopsis: Corpus in Matrix Market format - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/svmlightcorpus.txt b/docs/_sources/corpora/svmlightcorpus.txt deleted file mode 100644 index 9a2f8cb791..0000000000 --- a/docs/_sources/corpora/svmlightcorpus.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.svmlightcorpus` -- Corpus in SVMlight format -================================================================== - -.. automodule:: gensim.corpora.svmlightcorpus - :synopsis: Corpus in SVMlight format - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/textcorpus.txt b/docs/_sources/corpora/textcorpus.txt deleted file mode 100644 index 01698db4de..0000000000 --- a/docs/_sources/corpora/textcorpus.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.textcorpus` -- Building corpora with dictionaries -================================================================= - -.. automodule:: gensim.corpora.textcorpus - :synopsis: Building corpora with dictionaries - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/ucicorpus.txt b/docs/_sources/corpora/ucicorpus.txt deleted file mode 100644 index 3d48f81a1e..0000000000 --- a/docs/_sources/corpora/ucicorpus.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.ucicorpus` -- Corpus in UCI bag-of-words format -============================================================================================================== - -.. automodule:: gensim.corpora.ucicorpus - :synopsis: Corpus in University of California, Irvine (UCI) bag-of-words format - :members: - :inherited-members: - diff --git a/docs/_sources/corpora/wikicorpus.txt b/docs/_sources/corpora/wikicorpus.txt deleted file mode 100644 index 5dda3cb7fc..0000000000 --- a/docs/_sources/corpora/wikicorpus.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`corpora.wikicorpus` -- Corpus from a Wikipedia dump -========================================================== - -.. automodule:: gensim.corpora.wikicorpus - :synopsis: Corpus from a Wikipedia dump - :members: - :inherited-members: - diff --git a/docs/_sources/dist_lda.txt b/docs/_sources/dist_lda.txt deleted file mode 100644 index baf2d28aba..0000000000 --- a/docs/_sources/dist_lda.txt +++ /dev/null @@ -1,81 +0,0 @@ -.. _dist_lda: - -Distributed Latent Dirichlet Allocation -============================================ - - -.. note:: - See :doc:`distributed` for an introduction to distributed computing in `gensim`. - - -Setting up the cluster -_______________________ - -See the tutorial on :doc:`dist_lsi`; setting up a cluster for LDA is completely -analogous, except you want to run `lda_worker` and `lda_dispatcher` scripts instead -of `lsi_worker` and `lsi_dispatcher`. - -Running LDA -____________ - -Run LDA like you normally would, but turn on the `distributed=True` constructor -parameter:: - - >>> # extract 100 LDA topics, using default parameters - >>> lda = LdaModel(corpus=mm, id2word=id2word, num_topics=100, distributed=True) - using distributed version with 4 workers - running online LDA training, 100 topics, 1 passes over the supplied corpus of 3199665 documets, updating model once every 40000 documents - .. - - -In serial mode (no distribution), creating this online LDA :doc:`model of Wikipedia ` -takes 10h56m on my laptop (OS X, C2D 2.53GHz, 4GB RAM with `libVec`). -In distributed mode with four workers (Linux, Xeons of 2Ghz, 4GB RAM -with `ATLAS `_), the wallclock time taken drops to 3h20m. - -To run standard batch LDA (no online updates of mini-batches) instead, you would similarly -call:: - - >>> lda = LdaModel(corpus=mm, id2word=id2token, num_topics=100, update_every=0, passes=20, distributed=True) - using distributed version with 4 workers - running batch LDA training, 100 topics, 20 passes over the supplied corpus of 3199665 documets, updating model once every 3199665 documents - initializing workers - iteration 0, dispatching documents up to #10000/3199665 - iteration 0, dispatching documents up to #20000/3199665 - ... - -and then, some two days later:: - - iteration 19, dispatching documents up to #3190000/3199665 - iteration 19, dispatching documents up to #3199665/3199665 - reached the end of input; now waiting for all remaining jobs to finish - -:: - - >>> lda.print_topics(20) - topic #0: 0.007*disease + 0.006*medical + 0.005*treatment + 0.005*cells + 0.005*cell + 0.005*cancer + 0.005*health + 0.005*blood + 0.004*patients + 0.004*drug - topic #1: 0.024*king + 0.013*ii + 0.013*prince + 0.013*emperor + 0.008*duke + 0.008*empire + 0.007*son + 0.007*china + 0.007*dynasty + 0.007*iii - topic #2: 0.031*film + 0.017*films + 0.005*movie + 0.005*directed + 0.004*man + 0.004*episode + 0.003*character + 0.003*cast + 0.003*father + 0.003*mother - topic #3: 0.022*user + 0.012*edit + 0.009*wikipedia + 0.007*block + 0.007*my + 0.007*here + 0.007*edits + 0.007*blocked + 0.006*revert + 0.006*me - topic #4: 0.045*air + 0.026*aircraft + 0.021*force + 0.018*airport + 0.011*squadron + 0.010*flight + 0.010*military + 0.008*wing + 0.007*aviation + 0.007*f - topic #5: 0.025*sun + 0.022*star + 0.018*moon + 0.015*light + 0.013*stars + 0.012*planet + 0.011*camera + 0.010*mm + 0.009*earth + 0.008*lens - topic #6: 0.037*radio + 0.026*station + 0.022*fm + 0.014*news + 0.014*stations + 0.014*channel + 0.013*am + 0.013*racing + 0.011*tv + 0.010*broadcasting - topic #7: 0.122*image + 0.099*jpg + 0.046*file + 0.038*uploaded + 0.024*png + 0.014*contribs + 0.013*notify + 0.013*logs + 0.013*picture + 0.013*flag - topic #8: 0.036*russian + 0.030*soviet + 0.028*polish + 0.024*poland + 0.022*russia + 0.013*union + 0.012*czech + 0.011*republic + 0.011*moscow + 0.010*finland - topic #9: 0.031*language + 0.014*word + 0.013*languages + 0.009*term + 0.009*words + 0.008*example + 0.007*names + 0.007*meaning + 0.006*latin + 0.006*form - topic #10: 0.029*w + 0.029*toronto + 0.023*l + 0.020*hockey + 0.019*nhl + 0.014*ontario + 0.012*calgary + 0.011*edmonton + 0.011*hamilton + 0.010*season - topic #11: 0.110*wikipedia + 0.110*articles + 0.030*library + 0.029*wikiproject + 0.028*project + 0.019*data + 0.016*archives + 0.012*needing + 0.009*reference + 0.009*statements - topic #12: 0.032*http + 0.030*your + 0.022*request + 0.017*sources + 0.016*archived + 0.016*modify + 0.015*changes + 0.015*creation + 0.014*www + 0.013*try - topic #13: 0.011*your + 0.010*my + 0.009*we + 0.008*don + 0.008*get + 0.008*know + 0.007*me + 0.006*think + 0.006*question + 0.005*find - topic #14: 0.073*r + 0.066*japanese + 0.062*japan + 0.018*tokyo + 0.008*prefecture + 0.005*osaka + 0.004*j + 0.004*sf + 0.003*kyoto + 0.003*manga - topic #15: 0.045*da + 0.045*fr + 0.027*kategori + 0.026*pl + 0.024*nl + 0.021*pt + 0.017*en + 0.015*categoria + 0.014*es + 0.012*kategorie - topic #16: 0.010*death + 0.005*died + 0.005*father + 0.004*said + 0.004*himself + 0.004*took + 0.004*son + 0.004*killed + 0.003*murder + 0.003*wife - topic #17: 0.027*book + 0.021*published + 0.020*books + 0.014*isbn + 0.010*author + 0.010*magazine + 0.009*press + 0.009*novel + 0.009*writers + 0.008*story - topic #18: 0.027*football + 0.024*players + 0.023*cup + 0.019*club + 0.017*fc + 0.017*footballers + 0.017*league + 0.011*season + 0.007*teams + 0.007*goals - topic #19: 0.032*band + 0.024*album + 0.014*albums + 0.013*guitar + 0.013*rock + 0.011*records + 0.011*vocals + 0.009*live + 0.008*bass + 0.008*track - - - -If you used the distributed LDA implementation in `gensim`, please let me know (my -email is at the bottom of this page). I would like to hear about your application and -the possible (inevitable?) issues that you encountered, to improve `gensim` in the future. diff --git a/docs/_sources/dist_lsi.txt b/docs/_sources/dist_lsi.txt deleted file mode 100644 index 3cf8c5f065..0000000000 --- a/docs/_sources/dist_lsi.txt +++ /dev/null @@ -1,157 +0,0 @@ -.. _dist_lsi: - -Distributed Latent Semantic Analysis -============================================ - - -.. note:: - See :doc:`distributed` for an introduction to distributed computing in `gensim`. - - -Setting up the cluster -_______________________ - -We will show how to run distributed Latent Semantic Analysis by means of an example. -Let's say we have 5 computers at our disposal, all on the same network segment (=reachable -by network broadcast). To start with, install `gensim` and `Pyro` on each computer with:: - - $ sudo easy_install gensim[distributed] - -and run Pyro’s name server on exactly one of the machines (doesn’t matter which one):: - - $ python -m Pyro4.naming -n 0.0.0.0 & - -Let's say our example cluster consists of dual-core computers with loads of -memory. We will therefore run **two** worker scripts on four of the physical machines, -creating **eight** logical worker nodes:: - - $ python -m gensim.models.lsi_worker & - -This will execute `gensim`'s `lsi_worker.py` script (to be run twice on each of the -four computer). -This lets `gensim` know that it can run two jobs on each of the four computers in -parallel, so that the computation will be done faster, while also taking up twice -as much memory on each machine. - -Next, pick one computer that will be a job scheduler in charge of worker -synchronization, and on it, run `LSA dispatcher`. In our example, we will use the -fifth computer to act as the dispatcher and from there run:: - - $ python -m gensim.models.lsi_dispatcher & - -In general, the dispatcher can be run on the same machine as one of the worker nodes, or it -can be another, distinct computer (within the same broadcast domain). The dispatcher -won't be doing much with CPU most of the time, but pick a computer with ample memory. - -And that's it! The cluster is set up and running, ready to accept jobs. To remove -a worker later on, simply terminate its `lsi_worker` process. To add another worker, run another -`lsi_worker` (this will not affect a computation that is already running, the additions/deletions are not dynamic). -If you terminate `lsi_dispatcher`, you won't be able to run computations until you run it again -(surviving worker processes can be re-used though). - - -Running LSA -____________ - -So let's test our setup and run one computation of distributed LSA. Open a Python -shell on one of the five machines (again, this can be done on any computer -in the same `broadcast domain `_, -our choice is incidental) and try:: - - >>> from gensim import corpora, models, utils - >>> import logging - >>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - - >>> corpus = corpora.MmCorpus('/tmp/deerwester.mm') # load a corpus of nine documents, from the Tutorials - >>> id2word = corpora.Dictionary.load('/tmp/deerwester.dict') - - >>> lsi = models.LsiModel(corpus, id2word=id2word, num_topics=200, chunksize=1, distributed=True) # run distributed LSA on nine documents - -This uses the corpus and feature-token mapping created in the :doc:`tut1` tutorial. -If you look at the log in your Python session, you should see a line similar to:: - - 2010-08-09 23:44:25,746 : INFO : using distributed version with 8 workers - -which means all went well. You can also check the logs coming from your worker and dispatcher -processes --- this is especially helpful in case of problems. -To check the LSA results, let's print the first two latent topics:: - - >>> lsi.print_topics(num_topics=2, num_words=5) - topic #0(3.341): 0.644*"system" + 0.404*"user" + 0.301*"eps" + 0.265*"time" + 0.265*"response" - topic #1(2.542): 0.623*"graph" + 0.490*"trees" + 0.451*"minors" + 0.274*"survey" + -0.167*"system" - -Success! But a corpus of nine documents is no challenge for our powerful cluster... -In fact, we had to lower the job size (`chunksize` parameter above) to a single document -at a time, otherwise all documents would be processed by a single worker all at once. - -So let's run LSA on **one million documents** instead:: - - >>> # inflate the corpus to 1M documents, by repeating its documents over&over - >>> corpus1m = utils.RepeatCorpus(corpus, 1000000) - >>> # run distributed LSA on 1 million documents - >>> lsi1m = models.LsiModel(corpus1m, id2word=id2word, num_topics=200, chunksize=10000, distributed=True) - - >>> lsi1m.print_topics(num_topics=2, num_words=5) - topic #0(1113.628): 0.644*"system" + 0.404*"user" + 0.301*"eps" + 0.265*"time" + 0.265*"response" - topic #1(847.233): 0.623*"graph" + 0.490*"trees" + 0.451*"minors" + 0.274*"survey" + -0.167*"system" - -The log from 1M LSA should look like:: - - 2010-08-10 02:46:35,087 : INFO : using distributed version with 8 workers - 2010-08-10 02:46:35,087 : INFO : updating SVD with new documents - 2010-08-10 02:46:35,202 : INFO : dispatched documents up to #10000 - 2010-08-10 02:46:35,296 : INFO : dispatched documents up to #20000 - ... - 2010-08-10 02:46:46,524 : INFO : dispatched documents up to #990000 - 2010-08-10 02:46:46,694 : INFO : dispatched documents up to #1000000 - 2010-08-10 02:46:46,694 : INFO : reached the end of input; now waiting for all remaining jobs to finish - 2010-08-10 02:46:47,195 : INFO : all jobs finished, downloading final projection - 2010-08-10 02:46:47,200 : INFO : decomposition complete - -Due to the small vocabulary size and trivial structure of our "one-million corpus", the computation -of LSA still takes only 12 seconds. To really stress-test our cluster, let's do -Latent Semantic Analysis on the English Wikipedia. - -Distributed LSA on Wikipedia -++++++++++++++++++++++++++++++ - -First, download and prepare the Wikipedia corpus as per :doc:`wiki`, then load -the corpus iterator with:: - - >>> import logging, gensim, bz2 - >>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - - >>> # load id->word mapping (the dictionary) - >>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt') - >>> # load corpus iterator - >>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm') - >>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output - - >>> print mm - MmCorpus(3199665 documents, 100000 features, 495547400 non-zero entries) - -Now we're ready to run distributed LSA on the English Wikipedia:: - - >>> # extract 400 LSI topics, using a cluster of nodes - >>> lsi = gensim.models.lsimodel.LsiModel(corpus=mm, id2word=id2word, num_topics=400, chunksize=20000, distributed=True) - - >>> # print the most contributing words (both positively and negatively) for each of the first ten topics - >>> lsi.print_topics(10) - 2010-11-03 16:08:27,602 : INFO : topic #0(200.990): -0.475*"delete" + -0.383*"deletion" + -0.275*"debate" + -0.223*"comments" + -0.220*"edits" + -0.213*"modify" + -0.208*"appropriate" + -0.194*"subsequent" + -0.155*"wp" + -0.117*"notability" - 2010-11-03 16:08:27,626 : INFO : topic #1(143.129): -0.320*"diff" + -0.305*"link" + -0.199*"image" + -0.171*"www" + -0.162*"user" + 0.149*"delete" + -0.147*"undo" + -0.144*"contribs" + -0.122*"album" + 0.113*"deletion" - 2010-11-03 16:08:27,651 : INFO : topic #2(135.665): -0.437*"diff" + -0.400*"link" + -0.202*"undo" + -0.192*"user" + -0.182*"www" + -0.176*"contribs" + 0.168*"image" + -0.109*"added" + 0.106*"album" + 0.097*"copyright" - 2010-11-03 16:08:27,677 : INFO : topic #3(125.027): -0.354*"image" + 0.239*"age" + 0.218*"median" + -0.213*"copyright" + 0.204*"population" + -0.195*"fair" + 0.195*"income" + 0.167*"census" + 0.165*"km" + 0.162*"households" - 2010-11-03 16:08:27,701 : INFO : topic #4(116.927): -0.307*"image" + 0.195*"players" + -0.184*"median" + -0.184*"copyright" + -0.181*"age" + -0.167*"fair" + -0.162*"income" + -0.151*"population" + -0.136*"households" + -0.134*"census" - 2010-11-03 16:08:27,728 : INFO : topic #5(100.326): 0.501*"players" + 0.318*"football" + 0.284*"league" + 0.193*"footballers" + 0.142*"image" + 0.133*"season" + 0.119*"cup" + 0.113*"club" + 0.110*"baseball" + 0.103*"f" - 2010-11-03 16:08:27,754 : INFO : topic #6(92.298): -0.411*"album" + -0.275*"albums" + -0.217*"band" + -0.214*"song" + -0.184*"chart" + -0.163*"songs" + -0.160*"singles" + -0.149*"vocals" + -0.139*"guitar" + -0.129*"track" - 2010-11-03 16:08:27,780 : INFO : topic #7(83.811): -0.248*"wikipedia" + -0.182*"keep" + 0.180*"delete" + -0.167*"articles" + -0.152*"your" + -0.150*"my" + 0.144*"film" + -0.130*"we" + -0.123*"think" + -0.120*"user" - 2010-11-03 16:08:27,807 : INFO : topic #8(78.981): 0.588*"film" + 0.460*"films" + -0.130*"album" + -0.127*"station" + 0.121*"television" + 0.115*"poster" + 0.112*"directed" + 0.110*"actors" + -0.096*"railway" + 0.086*"movie" - 2010-11-03 16:08:27,834 : INFO : topic #9(78.620): 0.502*"kategori" + 0.282*"categoria" + 0.248*"kategorija" + 0.234*"kategorie" + 0.172*"категория" + 0.165*"categoría" + 0.161*"kategoria" + 0.148*"categorie" + 0.126*"kategória" + 0.121*"catégorie" - -In serial mode, creating the LSI model of Wikipedia with this **one-pass algorithm** -takes about 5.25h on my laptop (OS X, C2D 2.53GHz, 4GB RAM with `libVec`). -In distributed mode with four workers (Linux, dual-core Xeons of 2Ghz, 4GB RAM -with `ATLAS`), the wallclock time taken drops to 1 hour and 41 minutes. You can -read more about various internal settings and experiments in my `research -paper `_. - diff --git a/docs/_sources/distributed.txt b/docs/_sources/distributed.txt deleted file mode 100644 index 5723937851..0000000000 --- a/docs/_sources/distributed.txt +++ /dev/null @@ -1,92 +0,0 @@ -.. _distributed: - -Distributed Computing -=================================== - -Why distributed computing? ---------------------------- - -Need to build semantic representation of a corpus that is millions of documents large and it's -taking forever? Have several idle machines at your disposal that you could use? -`Distributed computing `_ tries -to accelerate computations by splitting a given task into several smaller subtasks, -passing them on to several computing nodes in parallel. - -In the context of `gensim`, computing nodes are computers identified by their IP address/port, -and communication happens over TCP/IP. The whole collection of available machines is called -a *cluster*. The distribution is very coarse grained (not -much communication going on), so the network is allowed to be of relatively high latency. - -.. warning:: - The primary reason for using distributed computing is making things run faster. In `gensim`, - most of the time consuming stuff is done inside low-level routines for linear algebra, inside - NumPy, independent of any `gensim` code. - **Installing a fast** `BLAS (Basic Linear Algebra) `_ **library - for NumPy can improve performance up to 15 times!** So before you start buying those extra computers, - consider installing a fast, threaded BLAS that is optimized for your particular machine - (as opposed to a generic, binary-distributed library). - Options include your vendor's BLAS library (Intel's MKL, - AMD's ACML, OS X's vecLib, Sun's Sunperf, ...) or some open-source alternative (GotoBLAS, ALTAS). - - To see what BLAS and LAPACK you are using, type into your shell:: - - python -c 'import numpy; numpy.show_config()' - -Prerequisites ------------------ - -For communication between nodes, `gensim` uses `Pyro (PYthon Remote Objects) -`_, version >= 4.8. This is a library for low-level socket communication -and remote procedure calls (RPC) in Python. `Pyro` is a pure-Python library, so its -installation is quite painless and only involves copying its `*.py` files somewhere onto your Python's import path:: - - sudo easy_install Pyro4 - -You don't have to install `Pyro` to run `gensim`, but if you don't, you won't be able -to access the distributed features (i.e., everything will always run in serial mode, -the examples on this page don't apply). - - -Core concepts ------------------------------------ - -As always, `gensim` strives for a clear and straightforward API (see :ref:`design`). -To this end, *you do not need to make any changes in your code at all* in order to -run it over a cluster of computers! - -What you need to do is run a :term:`worker` script (see below) on each of your cluster nodes prior -to starting your computation. Running this script tells `gensim` that it may use the node -as a slave to delegate some work to it. During initialization, the algorithms -inside `gensim` will try to look for and enslave all available worker nodes. - -.. glossary:: - - Node - A logical working unit. Can correspond to a single physical machine, but you - can also run multiple workers on one machine, resulting in multiple - logical nodes. - - Cluster - Several nodes which communicate over TCP/IP. Currently, network broadcasting - is used to discover and connect all communicating nodes, so the nodes must lie - within the same `broadcast domain `_. - - Worker - A process which is created on each node. To remove a node from your cluster, - simply kill its worker process. - - Dispatcher - The dispatcher will be in charge of negotiating all computations, queueing and - distributing ("dispatching") individual jobs to the workers. Computations never - "talk" to worker nodes directly, only through this dispatcher. Unlike workers, - there can only be one active dispatcher at a time in the cluster. - - -Available distributed algorithms ---------------------------------- - -.. toctree:: - :maxdepth: 1 - - dist_lsi - dist_lda diff --git a/docs/_sources/index.txt b/docs/_sources/index.txt deleted file mode 100644 index f2f7e073de..0000000000 --- a/docs/_sources/index.txt +++ /dev/null @@ -1,63 +0,0 @@ -.. gensim documentation master file, created by - sphinx-quickstart on Tue Mar 16 19:45:41 2010. - You can adapt this file completely to your liking, but it should at least - contain the root `toctree` directive. - -Gensim -- Topic Modelling for Humans -===================================================== - - -.. raw:: html - :file: _static/tagcloud.html - - -.. raw:: html - - - - - - - - - - - - - -Quick Reference Example ------------------------- - ->>> from gensim import corpora, models, similarities ->>> ->>> # Load corpus iterator from a Matrix Market file on disk. ->>> corpus = corpora.MmCorpus('/path/to/corpus.mm') ->>> ->>> # Initialize a transformation (Latent Semantic Indexing with 200 latent dimensions). ->>> lsi = models.LsiModel(corpus, num_topics=200) ->>> ->>> # Convert another corpus to the latent space and index it. ->>> index = similarities.MatrixSimilarity(lsi[another_corpus]) ->>> ->>> # determine similarity of a query document against each document in the index ->>> sims = index[query] - - -.. admonition:: What's new? - - * 15 Sep 2012: release 0.8.6 : added the `hashing trick `_ to allow online changes to the vocabulary; fixed parallel lemmatization + `other minor improvements `_ - * 22 Jul 2012: release 0.8.5 : better Wikipedia parsing, faster similarity queries, maintenance fixes - * 30 Apr 2012: William Bert's `interview with me `_ - * 9 Mar 2012: release 0.8.4: new model `Hierarchical Dirichlet Process `_ (full `CHANGELOG `_) - - -.. toctree:: - :hidden: - :maxdepth: 1 - - intro - install - tutorial - distributed - wiki - apiref diff --git a/docs/_sources/install.txt b/docs/_sources/install.txt deleted file mode 100644 index e4886b20b8..0000000000 --- a/docs/_sources/install.txt +++ /dev/null @@ -1,118 +0,0 @@ -.. _install: - -============= -Installation -============= - -Quick install --------------- - -Run in your terminal:: - - sudo easy_install -U gensim - -In case that fails, or you don't know what "terminal" means, read on. - ------ - -Dependencies -------------- -Gensim is known to run on Linux, Windows and Mac OS X and should run on any other -platform that supports Python 2.5 and NumPy. Gensim depends on the following software: - -* 3.0 > `Python `_ >= 2.5. Tested with versions 2.5, 2.6 and 2.7. -* `NumPy `_ >= 1.3. Tested with version 1.6.1rc2, 1.5.0rc1, 1.4.0, 1.3.0, 1.3.0rc2. -* `SciPy `_ >= 0.7. Tested with version 0.9.0, 0.8.0, 0.8.0b1, 0.7.1, 0.7.0. - -**Windows users** are well advised to try the `Enthought distribution `_, -which conveniently includes Python&NumPy&SciPy in a single bundle, and is free for academic use. - - -Install Python and `easy_install` ---------------------------------- - -Check what version of Python you have with:: - - python --version - -You can download Python from http://python.org/download. - -.. note:: Gensim requires Python 2.5 or greater and will not run under earlier versions. - -Next, install the `easy_install utility `_, -which will make installing other Python programs easier. - -Install SciPy & NumPy ----------------------- - -These are quite popular Python packages, so chances are there are pre-built binary -distributions available for your platform. You can try installing from source using easy_install:: - - sudo easy_install numpy - sudo easy_install scipy - -If that doesn't work or if you'd rather install using a binary package, consult -http://www.scipy.org/Download. - -Install `gensim` ------------------ - -You can now install (or upgrade) `gensim` with:: - - sudo easy_install --upgrade gensim - -That's it! Congratulations, you can proceed to the :doc:`tutorials `. - ------ - -If you also want to run the algorithms over a cluster -of computers, in :doc:`distributed`, you should install with:: - - sudo easy_install gensim[distributed] - -The optional `distributed` feature installs `Pyro (PYthon Remote Objects) `_. -If you don't know what distributed computing means, you can ignore it: -`gensim` will work fine for you anyway. -This optional extension can also be installed separately later with:: - - sudo easy_install Pyro4 - ------ - -There are also alternative routes to install: - -1. If you have downloaded and unzipped the `tar.gz source `_ - for `gensim` (or you're installing `gensim` from `github `_), - you can run:: - - sudo python setup.py install - - to install `gensim` into your ``site-packages`` folder. -2. If you wish to make local changes to the `gensim` code (`gensim` is, after all, a - package which targets research prototyping and modifications), a preferred - way may be installing with:: - - sudo python setup.py develop - - This will only place a symlink into your ``site-packages`` directory. The actual - files will stay wherever you unpacked them. -3. If you don't have root priviledges (or just don't want to put the package into - your ``site-packages``), simply unpack the source package somewhere and that's it! No - compilation or installation needed. Just don't forget to set your PYTHONPATH - (or modify ``sys.path``), so that Python can find the unpacked package when importing. - - -Testing `gensim` ----------------- - -To test the package, unzip the `tar.gz source `_ and run:: - - python setup.py test - - -Contact --------- - -Use the `gensim discussion group `_ for -any questions and troubleshooting. For private enquiries, you can also send -me an email to the address at the bottom of this page. diff --git a/docs/_sources/interfaces.txt b/docs/_sources/interfaces.txt deleted file mode 100644 index f0f932c50d..0000000000 --- a/docs/_sources/interfaces.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`interfaces` -- Core gensim interfaces -============================================ - -.. automodule:: gensim.interfaces - :synopsis: Core gensim interfaces - :members: - :inherited-members: - diff --git a/docs/_sources/intro.txt b/docs/_sources/intro.txt deleted file mode 100644 index 2166f21b14..0000000000 --- a/docs/_sources/intro.txt +++ /dev/null @@ -1,139 +0,0 @@ -.. _intro: - -============ -Introduction -============ - -Gensim is a :ref:`free ` Python framework designed to automatically extract semantic -topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible. - - -Gensim aims at processing raw, unstructured digital texts ("*plain text*"). -The algorithms in `gensim`, such as **Latent Semantic Analysis**, **Latent Dirichlet Allocation** or **Random Projections**, -discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. -These algorithms are unsupervised, which means no human input is necessary -- you only need a corpus of plain text documents. - -Once these statistical patterns are found, any plain text documents can be succinctly -expressed in the new, semantic representation, and queried for topical similarity -against other documents. - -.. note:: - If the previous paragraphs left you confused, you can read more about the `Vector - Space Model `_ and `unsupervised - document analysis `_ on Wikipedia. - - -.. _design: - -Features ------------------- - -* **Memory independence** -- there is no need for the whole training corpus to - reside fully in RAM at any one time (can process large, web-scale corpora). -* Efficient implementations for several popular vector space algorithms, - including **Tf-Idf**, distributed incremental **Latent Semantic Analysis**, - distributed incremental **Latent Dirichlet Allocation (LDA)** or **Random Projection**; adding new ones is easy (really!). -* I/O wrappers and converters around **several popular data formats**. -* **Similarity queries** for documents in their semantic representation. - -Creation of `gensim` was motivated by a perceived lack of available, scalable software -frameworks that realize topic modelling, and/or their overwhelming internal complexity (hail java!). -You can read more about the motivation in our `LREC 2010 workshop paper `_. -If you want to cite `gensim` in your own work, please refer to that article (`BibTeX `_). - -You're welcome to share your results on the `mailing list `_, -so others can learn from your success :) - -The **principal design objectives** behind `gensim` are: - -1. Straightforward interfaces and low API learning curve for developers. Good for prototyping. -2. Memory independence with respect to the size of the input corpus; all intermediate - steps and algorithms operate in a streaming fashion, accessing one document - at a time. - -.. seealso:: - - If you're interested in document indexing/similarity retrieval, I also maintain a higher-level package - of `document similarity server `_. It uses gensim internally. - -.. _availability: - -Availability ------------- - -Gensim is licensed under the OSI-approved `GNU LPGL license `_ -and can be downloaded either from its `github repository `_ -or from the `Python Package Index `_. - -.. seealso:: - - See the :doc:`install ` page for more info on `gensim` deployment. - - -Core concepts -------------- - -The whole gensim package revolves around the concepts of :term:`corpus`, :term:`vector` and -:term:`model`. - -.. glossary:: - - Corpus - A collection of digital documents. This collection is used to automatically - infer structure of the documents, their topics etc. For - this reason, the collection is also called a *training corpus*. The inferred - latent structure can be later used to assign topics to new documents, which did - not appear in the training corpus. - No human intervention (such as tagging the documents by hand, or creating - other metadata) is required. - - Vector - In the Vector Space Model (VSM), each document is represented by an - array of features. For example, a single feature may be thought of as a - question-answer pair: - - 1. How many times does the word *splonge* appear in the document? Zero. - 2. How many paragraphs does the document consist of? Two. - 3. How many fonts does the document use? Five. - - The question is usually represented only by its integer id (such as `1`, `2` and `3` here), - so that the - representation of this document becomes a series of pairs like ``(1, 0.0), (2, 2.0), (3, 5.0)``. - If we know all the questions in advance, we may leave them implicit - and simply write ``(0.0, 2.0, 5.0)``. - This sequence of answers can be thought of as a high-dimensional (in this case 3-dimensional) - *vector*. For practical purposes, only questions to which the answer is (or - can be converted to) a single real number are allowed. - - The questions are the same for each document, so that looking at two - vectors (representing two documents), we will hopefully be able to make - conclusions such as "The numbers in these two vectors are very similar, and - therefore the original documents must be similar, too". Of course, whether - such conclusions correspond to reality depends on how well we picked our questions. - - Sparse vector - Typically, the answer to most questions will be ``0.0``. To save space, - we omit them from the document's representation, and write only ``(2, 2.0), - (3, 5.0)`` (note the missing ``(1, 0.0)``). - Since the set of all questions is known in advance, all the missing features - in a sparse representation of a document can be unambiguously resolved to zero, ``0.0``. - - Gensim is specific in that it doesn't prescribe any specific corpus format; - a corpus is anything that, when iterated over, successively yields these sparse vectors. - For example, `set([(2, 2.0), (3, 5.0)], ([0, -1.0], [3, -1.0]))` is a trivial - corpus of two documents, each with two non-zero `feature-answer` pairs. - - - - Model - For our purposes, a model is a transformation from one document representation - to another (or, in other words, from one vector space to another). - Both the initial and target representations are - still vectors -- they only differ in what the questions and answers are. - The transformation is automatically learned from the traning :term:`corpus`, without human - supervision, and in hopes that the final document representation will be more compact - and more useful: with similar documents having similar representations. - -.. seealso:: - - For some examples on how this works out in code, go to :doc:`tutorials `. diff --git a/docs/_sources/matutils.txt b/docs/_sources/matutils.txt deleted file mode 100644 index 45de7a14fa..0000000000 --- a/docs/_sources/matutils.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`matutils` -- Math utils -============================== - -.. automodule:: gensim.matutils - :synopsis: Math utils - :members: - :inherited-members: - diff --git a/docs/_sources/models/hdpmodel.txt b/docs/_sources/models/hdpmodel.txt deleted file mode 100644 index 1975b28fc4..0000000000 --- a/docs/_sources/models/hdpmodel.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.hdpmodel` -- Hierarchical Dirichlet Process -======================================================== - -.. automodule:: gensim.models.hdpmodel - :synopsis: Hierarchical Dirichlet Process - :members: - :inherited-members: - diff --git a/docs/_sources/models/lda_dispatcher.txt b/docs/_sources/models/lda_dispatcher.txt deleted file mode 100644 index f6dffd28e1..0000000000 --- a/docs/_sources/models/lda_dispatcher.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.lda_dispatcher` -- Dispatcher for distributed LDA -================================================================ - -.. automodule:: gensim.models.lda_dispatcher - :synopsis: Dispatcher for distributed LDA - :members: - :inherited-members: - diff --git a/docs/_sources/models/lda_worker.txt b/docs/_sources/models/lda_worker.txt deleted file mode 100644 index 381556eaa6..0000000000 --- a/docs/_sources/models/lda_worker.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.lda_worker` -- Worker for distributed LDA -====================================================== - -.. automodule:: gensim.models.lda_worker - :synopsis: Worker for distributed LDA - :members: - :inherited-members: - diff --git a/docs/_sources/models/ldamodel.txt b/docs/_sources/models/ldamodel.txt deleted file mode 100644 index 2d6ae3adf0..0000000000 --- a/docs/_sources/models/ldamodel.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.ldamodel` -- Latent Dirichlet Allocation -====================================================== - -.. automodule:: gensim.models.ldamodel - :synopsis: Latent Dirichlet Allocation - :members: - :inherited-members: - diff --git a/docs/_sources/models/logentropy_model.txt b/docs/_sources/models/logentropy_model.txt deleted file mode 100644 index 1466f6fd1b..0000000000 --- a/docs/_sources/models/logentropy_model.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.logentropy_model` -- LogEntropy model -====================================================== - -.. automodule:: gensim.models.logentropy_model - :synopsis: LogEntropy model - :members: - :inherited-members: - diff --git a/docs/_sources/models/lsi_dispatcher.txt b/docs/_sources/models/lsi_dispatcher.txt deleted file mode 100644 index 59bc80e35c..0000000000 --- a/docs/_sources/models/lsi_dispatcher.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.lsi_dispatcher` -- Dispatcher for distributed LSI -=============================================================== - -.. automodule:: gensim.models.lsi_dispatcher - :synopsis: Dispatcher for distributed LSI - :members: - :inherited-members: - diff --git a/docs/_sources/models/lsi_worker.txt b/docs/_sources/models/lsi_worker.txt deleted file mode 100644 index baf999f105..0000000000 --- a/docs/_sources/models/lsi_worker.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.lsi_worker` -- Worker for distributed LSI -====================================================== - -.. automodule:: gensim.models.lsi_worker - :synopsis: Worker for distributed LSI - :members: - :inherited-members: - diff --git a/docs/_sources/models/lsimodel.txt b/docs/_sources/models/lsimodel.txt deleted file mode 100644 index c0340db439..0000000000 --- a/docs/_sources/models/lsimodel.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.lsimodel` -- Latent Semantic Indexing -====================================================== - -.. automodule:: gensim.models.lsimodel - :synopsis: Latent Semantic Indexing - :members: - :inherited-members: - diff --git a/docs/_sources/models/models.txt b/docs/_sources/models/models.txt deleted file mode 100644 index f18032b7ee..0000000000 --- a/docs/_sources/models/models.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models` -- Package for transformation models -====================================================== - -.. automodule:: gensim.models - :synopsis: Package for transformation models - :members: - :inherited-members: - diff --git a/docs/_sources/models/rpmodel.txt b/docs/_sources/models/rpmodel.txt deleted file mode 100644 index 47eba01262..0000000000 --- a/docs/_sources/models/rpmodel.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.rpmodel` -- Random Projections -====================================================== - -.. automodule:: gensim.models.rpmodel - :synopsis: Random Projections - :members: - :inherited-members: - diff --git a/docs/_sources/models/tfidfmodel.txt b/docs/_sources/models/tfidfmodel.txt deleted file mode 100644 index ea714e7268..0000000000 --- a/docs/_sources/models/tfidfmodel.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`models.tfidfmodel` -- TF-IDF model -====================================================== - -.. automodule:: gensim.models.tfidfmodel - :synopsis: TF-IDF model - :members: - :inherited-members: - diff --git a/docs/_sources/similarities/docsim.txt b/docs/_sources/similarities/docsim.txt deleted file mode 100644 index 9d330f23b9..0000000000 --- a/docs/_sources/similarities/docsim.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`similarities.docsim` -- Document similarity queries -======================================================================== - -.. automodule:: gensim.similarities.docsim - :synopsis: Document similarity queries - :members: - :inherited-members: - diff --git a/docs/_sources/similarities/simserver.txt b/docs/_sources/similarities/simserver.txt deleted file mode 100644 index 82b3101fc5..0000000000 --- a/docs/_sources/similarities/simserver.txt +++ /dev/null @@ -1,8 +0,0 @@ -:mod:`simserver` -- Document similarity server -====================================================== - -.. automodule:: simserver.simserver - :synopsis: Document similarity server - :members: - :inherited-members: - diff --git a/docs/_sources/simserver.txt b/docs/_sources/simserver.txt deleted file mode 100644 index de0785da57..0000000000 --- a/docs/_sources/simserver.txt +++ /dev/null @@ -1,330 +0,0 @@ -.. _simserver: - -Document Similarity Server -============================= - -The 0.7.x series of `gensim `_ was about improving performance and consolidating API. -0.8.x will be about new features --- 0.8.1, first of the series, is a **document similarity service**. - -The source code itself has been moved from gensim to its own, dedicated package, named `simserver`. -Get it from `PyPI `_ or clone it on `Github `_. - -What is a document similarity service? ---------------------------------------- - -Conceptually, a service that lets you : - -1. train a semantic model from a corpus of plain texts (no manual annotation and mark-up needed) -2. index arbitrary documents using this semantic model -3. query the index for similar documents (the query can be either an id of a document already in the index, or an arbitrary text) - - ->>> from simserver import SessionServer ->>> server = SessionServer('/tmp/my_server') # resume server (or create a new one) - ->>> server.train(training_corpus, method='lsi') # create a semantic model ->>> server.index(some_documents) # convert plain text to semantic representation and index it ->>> server.find_similar(query) # convert query to semantic representation and compare against index ->>> ... ->>> server.index(more_documents) # add to index: incremental indexing works ->>> server.find_similar(query) ->>> ... ->>> server.delete(ids_to_delete) # incremental deleting also works ->>> server.find_similar(query) ->>> ... - -.. note:: - "Semantic" here refers to semantics of the crude, statistical type -- - `Latent Semantic Analysis `_, - `Latent Dirichlet Allocation `_ etc. - Nothing to do with the semantic web, manual resource tagging or detailed linguistic inference. - - -What is it good for? ---------------------- - -Digital libraries of (mostly) text documents. More generally, it helps you annotate, -organize and navigate documents in a more abstract way, compared to plain keyword search. - -How is it unique? ------------------ - -1. **Memory independent**. Gensim has unique algorithms for statistical analysis that allow - you to create semantic models of arbitrarily large training corpora (larger than RAM) very quickly - and in constant RAM. -2. **Memory independent (again)**. Indexing shards are stored as files to disk/mmapped back as needed, - so you can index very large corpora. So again, constant RAM, this time independent of the number of indexed documents. -3. **Efficient**. Gensim makes heavy use of Python's NumPy and SciPy libraries to make indexing and - querying efficient. -4. **Robust**. Modifications of the index are transactional, so you can commit/rollback an - entire indexing session. Also, during the session, the service is still available - for querying (using its state from when the session started). Power failures leave - service in a consistent state (implicit rollback). -5. **Pure Python**. Well, technically, NumPy and SciPy are mostly wrapped C and Fortran, but - `gensim `_ itself is pure Python. No compiling, installing or root priviledges needed. -6. **Concurrency support**. The underlying service object is thread-safe and can - therefore be used as a daemon server: clients connect to it via RPC and issue train/index/query requests remotely. -7. **Cross-network, cross-platform and cross-language**. While the Python server runs - over TCP using `Pyro `_, - clients in Java/.NET are trivial thanks to `Pyrolite `_. - -The rest of this document serves as a tutorial explaining the features in more detail. - ------ - -Prerequisites ----------------------- - -It is assumed you have `gensim` properly :doc:`installed `. You'll also -need the `sqlitedict `_ package that wraps -Python's sqlite3 module in a thread-safe manner:: - - $ sudo easy_install -U sqlitedict - -To test the remote server capabilities, install Pyro4 (Python Remote Objects, at -version 4.8 as of this writing):: - - $ sudo easy_install Pyro4 - -.. note:: - Don't forget to initialize logging to see logging messages:: - - >>> import logging - >>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - -What is a document? -------------------- - -In case of text documents, the service expects:: - ->>> document = {'id': 'some_unique_string', ->>> 'tokens': ['content', 'of', 'the', 'document', '...'], ->>> 'other_fields_are_allowed_but_ignored': None} - -This format was chosen because it coincides with plain JSON and is therefore easy to serialize and send over the wire, in almost any language. -All strings involved must be utf8-encoded. - - -What is a corpus? ------------------ - -A sequence of documents. Anything that supports the `for document in corpus: ...` -iterator protocol. Generators are ok. Plain lists are also ok (but consume more memory). - ->>> from gensim import utils ->>> texts = ["Human machine interface for lab abc computer applications", ->>> "A survey of user opinion of computer system response time", ->>> "The EPS user interface management system", ->>> "System and human system engineering testing of EPS", ->>> "Relation of user perceived response time to error measurement", ->>> "The generation of random binary unordered trees", ->>> "The intersection graph of paths in trees", ->>> "Graph minors IV Widths of trees and well quasi ordering", ->>> "Graph minors A survey"] ->>> corpus = [{'id': 'doc_%i' % num, 'tokens': utils.simple_preprocess(text)} ->>> for num, text in enumerate(texts)] - -Since corpora are allowed to be arbitrarily large, it is -recommended client splits them into smaller chunks before uploading them to the server: - ->>> utils.upload_chunked(server, corpus, chunksize=1000) # send 1k docs at a time - -Wait, upload what, where? -------------------------- - -If you use the similarity service object (instance of :class:`simserver.SessionServer`) in -your code directly---no remote access---that's perfectly fine. Using the service remotely, from a different process/machine, is an -option, not a necessity. - -Document similarity can also act as a long-running service, a daemon process on a separate machine. In that -case, I'll call the service object a *server*. - -But let's start with a local object. Open your `favourite shell `_ and:: - ->>> from gensim import utils ->>> from simserver import SessionServer ->>> service = SessionServer('/tmp/my_server/') # or wherever - -That initialized a new service, located in `/tmp/my_server` (you need write access rights to that directory). - -.. note:: - The service is fully defined by the content of its location directory ("`/tmp/my_server/`"). - If you use an existing location, the service object will resume - from the index found there. Also, to "clone" a service, just copy that - directory somewhere else. The copy will be a fully working duplicate of the - original service. - - -Model training ---------------- - -We can start indexing right away: - ->>> service.index(corpus) -AttributeError: must initialize model for /tmp/my_server/b before indexing documents - -Oops, we can not. The service indexes documents in a semantic representation, which -is different to the plain text we give it. We must teach the service how to convert -between plain text and semantics first:: - ->>> service.train(corpus, method='lsi') - -That was easy. The `method='lsi'` parameter meant that we trained a model for -`Latent Semantic Indexing `_ -and default dimensionality (400) over a `tf-idf `_ -representation of our little `corpus`, all automatically. More on that later. - -Note that for the semantic model to make sense, it should be trained -on a corpus that is: - -* Reasonably similar to the documents you want to index later. Training on a corpus - of recipes in French when all indexed documents will be about programming in English - will not help. -* Reasonably large (at least thousands of documents), so that the statistical analysis has - a chance to kick in. Don't use my example corpus here of 9 documents in production O_o - -Indexing documents ------------------- - ->>> service.index(corpus) # index the same documents that we trained on... - -Indexing can happen over any documents, but I'm too lazy to create another example corpus, so we index the same 9 docs used for training. - -Delete documents with:: - - >>> service.delete(['doc_5', 'doc_8']) # supply a list of document ids to be removed from the index - -When you pass documents that have the same id as some already indexed document, -the indexed document is overwritten by the new input (=only the latest counts; -document ids are always unique per service):: - - >>> service.index(corpus[:3]) # overall index size unchanged (just 3 docs overwritten) - -The index/delete/overwrite calls can be arbitrarily interspersed with queries. -You don't have to index **all** documents first to start querying, indexing can be incremental. - -Querying ---------- - -There are two types of queries: - -1. by id: - - .. code-block:: python - - >>> print service.find_similar('doc_0') - [('doc_0', 1.0, None), ('doc_2', 0.30426699, None), ('doc_1', 0.25648531, None), ('doc_3', 0.25480536, None)] - - >>> print service.find_similar('doc_5') # we deleted doc_5 and doc_8, remember? - ValueError: document 'doc_5' not in index - - In the resulting 3-tuples, `doc_n` is the document id we supplied during indexing, - `0.30426699` is the similarity of `doc_n` to the query, but what's up with that `None`, you ask? - Well, you can associate each document with a "payload", during indexing. - This payload object (anything pickle-able) is later returned during querying. - If you don't specify `doc['payload']` during indexing, queries simply return `None` in the result tuple, as in our example here. - -2. or by document (using `document['tokens']`; id is ignored in this case): - - .. code-block:: python - - >>> doc = {'tokens': utils.simple_preprocess('Graph and minors and humans and trees.')} - >>> print service.find_similar(doc, min_score=0.4, max_results=50) - [('doc_7', 0.93350589, None), ('doc_3', 0.42718196, None)] - -Remote access -------------- - -So far, we did everything in our Python shell, locally. I very much like `Pyro `_, -a pure Python package for Remote Procedure Calls (RPC), so I'll illustrate remote -service access via Pyro. Pyro takes care of all the socket listening/request routing/data marshalling/thread -spawning, so it saves us a lot of trouble. - -To create a similarity server, we just create a :class:`simserver.SessionServer` object and register it -with a Pyro daemon for remote access. There is a small `example script `_ -included with simserver, run it with:: - - $ python -m simserver.run_simserver /tmp/testserver - -You can just `ctrl+c` to terminate the server, but leave it running for now. - -Now open your Python shell again, in another terminal window or possibly on another machine, and:: - ->>> import Pyro4 ->>> service = Pyro4.Proxy(Pyro4.locateNS().lookup('gensim.testserver')) - -Now `service` is only a proxy object: every call is physically executed wherever -you ran the `run_server.py` script, which can be a totally different computer -(within a network broadcast domain), but you don't even know:: - ->>> print service.status() ->>> service.train(corpus) ->>> service.index(other_corpus) ->>> service.find_similar(query) ->>> ... - -It is worth mentioning that Irmen, the author of Pyro, also released -`Pyrolite `_ recently. That is a package -which allows you to create Pyro proxies also from Java and .NET, in addition to Python. -That way you can call remote methods from there too---the client doesn't have to be in Python. - -Concurrency ------------ - -Ok, now it's getting interesting. Since we can access the service remotely, what -happens if multiple clients create proxies to it at the same time? What if they -want to modify the server index at the same time? - -Answer: the `SessionServer` object is thread-safe, so that when each client spawns a request -thread via Pyro, they don't step on each other's toes. - -This means that: - -1. There can be multiple simultaneous `service.find_similar` queries (or, in - general, multiple simultaneus calls that are "read-only"). -2. When two clients issue modification calls (`index`/`train`/`delete`/`drop_index`/...) - at the same time, an internal lock serializes them -- the later call has to wait. -3. While one client is modifying the index, all other clients' queries still see - the original index. Only once the modifications are committed do they become - "visible". - -What do you mean, visible? --------------------------- - -The service uses transactions internally. This means that each modification is -done over a clone of the service. If the modification session fails for whatever -reason (exception in code; power failure that turns off the server; client unhappy -with how the session went), it can be rolled back. It also means other clients can -continue querying the original index during index updates. - -The mechanism is hidden from users by default through auto-committing (it was already happening -in the examples above too), but auto-committing can be turned off explicitly:: - - >>> service.set_autosession(False) - >>> service.train(corpus) - RuntimeError: must open a session before modifying SessionServer - >>> service.open_session() - >>> service.train(corpus) - >>> service.index(corpus) - >>> service.delete(doc_ids) - >>> ... - -None of these changes are visible to other clients, yet. Also, other clients' -calls to index/train/etc will block until this session is committed/rolled back---there -cannot be two open sessions at the same time. - -To end a session:: - - >>> service.rollback() # discard all changes since open_session() - -or:: - - >>> service.commit() # make changes public; now other clients can see changes/acquire the modification lock - - -Other stuff ------------- - -TODO Custom document parsing (in lieu of `utils.simple_preprocess`). Different models (not just `lsi`). Optimizing the index with `service.optimize()`. -TODO add some hard numbers; example tutorial for some bigger collection, e.g. for `arxiv.org `_ or wikipedia. - diff --git a/docs/_sources/tut1.txt b/docs/_sources/tut1.txt deleted file mode 100644 index bd7fc87b0e..0000000000 --- a/docs/_sources/tut1.txt +++ /dev/null @@ -1,269 +0,0 @@ -.. _tut1: - -Corpora and Vector Spaces -=================================== - -Don't forget to set - ->>> import logging ->>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - -if you want to see logging events. - - -.. _second example: - -From Strings to Vectors ------------------------- - -This time, let's start from documents represented as strings: - ->>> from gensim import corpora, models, similarities ->>> ->>> documents = ["Human machine interface for lab abc computer applications", ->>> "A survey of user opinion of computer system response time", ->>> "The EPS user interface management system", ->>> "System and human system engineering testing of EPS", ->>> "Relation of user perceived response time to error measurement", ->>> "The generation of random binary unordered trees", ->>> "The intersection graph of paths in trees", ->>> "Graph minors IV Widths of trees and well quasi ordering", ->>> "Graph minors A survey"] - - -This is a tiny corpus of nine documents, each consisting of only a single sentence. - -First, let's tokenize the documents, remove common words (using a toy stoplist) -as well as words that only appear once in the corpus: - ->>> # remove common words and tokenize ->>> stoplist = set('for a of the and to in'.split()) ->>> texts = [[word for word in document.lower().split() if word not in stoplist] ->>> for document in documents] ->>> ->>> # remove words that appear only once ->>> all_tokens = sum(texts, []) ->>> tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1) ->>> texts = [[word for word in text if word not in tokens_once] ->>> for text in texts] ->>> ->>> print texts -[['human', 'interface', 'computer'], - ['survey', 'user', 'computer', 'system', 'response', 'time'], - ['eps', 'user', 'interface', 'system'], - ['system', 'human', 'system', 'eps'], - ['user', 'response', 'time'], - ['trees'], - ['graph', 'trees'], - ['graph', 'minors', 'trees'], - ['graph', 'minors', 'survey']] - -Your way of processing the documents will likely vary; here, I only split on whitespace -to tokenize, followed by lowercasing each word. In fact, I use this particular -(simplistic and inefficient) setup to mimick the experiment done in Deerwester et al.'s -original LSA article [1]_. - -The ways to process documents are so varied and application- and language-dependent that I -decided to *not* constrain them by any interface. Instead, a document is represented -by the features extracted from it, not by its "surface" string form: how you get to -the features is up to you. Below I describe one common, general-purpose approach (called -:dfn:`bag-of-words`), but keep in mind that different application domains call for -different features, and, as always, it's `garbage in, garbage out `_... - -To convert documents to vectors, we'll use a document representation called -`bag-of-words `_. In this representation, -each document is represented by one vector where each vector element represents -a question-answer pair, in the style of: - - "How many times does the word `system` appear in the document? Once." - -It is advantageous to represent the questions only by their (integer) ids. The mapping -between the questions and ids is called a dictionary: - ->>> dictionary = corpora.Dictionary(texts) ->>> dictionary.save('/tmp/deerwester.dict') # store the dictionary, for future reference ->>> print dictionary -Dictionary(12 unique tokens) - -Here we assigned a unique integer id to all words appearing in the corpus with the -:class:`gensim.corpora.dictionary.Dictionary` class. This sweeps across the texts, collecting word counts -and relevant statistics. In the end, we see there are twelve distinct words in the -processed corpus, which means each document will be represented by twelve numbers (ie., by a 12-D vector). -To see the mapping between words and their ids: - ->>> print dictionary.token2id -{'minors': 11, 'graph': 10, 'system': 5, 'trees': 9, 'eps': 8, 'computer': 0, -'survey': 4, 'user': 7, 'human': 1, 'time': 6, 'interface': 2, 'response': 3} - -To actually convert tokenized documents to vectors: - ->>> new_doc = "Human computer interaction" ->>> new_vec = dictionary.doc2bow(new_doc.lower().split()) ->>> print new_vec # the word "interaction" does not appear in the dictionary and is ignored -[(0, 1), (1, 1)] - -The function :func:`doc2bow` simply counts the number of occurences of -each distinct word, converts the word to its integer word id -and returns the result as a sparse vector. The sparse vector ``[(0, 1), (1, 1)]`` -therefore reads: in the document `"Human computer interaction"`, the words `computer` -(id 0) and `human` (id 1) appear once; the other ten dictionary words appear (implicitly) zero times. - - >>> corpus = [dictionary.doc2bow(text) for text in texts] - >>> corpora.MmCorpus.serialize('/tmp/deerwester.mm', corpus) # store to disk, for later use - >>> print corpus - [(0, 1), (1, 1), (2, 1)] - [(0, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)] - [(2, 1), (5, 1), (7, 1), (8, 1)] - [(1, 1), (5, 2), (8, 1)] - [(3, 1), (6, 1), (7, 1)] - [(9, 1)] - [(9, 1), (10, 1)] - [(9, 1), (10, 1), (11, 1)] - [(4, 1), (10, 1), (11, 1)] - -By now it should be clear that the vector feature with ``id=10`` stands for the question "How many -times does the word `graph` appear in the document?" and that the answer is "zero" for -the first six documents and "one" for the remaining three. As a matter of fact, -we have arrived at exactly the same corpus of vectors as in the :ref:`first-example`. - -Corpus Streaming -- One Document at a Time -------------------------------------------- - -Note that `corpus` above resides fully in memory, as a plain Python list. -In this simple example, it doesn't matter much, but just to make things clear, -let's assume there are millions of documents in the corpus. Storing all of them in RAM won't do. -Instead, let's assume the documents are stored in a file on disk, one document per line. Gensim -only requires that a corpus must be able to return one document vector at a time:: - ->>> class MyCorpus(object): ->>> def __iter__(self): ->>> for line in open('mycorpus.txt'): ->>> # assume there's one document per line, tokens separated by whitespace ->>> yield dictionary.doc2bow(line.lower().split()) - -Download the sample `mycorpus.txt file here <./mycorpus.txt>`_. The assumption that -each document occupies one line in a single file is not important; you can mold -the `__iter__` function to fit your input format, whatever it is. -Walking directories, parsing XML, accessing network... -Just parse your input to retrieve a clean list of tokens in each document, -then convert the tokens via a dictionary to their ids and yield the resulting sparse vector inside `__iter__`. - ->>> corpus_memory_friendly = MyCorpus() # doesn't load the corpus into memory! ->>> print corpus_memory_friendly -<__main__.MyCorpus object at 0x10d5690> - -Corpus is now an object. We didn't define any way to print it, so `print` just outputs address -of the object in memory. Not very useful. To see the constituent vectors, let's -iterate over the corpus and print each document vector (one at a time):: - - >>> for vector in corpus_memory_friendly: # load one vector into memory at a time - >>> print vector - [(0, 1), (1, 1), (2, 1)] - [(0, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)] - [(2, 1), (5, 1), (7, 1), (8, 1)] - [(1, 1), (5, 2), (8, 1)] - [(3, 1), (6, 1), (7, 1)] - [(9, 1)] - [(9, 1), (10, 1)] - [(9, 1), (10, 1), (11, 1)] - [(4, 1), (10, 1), (11, 1)] - -Although the output is the same as for the plain Python list, the corpus is now much -more memory friendly, because at most one vector resides in RAM at a time. Your -corpus can now be as large as you want. - -Similarly, to construct the dictionary without loading all texts into memory:: - - >>> # collect statistics about all tokens - >>> dictionary = corpora.Dictionary(line.lower().split() for line in open('mycorpus.txt')) - >>> # remove stop words and words that appear only once - >>> stop_ids = [dictionary.token2id[stopword] for stopword in stoplist - >>> if stopword in dictionary.token2id] - >>> once_ids = [tokenid for tokenid, docfreq in dictionary.dfs.iteritems() if docfreq == 1] - >>> dictionary.filter_tokens(stop_ids + once_ids) # remove stop words and words that appear only once - >>> dictionary.compactify() # remove gaps in id sequence after words that were removed - >>> print dictionary - Dictionary(12 unique tokens) - -And that is all there is to it! At least as far as bag-of-words representation is concerned. -Of course, what we do with such corpus is another question; it is not at all clear -how counting the frequency of distinct words could be useful. As it turns out, it isn't, and -we will need to apply a transformation on this simple representation first, before -we can use it to compute any meaningful document vs. document similarities. -Transformations are covered in the :doc:`next tutorial `, but before that, let's -briefly turn our attention to *corpus persistency*. - - -.. _corpus-formats: - -Corpus Formats ---------------- - -There exist several file formats for serializing a Vector Space corpus (~sequence of vectors) to disk. -`Gensim` implements them via the *streaming corpus interface* mentioned earlier: -documents are read from (resp. stored to) disk in a lazy fashion, one document at -a time, without the whole corpus being read into main memory at once. - -One of the more notable file formats is the `Market Matrix format `_. -To save a corpus in the Matrix Market format: - ->>> from gensim import corpora ->>> # create a toy corpus of 2 documents, as a plain Python list ->>> corpus = [[(1, 0.5)], []] # make one document empty, for the heck of it ->>> ->>> corpora.MmCorpus.serialize('/tmp/corpus.mm', corpus) - -Other formats include `Joachim's SVMlight format `_, -`Blei's LDA-C format `_ and -`GibbsLDA++ format `_. - ->>> corpora.SvmLightCorpus.serialize('/tmp/corpus.svmlight', corpus) ->>> corpora.BleiCorpus.serialize('/tmp/corpus.lda-c', corpus) ->>> corpora.LowCorpus.serialize('/tmp/corpus.low', corpus) - - -Conversely, to load a corpus iterator from a Matrix Market file: - ->>> corpus = corpora.MmCorpus('/tmp/corpus.mm') - -Corpus objects are streams, so typically you won't be able to print them directly: - ->>> print corpus -MmCorpus(2 documents, 2 features, 1 non-zero entries) - -Instead, to view the contents of a corpus: - ->>> # one way of printing a corpus: load it entirely into memory ->>> print list(corpus) # calling list() will convert any sequence to a plain Python list -[[(1, 0.5)], []] - -or - ->>> # another way of doing it: print one document at a time, making use of the streaming interface ->>> for doc in corpus: ->>> print doc -[(1, 0.5)] -[] - -The second way is obviously more memory-friendly, but for testing and development -purposes, nothing beats the simplicity of calling ``list(corpus)``. - -To save the same Matrix Market document stream in Blei's LDA-C format, - ->>> corpora.BleiCorpus.serialize('/tmp/corpus.lda-c', corpus) - -In this way, `gensim` can also be used as a memory-efficient **I/O format conversion tool**: -just load a document stream using one format and immediately save it in another format. -Adding new formats is dead easy, check out the `code for the SVMlight corpus -`_ for an example. - -------------- - -For a complete reference (Want to prune the dictionary to a smaller size? -Convert between corpora and NumPy/SciPy arrays?), see the :doc:`API documentation `. -Or continue to the next tutorial on :doc:`tut2`. - - -.. [1] This is the same corpus as used in - `Deerwester et al. (1990): Indexing by Latent Semantic Analysis `_, Table 2. - diff --git a/docs/_sources/tut2.txt b/docs/_sources/tut2.txt deleted file mode 100644 index c130c23a1a..0000000000 --- a/docs/_sources/tut2.txt +++ /dev/null @@ -1,249 +0,0 @@ -.. _tut2: - -Topics and Transformations -=========================== - - -Don't forget to set - ->>> import logging ->>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - -if you want to see logging events. - -Transformation interface --------------------------- - -In the previous tutorial on :doc:`tut1`, we created a corpus of documents represented -as a stream of vectors. To continue, let's fire up gensim and use that corpus: - ->>> from gensim import corpora, models, similarities ->>> dictionary = corpora.Dictionary.load('/tmp/deerwester.dict') ->>> corpus = corpora.MmCorpus('/tmp/deerwester.mm') ->>> print corpus -MmCorpus(9 documents, 12 features, 28 non-zero entries) - -In this tutorial, I will show how to transform documents from one vector representation -into another. This process serves two goals: - -1. To bring out hidden structure in the corpus, discover relationships between - words and use them to describe the documents in a new and - (hopefully) more semantic way. -2. To make the document representation more compact. This both improves efficiency - (new representation consumes less resources) and efficacy (marginal data - trends are ignored, noise-reduction). - -Creating a transformation -++++++++++++++++++++++++++ - -The transformations are standard Python objects, typically initialized by means of -a :dfn:`training corpus`: - ->>> tfidf = models.TfidfModel(corpus) # step 1 -- initialize a model - -We used our old corpus from tutorial 1 to initialize (train) the transformation model. Different -transformations may require different initialization parameters; in case of TfIdf, the -"training" consists simply of going through the supplied corpus once and computing document frequencies -of all its features. Training other models, such as Latent Semantic Analysis or Latent Dirichlet -Allocation, is much more involved and, consequently, takes much more time. - -.. note:: - - Transformations always convert between two specific vector - spaces. The same vector space (= the same set of feature ids) must be used for training - as well as for subsequent vector transformations. Failure to use the same input - feature space, such as applying a different string preprocessing, using different - feature ids, or using bag-of-words input vectors where TfIdf vectors are expected, will - result in feature mismatch during transformation calls and consequently in either - garbage output and/or runtime exceptions. - - -Transforming vectors -+++++++++++++++++++++ - -From now on, ``tfidf`` is treated as a read-only object that can be used to convert -any vector from the old representation (bag-of-words integer counts) to the new representation -(TfIdf real-valued weights): - ->>> doc_bow = [(0, 1), (1, 1)] ->>> print tfidf[doc_bow] # step 2 -- use the model to transform vectors -[(0, 0.70710678), (1, 0.70710678)] - -Or to apply a transformation to a whole corpus: - ->>> corpus_tfidf = tfidf[corpus] ->>> for doc in corpus_tfidf: ->>> print doc -[(0, 0.57735026918962573), (1, 0.57735026918962573), (2, 0.57735026918962573)] -[(0, 0.44424552527467476), (3, 0.44424552527467476), (4, 0.44424552527467476), (5, 0.32448702061385548), (6, 0.44424552527467476), (7, 0.32448702061385548)] -[(2, 0.5710059809418182), (5, 0.41707573620227772), (7, 0.41707573620227772), (8, 0.5710059809418182)] -[(1, 0.49182558987264147), (5, 0.71848116070837686), (8, 0.49182558987264147)] -[(3, 0.62825804686700459), (6, 0.62825804686700459), (7, 0.45889394536615247)] -[(9, 1.0)] -[(9, 0.70710678118654746), (10, 0.70710678118654746)] -[(9, 0.50804290089167492), (10, 0.50804290089167492), (11, 0.69554641952003704)] -[(4, 0.62825804686700459), (10, 0.45889394536615247), (11, 0.62825804686700459)] - -In this particular case, we are transforming the same corpus that we used -for training, but this is only incidental. Once the transformation model has been initialized, -it can be used on any vectors (provided they come from the same vector space, of course), -even if they were not used in the training corpus at all. This is achieved by a process called -folding-in for LSA, by topic inference for LDA etc. - -.. note:: - Calling ``model[corpus]`` only creates a wrapper around the old ``corpus`` - document stream -- actual conversions are done on-the-fly, during document iteration. - We cannot convert the entire corpus at the time of calling ``corpus_transformed = model[corpus]``, - because that would mean storing the result in main memory, and that contradicts gensim's objective of memory-indepedence. - If you will be iterating over the transformed ``corpus_transformed`` multiple times, and the - transformation is costly, :ref:`serialize the resulting corpus to disk first ` and continue - using that. - -Transformations can also be serialized, one on top of another, in a sort of chain: - ->>> lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2) # initialize an LSI transformation ->>> corpus_lsi = lsi[corpus_tfidf] # create a double wrapper over the original corpus: bow->tfidf->fold-in-lsi - -Here we transformed our Tf-Idf corpus via `Latent Semantic Indexing `_ -into a latent 2-D space (2-D because we set ``num_topics=2``). Now you're probably wondering: what do these two latent -dimensions stand for? Let's inspect with :func:`models.LsiModel.print_topics`: - - >>> lsi.print_topics(2) - topic #0(1.594): -0.703*"trees" + -0.538*"graph" + -0.402*"minors" + -0.187*"survey" + -0.061*"system" + -0.060*"response" + -0.060*"time" + -0.058*"user" + -0.049*"computer" + -0.035*"interface" - topic #1(1.476): -0.460*"system" + -0.373*"user" + -0.332*"eps" + -0.328*"interface" + -0.320*"response" + -0.320*"time" + -0.293*"computer" + -0.280*"human" + -0.171*"survey" + 0.161*"trees" - -(the topics are printed to log -- see the note at the top of this page about activating -logging) - -It appears that according to LSI, "trees", "graph" and "minors" are all related -words (and contribute the most to the direction of the first topic), while the -second topic practically concerns itself with all the other words. As expected, -the first five documents are more strongly related to the second topic while the -remaining four documents to the first topic: - ->>> for doc in corpus_lsi: # both bow->tfidf and tfidf->lsi transformations are actually executed here, on the fly ->>> print doc -[(0, -0.066), (1, 0.520)] # "Human machine interface for lab abc computer applications" -[(0, -0.197), (1, 0.761)] # "A survey of user opinion of computer system response time" -[(0, -0.090), (1, 0.724)] # "The EPS user interface management system" -[(0, -0.076), (1, 0.632)] # "System and human system engineering testing of EPS" -[(0, -0.102), (1, 0.574)] # "Relation of user perceived response time to error measurement" -[(0, -0.703), (1, -0.161)] # "The generation of random binary unordered trees" -[(0, -0.877), (1, -0.168)] # "The intersection graph of paths in trees" -[(0, -0.910), (1, -0.141)] # "Graph minors IV Widths of trees and well quasi ordering" -[(0, -0.617), (1, 0.054)] # "Graph minors A survey" - - -Model persistency is achieved with the :func:`save` and :func:`load` functions: - ->>> lsi.save('/tmp/model.lsi') # same for tfidf, lda, ... ->>> lsi = models.LsiModel.load('/tmp/model.lsi') - - -The next question might be: just how exactly similar are those documents to each other? -Is there a way to formalize the similarity, so that for a given input document, we can -order some other set of documents according to their similarity? Similarity queries -are covered in the :doc:`next tutorial `. - -.. _transformations: - -Available transformations --------------------------- - -Gensim implements several popular Vector Space Model algorithms: - -* `Term Frequency * Inverse Document Frequency, Tf-Idf `_ - expects a bag-of-words (integer values) training corpus during initialization. - During transformation, it will take a vector and return another vector of the - same dimensionality, except that features which were rare in the training corpus - will have their value increased. - It therefore converts integer-valued vectors into real-valued ones, while leaving - the number of dimensions intact. It can also optionally normalize the resulting - vectors to (Euclidean) unit length. - - >>> model = tfidfmodel.TfidfModel(bow_corpus, normalize=True) - -* `Latent Semantic Indexing, LSI (or sometimes LSA) `_ - transforms documents from either bag-of-words or (preferrably) TfIdf-weighted space into - a latent space of a lower dimensionality. For the toy corpus above we used only - 2 latent dimensions, but on real corpora, target dimensionality of 200--500 is recommended - as a "golden standard" [1]_. - - >>> model = lsimodel.LsiModel(tfidf_corpus, id2word=dictionary, num_topics=300) - - LSI training is unique in that we can continue "training" at any point, simply - by providing more training documents. This is done by incremental updates to - the underlying model, in a process called `online training`. Because of this feature, the - input document stream may even be infinite -- just keep feeding LSI new documents - as they arrive, while using the computed transformation model as read-only in the meanwhile! - - >>> model.add_documents(another_tfidf_corpus) # now LSI has been trained on tfidf_corpus + another_tfidf_corpus - >>> lsi_vec = model[tfidf_vec] # convert some new document into the LSI space, without affecting the model - >>> ... - >>> model.add_documents(more_documents) # tfidf_corpus + another_tfidf_corpus + more_documents - >>> lsi_vec = model[tfidf_vec] - >>> ... - - See the :mod:`gensim.models.lsimodel` documentation for details on how to make - LSI gradually "forget" old observations in infinite streams. If you want to get dirty, - there are also parameters you can tweak that affect speed vs. memory footprint vs. numerical - precision of the LSI algorithm. - - `gensim` uses a novel online incremental streamed distributed training algorithm (quite a mouthful!), - which I published in [5]_. `gensim` also executes a stochastic multi-pass algorithm - from Halko et al. [4]_ internally, to accelerate in-core part - of the computations. - See also :doc:`wiki` for further speed-ups by distributing the computation across - a cluster of computers. - -* `Random Projections, RP `_ aim to - reduce vector space dimensionality. This is a very efficient (both memory- and - CPU-friendly) approach to approximating TfIdf distances between documents, by throwing in a little randomness. - Recommended target dimensionality is again in the hundreds/thousands, depending on your dataset. - - >>> model = rpmodel.RpModel(tfidf_corpus, num_topics=500) - -* `Latent Dirichlet Allocation, LDA `_ - is yet another transformation from bag-of-words counts into a topic space of lower - dimensionality. LDA is a probabilistic extension of LSA (also called multinomial PCA), - so LDA's topics can be interpreted as probability distributions over words. These distributions are, - just like with LSA, inferred automatically from a training corpus. Documents - are in turn interpreted as a (soft) mixture of these topics (again, just like with LSA). - - >>> model = ldamodel.LdaModel(bow_corpus, id2word=dictionary, num_topics=100) - - `gensim` uses a fast implementation of online LDA parameter estimation based on [2]_, - modified to run in :doc:`distributed mode ` on a cluster of computers. - -* `Hierarchical Dirichlet Process, HDP `_ - is a non-parametric bayesian method (note the missing number of requested topics): - - >>> model = hdpmodel.HdpModel(bow_corpus, id2word=dictionary) - - `gensim` uses a fast, online implementation based on [3]_. - The HDP model is a new addition to `gensim`, and still rough around its academic edges -- use with care. - -Adding new :abbr:`VSM (Vector Space Model)` transformations (such as different weighting schemes) is rather trivial; -see the :doc:`API reference ` or directly the `Python code `_ -for more info and examples. - -It is worth repeating that these are all unique, **incremental** implementations, -which do not require the whole training corpus to be present in main memory all at once. -With memory taken care of, I am now improving :doc:`distributed`, -to improve CPU efficiency, too. -If you feel you could contribute (by testing, providing use-cases or code), -please `let me know `_. - -Continue on to the next tutorial on :doc:`tut3`. - ------- - -.. [1] Bradford. 2008. An empirical study of required dimensionality for large-scale latent semantic indexing applications. - -.. [2] Hoffman, Blei, Bach. 2010. Online learning for Latent Dirichlet Allocation. - -.. [3] Wang, Paisley, Blei. 2011. Online variational inference for the hierarchical Dirichlet process. - -.. [4] Halko, Martinsson, Tropp. 2009. Finding structure with randomness. - -.. [5] Řehůřek. 2011. Subspace tracking for Latent Semantic Analysis. diff --git a/docs/_sources/tut3.txt b/docs/_sources/tut3.txt deleted file mode 100644 index 2a0eca97ed..0000000000 --- a/docs/_sources/tut3.txt +++ /dev/null @@ -1,147 +0,0 @@ -.. _tut3: - -Similarity Queries -=========================== - - -Don't forget to set - ->>> import logging ->>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - -if you want to see logging events. - -Similarity interface --------------------------- - -In the previous tutorials on :doc:`tut1` and :doc:`tut2`, we covered what it means -to create a corpus in the Vector Space Model and how to transform it between different -vector spaces. A common reason for such a charade is that we want to determine -**similarity between pairs of documents**, or the **similarity between a specific document -and a set of other documents** (such as a user query vs. indexed documents). - -To show how this can be done in gensim, let us consider the same corpus as in the -previous examples (which really originally comes from Deerwester et al.'s -`"Indexing by Latent Semantic Analysis" `_ -seminal 1990 article): - ->>> from gensim import corpora, models, similarities ->>> dictionary = corpora.Dictionary.load('/tmp/deerwester.dict') ->>> corpus = corpora.MmCorpus('/tmp/deerwester.mm') # comes from the first tutorial, "From strings to vectors" ->>> print corpus -MmCorpus(9 documents, 12 features, 28 non-zero entries) - -To follow Deerwester's example, we first use this tiny corpus to define a 2-dimensional -LSI space: - ->>> lsi = models.LsiModel(corpus, id2word=dictionary, num_topics=2) - -Now suppose a user typed in the query `"Human computer interaction"`. We would -like to sort our nine corpus documents in decreasing order of relevance to this query. -Unlike modern search engines, here we only concentrate on a single aspect of possible -similarities---on apparent semantic relatedness of their texts (words). No hyperlinks, -no random-walk static ranks, just a semantic extension over the boolean keyword match: - ->>> doc = "Human computer interaction" ->>> vec_bow = dictionary.doc2bow(doc.lower().split()) ->>> vec_lsi = lsi[vec_bow] # convert the query to LSI space ->>> print vec_lsi -[(0, -0.461821), (1, 0.070028)] - -In addition, we will be considering `cosine similarity `_ -to determine the similarity of two vectors. Cosine similarity is a standard measure -in Vector Space Modeling, but wherever the vectors represent probability distributions, -`different similarity measures `_ -may be more appropriate. - -Initializing query structures -++++++++++++++++++++++++++++++++ - -To prepare for similarity queries, we need to enter all documents which we want -to compare against subsequent queries. In our case, they are the same nine documents -used for training LSI, converted to 2-D LSA space. But that's only incidental, we -might also be indexing a different corpus altogether. - ->>> index = similarities.MatrixSimilarity(lsi[corpus]) # transform corpus to LSI space and index it - -.. warning:: - The class :class:`similarities.MatrixSimilarity` is only appropriate when the whole - set of vectors fits into memory. For example, a corpus of one million documents - would require 2GB of RAM in a 256-dimensional LSI space, when used with this class. - Without 2GB of free RAM, you would need to use the :class:`similarities.Similarity` class. - This class operates in fixed memory, by splitting the index across multiple files on disk. - It uses :class:`similarities.MatrixSimilarity` and :class:`similarities.SparseMatrixSimilarity` internally, - so it is still fast, although slightly more complex. - -Index persistency is handled via the standard :func:`save` and :func:`load` functions: - ->>> index.save('/tmp/deerwester.index') ->>> index = similarities.MatrixSimilarity.load('/tmp/deerwester.index') - -This is true for all similarity indexing classes (:class:`similarities.Similarity`, -:class:`similarities.MatrixSimilarity` and :class:`similarities.SparseMatrixSimilarity`). -Also in the following, `index` can be an object of any of these. When in doubt, -use :class:`similarities.Similarity`, as it is the most scalable version, and it also -supports adding more documents to the index later. - -Performing queries -+++++++++++++++++++++ - -To obtain similarities of our query document against the nine indexed documents: - ->>> sims = index[vec_lsi] # perform a similarity query against the corpus ->>> print list(enumerate(sims)) # print (document_number, document_similarity) 2-tuples -[(0, 0.99809301), (1, 0.93748635), (2, 0.99844527), (3, 0.9865886), (4, 0.90755945), -(5, -0.12416792), (6, -0.1063926), (7, -0.098794639), (8, 0.05004178)] - -Cosine measure returns similarities in the range `<-1, 1>` (the greater, the more similar), -so that the first document has a score of 0.99809301 etc. - -With some standard Python magic we sort these similarities into descending -order, and obtain the final answer to the query `"Human computer interaction"`: - ->>> sims = sorted(enumerate(sims), key=lambda item: -item[1]) ->>> print sims # print sorted (document number, similarity score) 2-tuples -[(2, 0.99844527), # The EPS user interface management system -(0, 0.99809301), # Human machine interface for lab abc computer applications -(3, 0.9865886), # System and human system engineering testing of EPS -(1, 0.93748635), # A survey of user opinion of computer system response time -(4, 0.90755945), # Relation of user perceived response time to error measurement -(8, 0.050041795), # Graph minors A survey -(7, -0.098794639), # Graph minors IV Widths of trees and well quasi ordering -(6, -0.1063926), # The intersection graph of paths in trees -(5, -0.12416792)] # The generation of random binary unordered trees - -(I added the original documents in their "string form" to the output comments, to -improve clarity.) - -The thing to note here is that documents no. 2 (``"The EPS user interface management system"``) -and 4 (``"Relation of user perceived response time to error measurement"``) would never be returned by -a standard boolean fulltext search, because they do not share any common words with ``"Human -computer interaction"``. However, after applying LSI, we can observe that both of -them received quite high similarity scores (no. 2 is actually the most similar!), -which corresponds better to our intuition of -them sharing a "computer-human" related topic with the query. In fact, this semantic -generalization is the reason why we apply transformations and do topic modelling -in the first place. - - -Where next? ------------- - -Congratulations, you have finished the tutorials -- now you know how gensim works :-) -To delve into more details, you can browse through the :doc:`API documentation `, -see the :doc:`Wikipedia experiments ` or perhaps check out :doc:`distributed computing ` in `gensim`. - -Please remember that gensim is an experimental package, aimed at the NLP research community. -This means that: - -* there certainly are parts that could be implemented more efficiently (in C, for example), and there may also be bugs in the code -* your **feedback is most welcome** and appreciated, be it in code and - `idea contributions `_, - `bug reports `_ or just - `user stories and general questions `_. - -Gensim has no ambition to become an all-encompassing production level tool, with robust failure handling -and error recoveries. Its main goal is to help NLP newcomers try out popular algorithms -and to facilitate prototyping of new algorithms for NLP researchers. diff --git a/docs/_sources/tutorial.txt b/docs/_sources/tutorial.txt deleted file mode 100644 index a25f54f80b..0000000000 --- a/docs/_sources/tutorial.txt +++ /dev/null @@ -1,115 +0,0 @@ -.. _tutorial: - -Tutorials -========= - - -The tutorials are organized as a series of examples that highlight various features -of `gensim`. It is assumed that the reader is familiar with the Python language -and has read the :doc:`intro`. - -The examples are divided into parts on: - -.. toctree:: - :maxdepth: 2 - - tut1 - tut2 - tut3 - wiki - distributed - -Preliminaries --------------- - -All the examples can be directly copied to your Python interpreter shell (assuming -you have :doc:`gensim installed `, of course). -`IPython `_'s ``cpaste`` command is especially handy for copypasting code fragments which include superfluous -characters, such as the leading ``>>>``. - -Gensim uses Python's standard :mod:`logging` module to log various stuff at various -priority levels; to activate logging (this is optional), run - ->>> import logging ->>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - - -.. _first-example: - -Quick Example -------------- - -First, let's import gensim and create a small corpus of nine documents [1]_: - ->>> from gensim import corpora, models, similarities ->>> ->>> corpus = [[(0, 1.0), (1, 1.0), (2, 1.0)], ->>> [(2, 1.0), (3, 1.0), (4, 1.0), (5, 1.0), (6, 1.0), (8, 1.0)], ->>> [(1, 1.0), (3, 1.0), (4, 1.0), (7, 1.0)], ->>> [(0, 1.0), (4, 2.0), (7, 1.0)], ->>> [(3, 1.0), (5, 1.0), (6, 1.0)], ->>> [(9, 1.0)], ->>> [(9, 1.0), (10, 1.0)], ->>> [(9, 1.0), (10, 1.0), (11, 1.0)], ->>> [(8, 1.0), (10, 1.0), (11, 1.0)]] - -:dfn:`Corpus` is simply an object which, when iterated over, returns its documents represented -as sparse vectors. - -If you're familiar with the `Vector Space Model `_, -you'll probably know that the way you parse your documents and convert them to vectors -has major impact on the quality of any subsequent applications. If you're not familiar -with :abbr:`VSM (Vector Space Model)`, we'll bridge the gap between **raw strings** -and **sparse vectors** in the next tutorial -on :doc:`tut1`. - -.. note:: - In this example, the whole corpus is stored in memory, as a Python list. However, - the corpus interface only dictates that a corpus must support iteration over its - constituent documents. For very large corpora, it is advantageous to keep the - corpus on disk, and access its documents sequentially, one at a time. All the - operations and transformations are implemented in such a way that makes - them independent of the size of the corpus, memory-wise. - - -Next, let's initialize a :dfn:`transformation`: - ->>> tfidf = models.TfidfModel(corpus) - -A transformation is used to convert documents from one vector representation into another: - ->>> vec = [(0, 1), (4, 1)] ->>> print tfidf[vec] -[(0, 0.8075244), (4, 0.5898342)] - -Here, we used `Tf-Idf `_, a simple -transformation which takes documents represented as bag-of-words counts and applies -a weighting which discounts common terms (or, equivalently, promotes rare terms). -It also scales the resulting vector to unit length (in the `Euclidean norm `_). - -Transformations are covered in detail in the tutorial on :doc:`tut2`. - -To transform the whole corpus via TfIdf and index it, in preparation for similarity queries: - ->>> index = similarities.SparseMatrixSimilarity(tfidf[corpus]) - -and to query the similarity of our query vector ``vec`` against every document in the corpus: - ->>> sims = index[tfidf[vec]] ->>> print list(enumerate(sims)) -[(0, 0.4662244), (1, 0.19139354), (2, 0.24600551), (3, 0.82094586), (4, 0.0), (5, 0.0), (6, 0.0), (7, 0.0), (8, 0.0)] - -How to read this output? Document number zero (the first document) has a similarity score of 0.466=46.6\%, -the second document has a similarity score of 19.1\% etc. - -Thus, according to TfIdf document representation and cosine similarity measure, -the most similar to our query document `vec` is document no. 3, with a similarity score of 82.1%. -Note that in the TfIdf representation, any documents which do not share any common features -with ``vec`` at all (documents no. 4--8) get a similarity score of 0.0. See the :doc:`tut3` tutorial for more detail. - ------- - -.. [1] This is the same corpus as used in - `Deerwester et al. (1990): Indexing by Latent Semantic Analysis `_, Table 2. - - diff --git a/docs/_sources/utils.txt b/docs/_sources/utils.txt deleted file mode 100644 index ffc98e25f8..0000000000 --- a/docs/_sources/utils.txt +++ /dev/null @@ -1,7 +0,0 @@ -:mod:`utils` -- Various utility functions -========================================== - -.. automodule:: gensim.utils - :synopsis: Various utility functions - :members: - :inherited-members: diff --git a/docs/_sources/wiki.txt b/docs/_sources/wiki.txt deleted file mode 100644 index 3bb6aa097f..0000000000 --- a/docs/_sources/wiki.txt +++ /dev/null @@ -1,206 +0,0 @@ -.. _wiki: - -Experiments on the English Wikipedia -============================================ - -To test `gensim` performance, we run it against the English version of Wikipedia. - -This page describes the process of obtaining and processing Wikipedia, so that -anyone can reproduce the results. It is assumed you have `gensim` properly :doc:`installed `. - - - -Preparing the corpus ----------------------- - -1. First, download the dump of all Wikipedia articles from http://download.wikimedia.org/enwiki/ - (you want a file like `enwiki-latest-pages-articles.xml.bz2`). This file is about 8GB in size - and contains (a compressed version of) all articles from the English Wikipedia. - -2. Convert the articles to plain text (process Wiki markup) and store the result as - sparse TF-IDF vectors. In Python, this is easy to do on-the-fly and we don't - even need to uncompress the whole archive to disk. There is a script included in - `gensim` that does just that, run:: - - $ python -m gensim.scripts.make_wiki - -.. note:: - This pre-processing step makes two passes over the 8.2GB compressed wiki dump (one to extract - the dictionary, one to create and store the sparse vectors) and takes about - 9 hours on my laptop, so you may want to go have a coffee or two. - - Also, you will need about 35GB of free disk space to store the sparse output vectors. - I recommend compressing these files immediately, e.g. with bzip2 (down to ~13GB). Gensim - can work with compressed files directly, so this lets you save disk space. - -Latent Sematic Analysis --------------------------- - -First let's load the corpus iterator and dictionary, created in the second step above:: - - >>> import logging, gensim, bz2 - >>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - - >>> # load id->word mapping (the dictionary), one of the results of step 2 above - >>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt') - >>> # load corpus iterator - >>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm') - >>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output (recommended) - - >>> print mm - MmCorpus(3931787 documents, 100000 features, 756379027 non-zero entries) - -We see that our corpus contains 3.9M documents, 100K features (distinct -tokens) and 0.76G non-zero entries in the sparse TF-IDF matrix. The Wikipedia corpus -contains about 2.24 billion tokens in total. - -Now we're ready to compute LSA of the English Wikipedia:: - - >>> # extract 400 LSI topics; use the default one-pass algorithm - >>> lsi = gensim.models.lsimodel.LsiModel(corpus=mm, id2word=id2word, num_topics=400) - - >>> # print the most contributing words (both positively and negatively) for each of the first ten topics - >>> lsi.print_topics(10) - topic #0(332.762): 0.425*"utc" + 0.299*"talk" + 0.293*"page" + 0.226*"article" + 0.224*"delete" + 0.216*"discussion" + 0.205*"deletion" + 0.198*"should" + 0.146*"debate" + 0.132*"be" - topic #1(201.852): 0.282*"link" + 0.209*"he" + 0.145*"com" + 0.139*"his" + -0.137*"page" + -0.118*"delete" + 0.114*"blacklist" + -0.108*"deletion" + -0.105*"discussion" + 0.100*"diff" - topic #2(191.991): -0.565*"link" + -0.241*"com" + -0.238*"blacklist" + -0.202*"diff" + -0.193*"additions" + -0.182*"users" + -0.158*"coibot" + -0.136*"user" + 0.133*"he" + -0.130*"resolves" - topic #3(141.284): -0.476*"image" + -0.255*"copyright" + -0.245*"fair" + -0.225*"use" + -0.173*"album" + -0.163*"cover" + -0.155*"resolution" + -0.141*"licensing" + 0.137*"he" + -0.121*"copies" - topic #4(130.909): 0.264*"population" + 0.246*"age" + 0.243*"median" + 0.213*"income" + 0.195*"census" + -0.189*"he" + 0.184*"households" + 0.175*"were" + 0.167*"females" + 0.166*"males" - topic #5(120.397): 0.304*"diff" + 0.278*"utc" + 0.213*"you" + -0.171*"additions" + 0.165*"talk" + -0.159*"image" + 0.159*"undo" + 0.155*"www" + -0.152*"page" + 0.148*"contribs" - topic #6(115.414): -0.362*"diff" + -0.203*"www" + 0.197*"you" + -0.180*"undo" + -0.180*"kategori" + 0.164*"users" + 0.157*"additions" + -0.150*"contribs" + -0.139*"he" + -0.136*"image" - topic #7(111.440): 0.429*"kategori" + 0.276*"categoria" + 0.251*"category" + 0.207*"kategorija" + 0.198*"kategorie" + -0.188*"diff" + 0.163*"категория" + 0.153*"categoría" + 0.139*"kategoria" + 0.133*"categorie" - topic #8(109.907): 0.385*"album" + 0.224*"song" + 0.209*"chart" + 0.204*"band" + 0.169*"released" + 0.151*"music" + 0.142*"diff" + 0.141*"vocals" + 0.138*"she" + 0.132*"guitar" - topic #9(102.599): -0.237*"league" + -0.214*"he" + -0.180*"season" + -0.174*"football" + -0.166*"team" + 0.159*"station" + -0.137*"played" + -0.131*"cup" + 0.131*"she" + -0.128*"utc" - -Creating the LSI model of Wikipedia takes about 4 hours and 9 minutes on my laptop [1]_. -That's about **16,000 documents per minute, including all I/O**. - -.. note:: - If you need your results even faster, see the tutorial on :doc:`distributed`. Note - that the BLAS libraries inside `gensim` make use of multiple cores transparently, so the same data - will be processed faster on a multicore machine "for free", without any distributed setup. - -We see that the total processing time is dominated by the preprocessing step of -preparing the TF-IDF corpus from a raw Wikipedia XML dump, which took 9h. [2]_ - -The algorithm used in `gensim` only needs to see each input document once, so it -is suitable for environments where the documents come as a non-repeatable stream, -or where the cost of storing/iterating over the corpus multiple times is too high. - - -Latent Dirichlet Allocation ----------------------------- - -As with Latent Semantic Analysis above, first load the corpus iterator and dictionary:: - - >>> import logging, gensim, bz2 - >>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) - - >>> # load id->word mapping (the dictionary), one of the results of step 2 above - >>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt') - >>> # load corpus iterator - >>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm') - >>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output - - >>> print mm - MmCorpus(3931787 documents, 100000 features, 756379027 non-zero entries) - -We will run online LDA (see Hoffman et al. [3]_), which is an algorithm that takes a chunk of documents, -updates the LDA model, takes another chunk, updates the model etc. Online LDA can be contrasted -with batch LDA, which processes the whole corpus (one full pass), then updates -the model, then another pass, another update... The difference is that given a -reasonably stationary document stream (not much topic drift), the online updates -over the smaller chunks (subcorpora) are pretty good in themselves, so that the -model estimation converges faster. As a result, we will perhaps only need a single full -pass over the corpus: if the corpus has 3 million articles, and we update once after -every 10,000 articles, this means we will have done 300 updates in one pass, quite likely -enough to have a very accurate topics estimate:: - - >>> # extract 100 LDA topics, using 1 pass and updating once every 1 chunk (10,000 documents) - >>> lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1) - using serial LDA version on this node - running online LDA training, 100 topics, 1 passes over the supplied corpus of 3931787 documents, updating model once every 10000 documents - ... - -Unlike LSA, the topics coming from LDA are easier to interpret:: - - >>> # print the most contributing words for 20 randomly selected topics - >>> lda.print_topics(20) - topic #0: 0.009*river + 0.008*lake + 0.006*island + 0.005*mountain + 0.004*area + 0.004*park + 0.004*antarctic + 0.004*south + 0.004*mountains + 0.004*dam - topic #1: 0.026*relay + 0.026*athletics + 0.025*metres + 0.023*freestyle + 0.022*hurdles + 0.020*ret + 0.017*divisão + 0.017*athletes + 0.016*bundesliga + 0.014*medals - topic #2: 0.002*were + 0.002*he + 0.002*court + 0.002*his + 0.002*had + 0.002*law + 0.002*government + 0.002*police + 0.002*patrolling + 0.002*their - topic #3: 0.040*courcelles + 0.035*centimeters + 0.023*mattythewhite + 0.021*wine + 0.019*stamps + 0.018*oko + 0.017*perennial + 0.014*stubs + 0.012*ovate + 0.011*greyish - topic #4: 0.039*al + 0.029*sysop + 0.019*iran + 0.015*pakistan + 0.014*ali + 0.013*arab + 0.010*islamic + 0.010*arabic + 0.010*saudi + 0.010*muhammad - topic #5: 0.020*copyrighted + 0.020*northamerica + 0.014*uncopyrighted + 0.007*rihanna + 0.005*cloudz + 0.005*knowles + 0.004*gaga + 0.004*zombie + 0.004*wigan + 0.003*maccabi - topic #6: 0.061*israel + 0.056*israeli + 0.030*sockpuppet + 0.025*jerusalem + 0.025*tel + 0.023*aviv + 0.022*palestinian + 0.019*ifk + 0.016*palestine + 0.014*hebrew - topic #7: 0.015*melbourne + 0.014*rovers + 0.013*vfl + 0.012*australian + 0.012*wanderers + 0.011*afl + 0.008*dinamo + 0.008*queensland + 0.008*tracklist + 0.008*brisbane - topic #8: 0.011*film + 0.007*her + 0.007*she + 0.004*he + 0.004*series + 0.004*his + 0.004*episode + 0.003*films + 0.003*television + 0.003*best - topic #9: 0.019*wrestling + 0.013*château + 0.013*ligue + 0.012*discus + 0.012*estonian + 0.009*uci + 0.008*hockeyarchives + 0.008*wwe + 0.008*estonia + 0.007*reign - topic #10: 0.078*edits + 0.059*notability + 0.035*archived + 0.025*clearer + 0.022*speedy + 0.021*deleted + 0.016*hook + 0.015*checkuser + 0.014*ron + 0.011*nominator - topic #11: 0.013*admins + 0.009*acid + 0.009*molniya + 0.009*chemical + 0.007*ch + 0.007*chemistry + 0.007*compound + 0.007*anemone + 0.006*mg + 0.006*reaction - topic #12: 0.018*india + 0.013*indian + 0.010*tamil + 0.009*singh + 0.008*film + 0.008*temple + 0.006*kumar + 0.006*hindi + 0.006*delhi + 0.005*bengal - topic #13: 0.047*bwebs + 0.024*malta + 0.020*hobart + 0.019*basa + 0.019*columella + 0.019*huon + 0.018*tasmania + 0.016*popups + 0.014*tasmanian + 0.014*modèle - topic #14: 0.014*jewish + 0.011*rabbi + 0.008*bgwhite + 0.008*lebanese + 0.007*lebanon + 0.006*homs + 0.005*beirut + 0.004*jews + 0.004*hebrew + 0.004*caligari - topic #15: 0.025*german + 0.020*der + 0.017*von + 0.015*und + 0.014*berlin + 0.012*germany + 0.012*die + 0.010*des + 0.008*kategorie + 0.007*cross - topic #16: 0.003*can + 0.003*system + 0.003*power + 0.003*are + 0.003*energy + 0.002*data + 0.002*be + 0.002*used + 0.002*or + 0.002*using - topic #17: 0.049*indonesia + 0.042*indonesian + 0.031*malaysia + 0.024*singapore + 0.022*greek + 0.021*jakarta + 0.016*greece + 0.015*dord + 0.014*athens + 0.011*malaysian - topic #18: 0.031*stakes + 0.029*webs + 0.018*futsal + 0.014*whitish + 0.013*hyun + 0.012*thoroughbred + 0.012*dnf + 0.012*jockey + 0.011*medalists + 0.011*racehorse - topic #19: 0.119*oblast + 0.034*uploaded + 0.034*uploads + 0.033*nordland + 0.025*selsoviet + 0.023*raion + 0.022*krai + 0.018*okrug + 0.015*hålogaland + 0.015*russiae + 0.020*manga + 0.017*dragon + 0.012*theme + 0.011*dvd + 0.011*super + 0.011*hunter + 0.009*ash + 0.009*dream + 0.009*angel - -Creating this LDA model of Wikipedia takes about 6 hours and 20 minutes on my laptop [1]_. -If you need your results faster, consider running :doc:`dist_lda` on a cluster of -computers. - -Note two differences between the LDA and LSA runs: we asked LSA -to extract 400 topics, LDA only 100 topics (so the difference in speed is in fact -even greater). Secondly, the LSA implementation in `gensim` is truly online: if the nature of the input -stream changes in time, LSA will re-orient itself to reflect these changes, in a reasonably -small amount of updates. In contrast, LDA is not truly online (the name of the [3]_ -article notwithstanding), as the impact of later updates on the model gradually -diminishes. If there is topic drift in the input document stream, LDA will get -confused and be increasingly slower at adjusting itself to the new state of affairs. - -In short, be careful if using LDA to incrementally add new documents to the model -over time. **Batch usage of LDA**, where the entire training corpus is either known beforehand or does -not exhibit topic drift, **is ok and not affected**. - -To run batch LDA (not online), train `LdaModel` with:: - - >>> # extract 100 LDA topics, using 20 full passes, no online updates - >>> lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=0, passes=20) - -As usual, a trained model can used be to transform new, unseen documents (plain bag-of-words count vectors) -into LDA topic distributions: - - >>> doc_lda = lda[doc_bow] - --------------------- - -.. [1] My laptop = MacBook Pro, Intel Core i7 2.3GHz, 16GB DDR3 RAM, OS X with `libVec`. - -.. [2] - Here we're mostly interested in performance, but it is interesting to look at the - retrieved LSA concepts, too. I am no Wikipedia expert and don't see into Wiki's bowels, - but Brian Mingus had this to say about the result:: - - There appears to be a lot of noise in your dataset. The first three topics - in your list appear to be meta topics, concerning the administration and - cleanup of Wikipedia. These show up because you didn't exclude templates - such as these, some of which are included in most articles for quality - control: http://en.wikipedia.org/wiki/Wikipedia:Template_messages/Cleanup - - The fourth and fifth topics clearly shows the influence of bots that import - massive databases of cities, countries, etc. and their statistics such as - population, capita, etc. - - The sixth shows the influence of sports bots, and the seventh of music bots. - - So the top ten concepts are apparently dominated by Wikipedia robots and expanded - templates; this is a good reminder that LSA is a powerful tool for data analysis, - but no silver bullet. As always, it's `garbage in, garbage out - `_... - By the way, improvements to the Wiki markup parsing code are welcome :-) - -.. [3] Hoffman, Blei, Bach. 2010. Online learning for Latent Dirichlet Allocation - [`pdf `_] [`code `_] - diff --git a/docs/_static/basic.css b/docs/_static/basic.css deleted file mode 100644 index 43e8bafaf3..0000000000 --- a/docs/_static/basic.css +++ /dev/null @@ -1,540 +0,0 @@ -/* - * basic.css - * ~~~~~~~~~ - * - * Sphinx stylesheet -- basic theme. - * - * :copyright: Copyright 2007-2011 by the Sphinx team, see AUTHORS. - * :license: BSD, see LICENSE for details. - * - */ - -/* -- main layout ----------------------------------------------------------- */ - -div.clearer { - clear: both; -} - -/* -- relbar ---------------------------------------------------------------- */ - -div.related { - width: 100%; - font-size: 90%; -} - -div.related h3 { - display: none; -} - -div.related ul { - margin: 0; - padding: 0 0 0 10px; - list-style: none; -} - -div.related li { - display: inline; -} - -div.related li.right { - float: right; - margin-right: 5px; -} - -/* -- sidebar --------------------------------------------------------------- */ - -div.sphinxsidebarwrapper { - padding: 10px 5px 0 10px; -} - -div.sphinxsidebar { - float: left; - width: 230px; - margin-left: -100%; - font-size: 90%; -} - -div.sphinxsidebar ul { - list-style: none; -} - -div.sphinxsidebar ul ul, -div.sphinxsidebar ul.want-points { - margin-left: 20px; - list-style: square; -} - -div.sphinxsidebar ul ul { - margin-top: 0; - margin-bottom: 0; -} - -div.sphinxsidebar form { - margin-top: 10px; -} - -div.sphinxsidebar input { - border: 1px solid #98dbcc; - font-family: sans-serif; - font-size: 1em; -} - -div.sphinxsidebar #searchbox input[type="text"] { - width: 170px; -} - -div.sphinxsidebar #searchbox input[type="submit"] { - width: 30px; -} - -img { - border: 0; -} - -/* -- search page ----------------------------------------------------------- */ - -ul.search { - margin: 10px 0 0 20px; - padding: 0; -} - -ul.search li { - padding: 5px 0 5px 20px; - background-image: url(file.png); - background-repeat: no-repeat; - background-position: 0 7px; -} - -ul.search li a { - font-weight: bold; -} - -ul.search li div.context { - color: #888; - margin: 2px 0 0 30px; - text-align: left; -} - -ul.keywordmatches li.goodmatch a { - font-weight: bold; -} - -/* -- index page ------------------------------------------------------------ */ - -table.contentstable { - width: 90%; -} - -table.contentstable p.biglink { - line-height: 150%; -} - -a.biglink { - font-size: 1.3em; -} - -span.linkdescr { - font-style: italic; - padding-top: 5px; - font-size: 90%; -} - -/* -- general index --------------------------------------------------------- */ - -table.indextable { - width: 100%; -} - -table.indextable td { - text-align: left; - vertical-align: top; -} - -table.indextable dl, table.indextable dd { - margin-top: 0; - margin-bottom: 0; -} - -table.indextable tr.pcap { - height: 10px; -} - -table.indextable tr.cap { - margin-top: 10px; - background-color: #f2f2f2; -} - -img.toggler { - margin-right: 3px; - margin-top: 3px; - cursor: pointer; -} - -div.modindex-jumpbox { - border-top: 1px solid #ddd; - border-bottom: 1px solid #ddd; - margin: 1em 0 1em 0; - padding: 0.4em; -} - -div.genindex-jumpbox { - border-top: 1px solid #ddd; - border-bottom: 1px solid #ddd; - margin: 1em 0 1em 0; - padding: 0.4em; -} - -/* -- general body styles --------------------------------------------------- */ - -a.headerlink { - visibility: hidden; -} - -h1:hover > a.headerlink, -h2:hover > a.headerlink, -h3:hover > a.headerlink, -h4:hover > a.headerlink, -h5:hover > a.headerlink, -h6:hover > a.headerlink, -dt:hover > a.headerlink { - visibility: visible; -} - -div.body p.caption { - text-align: inherit; -} - -div.body td { - text-align: left; -} - -.field-list ul { - padding-left: 1em; -} - -.first { - margin-top: 0 !important; -} - -p.rubric { - margin-top: 30px; - font-weight: bold; -} - -img.align-left, .figure.align-left, object.align-left { - clear: left; - float: left; - margin-right: 1em; -} - -img.align-right, .figure.align-right, object.align-right { - clear: right; - float: right; - margin-left: 1em; -} - -img.align-center, .figure.align-center, object.align-center { - display: block; - margin-left: auto; - margin-right: auto; -} - -.align-left { - text-align: left; -} - -.align-center { - text-align: center; -} - -.align-right { - text-align: right; -} - -/* -- sidebars -------------------------------------------------------------- */ - -div.sidebar { - margin: 0 0 0.5em 1em; - border: 1px solid #ddb; - padding: 7px 7px 0 7px; - background-color: #ffe; - width: 40%; - float: right; -} - -p.sidebar-title { - font-weight: bold; -} - -/* -- topics ---------------------------------------------------------------- */ - -div.topic { - border: 1px solid #ccc; - padding: 7px 7px 0 7px; - margin: 10px 0 10px 0; -} - -p.topic-title { - font-size: 1.1em; - font-weight: bold; - margin-top: 10px; -} - -/* -- admonitions ----------------------------------------------------------- */ - -div.admonition { - margin-top: 10px; - margin-bottom: 10px; - padding: 7px; -} - -div.admonition dt { - font-weight: bold; -} - -div.admonition dl { - margin-bottom: 0; -} - -p.admonition-title { - margin: 0px 10px 5px 0px; - font-weight: bold; -} - -div.body p.centered { - text-align: center; - margin-top: 25px; -} - -/* -- tables ---------------------------------------------------------------- */ - -table.docutils { - border: 0; - border-collapse: collapse; -} - -table.docutils td, table.docutils th { - padding: 1px 8px 1px 5px; - border-top: 0; - border-left: 0; - border-right: 0; - border-bottom: 1px solid #aaa; -} - -table.field-list td, table.field-list th { - border: 0 !important; -} - -table.footnote td, table.footnote th { - border: 0 !important; -} - -th { - text-align: left; - padding-right: 5px; -} - -table.citation { - border-left: solid 1px gray; - margin-left: 1px; -} - -table.citation td { - border-bottom: none; -} - -/* -- other body styles ----------------------------------------------------- */ - -ol.arabic { - list-style: decimal; -} - -ol.loweralpha { - list-style: lower-alpha; -} - -ol.upperalpha { - list-style: upper-alpha; -} - -ol.lowerroman { - list-style: lower-roman; -} - -ol.upperroman { - list-style: upper-roman; -} - -dl { - margin-bottom: 15px; -} - -dd p { - margin-top: 0px; -} - -dd ul, dd table { - margin-bottom: 10px; -} - -dd { - margin-top: 3px; - margin-bottom: 10px; - margin-left: 30px; -} - -dt:target, .highlighted { - background-color: #fbe54e; -} - -dl.glossary dt { - font-weight: bold; - font-size: 1.1em; -} - -.field-list ul { - margin: 0; - padding-left: 1em; -} - -.field-list p { - margin: 0; -} - -.refcount { - color: #060; -} - -.optional { - font-size: 1.3em; -} - -.versionmodified { - font-style: italic; -} - -.system-message { - background-color: #fda; - padding: 5px; - border: 3px solid red; -} - -.footnote:target { - background-color: #ffa; -} - -.line-block { - display: block; - margin-top: 1em; - margin-bottom: 1em; -} - -.line-block .line-block { - margin-top: 0; - margin-bottom: 0; - margin-left: 1.5em; -} - -.guilabel, .menuselection { - font-family: sans-serif; -} - -.accelerator { - text-decoration: underline; -} - -.classifier { - font-style: oblique; -} - -abbr, acronym { - border-bottom: dotted 1px; - cursor: help; -} - -/* -- code displays --------------------------------------------------------- */ - -pre { - overflow: auto; - overflow-y: hidden; /* fixes display issues on Chrome browsers */ -} - -td.linenos pre { - padding: 5px 0px; - border: 0; - background-color: transparent; - color: #aaa; -} - -table.highlighttable { - margin-left: 0.5em; -} - -table.highlighttable td { - padding: 0 0.5em 0 0.5em; -} - -tt.descname { - background-color: transparent; - font-weight: bold; - font-size: 1.2em; -} - -tt.descclassname { - background-color: transparent; -} - -tt.xref, a tt { - background-color: transparent; - font-weight: bold; -} - -h1 tt, h2 tt, h3 tt, h4 tt, h5 tt, h6 tt { - background-color: transparent; -} - -.viewcode-link { - float: right; -} - -.viewcode-back { - float: right; - font-family: sans-serif; -} - -div.viewcode-block:target { - margin: -1px -10px; - padding: 0 10px; -} - -/* -- math display ---------------------------------------------------------- */ - -img.math { - vertical-align: middle; -} - -div.body div.math p { - text-align: center; -} - -span.eqno { - float: right; -} - -/* -- printout stylesheet --------------------------------------------------- */ - -@media print { - div.document, - div.documentwrapper, - div.bodywrapper { - margin: 0 !important; - width: 100%; - } - - div.sphinxsidebar, - div.related, - div.footer, - #top-link { - display: none; - } -} \ No newline at end of file diff --git a/docs/_static/contents.png b/docs/_static/contents.png deleted file mode 100644 index 7fb82154a1..0000000000 Binary files a/docs/_static/contents.png and /dev/null differ diff --git a/docs/_static/default.css b/docs/_static/default.css deleted file mode 100644 index 611cb55720..0000000000 --- a/docs/_static/default.css +++ /dev/null @@ -1,574 +0,0 @@ -/** - * Alternate Sphinx design - * Originally created by Armin Ronacher for Werkzeug, adapted by Georg Brandl. - * Minor modifications by Radim Rehurek, who was too lazy to fork the theme. - */ - -table.links { - border-collapse: separate; - border-spacing: 15px; - margin: 10px auto 0px auto; -} - -table.links tr td { - -webkit-border-radius: 15px; - -moz-border-radius: 15px; - border-radius: 15px; - padding: 10px 30px; - color: white; -/* border: 1px solid #86989B;*/ - font-weight: bold; - background-color: #eee; - text-align: center; - background: -webkit-gradient(linear, left top, left bottom, from(#eee), to(#ddd)); - background: -moz-linear-gradient(top, #eee, #ddd); -} - -table.links a { text-decoration: none; } - -#tagcloud { - font-size: 50%; - width: 80%; - margin: 20px auto 0px auto; - padding: 15px; - line-height: 2.4em; - word-spacing: normal; - letter-spacing: normal; - text-transform: none; - text-align: justify; - text-indent: 0; - border: 1px dotted; -} - -.wrd { padding:0; position:relative } -.tagcloud0 { font-size:1.0em; color:#ACC1F3; z-index:10 } -.tagcloud1 { font-size:1.4em; color:#ACC1F3; z-index:9 } -.tagcloud2 { font-size:1.8em; color:#86A0DC; z-index:8} -.tagcloud3 { font-size:2.2em; color:#86A0DC; z-index:7} -.tagcloud4 { font-size:2.6em; color:#607EC5; z-index:6} -.tagcloud5 { font-size:3.0em; color:#607EC5; z-index:5} -.tagcloud6 { font-size:3.3em; color:#4C6DB9; z-index:4} -.tagcloud7 { font-size:3.6em; color:#395CAE; z-index:3} -.tagcloud8 { font-size:3.9em; color:#264CA2; z-index:2} -.tagcloud9 { font-size:4.2em; color:#133B97; z-index:1} -.tagcloud10 { font-size:4.5em; color:#002A8B; z-index:0} - - -body { - font-family: 'Lucida Grande', 'Lucida Sans Unicode', 'Geneva', 'Verdana', sans-serif; - font-size: 14px; - letter-spacing: -0.01em; - line-height: 150%; - text-align: center; - /*background-color: #AFC1C4; */ - background-color: #BFD1D4; - color: black; - padding: 0; - border: 1px solid #aaa; - - margin: 0px 80px 0px 80px; - min-width: 740px; -} - -a { - color: #CA7900; - text-decoration: none; -} - -a:hover { - color: #2491CF; -} - -pre { - font-family: 'Consolas', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace; - font-size: 0.95em; - letter-spacing: 0.015em; - padding: 0.5em; - border: 1px solid #ccc; - background-color: #f8f8f8; -} - -td.linenos pre { - padding: 0.5em 0; - border: 0; - background-color: transparent; - color: #aaa; -} - -table.highlighttable { - margin-left: 0.5em; -} - -table.highlighttable td { - padding: 0 0.5em 0 0.5em; -} - -cite, code, tt { - font-family: 'Consolas', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace; - font-size: 0.95em; - letter-spacing: 0.01em; -} - -hr { - border: 1px solid #abc; - margin: 2em; -} - -tt { - background-color: #f2f2f2; - border-bottom: 1px solid #ddd; - color: #333; -} - -tt.descname { - background-color: transparent; - font-weight: bold; - font-size: 1.2em; - border: 0; -} - -tt.descclassname { - background-color: transparent; - border: 0; -} - -tt.xref { - background-color: transparent; - font-weight: bold; - border: 0; -} - -a tt { - background-color: transparent; - font-weight: bold; - border: 0; - color: #CA7900; -} - -a tt:hover { - color: #2491CF; -} - -dl { - margin-bottom: 15px; -} - -dd p { - margin-top: 0px; -} - -dd ul, dd table { - margin-bottom: 10px; -} - -dd { - margin-top: 3px; - margin-bottom: 10px; - margin-left: 30px; -} - -.refcount { - color: #060; -} - -dt:target, -.highlight { - background-color: #fbe54e; -} - -dl.class, dl.function { - border-top: 2px solid #888; -} - -dl.method, dl.attribute { - border-top: 1px solid #aaa; -} - -dl.glossary dt { - font-weight: bold; - font-size: 1.1em; -} - -pre { - line-height: 120%; -} - -pre a { - color: inherit; - text-decoration: underline; -} - -.first { - margin-top: 0 !important; -} - -div.document { - background-color: white; - text-align: left; - background-image: url(contents.png); - background-repeat: repeat-x; -} - -/* -div.documentwrapper { - width: 100%; -} -*/ - -div.clearer { - clear: both; -} - -div.related h3 { - display: none; -} - -div.related ul { - background-image: url(navigation.png); - height: 2em; - list-style: none; - border-top: 1px solid #ddd; - border-bottom: 1px solid #ddd; - margin: 0; - padding-left: 10px; -} - -div.related ul li { - margin: 0; - padding: 0; - height: 2em; - float: left; -} - -div.related ul li.right { - float: right; - margin-right: 5px; -} - -div.related ul li a { - margin: 0; - padding: 0 5px 0 5px; - line-height: 1.75em; - color: #EE9816; -} - -div.related ul li a:hover { - color: #3CA8E7; -} - -div.body { - margin: 0; - padding: 0.5em 20px 20px 20px; -} - -div.bodywrapper { - margin: 0 280px 0 0; - border-right: 1px solid #ccc; -} - -div.body a { - text-decoration: underline; -} - -div.sphinxsidebar { - margin: 0; - padding: 0.5em 15px 15px 15px; - width: 250px; - float: right; - text-align: left; - background-color: #eee; -/* margin-left: -100%; */ -} - -div.sphinxsidebarl { - margin: 0; - padding: 0.5em 15px 15px 0px; - width: 210px; - float: left; - clear: left; - text-align: left; -/* margin-left: -100%; */ -} - -div.sphinxsidebar h4, div.sphinxsidebar h3, div.sphinxsidebarl h4, div.sphinxsidebarl h3 { - margin: 1em 0 0.5em 0; - font-size: 0.9em; - padding: 0.1em 0 0.1em 0.5em; - color: white; - border: 1px solid #86989B; - background-color: #AFC1C4; -} - -div.sphinxsidebar ul, div.sphinxsidebarl ul { - padding-left: 1.5em; - margin-top: 7px; - list-style: none; - padding: 0; - line-height: 130%; -} - -div.sphinxsidebar ul ul, div.sphinxsidebarl ul ul { - list-style: square; - margin-left: 20px; -} - -p { - margin: 0.8em 0 0.5em 0; -} - -p.rubric { - font-weight: bold; -} - -h1 { - margin: 0; - padding: 0.7em 0 0.3em 0; - font-size: 1.8em; - /*color: #11557C;*/ - color: #000; -} - -h2 { - margin: 1.3em 0 0.2em 0; - font-size: 1.35em; - padding: 0; -} - -h3 { - margin: 1em 0 -0.3em 0; - font-size: 1.2em; -} - -h1 a, h2 a, h3 a, h4 a, h5 a, h6 a { - color: black!important; -} - -h1 a.anchor, h2 a.anchor, h3 a.anchor, h4 a.anchor, h5 a.anchor, h6 a.anchor { - display: none; - margin: 0 0 0 0.3em; - padding: 0 0.2em 0 0.2em; - color: #aaa!important; -} - -h1:hover a.anchor, h2:hover a.anchor, h3:hover a.anchor, h4:hover a.anchor, -h5:hover a.anchor, h6:hover a.anchor { - display: inline; -} - -h1 a.anchor:hover, h2 a.anchor:hover, h3 a.anchor:hover, h4 a.anchor:hover, -h5 a.anchor:hover, h6 a.anchor:hover { - color: #777; - background-color: #eee; -} - -table { - border-collapse: collapse; - margin: 0 -0.5em 0 -0.5em; -} - -table td, table th { - padding: 0.2em 0.5em 0.2em 0.5em; -} - -div.footer { - background-color: #E3EFF1; - color: #86989B; - padding: 3px 8px 3px 0; - clear: both; - font-size: 0.8em; - text-align: right; -} - -div.footer a { - color: #86989B; - text-decoration: underline; -} - -div.pagination { - margin-top: 2em; - padding-top: 0.5em; - border-top: 1px solid black; - text-align: center; -} - -div.sphinxsidebar ul.toc, div.sphinxsidebarl ul.toc { - margin: 1em 0 1em 0; - padding: 0 0 0 0.5em; - list-style: none; -} - -div.sphinxsidebar ul.toc li, div.sphinxsidebarl ul.toc li { - margin: 0.5em 0 0.5em 0; - font-size: 0.9em; - line-height: 130%; -} - -div.sphinxsidebar ul.toc li p, div.sphinxsidebarl ul.toc li p { - margin: 0; - padding: 0; -} - -div.sphinxsidebar ul.toc ul, div.sphinxsidebarl ul.toc ul { - margin: 0.2em 0 0.2em 0; - padding: 0 0 0 1.8em; -} - -div.sphinxsidebar ul.toc ul li, div.sphinxsidebarl ul.toc ul li { - padding: 0; -} - -div.admonition, div.warning { - font-size: 0.9em; - margin: 1em 0 0 0; - border: 1px solid #86989B; - background-color: #f7f7f7; -} - -div.admonition p, div.warning p { - margin: 0.5em 1em 0.5em 1em; - padding: 0; -} - -div.admonition pre, div.warning pre { - margin: 0.4em 1em 0.4em 1em; -} - -div.admonition p.admonition-title, -div.warning p.admonition-title { - margin: 0; - padding: 0.1em 0 0.1em 0.5em; - color: white; - border-bottom: 1px solid #86989B; - font-weight: bold; - background-color: #AFC1C4; -} - -div.warning { - border: 1px solid #940000; -} - -div.warning p.admonition-title { - background-color: #CF0000; - border-bottom-color: #940000; -} - -div.admonition ul, div.admonition ol, -div.warning ul, div.warning ol { - margin: 0.1em 0.5em 0.5em 3em; - padding: 0; -} - -div.versioninfo { - margin: 1em 0 0 0; - border: 1px solid #ccc; - background-color: #DDEAF0; - padding: 8px; - line-height: 1.3em; - font-size: 0.9em; -} - - -a.headerlink { - color: #c60f0f!important; - font-size: 1em; - margin-left: 6px; - padding: 0 4px 0 4px; - text-decoration: none!important; - visibility: hidden; -} - -h1:hover > a.headerlink, -h2:hover > a.headerlink, -h3:hover > a.headerlink, -h4:hover > a.headerlink, -h5:hover > a.headerlink, -h6:hover > a.headerlink, -dt:hover > a.headerlink { - visibility: visible; -} - -a.headerlink:hover { - background-color: #ccc; - color: white!important; -} - -table.indextable td { - text-align: left; - vertical-align: top; -} - -table.indextable dl, table.indextable dd { - margin-top: 0; - margin-bottom: 0; -} - -table.indextable tr.pcap { - height: 10px; -} - -table.indextable tr.cap { - margin-top: 10px; - background-color: #f2f2f2; -} - -img.toggler { - margin-right: 3px; - margin-top: 3px; - cursor: pointer; -} - -img.inheritance { - border: 0px -} - -form.pfform { - margin: 10px 0 20px 0; -} - -table.contentstable { - width: 90%; -} - -table.contentstable p.biglink { - line-height: 150%; -} - -a.biglink { - font-size: 1.3em; -} - -span.linkdescr { - font-style: italic; - padding-top: 5px; - font-size: 90%; -} - -ul.search { - margin: 10px 0 0 20px; - padding: 0; -} - -ul.search li { - padding: 5px 0 5px 20px; - background-image: url(file.png); - background-repeat: no-repeat; - background-position: 0 7px; -} - -ul.search li a { - font-weight: bold; -} - -ul.search li div.context { - color: #888; - margin: 2px 0 0 30px; - text-align: left; -} - -ul.keywordmatches li.goodmatch a { - font-weight: bold; -} - -.twtr-ft { - display: none; -} diff --git a/docs/_static/file.png b/docs/_static/file.png deleted file mode 100644 index d18082e397..0000000000 Binary files a/docs/_static/file.png and /dev/null differ diff --git a/docs/_static/jquery.js b/docs/_static/jquery.js deleted file mode 100644 index 7c24308023..0000000000 --- a/docs/_static/jquery.js +++ /dev/null @@ -1,154 +0,0 @@ -/*! - * jQuery JavaScript Library v1.4.2 - * http://jquery.com/ - * - * Copyright 2010, John Resig - * Dual licensed under the MIT or GPL Version 2 licenses. - * http://jquery.org/license - * - * Includes Sizzle.js - * http://sizzlejs.com/ - * Copyright 2010, The Dojo Foundation - * Released under the MIT, BSD, and GPL Licenses. - * - * Date: Sat Feb 13 22:33:48 2010 -0500 - */ -(function(A,w){function ma(){if(!c.isReady){try{s.documentElement.doScroll("left")}catch(a){setTimeout(ma,1);return}c.ready()}}function Qa(a,b){b.src?c.ajax({url:b.src,async:false,dataType:"script"}):c.globalEval(b.text||b.textContent||b.innerHTML||"");b.parentNode&&b.parentNode.removeChild(b)}function X(a,b,d,f,e,j){var i=a.length;if(typeof b==="object"){for(var o in b)X(a,o,b[o],f,e,d);return a}if(d!==w){f=!j&&f&&c.isFunction(d);for(o=0;o)[^>]*$|^#([\w-]+)$/,Ua=/^.[^:#\[\.,]*$/,Va=/\S/, -Wa=/^(\s|\u00A0)+|(\s|\u00A0)+$/g,Xa=/^<(\w+)\s*\/?>(?:<\/\1>)?$/,P=navigator.userAgent,xa=false,Q=[],L,$=Object.prototype.toString,aa=Object.prototype.hasOwnProperty,ba=Array.prototype.push,R=Array.prototype.slice,ya=Array.prototype.indexOf;c.fn=c.prototype={init:function(a,b){var d,f;if(!a)return this;if(a.nodeType){this.context=this[0]=a;this.length=1;return this}if(a==="body"&&!b){this.context=s;this[0]=s.body;this.selector="body";this.length=1;return this}if(typeof a==="string")if((d=Ta.exec(a))&& -(d[1]||!b))if(d[1]){f=b?b.ownerDocument||b:s;if(a=Xa.exec(a))if(c.isPlainObject(b)){a=[s.createElement(a[1])];c.fn.attr.call(a,b,true)}else a=[f.createElement(a[1])];else{a=sa([d[1]],[f]);a=(a.cacheable?a.fragment.cloneNode(true):a.fragment).childNodes}return c.merge(this,a)}else{if(b=s.getElementById(d[2])){if(b.id!==d[2])return T.find(a);this.length=1;this[0]=b}this.context=s;this.selector=a;return this}else if(!b&&/^\w+$/.test(a)){this.selector=a;this.context=s;a=s.getElementsByTagName(a);return c.merge(this, -a)}else return!b||b.jquery?(b||T).find(a):c(b).find(a);else if(c.isFunction(a))return T.ready(a);if(a.selector!==w){this.selector=a.selector;this.context=a.context}return c.makeArray(a,this)},selector:"",jquery:"1.4.2",length:0,size:function(){return this.length},toArray:function(){return R.call(this,0)},get:function(a){return a==null?this.toArray():a<0?this.slice(a)[0]:this[a]},pushStack:function(a,b,d){var f=c();c.isArray(a)?ba.apply(f,a):c.merge(f,a);f.prevObject=this;f.context=this.context;if(b=== -"find")f.selector=this.selector+(this.selector?" ":"")+d;else if(b)f.selector=this.selector+"."+b+"("+d+")";return f},each:function(a,b){return c.each(this,a,b)},ready:function(a){c.bindReady();if(c.isReady)a.call(s,c);else Q&&Q.push(a);return this},eq:function(a){return a===-1?this.slice(a):this.slice(a,+a+1)},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},slice:function(){return this.pushStack(R.apply(this,arguments),"slice",R.call(arguments).join(","))},map:function(a){return this.pushStack(c.map(this, -function(b,d){return a.call(b,d,b)}))},end:function(){return this.prevObject||c(null)},push:ba,sort:[].sort,splice:[].splice};c.fn.init.prototype=c.fn;c.extend=c.fn.extend=function(){var a=arguments[0]||{},b=1,d=arguments.length,f=false,e,j,i,o;if(typeof a==="boolean"){f=a;a=arguments[1]||{};b=2}if(typeof a!=="object"&&!c.isFunction(a))a={};if(d===b){a=this;--b}for(;b
a"; -var e=d.getElementsByTagName("*"),j=d.getElementsByTagName("a")[0];if(!(!e||!e.length||!j)){c.support={leadingWhitespace:d.firstChild.nodeType===3,tbody:!d.getElementsByTagName("tbody").length,htmlSerialize:!!d.getElementsByTagName("link").length,style:/red/.test(j.getAttribute("style")),hrefNormalized:j.getAttribute("href")==="/a",opacity:/^0.55$/.test(j.style.opacity),cssFloat:!!j.style.cssFloat,checkOn:d.getElementsByTagName("input")[0].value==="on",optSelected:s.createElement("select").appendChild(s.createElement("option")).selected, -parentNode:d.removeChild(d.appendChild(s.createElement("div"))).parentNode===null,deleteExpando:true,checkClone:false,scriptEval:false,noCloneEvent:true,boxModel:null};b.type="text/javascript";try{b.appendChild(s.createTextNode("window."+f+"=1;"))}catch(i){}a.insertBefore(b,a.firstChild);if(A[f]){c.support.scriptEval=true;delete A[f]}try{delete b.test}catch(o){c.support.deleteExpando=false}a.removeChild(b);if(d.attachEvent&&d.fireEvent){d.attachEvent("onclick",function k(){c.support.noCloneEvent= -false;d.detachEvent("onclick",k)});d.cloneNode(true).fireEvent("onclick")}d=s.createElement("div");d.innerHTML="";a=s.createDocumentFragment();a.appendChild(d.firstChild);c.support.checkClone=a.cloneNode(true).cloneNode(true).lastChild.checked;c(function(){var k=s.createElement("div");k.style.width=k.style.paddingLeft="1px";s.body.appendChild(k);c.boxModel=c.support.boxModel=k.offsetWidth===2;s.body.removeChild(k).style.display="none"});a=function(k){var n= -s.createElement("div");k="on"+k;var r=k in n;if(!r){n.setAttribute(k,"return;");r=typeof n[k]==="function"}return r};c.support.submitBubbles=a("submit");c.support.changeBubbles=a("change");a=b=d=e=j=null}})();c.props={"for":"htmlFor","class":"className",readonly:"readOnly",maxlength:"maxLength",cellspacing:"cellSpacing",rowspan:"rowSpan",colspan:"colSpan",tabindex:"tabIndex",usemap:"useMap",frameborder:"frameBorder"};var G="jQuery"+J(),Ya=0,za={};c.extend({cache:{},expando:G,noData:{embed:true,object:true, -applet:true},data:function(a,b,d){if(!(a.nodeName&&c.noData[a.nodeName.toLowerCase()])){a=a==A?za:a;var f=a[G],e=c.cache;if(!f&&typeof b==="string"&&d===w)return null;f||(f=++Ya);if(typeof b==="object"){a[G]=f;e[f]=c.extend(true,{},b)}else if(!e[f]){a[G]=f;e[f]={}}a=e[f];if(d!==w)a[b]=d;return typeof b==="string"?a[b]:a}},removeData:function(a,b){if(!(a.nodeName&&c.noData[a.nodeName.toLowerCase()])){a=a==A?za:a;var d=a[G],f=c.cache,e=f[d];if(b){if(e){delete e[b];c.isEmptyObject(e)&&c.removeData(a)}}else{if(c.support.deleteExpando)delete a[c.expando]; -else a.removeAttribute&&a.removeAttribute(c.expando);delete f[d]}}}});c.fn.extend({data:function(a,b){if(typeof a==="undefined"&&this.length)return c.data(this[0]);else if(typeof a==="object")return this.each(function(){c.data(this,a)});var d=a.split(".");d[1]=d[1]?"."+d[1]:"";if(b===w){var f=this.triggerHandler("getData"+d[1]+"!",[d[0]]);if(f===w&&this.length)f=c.data(this[0],a);return f===w&&d[1]?this.data(d[0]):f}else return this.trigger("setData"+d[1]+"!",[d[0],b]).each(function(){c.data(this, -a,b)})},removeData:function(a){return this.each(function(){c.removeData(this,a)})}});c.extend({queue:function(a,b,d){if(a){b=(b||"fx")+"queue";var f=c.data(a,b);if(!d)return f||[];if(!f||c.isArray(d))f=c.data(a,b,c.makeArray(d));else f.push(d);return f}},dequeue:function(a,b){b=b||"fx";var d=c.queue(a,b),f=d.shift();if(f==="inprogress")f=d.shift();if(f){b==="fx"&&d.unshift("inprogress");f.call(a,function(){c.dequeue(a,b)})}}});c.fn.extend({queue:function(a,b){if(typeof a!=="string"){b=a;a="fx"}if(b=== -w)return c.queue(this[0],a);return this.each(function(){var d=c.queue(this,a,b);a==="fx"&&d[0]!=="inprogress"&&c.dequeue(this,a)})},dequeue:function(a){return this.each(function(){c.dequeue(this,a)})},delay:function(a,b){a=c.fx?c.fx.speeds[a]||a:a;b=b||"fx";return this.queue(b,function(){var d=this;setTimeout(function(){c.dequeue(d,b)},a)})},clearQueue:function(a){return this.queue(a||"fx",[])}});var Aa=/[\n\t]/g,ca=/\s+/,Za=/\r/g,$a=/href|src|style/,ab=/(button|input)/i,bb=/(button|input|object|select|textarea)/i, -cb=/^(a|area)$/i,Ba=/radio|checkbox/;c.fn.extend({attr:function(a,b){return X(this,a,b,true,c.attr)},removeAttr:function(a){return this.each(function(){c.attr(this,a,"");this.nodeType===1&&this.removeAttribute(a)})},addClass:function(a){if(c.isFunction(a))return this.each(function(n){var r=c(this);r.addClass(a.call(this,n,r.attr("class")))});if(a&&typeof a==="string")for(var b=(a||"").split(ca),d=0,f=this.length;d-1)return true;return false},val:function(a){if(a===w){var b=this[0];if(b){if(c.nodeName(b,"option"))return(b.attributes.value||{}).specified?b.value:b.text;if(c.nodeName(b,"select")){var d=b.selectedIndex,f=[],e=b.options;b=b.type==="select-one";if(d<0)return null;var j=b?d:0;for(d=b?d+1:e.length;j=0;else if(c.nodeName(this,"select")){var u=c.makeArray(r);c("option",this).each(function(){this.selected= -c.inArray(c(this).val(),u)>=0});if(!u.length)this.selectedIndex=-1}else this.value=r}})}});c.extend({attrFn:{val:true,css:true,html:true,text:true,data:true,width:true,height:true,offset:true},attr:function(a,b,d,f){if(!a||a.nodeType===3||a.nodeType===8)return w;if(f&&b in c.attrFn)return c(a)[b](d);f=a.nodeType!==1||!c.isXMLDoc(a);var e=d!==w;b=f&&c.props[b]||b;if(a.nodeType===1){var j=$a.test(b);if(b in a&&f&&!j){if(e){b==="type"&&ab.test(a.nodeName)&&a.parentNode&&c.error("type property can't be changed"); -a[b]=d}if(c.nodeName(a,"form")&&a.getAttributeNode(b))return a.getAttributeNode(b).nodeValue;if(b==="tabIndex")return(b=a.getAttributeNode("tabIndex"))&&b.specified?b.value:bb.test(a.nodeName)||cb.test(a.nodeName)&&a.href?0:w;return a[b]}if(!c.support.style&&f&&b==="style"){if(e)a.style.cssText=""+d;return a.style.cssText}e&&a.setAttribute(b,""+d);a=!c.support.hrefNormalized&&f&&j?a.getAttribute(b,2):a.getAttribute(b);return a===null?w:a}return c.style(a,b,d)}});var O=/\.(.*)$/,db=function(a){return a.replace(/[^\w\s\.\|`]/g, -function(b){return"\\"+b})};c.event={add:function(a,b,d,f){if(!(a.nodeType===3||a.nodeType===8)){if(a.setInterval&&a!==A&&!a.frameElement)a=A;var e,j;if(d.handler){e=d;d=e.handler}if(!d.guid)d.guid=c.guid++;if(j=c.data(a)){var i=j.events=j.events||{},o=j.handle;if(!o)j.handle=o=function(){return typeof c!=="undefined"&&!c.event.triggered?c.event.handle.apply(o.elem,arguments):w};o.elem=a;b=b.split(" ");for(var k,n=0,r;k=b[n++];){j=e?c.extend({},e):{handler:d,data:f};if(k.indexOf(".")>-1){r=k.split("."); -k=r.shift();j.namespace=r.slice(0).sort().join(".")}else{r=[];j.namespace=""}j.type=k;j.guid=d.guid;var u=i[k],z=c.event.special[k]||{};if(!u){u=i[k]=[];if(!z.setup||z.setup.call(a,f,r,o)===false)if(a.addEventListener)a.addEventListener(k,o,false);else a.attachEvent&&a.attachEvent("on"+k,o)}if(z.add){z.add.call(a,j);if(!j.handler.guid)j.handler.guid=d.guid}u.push(j);c.event.global[k]=true}a=null}}},global:{},remove:function(a,b,d,f){if(!(a.nodeType===3||a.nodeType===8)){var e,j=0,i,o,k,n,r,u,z=c.data(a), -C=z&&z.events;if(z&&C){if(b&&b.type){d=b.handler;b=b.type}if(!b||typeof b==="string"&&b.charAt(0)==="."){b=b||"";for(e in C)c.event.remove(a,e+b)}else{for(b=b.split(" ");e=b[j++];){n=e;i=e.indexOf(".")<0;o=[];if(!i){o=e.split(".");e=o.shift();k=new RegExp("(^|\\.)"+c.map(o.slice(0).sort(),db).join("\\.(?:.*\\.)?")+"(\\.|$)")}if(r=C[e])if(d){n=c.event.special[e]||{};for(B=f||0;B=0){a.type= -e=e.slice(0,-1);a.exclusive=true}if(!d){a.stopPropagation();c.event.global[e]&&c.each(c.cache,function(){this.events&&this.events[e]&&c.event.trigger(a,b,this.handle.elem)})}if(!d||d.nodeType===3||d.nodeType===8)return w;a.result=w;a.target=d;b=c.makeArray(b);b.unshift(a)}a.currentTarget=d;(f=c.data(d,"handle"))&&f.apply(d,b);f=d.parentNode||d.ownerDocument;try{if(!(d&&d.nodeName&&c.noData[d.nodeName.toLowerCase()]))if(d["on"+e]&&d["on"+e].apply(d,b)===false)a.result=false}catch(j){}if(!a.isPropagationStopped()&& -f)c.event.trigger(a,b,f,true);else if(!a.isDefaultPrevented()){f=a.target;var i,o=c.nodeName(f,"a")&&e==="click",k=c.event.special[e]||{};if((!k._default||k._default.call(d,a)===false)&&!o&&!(f&&f.nodeName&&c.noData[f.nodeName.toLowerCase()])){try{if(f[e]){if(i=f["on"+e])f["on"+e]=null;c.event.triggered=true;f[e]()}}catch(n){}if(i)f["on"+e]=i;c.event.triggered=false}}},handle:function(a){var b,d,f,e;a=arguments[0]=c.event.fix(a||A.event);a.currentTarget=this;b=a.type.indexOf(".")<0&&!a.exclusive; -if(!b){d=a.type.split(".");a.type=d.shift();f=new RegExp("(^|\\.)"+d.slice(0).sort().join("\\.(?:.*\\.)?")+"(\\.|$)")}e=c.data(this,"events");d=e[a.type];if(e&&d){d=d.slice(0);e=0;for(var j=d.length;e-1?c.map(a.options,function(f){return f.selected}).join("-"):"";else if(a.nodeName.toLowerCase()==="select")d=a.selectedIndex;return d},fa=function(a,b){var d=a.target,f,e;if(!(!da.test(d.nodeName)||d.readOnly)){f=c.data(d,"_change_data");e=Fa(d);if(a.type!=="focusout"||d.type!=="radio")c.data(d,"_change_data", -e);if(!(f===w||e===f))if(f!=null||e){a.type="change";return c.event.trigger(a,b,d)}}};c.event.special.change={filters:{focusout:fa,click:function(a){var b=a.target,d=b.type;if(d==="radio"||d==="checkbox"||b.nodeName.toLowerCase()==="select")return fa.call(this,a)},keydown:function(a){var b=a.target,d=b.type;if(a.keyCode===13&&b.nodeName.toLowerCase()!=="textarea"||a.keyCode===32&&(d==="checkbox"||d==="radio")||d==="select-multiple")return fa.call(this,a)},beforeactivate:function(a){a=a.target;c.data(a, -"_change_data",Fa(a))}},setup:function(){if(this.type==="file")return false;for(var a in ea)c.event.add(this,a+".specialChange",ea[a]);return da.test(this.nodeName)},teardown:function(){c.event.remove(this,".specialChange");return da.test(this.nodeName)}};ea=c.event.special.change.filters}s.addEventListener&&c.each({focus:"focusin",blur:"focusout"},function(a,b){function d(f){f=c.event.fix(f);f.type=b;return c.event.handle.call(this,f)}c.event.special[b]={setup:function(){this.addEventListener(a, -d,true)},teardown:function(){this.removeEventListener(a,d,true)}}});c.each(["bind","one"],function(a,b){c.fn[b]=function(d,f,e){if(typeof d==="object"){for(var j in d)this[b](j,f,d[j],e);return this}if(c.isFunction(f)){e=f;f=w}var i=b==="one"?c.proxy(e,function(k){c(this).unbind(k,i);return e.apply(this,arguments)}):e;if(d==="unload"&&b!=="one")this.one(d,f,e);else{j=0;for(var o=this.length;j0){y=t;break}}t=t[g]}m[q]=y}}}var f=/((?:\((?:\([^()]+\)|[^()]+)+\)|\[(?:\[[^[\]]*\]|['"][^'"]*['"]|[^[\]'"]+)+\]|\\.|[^ >+~,(\[\\]+)+|[>+~])(\s*,\s*)?((?:.|\r|\n)*)/g, -e=0,j=Object.prototype.toString,i=false,o=true;[0,0].sort(function(){o=false;return 0});var k=function(g,h,l,m){l=l||[];var q=h=h||s;if(h.nodeType!==1&&h.nodeType!==9)return[];if(!g||typeof g!=="string")return l;for(var p=[],v,t,y,S,H=true,M=x(h),I=g;(f.exec(""),v=f.exec(I))!==null;){I=v[3];p.push(v[1]);if(v[2]){S=v[3];break}}if(p.length>1&&r.exec(g))if(p.length===2&&n.relative[p[0]])t=ga(p[0]+p[1],h);else for(t=n.relative[p[0]]?[h]:k(p.shift(),h);p.length;){g=p.shift();if(n.relative[g])g+=p.shift(); -t=ga(g,t)}else{if(!m&&p.length>1&&h.nodeType===9&&!M&&n.match.ID.test(p[0])&&!n.match.ID.test(p[p.length-1])){v=k.find(p.shift(),h,M);h=v.expr?k.filter(v.expr,v.set)[0]:v.set[0]}if(h){v=m?{expr:p.pop(),set:z(m)}:k.find(p.pop(),p.length===1&&(p[0]==="~"||p[0]==="+")&&h.parentNode?h.parentNode:h,M);t=v.expr?k.filter(v.expr,v.set):v.set;if(p.length>0)y=z(t);else H=false;for(;p.length;){var D=p.pop();v=D;if(n.relative[D])v=p.pop();else D="";if(v==null)v=h;n.relative[D](y,v,M)}}else y=[]}y||(y=t);y||k.error(D|| -g);if(j.call(y)==="[object Array]")if(H)if(h&&h.nodeType===1)for(g=0;y[g]!=null;g++){if(y[g]&&(y[g]===true||y[g].nodeType===1&&E(h,y[g])))l.push(t[g])}else for(g=0;y[g]!=null;g++)y[g]&&y[g].nodeType===1&&l.push(t[g]);else l.push.apply(l,y);else z(y,l);if(S){k(S,q,l,m);k.uniqueSort(l)}return l};k.uniqueSort=function(g){if(B){i=o;g.sort(B);if(i)for(var h=1;h":function(g,h){var l=typeof h==="string";if(l&&!/\W/.test(h)){h=h.toLowerCase();for(var m=0,q=g.length;m=0))l||m.push(v);else if(l)h[p]=false;return false},ID:function(g){return g[1].replace(/\\/g,"")},TAG:function(g){return g[1].toLowerCase()}, -CHILD:function(g){if(g[1]==="nth"){var h=/(-?)(\d*)n((?:\+|-)?\d*)/.exec(g[2]==="even"&&"2n"||g[2]==="odd"&&"2n+1"||!/\D/.test(g[2])&&"0n+"+g[2]||g[2]);g[2]=h[1]+(h[2]||1)-0;g[3]=h[3]-0}g[0]=e++;return g},ATTR:function(g,h,l,m,q,p){h=g[1].replace(/\\/g,"");if(!p&&n.attrMap[h])g[1]=n.attrMap[h];if(g[2]==="~=")g[4]=" "+g[4]+" ";return g},PSEUDO:function(g,h,l,m,q){if(g[1]==="not")if((f.exec(g[3])||"").length>1||/^\w/.test(g[3]))g[3]=k(g[3],null,null,h);else{g=k.filter(g[3],h,l,true^q);l||m.push.apply(m, -g);return false}else if(n.match.POS.test(g[0])||n.match.CHILD.test(g[0]))return true;return g},POS:function(g){g.unshift(true);return g}},filters:{enabled:function(g){return g.disabled===false&&g.type!=="hidden"},disabled:function(g){return g.disabled===true},checked:function(g){return g.checked===true},selected:function(g){return g.selected===true},parent:function(g){return!!g.firstChild},empty:function(g){return!g.firstChild},has:function(g,h,l){return!!k(l[3],g).length},header:function(g){return/h\d/i.test(g.nodeName)}, -text:function(g){return"text"===g.type},radio:function(g){return"radio"===g.type},checkbox:function(g){return"checkbox"===g.type},file:function(g){return"file"===g.type},password:function(g){return"password"===g.type},submit:function(g){return"submit"===g.type},image:function(g){return"image"===g.type},reset:function(g){return"reset"===g.type},button:function(g){return"button"===g.type||g.nodeName.toLowerCase()==="button"},input:function(g){return/input|select|textarea|button/i.test(g.nodeName)}}, -setFilters:{first:function(g,h){return h===0},last:function(g,h,l,m){return h===m.length-1},even:function(g,h){return h%2===0},odd:function(g,h){return h%2===1},lt:function(g,h,l){return hl[3]-0},nth:function(g,h,l){return l[3]-0===h},eq:function(g,h,l){return l[3]-0===h}},filter:{PSEUDO:function(g,h,l,m){var q=h[1],p=n.filters[q];if(p)return p(g,l,h,m);else if(q==="contains")return(g.textContent||g.innerText||a([g])||"").indexOf(h[3])>=0;else if(q==="not"){h= -h[3];l=0;for(m=h.length;l=0}},ID:function(g,h){return g.nodeType===1&&g.getAttribute("id")===h},TAG:function(g,h){return h==="*"&&g.nodeType===1||g.nodeName.toLowerCase()===h},CLASS:function(g,h){return(" "+(g.className||g.getAttribute("class"))+" ").indexOf(h)>-1},ATTR:function(g,h){var l=h[1];g=n.attrHandle[l]?n.attrHandle[l](g):g[l]!=null?g[l]:g.getAttribute(l);l=g+"";var m=h[2];h=h[4];return g==null?m==="!=":m=== -"="?l===h:m==="*="?l.indexOf(h)>=0:m==="~="?(" "+l+" ").indexOf(h)>=0:!h?l&&g!==false:m==="!="?l!==h:m==="^="?l.indexOf(h)===0:m==="$="?l.substr(l.length-h.length)===h:m==="|="?l===h||l.substr(0,h.length+1)===h+"-":false},POS:function(g,h,l,m){var q=n.setFilters[h[2]];if(q)return q(g,l,h,m)}}},r=n.match.POS;for(var u in n.match){n.match[u]=new RegExp(n.match[u].source+/(?![^\[]*\])(?![^\(]*\))/.source);n.leftMatch[u]=new RegExp(/(^(?:.|\r|\n)*?)/.source+n.match[u].source.replace(/\\(\d+)/g,function(g, -h){return"\\"+(h-0+1)}))}var z=function(g,h){g=Array.prototype.slice.call(g,0);if(h){h.push.apply(h,g);return h}return g};try{Array.prototype.slice.call(s.documentElement.childNodes,0)}catch(C){z=function(g,h){h=h||[];if(j.call(g)==="[object Array]")Array.prototype.push.apply(h,g);else if(typeof g.length==="number")for(var l=0,m=g.length;l";var l=s.documentElement;l.insertBefore(g,l.firstChild);if(s.getElementById(h)){n.find.ID=function(m,q,p){if(typeof q.getElementById!=="undefined"&&!p)return(q=q.getElementById(m[1]))?q.id===m[1]||typeof q.getAttributeNode!=="undefined"&& -q.getAttributeNode("id").nodeValue===m[1]?[q]:w:[]};n.filter.ID=function(m,q){var p=typeof m.getAttributeNode!=="undefined"&&m.getAttributeNode("id");return m.nodeType===1&&p&&p.nodeValue===q}}l.removeChild(g);l=g=null})();(function(){var g=s.createElement("div");g.appendChild(s.createComment(""));if(g.getElementsByTagName("*").length>0)n.find.TAG=function(h,l){l=l.getElementsByTagName(h[1]);if(h[1]==="*"){h=[];for(var m=0;l[m];m++)l[m].nodeType===1&&h.push(l[m]);l=h}return l};g.innerHTML=""; -if(g.firstChild&&typeof g.firstChild.getAttribute!=="undefined"&&g.firstChild.getAttribute("href")!=="#")n.attrHandle.href=function(h){return h.getAttribute("href",2)};g=null})();s.querySelectorAll&&function(){var g=k,h=s.createElement("div");h.innerHTML="

";if(!(h.querySelectorAll&&h.querySelectorAll(".TEST").length===0)){k=function(m,q,p,v){q=q||s;if(!v&&q.nodeType===9&&!x(q))try{return z(q.querySelectorAll(m),p)}catch(t){}return g(m,q,p,v)};for(var l in g)k[l]=g[l];h=null}}(); -(function(){var g=s.createElement("div");g.innerHTML="
";if(!(!g.getElementsByClassName||g.getElementsByClassName("e").length===0)){g.lastChild.className="e";if(g.getElementsByClassName("e").length!==1){n.order.splice(1,0,"CLASS");n.find.CLASS=function(h,l,m){if(typeof l.getElementsByClassName!=="undefined"&&!m)return l.getElementsByClassName(h[1])};g=null}}})();var E=s.compareDocumentPosition?function(g,h){return!!(g.compareDocumentPosition(h)&16)}: -function(g,h){return g!==h&&(g.contains?g.contains(h):true)},x=function(g){return(g=(g?g.ownerDocument||g:0).documentElement)?g.nodeName!=="HTML":false},ga=function(g,h){var l=[],m="",q;for(h=h.nodeType?[h]:h;q=n.match.PSEUDO.exec(g);){m+=q[0];g=g.replace(n.match.PSEUDO,"")}g=n.relative[g]?g+"*":g;q=0;for(var p=h.length;q=0===d})};c.fn.extend({find:function(a){for(var b=this.pushStack("","find",a),d=0,f=0,e=this.length;f0)for(var j=d;j0},closest:function(a,b){if(c.isArray(a)){var d=[],f=this[0],e,j= -{},i;if(f&&a.length){e=0;for(var o=a.length;e-1:c(f).is(e)){d.push({selector:i,elem:f});delete j[i]}}f=f.parentNode}}return d}var k=c.expr.match.POS.test(a)?c(a,b||this.context):null;return this.map(function(n,r){for(;r&&r.ownerDocument&&r!==b;){if(k?k.index(r)>-1:c(r).is(a))return r;r=r.parentNode}return null})},index:function(a){if(!a||typeof a=== -"string")return c.inArray(this[0],a?c(a):this.parent().children());return c.inArray(a.jquery?a[0]:a,this)},add:function(a,b){a=typeof a==="string"?c(a,b||this.context):c.makeArray(a);b=c.merge(this.get(),a);return this.pushStack(qa(a[0])||qa(b[0])?b:c.unique(b))},andSelf:function(){return this.add(this.prevObject)}});c.each({parent:function(a){return(a=a.parentNode)&&a.nodeType!==11?a:null},parents:function(a){return c.dir(a,"parentNode")},parentsUntil:function(a,b,d){return c.dir(a,"parentNode", -d)},next:function(a){return c.nth(a,2,"nextSibling")},prev:function(a){return c.nth(a,2,"previousSibling")},nextAll:function(a){return c.dir(a,"nextSibling")},prevAll:function(a){return c.dir(a,"previousSibling")},nextUntil:function(a,b,d){return c.dir(a,"nextSibling",d)},prevUntil:function(a,b,d){return c.dir(a,"previousSibling",d)},siblings:function(a){return c.sibling(a.parentNode.firstChild,a)},children:function(a){return c.sibling(a.firstChild)},contents:function(a){return c.nodeName(a,"iframe")? -a.contentDocument||a.contentWindow.document:c.makeArray(a.childNodes)}},function(a,b){c.fn[a]=function(d,f){var e=c.map(this,b,d);eb.test(a)||(f=d);if(f&&typeof f==="string")e=c.filter(f,e);e=this.length>1?c.unique(e):e;if((this.length>1||gb.test(f))&&fb.test(a))e=e.reverse();return this.pushStack(e,a,R.call(arguments).join(","))}});c.extend({filter:function(a,b,d){if(d)a=":not("+a+")";return c.find.matches(a,b)},dir:function(a,b,d){var f=[];for(a=a[b];a&&a.nodeType!==9&&(d===w||a.nodeType!==1||!c(a).is(d));){a.nodeType=== -1&&f.push(a);a=a[b]}return f},nth:function(a,b,d){b=b||1;for(var f=0;a;a=a[d])if(a.nodeType===1&&++f===b)break;return a},sibling:function(a,b){for(var d=[];a;a=a.nextSibling)a.nodeType===1&&a!==b&&d.push(a);return d}});var Ja=/ jQuery\d+="(?:\d+|null)"/g,V=/^\s+/,Ka=/(<([\w:]+)[^>]*?)\/>/g,hb=/^(?:area|br|col|embed|hr|img|input|link|meta|param)$/i,La=/<([\w:]+)/,ib=/"},F={option:[1,""],legend:[1,"
","
"],thead:[1,"","
"],tr:[2,"","
"],td:[3,"","
"],col:[2,"","
"],area:[1,"",""],_default:[0,"",""]};F.optgroup=F.option;F.tbody=F.tfoot=F.colgroup=F.caption=F.thead;F.th=F.td;if(!c.support.htmlSerialize)F._default=[1,"div
","
"];c.fn.extend({text:function(a){if(c.isFunction(a))return this.each(function(b){var d= -c(this);d.text(a.call(this,b,d.text()))});if(typeof a!=="object"&&a!==w)return this.empty().append((this[0]&&this[0].ownerDocument||s).createTextNode(a));return c.text(this)},wrapAll:function(a){if(c.isFunction(a))return this.each(function(d){c(this).wrapAll(a.call(this,d))});if(this[0]){var b=c(a,this[0].ownerDocument).eq(0).clone(true);this[0].parentNode&&b.insertBefore(this[0]);b.map(function(){for(var d=this;d.firstChild&&d.firstChild.nodeType===1;)d=d.firstChild;return d}).append(this)}return this}, -wrapInner:function(a){if(c.isFunction(a))return this.each(function(b){c(this).wrapInner(a.call(this,b))});return this.each(function(){var b=c(this),d=b.contents();d.length?d.wrapAll(a):b.append(a)})},wrap:function(a){return this.each(function(){c(this).wrapAll(a)})},unwrap:function(){return this.parent().each(function(){c.nodeName(this,"body")||c(this).replaceWith(this.childNodes)}).end()},append:function(){return this.domManip(arguments,true,function(a){this.nodeType===1&&this.appendChild(a)})}, -prepend:function(){return this.domManip(arguments,true,function(a){this.nodeType===1&&this.insertBefore(a,this.firstChild)})},before:function(){if(this[0]&&this[0].parentNode)return this.domManip(arguments,false,function(b){this.parentNode.insertBefore(b,this)});else if(arguments.length){var a=c(arguments[0]);a.push.apply(a,this.toArray());return this.pushStack(a,"before",arguments)}},after:function(){if(this[0]&&this[0].parentNode)return this.domManip(arguments,false,function(b){this.parentNode.insertBefore(b, -this.nextSibling)});else if(arguments.length){var a=this.pushStack(this,"after",arguments);a.push.apply(a,c(arguments[0]).toArray());return a}},remove:function(a,b){for(var d=0,f;(f=this[d])!=null;d++)if(!a||c.filter(a,[f]).length){if(!b&&f.nodeType===1){c.cleanData(f.getElementsByTagName("*"));c.cleanData([f])}f.parentNode&&f.parentNode.removeChild(f)}return this},empty:function(){for(var a=0,b;(b=this[a])!=null;a++)for(b.nodeType===1&&c.cleanData(b.getElementsByTagName("*"));b.firstChild;)b.removeChild(b.firstChild); -return this},clone:function(a){var b=this.map(function(){if(!c.support.noCloneEvent&&!c.isXMLDoc(this)){var d=this.outerHTML,f=this.ownerDocument;if(!d){d=f.createElement("div");d.appendChild(this.cloneNode(true));d=d.innerHTML}return c.clean([d.replace(Ja,"").replace(/=([^="'>\s]+\/)>/g,'="$1">').replace(V,"")],f)[0]}else return this.cloneNode(true)});if(a===true){ra(this,b);ra(this.find("*"),b.find("*"))}return b},html:function(a){if(a===w)return this[0]&&this[0].nodeType===1?this[0].innerHTML.replace(Ja, -""):null;else if(typeof a==="string"&&!ta.test(a)&&(c.support.leadingWhitespace||!V.test(a))&&!F[(La.exec(a)||["",""])[1].toLowerCase()]){a=a.replace(Ka,Ma);try{for(var b=0,d=this.length;b0||e.cacheable||this.length>1?k.cloneNode(true):k)}o.length&&c.each(o,Qa)}return this}});c.fragments={};c.each({appendTo:"append",prependTo:"prepend",insertBefore:"before",insertAfter:"after",replaceAll:"replaceWith"},function(a,b){c.fn[a]=function(d){var f=[];d=c(d);var e=this.length===1&&this[0].parentNode;if(e&&e.nodeType===11&&e.childNodes.length===1&&d.length===1){d[b](this[0]); -return this}else{e=0;for(var j=d.length;e0?this.clone(true):this).get();c.fn[b].apply(c(d[e]),i);f=f.concat(i)}return this.pushStack(f,a,d.selector)}}});c.extend({clean:function(a,b,d,f){b=b||s;if(typeof b.createElement==="undefined")b=b.ownerDocument||b[0]&&b[0].ownerDocument||s;for(var e=[],j=0,i;(i=a[j])!=null;j++){if(typeof i==="number")i+="";if(i){if(typeof i==="string"&&!jb.test(i))i=b.createTextNode(i);else if(typeof i==="string"){i=i.replace(Ka,Ma);var o=(La.exec(i)||["", -""])[1].toLowerCase(),k=F[o]||F._default,n=k[0],r=b.createElement("div");for(r.innerHTML=k[1]+i+k[2];n--;)r=r.lastChild;if(!c.support.tbody){n=ib.test(i);o=o==="table"&&!n?r.firstChild&&r.firstChild.childNodes:k[1]===""&&!n?r.childNodes:[];for(k=o.length-1;k>=0;--k)c.nodeName(o[k],"tbody")&&!o[k].childNodes.length&&o[k].parentNode.removeChild(o[k])}!c.support.leadingWhitespace&&V.test(i)&&r.insertBefore(b.createTextNode(V.exec(i)[0]),r.firstChild);i=r.childNodes}if(i.nodeType)e.push(i);else e= -c.merge(e,i)}}if(d)for(j=0;e[j];j++)if(f&&c.nodeName(e[j],"script")&&(!e[j].type||e[j].type.toLowerCase()==="text/javascript"))f.push(e[j].parentNode?e[j].parentNode.removeChild(e[j]):e[j]);else{e[j].nodeType===1&&e.splice.apply(e,[j+1,0].concat(c.makeArray(e[j].getElementsByTagName("script"))));d.appendChild(e[j])}return e},cleanData:function(a){for(var b,d,f=c.cache,e=c.event.special,j=c.support.deleteExpando,i=0,o;(o=a[i])!=null;i++)if(d=o[c.expando]){b=f[d];if(b.events)for(var k in b.events)e[k]? -c.event.remove(o,k):Ca(o,k,b.handle);if(j)delete o[c.expando];else o.removeAttribute&&o.removeAttribute(c.expando);delete f[d]}}});var kb=/z-?index|font-?weight|opacity|zoom|line-?height/i,Na=/alpha\([^)]*\)/,Oa=/opacity=([^)]*)/,ha=/float/i,ia=/-([a-z])/ig,lb=/([A-Z])/g,mb=/^-?\d+(?:px)?$/i,nb=/^-?\d/,ob={position:"absolute",visibility:"hidden",display:"block"},pb=["Left","Right"],qb=["Top","Bottom"],rb=s.defaultView&&s.defaultView.getComputedStyle,Pa=c.support.cssFloat?"cssFloat":"styleFloat",ja= -function(a,b){return b.toUpperCase()};c.fn.css=function(a,b){return X(this,a,b,true,function(d,f,e){if(e===w)return c.curCSS(d,f);if(typeof e==="number"&&!kb.test(f))e+="px";c.style(d,f,e)})};c.extend({style:function(a,b,d){if(!a||a.nodeType===3||a.nodeType===8)return w;if((b==="width"||b==="height")&&parseFloat(d)<0)d=w;var f=a.style||a,e=d!==w;if(!c.support.opacity&&b==="opacity"){if(e){f.zoom=1;b=parseInt(d,10)+""==="NaN"?"":"alpha(opacity="+d*100+")";a=f.filter||c.curCSS(a,"filter")||"";f.filter= -Na.test(a)?a.replace(Na,b):b}return f.filter&&f.filter.indexOf("opacity=")>=0?parseFloat(Oa.exec(f.filter)[1])/100+"":""}if(ha.test(b))b=Pa;b=b.replace(ia,ja);if(e)f[b]=d;return f[b]},css:function(a,b,d,f){if(b==="width"||b==="height"){var e,j=b==="width"?pb:qb;function i(){e=b==="width"?a.offsetWidth:a.offsetHeight;f!=="border"&&c.each(j,function(){f||(e-=parseFloat(c.curCSS(a,"padding"+this,true))||0);if(f==="margin")e+=parseFloat(c.curCSS(a,"margin"+this,true))||0;else e-=parseFloat(c.curCSS(a, -"border"+this+"Width",true))||0})}a.offsetWidth!==0?i():c.swap(a,ob,i);return Math.max(0,Math.round(e))}return c.curCSS(a,b,d)},curCSS:function(a,b,d){var f,e=a.style;if(!c.support.opacity&&b==="opacity"&&a.currentStyle){f=Oa.test(a.currentStyle.filter||"")?parseFloat(RegExp.$1)/100+"":"";return f===""?"1":f}if(ha.test(b))b=Pa;if(!d&&e&&e[b])f=e[b];else if(rb){if(ha.test(b))b="float";b=b.replace(lb,"-$1").toLowerCase();e=a.ownerDocument.defaultView;if(!e)return null;if(a=e.getComputedStyle(a,null))f= -a.getPropertyValue(b);if(b==="opacity"&&f==="")f="1"}else if(a.currentStyle){d=b.replace(ia,ja);f=a.currentStyle[b]||a.currentStyle[d];if(!mb.test(f)&&nb.test(f)){b=e.left;var j=a.runtimeStyle.left;a.runtimeStyle.left=a.currentStyle.left;e.left=d==="fontSize"?"1em":f||0;f=e.pixelLeft+"px";e.left=b;a.runtimeStyle.left=j}}return f},swap:function(a,b,d){var f={};for(var e in b){f[e]=a.style[e];a.style[e]=b[e]}d.call(a);for(e in b)a.style[e]=f[e]}});if(c.expr&&c.expr.filters){c.expr.filters.hidden=function(a){var b= -a.offsetWidth,d=a.offsetHeight,f=a.nodeName.toLowerCase()==="tr";return b===0&&d===0&&!f?true:b>0&&d>0&&!f?false:c.curCSS(a,"display")==="none"};c.expr.filters.visible=function(a){return!c.expr.filters.hidden(a)}}var sb=J(),tb=//gi,ub=/select|textarea/i,vb=/color|date|datetime|email|hidden|month|number|password|range|search|tel|text|time|url|week/i,N=/=\?(&|$)/,ka=/\?/,wb=/(\?|&)_=.*?(&|$)/,xb=/^(\w+:)?\/\/([^\/?#]+)/,yb=/%20/g,zb=c.fn.load;c.fn.extend({load:function(a,b,d){if(typeof a!== -"string")return zb.call(this,a);else if(!this.length)return this;var f=a.indexOf(" ");if(f>=0){var e=a.slice(f,a.length);a=a.slice(0,f)}f="GET";if(b)if(c.isFunction(b)){d=b;b=null}else if(typeof b==="object"){b=c.param(b,c.ajaxSettings.traditional);f="POST"}var j=this;c.ajax({url:a,type:f,dataType:"html",data:b,complete:function(i,o){if(o==="success"||o==="notmodified")j.html(e?c("
").append(i.responseText.replace(tb,"")).find(e):i.responseText);d&&j.each(d,[i.responseText,o,i])}});return this}, -serialize:function(){return c.param(this.serializeArray())},serializeArray:function(){return this.map(function(){return this.elements?c.makeArray(this.elements):this}).filter(function(){return this.name&&!this.disabled&&(this.checked||ub.test(this.nodeName)||vb.test(this.type))}).map(function(a,b){a=c(this).val();return a==null?null:c.isArray(a)?c.map(a,function(d){return{name:b.name,value:d}}):{name:b.name,value:a}}).get()}});c.each("ajaxStart ajaxStop ajaxComplete ajaxError ajaxSuccess ajaxSend".split(" "), -function(a,b){c.fn[b]=function(d){return this.bind(b,d)}});c.extend({get:function(a,b,d,f){if(c.isFunction(b)){f=f||d;d=b;b=null}return c.ajax({type:"GET",url:a,data:b,success:d,dataType:f})},getScript:function(a,b){return c.get(a,null,b,"script")},getJSON:function(a,b,d){return c.get(a,b,d,"json")},post:function(a,b,d,f){if(c.isFunction(b)){f=f||d;d=b;b={}}return c.ajax({type:"POST",url:a,data:b,success:d,dataType:f})},ajaxSetup:function(a){c.extend(c.ajaxSettings,a)},ajaxSettings:{url:location.href, -global:true,type:"GET",contentType:"application/x-www-form-urlencoded",processData:true,async:true,xhr:A.XMLHttpRequest&&(A.location.protocol!=="file:"||!A.ActiveXObject)?function(){return new A.XMLHttpRequest}:function(){try{return new A.ActiveXObject("Microsoft.XMLHTTP")}catch(a){}},accepts:{xml:"application/xml, text/xml",html:"text/html",script:"text/javascript, application/javascript",json:"application/json, text/javascript",text:"text/plain",_default:"*/*"}},lastModified:{},etag:{},ajax:function(a){function b(){e.success&& -e.success.call(k,o,i,x);e.global&&f("ajaxSuccess",[x,e])}function d(){e.complete&&e.complete.call(k,x,i);e.global&&f("ajaxComplete",[x,e]);e.global&&!--c.active&&c.event.trigger("ajaxStop")}function f(q,p){(e.context?c(e.context):c.event).trigger(q,p)}var e=c.extend(true,{},c.ajaxSettings,a),j,i,o,k=a&&a.context||e,n=e.type.toUpperCase();if(e.data&&e.processData&&typeof e.data!=="string")e.data=c.param(e.data,e.traditional);if(e.dataType==="jsonp"){if(n==="GET")N.test(e.url)||(e.url+=(ka.test(e.url)? -"&":"?")+(e.jsonp||"callback")+"=?");else if(!e.data||!N.test(e.data))e.data=(e.data?e.data+"&":"")+(e.jsonp||"callback")+"=?";e.dataType="json"}if(e.dataType==="json"&&(e.data&&N.test(e.data)||N.test(e.url))){j=e.jsonpCallback||"jsonp"+sb++;if(e.data)e.data=(e.data+"").replace(N,"="+j+"$1");e.url=e.url.replace(N,"="+j+"$1");e.dataType="script";A[j]=A[j]||function(q){o=q;b();d();A[j]=w;try{delete A[j]}catch(p){}z&&z.removeChild(C)}}if(e.dataType==="script"&&e.cache===null)e.cache=false;if(e.cache=== -false&&n==="GET"){var r=J(),u=e.url.replace(wb,"$1_="+r+"$2");e.url=u+(u===e.url?(ka.test(e.url)?"&":"?")+"_="+r:"")}if(e.data&&n==="GET")e.url+=(ka.test(e.url)?"&":"?")+e.data;e.global&&!c.active++&&c.event.trigger("ajaxStart");r=(r=xb.exec(e.url))&&(r[1]&&r[1]!==location.protocol||r[2]!==location.host);if(e.dataType==="script"&&n==="GET"&&r){var z=s.getElementsByTagName("head")[0]||s.documentElement,C=s.createElement("script");C.src=e.url;if(e.scriptCharset)C.charset=e.scriptCharset;if(!j){var B= -false;C.onload=C.onreadystatechange=function(){if(!B&&(!this.readyState||this.readyState==="loaded"||this.readyState==="complete")){B=true;b();d();C.onload=C.onreadystatechange=null;z&&C.parentNode&&z.removeChild(C)}}}z.insertBefore(C,z.firstChild);return w}var E=false,x=e.xhr();if(x){e.username?x.open(n,e.url,e.async,e.username,e.password):x.open(n,e.url,e.async);try{if(e.data||a&&a.contentType)x.setRequestHeader("Content-Type",e.contentType);if(e.ifModified){c.lastModified[e.url]&&x.setRequestHeader("If-Modified-Since", -c.lastModified[e.url]);c.etag[e.url]&&x.setRequestHeader("If-None-Match",c.etag[e.url])}r||x.setRequestHeader("X-Requested-With","XMLHttpRequest");x.setRequestHeader("Accept",e.dataType&&e.accepts[e.dataType]?e.accepts[e.dataType]+", */*":e.accepts._default)}catch(ga){}if(e.beforeSend&&e.beforeSend.call(k,x,e)===false){e.global&&!--c.active&&c.event.trigger("ajaxStop");x.abort();return false}e.global&&f("ajaxSend",[x,e]);var g=x.onreadystatechange=function(q){if(!x||x.readyState===0||q==="abort"){E|| -d();E=true;if(x)x.onreadystatechange=c.noop}else if(!E&&x&&(x.readyState===4||q==="timeout")){E=true;x.onreadystatechange=c.noop;i=q==="timeout"?"timeout":!c.httpSuccess(x)?"error":e.ifModified&&c.httpNotModified(x,e.url)?"notmodified":"success";var p;if(i==="success")try{o=c.httpData(x,e.dataType,e)}catch(v){i="parsererror";p=v}if(i==="success"||i==="notmodified")j||b();else c.handleError(e,x,i,p);d();q==="timeout"&&x.abort();if(e.async)x=null}};try{var h=x.abort;x.abort=function(){x&&h.call(x); -g("abort")}}catch(l){}e.async&&e.timeout>0&&setTimeout(function(){x&&!E&&g("timeout")},e.timeout);try{x.send(n==="POST"||n==="PUT"||n==="DELETE"?e.data:null)}catch(m){c.handleError(e,x,null,m);d()}e.async||g();return x}},handleError:function(a,b,d,f){if(a.error)a.error.call(a.context||a,b,d,f);if(a.global)(a.context?c(a.context):c.event).trigger("ajaxError",[b,a,f])},active:0,httpSuccess:function(a){try{return!a.status&&location.protocol==="file:"||a.status>=200&&a.status<300||a.status===304||a.status=== -1223||a.status===0}catch(b){}return false},httpNotModified:function(a,b){var d=a.getResponseHeader("Last-Modified"),f=a.getResponseHeader("Etag");if(d)c.lastModified[b]=d;if(f)c.etag[b]=f;return a.status===304||a.status===0},httpData:function(a,b,d){var f=a.getResponseHeader("content-type")||"",e=b==="xml"||!b&&f.indexOf("xml")>=0;a=e?a.responseXML:a.responseText;e&&a.documentElement.nodeName==="parsererror"&&c.error("parsererror");if(d&&d.dataFilter)a=d.dataFilter(a,b);if(typeof a==="string")if(b=== -"json"||!b&&f.indexOf("json")>=0)a=c.parseJSON(a);else if(b==="script"||!b&&f.indexOf("javascript")>=0)c.globalEval(a);return a},param:function(a,b){function d(i,o){if(c.isArray(o))c.each(o,function(k,n){b||/\[\]$/.test(i)?f(i,n):d(i+"["+(typeof n==="object"||c.isArray(n)?k:"")+"]",n)});else!b&&o!=null&&typeof o==="object"?c.each(o,function(k,n){d(i+"["+k+"]",n)}):f(i,o)}function f(i,o){o=c.isFunction(o)?o():o;e[e.length]=encodeURIComponent(i)+"="+encodeURIComponent(o)}var e=[];if(b===w)b=c.ajaxSettings.traditional; -if(c.isArray(a)||a.jquery)c.each(a,function(){f(this.name,this.value)});else for(var j in a)d(j,a[j]);return e.join("&").replace(yb,"+")}});var la={},Ab=/toggle|show|hide/,Bb=/^([+-]=)?([\d+-.]+)(.*)$/,W,va=[["height","marginTop","marginBottom","paddingTop","paddingBottom"],["width","marginLeft","marginRight","paddingLeft","paddingRight"],["opacity"]];c.fn.extend({show:function(a,b){if(a||a===0)return this.animate(K("show",3),a,b);else{a=0;for(b=this.length;a").appendTo("body");f=e.css("display");if(f==="none")f="block";e.remove();la[d]=f}c.data(this[a],"olddisplay",f)}}a=0;for(b=this.length;a=0;f--)if(d[f].elem===this){b&&d[f](true);d.splice(f,1)}});b||this.dequeue();return this}});c.each({slideDown:K("show",1),slideUp:K("hide",1),slideToggle:K("toggle",1),fadeIn:{opacity:"show"},fadeOut:{opacity:"hide"}},function(a,b){c.fn[a]=function(d,f){return this.animate(b,d,f)}});c.extend({speed:function(a,b,d){var f=a&&typeof a==="object"?a:{complete:d||!d&&b||c.isFunction(a)&&a,duration:a,easing:d&&b||b&&!c.isFunction(b)&&b};f.duration=c.fx.off?0:typeof f.duration=== -"number"?f.duration:c.fx.speeds[f.duration]||c.fx.speeds._default;f.old=f.complete;f.complete=function(){f.queue!==false&&c(this).dequeue();c.isFunction(f.old)&&f.old.call(this)};return f},easing:{linear:function(a,b,d,f){return d+f*a},swing:function(a,b,d,f){return(-Math.cos(a*Math.PI)/2+0.5)*f+d}},timers:[],fx:function(a,b,d){this.options=b;this.elem=a;this.prop=d;if(!b.orig)b.orig={}}});c.fx.prototype={update:function(){this.options.step&&this.options.step.call(this.elem,this.now,this);(c.fx.step[this.prop]|| -c.fx.step._default)(this);if((this.prop==="height"||this.prop==="width")&&this.elem.style)this.elem.style.display="block"},cur:function(a){if(this.elem[this.prop]!=null&&(!this.elem.style||this.elem.style[this.prop]==null))return this.elem[this.prop];return(a=parseFloat(c.css(this.elem,this.prop,a)))&&a>-10000?a:parseFloat(c.curCSS(this.elem,this.prop))||0},custom:function(a,b,d){function f(j){return e.step(j)}this.startTime=J();this.start=a;this.end=b;this.unit=d||this.unit||"px";this.now=this.start; -this.pos=this.state=0;var e=this;f.elem=this.elem;if(f()&&c.timers.push(f)&&!W)W=setInterval(c.fx.tick,13)},show:function(){this.options.orig[this.prop]=c.style(this.elem,this.prop);this.options.show=true;this.custom(this.prop==="width"||this.prop==="height"?1:0,this.cur());c(this.elem).show()},hide:function(){this.options.orig[this.prop]=c.style(this.elem,this.prop);this.options.hide=true;this.custom(this.cur(),0)},step:function(a){var b=J(),d=true;if(a||b>=this.options.duration+this.startTime){this.now= -this.end;this.pos=this.state=1;this.update();this.options.curAnim[this.prop]=true;for(var f in this.options.curAnim)if(this.options.curAnim[f]!==true)d=false;if(d){if(this.options.display!=null){this.elem.style.overflow=this.options.overflow;a=c.data(this.elem,"olddisplay");this.elem.style.display=a?a:this.options.display;if(c.css(this.elem,"display")==="none")this.elem.style.display="block"}this.options.hide&&c(this.elem).hide();if(this.options.hide||this.options.show)for(var e in this.options.curAnim)c.style(this.elem, -e,this.options.orig[e]);this.options.complete.call(this.elem)}return false}else{e=b-this.startTime;this.state=e/this.options.duration;a=this.options.easing||(c.easing.swing?"swing":"linear");this.pos=c.easing[this.options.specialEasing&&this.options.specialEasing[this.prop]||a](this.state,e,0,1,this.options.duration);this.now=this.start+(this.end-this.start)*this.pos;this.update()}return true}};c.extend(c.fx,{tick:function(){for(var a=c.timers,b=0;b
"; -a.insertBefore(b,a.firstChild);d=b.firstChild;f=d.firstChild;e=d.nextSibling.firstChild.firstChild;this.doesNotAddBorder=f.offsetTop!==5;this.doesAddBorderForTableAndCells=e.offsetTop===5;f.style.position="fixed";f.style.top="20px";this.supportsFixedPosition=f.offsetTop===20||f.offsetTop===15;f.style.position=f.style.top="";d.style.overflow="hidden";d.style.position="relative";this.subtractsBorderForOverflowNotVisible=f.offsetTop===-5;this.doesNotIncludeMarginInBodyOffset=a.offsetTop!==j;a.removeChild(b); -c.offset.initialize=c.noop},bodyOffset:function(a){var b=a.offsetTop,d=a.offsetLeft;c.offset.initialize();if(c.offset.doesNotIncludeMarginInBodyOffset){b+=parseFloat(c.curCSS(a,"marginTop",true))||0;d+=parseFloat(c.curCSS(a,"marginLeft",true))||0}return{top:b,left:d}},setOffset:function(a,b,d){if(/static/.test(c.curCSS(a,"position")))a.style.position="relative";var f=c(a),e=f.offset(),j=parseInt(c.curCSS(a,"top",true),10)||0,i=parseInt(c.curCSS(a,"left",true),10)||0;if(c.isFunction(b))b=b.call(a, -d,e);d={top:b.top-e.top+j,left:b.left-e.left+i};"using"in b?b.using.call(a,d):f.css(d)}};c.fn.extend({position:function(){if(!this[0])return null;var a=this[0],b=this.offsetParent(),d=this.offset(),f=/^body|html$/i.test(b[0].nodeName)?{top:0,left:0}:b.offset();d.top-=parseFloat(c.curCSS(a,"marginTop",true))||0;d.left-=parseFloat(c.curCSS(a,"marginLeft",true))||0;f.top+=parseFloat(c.curCSS(b[0],"borderTopWidth",true))||0;f.left+=parseFloat(c.curCSS(b[0],"borderLeftWidth",true))||0;return{top:d.top- -f.top,left:d.left-f.left}},offsetParent:function(){return this.map(function(){for(var a=this.offsetParent||s.body;a&&!/^body|html$/i.test(a.nodeName)&&c.css(a,"position")==="static";)a=a.offsetParent;return a})}});c.each(["Left","Top"],function(a,b){var d="scroll"+b;c.fn[d]=function(f){var e=this[0],j;if(!e)return null;if(f!==w)return this.each(function(){if(j=wa(this))j.scrollTo(!a?f:c(j).scrollLeft(),a?f:c(j).scrollTop());else this[d]=f});else return(j=wa(e))?"pageXOffset"in j?j[a?"pageYOffset": -"pageXOffset"]:c.support.boxModel&&j.document.documentElement[d]||j.document.body[d]:e[d]}});c.each(["Height","Width"],function(a,b){var d=b.toLowerCase();c.fn["inner"+b]=function(){return this[0]?c.css(this[0],d,false,"padding"):null};c.fn["outer"+b]=function(f){return this[0]?c.css(this[0],d,false,f?"margin":"border"):null};c.fn[d]=function(f){var e=this[0];if(!e)return f==null?null:this;if(c.isFunction(f))return this.each(function(j){var i=c(this);i[d](f.call(this,j,i[d]()))});return"scrollTo"in -e&&e.document?e.document.compatMode==="CSS1Compat"&&e.document.documentElement["client"+b]||e.document.body["client"+b]:e.nodeType===9?Math.max(e.documentElement["client"+b],e.body["scroll"+b],e.documentElement["scroll"+b],e.body["offset"+b],e.documentElement["offset"+b]):f===w?c.css(e,d):this.css(d,typeof f==="string"?f:f+"px")}});A.jQuery=A.$=c})(window); diff --git a/docs/_static/minus.png b/docs/_static/minus.png deleted file mode 100644 index da1c5620d1..0000000000 Binary files a/docs/_static/minus.png and /dev/null differ diff --git a/docs/_static/navigation.png b/docs/_static/navigation.png deleted file mode 100644 index 1081dc1439..0000000000 Binary files a/docs/_static/navigation.png and /dev/null differ diff --git a/docs/_static/plus.png b/docs/_static/plus.png deleted file mode 100644 index b3cb37425e..0000000000 Binary files a/docs/_static/plus.png and /dev/null differ diff --git a/docs/_static/pygments.css b/docs/_static/pygments.css deleted file mode 100644 index 1a14f2ae1a..0000000000 --- a/docs/_static/pygments.css +++ /dev/null @@ -1,62 +0,0 @@ -.highlight .hll { background-color: #ffffcc } -.highlight { background: #eeffcc; } -.highlight .c { color: #408090; font-style: italic } /* Comment */ -.highlight .err { border: 1px solid #FF0000 } /* Error */ -.highlight .k { color: #007020; font-weight: bold } /* Keyword */ -.highlight .o { color: #666666 } /* Operator */ -.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ -.highlight .cp { color: #007020 } /* Comment.Preproc */ -.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ -.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ -.highlight .gd { color: #A00000 } /* Generic.Deleted */ -.highlight .ge { font-style: italic } /* Generic.Emph */ -.highlight .gr { color: #FF0000 } /* Generic.Error */ -.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ -.highlight .gi { color: #00A000 } /* Generic.Inserted */ -.highlight .go { color: #303030 } /* Generic.Output */ -.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ -.highlight .gs { font-weight: bold } /* Generic.Strong */ -.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ -.highlight .gt { color: #0040D0 } /* Generic.Traceback */ -.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ -.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ -.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ -.highlight .kp { color: #007020 } /* Keyword.Pseudo */ -.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ -.highlight .kt { color: #902000 } /* Keyword.Type */ -.highlight .m { color: #208050 } /* Literal.Number */ -.highlight .s { color: #4070a0 } /* Literal.String */ -.highlight .na { color: #4070a0 } /* Name.Attribute */ -.highlight .nb { color: #007020 } /* Name.Builtin */ -.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ -.highlight .no { color: #60add5 } /* Name.Constant */ -.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ -.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ -.highlight .ne { color: #007020 } /* Name.Exception */ -.highlight .nf { color: #06287e } /* Name.Function */ -.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ -.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ -.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ -.highlight .nv { color: #bb60d5 } /* Name.Variable */ -.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ -.highlight .w { color: #bbbbbb } /* Text.Whitespace */ -.highlight .mf { color: #208050 } /* Literal.Number.Float */ -.highlight .mh { color: #208050 } /* Literal.Number.Hex */ -.highlight .mi { color: #208050 } /* Literal.Number.Integer */ -.highlight .mo { color: #208050 } /* Literal.Number.Oct */ -.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ -.highlight .sc { color: #4070a0 } /* Literal.String.Char */ -.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ -.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ -.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ -.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ -.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ -.highlight .sx { color: #c65d09 } /* Literal.String.Other */ -.highlight .sr { color: #235388 } /* Literal.String.Regex */ -.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ -.highlight .ss { color: #517918 } /* Literal.String.Symbol */ -.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ -.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ -.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ -.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ -.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/docs/_static/searchtools.js b/docs/_static/searchtools.js deleted file mode 100644 index 663be4c909..0000000000 --- a/docs/_static/searchtools.js +++ /dev/null @@ -1,560 +0,0 @@ -/* - * searchtools.js_t - * ~~~~~~~~~~~~~~~~ - * - * Sphinx JavaScript utilties for the full-text search. - * - * :copyright: Copyright 2007-2011 by the Sphinx team, see AUTHORS. - * :license: BSD, see LICENSE for details. - * - */ - -/** - * helper function to return a node containing the - * search summary for a given text. keywords is a list - * of stemmed words, hlwords is the list of normal, unstemmed - * words. the first one is used to find the occurance, the - * latter for highlighting it. - */ - -jQuery.makeSearchSummary = function(text, keywords, hlwords) { - var textLower = text.toLowerCase(); - var start = 0; - $.each(keywords, function() { - var i = textLower.indexOf(this.toLowerCase()); - if (i > -1) - start = i; - }); - start = Math.max(start - 120, 0); - var excerpt = ((start > 0) ? '...' : '') + - $.trim(text.substr(start, 240)) + - ((start + 240 - text.length) ? '...' : ''); - var rv = $('
').text(excerpt); - $.each(hlwords, function() { - rv = rv.highlightText(this, 'highlighted'); - }); - return rv; -} - - -/** - * Porter Stemmer - */ -var Stemmer = function() { - - var step2list = { - ational: 'ate', - tional: 'tion', - enci: 'ence', - anci: 'ance', - izer: 'ize', - bli: 'ble', - alli: 'al', - entli: 'ent', - eli: 'e', - ousli: 'ous', - ization: 'ize', - ation: 'ate', - ator: 'ate', - alism: 'al', - iveness: 'ive', - fulness: 'ful', - ousness: 'ous', - aliti: 'al', - iviti: 'ive', - biliti: 'ble', - logi: 'log' - }; - - var step3list = { - icate: 'ic', - ative: '', - alize: 'al', - iciti: 'ic', - ical: 'ic', - ful: '', - ness: '' - }; - - var c = "[^aeiou]"; // consonant - var v = "[aeiouy]"; // vowel - var C = c + "[^aeiouy]*"; // consonant sequence - var V = v + "[aeiou]*"; // vowel sequence - - var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 - var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 - var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 - var s_v = "^(" + C + ")?" + v; // vowel in stem - - this.stemWord = function (w) { - var stem; - var suffix; - var firstch; - var origword = w; - - if (w.length < 3) - return w; - - var re; - var re2; - var re3; - var re4; - - firstch = w.substr(0,1); - if (firstch == "y") - w = firstch.toUpperCase() + w.substr(1); - - // Step 1a - re = /^(.+?)(ss|i)es$/; - re2 = /^(.+?)([^s])s$/; - - if (re.test(w)) - w = w.replace(re,"$1$2"); - else if (re2.test(w)) - w = w.replace(re2,"$1$2"); - - // Step 1b - re = /^(.+?)eed$/; - re2 = /^(.+?)(ed|ing)$/; - if (re.test(w)) { - var fp = re.exec(w); - re = new RegExp(mgr0); - if (re.test(fp[1])) { - re = /.$/; - w = w.replace(re,""); - } - } - else if (re2.test(w)) { - var fp = re2.exec(w); - stem = fp[1]; - re2 = new RegExp(s_v); - if (re2.test(stem)) { - w = stem; - re2 = /(at|bl|iz)$/; - re3 = new RegExp("([^aeiouylsz])\\1$"); - re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); - if (re2.test(w)) - w = w + "e"; - else if (re3.test(w)) { - re = /.$/; - w = w.replace(re,""); - } - else if (re4.test(w)) - w = w + "e"; - } - } - - // Step 1c - re = /^(.+?)y$/; - if (re.test(w)) { - var fp = re.exec(w); - stem = fp[1]; - re = new RegExp(s_v); - if (re.test(stem)) - w = stem + "i"; - } - - // Step 2 - re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; - if (re.test(w)) { - var fp = re.exec(w); - stem = fp[1]; - suffix = fp[2]; - re = new RegExp(mgr0); - if (re.test(stem)) - w = stem + step2list[suffix]; - } - - // Step 3 - re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; - if (re.test(w)) { - var fp = re.exec(w); - stem = fp[1]; - suffix = fp[2]; - re = new RegExp(mgr0); - if (re.test(stem)) - w = stem + step3list[suffix]; - } - - // Step 4 - re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; - re2 = /^(.+?)(s|t)(ion)$/; - if (re.test(w)) { - var fp = re.exec(w); - stem = fp[1]; - re = new RegExp(mgr1); - if (re.test(stem)) - w = stem; - } - else if (re2.test(w)) { - var fp = re2.exec(w); - stem = fp[1] + fp[2]; - re2 = new RegExp(mgr1); - if (re2.test(stem)) - w = stem; - } - - // Step 5 - re = /^(.+?)e$/; - if (re.test(w)) { - var fp = re.exec(w); - stem = fp[1]; - re = new RegExp(mgr1); - re2 = new RegExp(meq1); - re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); - if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) - w = stem; - } - re = /ll$/; - re2 = new RegExp(mgr1); - if (re.test(w) && re2.test(w)) { - re = /.$/; - w = w.replace(re,""); - } - - // and turn initial Y back to y - if (firstch == "y") - w = firstch.toLowerCase() + w.substr(1); - return w; - } -} - - -/** - * Search Module - */ -var Search = { - - _index : null, - _queued_query : null, - _pulse_status : -1, - - init : function() { - var params = $.getQueryParameters(); - if (params.q) { - var query = params.q[0]; - $('input[name="q"]')[0].value = query; - this.performSearch(query); - } - }, - - loadIndex : function(url) { - $.ajax({type: "GET", url: url, data: null, success: null, - dataType: "script", cache: true}); - }, - - setIndex : function(index) { - var q; - this._index = index; - if ((q = this._queued_query) !== null) { - this._queued_query = null; - Search.query(q); - } - }, - - hasIndex : function() { - return this._index !== null; - }, - - deferQuery : function(query) { - this._queued_query = query; - }, - - stopPulse : function() { - this._pulse_status = 0; - }, - - startPulse : function() { - if (this._pulse_status >= 0) - return; - function pulse() { - Search._pulse_status = (Search._pulse_status + 1) % 4; - var dotString = ''; - for (var i = 0; i < Search._pulse_status; i++) - dotString += '.'; - Search.dots.text(dotString); - if (Search._pulse_status > -1) - window.setTimeout(pulse, 500); - }; - pulse(); - }, - - /** - * perform a search for something - */ - performSearch : function(query) { - // create the required interface elements - this.out = $('#search-results'); - this.title = $('

' + _('Searching') + '

').appendTo(this.out); - this.dots = $('').appendTo(this.title); - this.status = $('

').appendTo(this.out); - this.output = $('