Reuters21578 SPIMI Indexer and Searcher

Installation

Download the Reuters 21578 corpus from http://www.daviddlewis.com/resources/testcollections/reuters21578/. Make sure the unzipped folder is at the same level as this repository and is called reuters21578.

Download the dependencies in requirements.txt.

Run the first subproject with: $ python subproject1.py. This subproject:

Gets all articles in reuters21578.
Compute certain statistics about them for use later.
Creates the naive and SPIMI indexes for the files.
Computes the difference in how long each index took to create their first 10,000 dictionary terms.

Run the second subproject with: $ python subproject2.py. This subproject:

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
subproject1.py		subproject1.py
subproject2.py		subproject2.py
utilities.py		utilities.py