Classifying and ranking text using NLTK and The Nameless Horror

This is a small demo showing basic NLTK functionality (tokenizing, classifying, frequency counting), using The Collected Works of H.P. Lovecraft as a corpus. The code ought to be fairly self-explanatory, however:

The script will write a file, results.pickle, to your current working directory upon its first run, because classification is quite slow. This allows you to tune the tag set to be used for frequency counting without having to wait for re-classification each time.
There's a Jupyter notebook for interactive exploration

Requirements

Requests
BeautifulSoup4
NLTK
Matplotlib >= 1.5.x

And for the Notebook:

Pandas
Jupyter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Classifying and ranking text using NLTK and The Nameless Horror

Requirements

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Classifying and ranking text using NLTK and The Nameless Horror

Requirements

License