Skip to content

Latest commit

 

History

History
25 lines (16 loc) · 946 Bytes

README.md

File metadata and controls

25 lines (16 loc) · 946 Bytes

Classifying and ranking text using NLTK and The Nameless Horror

This is a small demo showing basic NLTK functionality (tokenizing, classifying, frequency counting), using The Collected Works of H.P. Lovecraft as a corpus. The code ought to be fairly self-explanatory, however:

  • The script will write a file, results.pickle, to your current working directory upon its first run, because classification is quite slow. This allows you to tune the tag set to be used for frequency counting without having to wait for re-classification each time.
  • There's a Jupyter notebook for interactive exploration

Requirements

  • Requests
  • BeautifulSoup4
  • NLTK
  • Matplotlib >= 1.5.x

And for the Notebook:

  • Pandas
  • Jupyter

License

MIT, copyright Stephan Hügel 2013

Fhtagn!