This is a small demo showing basic NLTK functionality (tokenizing, classifying, frequency counting), using The Collected Works of H.P. Lovecraft as a corpus. The code ought to be fairly self-explanatory, however:
- The script will write a file,
results.pickle
, to your current working directory upon its first run, because classification is quite slow. This allows you to tune the tag set to be used for frequency counting without having to wait for re-classification each time. - There's a Jupyter notebook for interactive exploration
- Requests
- BeautifulSoup4
- NLTK
- Matplotlib >= 1.5.x
And for the Notebook:
- Pandas
- Jupyter
MIT, copyright Stephan Hügel 2013