Code and data for the chapter "Computational Methods for the Analysis of Fiction Genres"

The Project Gutenberg metadata is from: https://github.com/dh-trier/pg-fiction/

The genre labels were selected using makelabels.py.

The cleaned corpus with Project Gutenberg texts is available here: http://corpus.leeds.ac.uk/serge/webgenres/gutenberg-clean.ol.xz

The topic model was created using Mallet: https://mimno.github.io/Mallet/ See lda.sh.

The list of stop words and names sw_jockers.txt (included here for reproducibility) comes from https://www.matthewjockers.net/2013/04/12/secret-recipe-for-topic-modeling-themes/

An interactive browser of the topic model is available here: https://urd2.let.rug.nl/~andreas/lorentztopics/

The data for the topic model browser is generated by convmeta.py; the topic model browser is https://github.com/agoldst/dfr-browser

The Biber features were extracted using https://github.com/ssharoff/biberpy

Readability features were extracted using https://github.com/andreasvc/readability/

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Biber MDA plots.ipynb		Biber MDA plots.ipynb
Classification Experiments.ipynb		Classification Experiments.ipynb
LICENSE		LICENSE
README.md		README.md
bow5000.csv.gz		bow5000.csv.gz
convmeta.py		convmeta.py
doctopics_pertext.tsv.gz		doctopics_pertext.tsv.gz
extractfreqs.py		extractfreqs.py
getreadability.py		getreadability.py
gutenberg-biber.dat.gz		gutenberg-biber.dat.gz
gutenberg-features-fa.dat.xz		gutenberg-features-fa.dat.xz
gutenberg-genres-fa.dat.xz		gutenberg-genres-fa.dat.xz
gutenberg_readability.csv		gutenberg_readability.csv
lda.sh		lda.sh
makelabels.py		makelabels.py
metadata-pg-genres-subset.tsv		metadata-pg-genres-subset.tsv
readability analysis.ipynb		readability analysis.ipynb
requirements.txt		requirements.txt
sw_jockers.txt		sw_jockers.txt
test.txt		test.txt
topic model.ipynb		topic model.ipynb
topickeys.txt		topickeys.txt
topicmodelpreprocess.py		topicmodelpreprocess.py
train.txt		train.txt

Provide feedback