Video Demo: https://youtu.be/rakVnIghJYE
The goal of Austen Aloud is to trace speech acts in Jane Austen's novels. My work is related to a larger project (which will hopefully appear here eventually), and this CS50x final project only focuses on Austen's most popular novel, Pride and Prejudice. The goal of this project is to print a chronological list of speech acts from Pride and Prejudice extracted from a TEI/XML file that could - theoretically - be used as a playtext.
PnP.xml: This TEI/XML-encoded version of Jane Austen's Pride and Prejudice was adapted from the meticulously-coded TEI/XML document on Austen Said at the University of Nebraska. With the help of two research assistants, Katie Haire and Ziona Kocher, my colleague Gerard Cohen-Vrignaud and I directed checking and revisions to the UNL files. We had Katie and Ziona flag each utterance of spoken language using "ref" tags and each conversation with "q" tags. This project is a small part of a larger effort to publicize this encoding work. In this case, the final project will display all the spoken language in Pride and Prejudice and will thus read like a script of the novel. This program will work for other Austen novels we have encoded; only the xml file name in characterWrite.py and speechWrite.py needs to be changes and the title of the index webpage.
databaseBuilder.py and speeches.db: In order to upload the data from the TEI/XML file, I first had to export it to a database. I built a database "speeches.db" using the program databaseBuilder.py which created two tables - one for character names and ids and one for speech acts and character ids. If you are switching TEI/XML files, you need to run databaseBuilder.py first.
speechWrite.py and /drafts: speechWrite.py was the hardest part of the project for me because I had to figure out how to use the Element Tree Python library to parse XML files. It took me so long to figure out how to access the material contained in a specific element, which I finally figured out through root.iter(tag="{http://www.tei-c.org/ns/1.0}ref"). The goal of this program was to identify all of the speech acts in the XML file (via "ref") and associate them in a list of lists with the id of the character who is speaking them. Once I conquered that part of the project, everything moved quickly. For all of my trial-and-error tests and code parcels, feel free to browse the drafts folder.
characterWrite.py: Like speechWrite.py, characterWrite.py was a difficult program, and I had to go through the header of the TEI/XML file to extract character names and ids. I had some trouble accessing the xml:id attribute because the namespace wasn't working, but I was proud of my workaround using list(dict.values(i.attrib))[1]. I also had to add some code to add/eliminate whitespace for the actual names. The second step if the files are switched - before starting Flask - is to run both speechWrite.py and characterWrite.py. After that, the user may start the Flask server.
joinTables.sql: This is simply the SQL command (that later appears in app.py) that joins the character and speech tables so that character names are associated with speeches instead of merely ids.
app.py and requirements.txt app.py is the Python file that builds the Flask web application and actually joins the two tables. It then gives directions for the joined data to be displayed on the index webpage. requirements.txt is the standard list of requirements needed for Flask.
/templates and /static These folders contain the index and layout html pages and the CSS file. The display is pretty straightforward: I adapted the layout from the finance problem set, and I use a Jinja loop to iterate through all of the rows of the joined table within the index. I tried to make the CSS styling Austen-like.
I am very pleased with my project and "had I but world enough, and time", I might have incorporated the three Python programs (databaseBuilder.py, speechWrite.py, and characterWrite.py) into a helpers.py file so that the user could go straight to Flask if they decided to change novels, but I didn't want to mess with anything since it all worked! The databaseBuilder.py would also need commands to drop the previous database and tables.