My project was inspired by my interest in following US politics, combined with my interest in US history, as well as my passion for learning new technologies. So that being the case, when I was given the task of creating a final project for my data visualization class I took it upon myself to go big. This project was created using data from every presidential speech in US history which I sourced from the Miller Center. I learned a lot about working with text data during this project, as well as about how language models are built. The data cleaning alone for this project took weeks and required a lot of trial and error. I am very happy with how this project turned out, and hope you may find the results as interesting as I did!
My visualization tells much of the story of American politics in one place. All of the presidential speeches given in the United States have been included in this model. Say for example, you would like to compare the Democratic Party and Republican Party of the Regan era with the modern parties of today; with this visualization that would be incredibly easy to do. It is the job of the presidential speech writers to distill all of the talking points and debate topics that have been popular in the mainstream into their speeches. To put it another way: all of our nations top issues of the day are the ones that are brought to the table when it comes down to the writing of presidential speeches.
The main results I was interested in seeing when going to create this visualization were found by comparing the hawkishness of the two modern mainstream parties, with the parties during the 40s (during the 5th Party System of United States Politics). The results I found were pretty interesting; I discovered that the Republican party has really not gotten as far away from it’s roots with it’s modern presidential speeches as many might would believe. The China hawkishness seen in their speeches, and the framing of speaking of their economic policies as being “pro-business” really hasn’t changed since the 40’s. The democratic party, on the other hand, seems to have had a lot more changes in the way that they frame political issues through their presidential speeches. During the 40’s, it would seem that they used a lot more overtly pro-consumer language in their speeches, whereas in more recent years they seem to tow the line between using language that is either pro-business or pro-consumer much more.
My visualization, as previously stated can be accessed primarily through the HTML page I set up to act as the “home page” of the project, which you can access via github pages here. Once this page loads into your browser, from top to bottom as you go down the dates of the speeches in each era get more current. In each category you can select the party you would like to look at, and click on it to load the model in a new window. Once the model loads in (keep in mind it will be a tad slow), you will see a scatter chart with the party you are directly looking at on the Y-axis, and the parties that you are comparing it against during that era on the X-axis. As you go further up and to the right, you will see words that appear more frequently for all parties during that era. As you look further downward and leftward, you will see words that are used less frequently within all parties during that era. In the top left corner of the chart you will see words that are almost exclusively by the main party that you are looking at within that time-period, and in the bottom right corner of the chart you will see words that are predominantly used by parties aside from the one you are looking at in that era. For interactivity, you can hover over words to see the exact frequency that they are used by each group, and you can click on a word in order to have the context of the word withing the speeches that it was found shown to you along the bottom of the screen. The parts of speeches that it shows when this happens have had their “stopwords” (words such as ‘a’, ‘is’, ‘the’) stripped from them so that it reduces loading times. You are still able to get a good idea of the context that each word is being used with in each specific speech using this interactive methodology. The last bit of interactivity that you can use with the scattertext chart is the search function. The search function works in a similar manner to clicking on a word you can see on the chart, however, the main difference being that you can search for any word you would like and see it’s context within the model regardless of how frequently it showed up. What you can directly see on the scattertext chart normally are words used more than ~8 times within the presidential speeches of each era.
The biggest design decision when it came to the project visuals was deciding how I would like to present the large amount of data I had from this project, since I had created 20 charts that could be used in similar methods to the one I laid out in the Visualization Summary section of this final project writeup. In the end I decided to use the principle of small multiples to display each of the charts side-by-side so that the viewer would be able to easily navigate through each one, and find and directly compare each one with the others. I color-coded the title and subtitle regions on the html page I created so that each era would have a clear dividing line between it and the next. I also upscaled the charts from the Fourth Party System onward since there was only two major parties in each time- period and it made the charts on the main page a lot more legible. I also mulled over with making the design decision to not leave the background simple and plain, but in the end I decided against it. There was already a lot of color being provided by the scattertext chart, so I chose to leave the background white so that the colorful points would pop a lot more and be easier to read.
I had to settle for not letting good be the enemy of great in the case of loading times. If I had more time to work on the project further, I would like to try to further optimize the scattertext model. The issue of loading times likely occurs due to some inefficiencies of the model itself when dealing with as wide a body of text as I used for this project. I optimized the model itself by increasing the number of times a word had to be present in the model, which lead to the removal of points from each scattertext chart and helped to boost the speeds of the interactive webpage that each chart renders to. This increased the overall speed of the model’s interactive segments a good deal. I originally had it set to require a minimum of 5 occurrences to include per word from each grouping of speeches. I tested out different numbers for that until I arrived at using 8 as the minimum requirement for occurrences. In retrospect, having finished the project, I likely could have further optimized the model by bumping that number up a little bit more. I would also like to try comparing parties during other time periods, or possibly limiting the scope to compare individual presidents to the rest of the presidents.