University of Cincinnati Senior Design 2020
Deep Pollster is an data analysis application that uses Deep Learning to predict the political leaning of Twitter users in a particular geographical region of the USA
- Project Description
- Architecture
- Testing & Results
- Future Work
- User Manual
- Video
- Presentation
- Poster
- Assessments
- Summary of Hours
- Assignments
- Appendix
Social Networking has risen to a place of prominence as a medium of publishing information. Times are constantly changing, and the power to sway and portray political opinions is shifting from traditional media such as newspapers and television networks to social media platforms like twitter. This has given rise to new directions of research in Computational Political Science.
In this venture we reexamine the problem of measuring and predicting the political orientation of twitter users. We expect to contribute to the study of the political blogosphere by incorporating multiple hypotheses about the behavior of the average twitter user and a registered politician, alike. Incorporating ideas such as tweets, retweets, subtweeting, followers and followees network and degrees of separation helps us understand the twitter political scenario better and helps us better understand how to leverage these sources of information. In recent times, hundreds of researchers take to twitter to analyze the effect of twitter on major political events such as the 2016 and 2020 U.S. elections, and we think that our technical contribution would be the reimagination of the traditional problem of predicting the political leaning of a given user.
By studying the political orientation of twitter users, it is possible to target advertisements at individuals, shape digital profiles, and deliver news, articles, views and products that are individualistic and personalized. This could also be used to predict the political outcome of an election by predicting the leaning of users in a geographical location.
Index Terms - Twitter, Political Science, NLP, Deep Learning, Neural Networks
Sentiment analysis on tweets to assess political leaning has its disadvantages :
- Does not paint a complete and holistic picture of the users' ideological views
- Cannot build a digital profile of a user from a single or even with a temporal series of tweets
- Assessing political leaning of a demography does not serve the purposes and intents of individual orientation
We leverage more than just tweet-retweet maximization, or a network matrix :
- Binary classification of the latest tweets, retweets and liked tweets on the basis of political leaning
- Identifying the degree of separation between the user and politicians from both the political sides
Our proposed system is curated to provide a more rounded and holistic sense of the individual user, painting an overall picture of their digital profile, leading to potential in marketing and business spheres.
Shivchander Sudalairaj - sudalasr@mail.uc.edu
Sagar Panwar - panwarsr@mail.uc.edu
Anca Ralescu - ralescal@ucmail.uc.edu
- The model was tested with multiple sets of test cases to eliminate any innate bias
- Each sets of test cases consists of a set of 50 previously untested politicians’ twitter handles
Hypothesis : Politicians are relatively consistent with the language models that they follow while publishing tweets on twitter
After running our tests, we observed that our model was able to predict democratic politicians with 100% accuracy While the misclassifications arose with republican politicians resulting with an accuracy of 80%
After running our tests, we observed that our model was able to identify democrats with significantly higher confidence than republicans.
We also observed that predictions from tweets model was more stable than retweets and liked tweets. This supports our initial hypothesis for variable weights
- Democratic tweets consistently fall on the political left
- Republican tweets falls more on political centre and centre-right
- Democrats are more consistent with their language and in turn political ideologies used on twitter
- Republicans use language which is more ambiguous and tend to waver between left and right of the political spectrum
- Naively extrapolating this model to general public users will cause inherent bias
- The handling of tweet content analysis and classification can further be improved to handle spam accounts. It is also to be noted that sarcasm is yet to be handled by our model and additions could be made to account for this
- To develop a generalized model to be applicable for general public users, we would need to survey users and find out their political orientation to develop a more general dataset and retrain the model using the dataset
Date | Category | Shiv | Sagar | |
---|---|---|---|---|
1 | 1/20 - 1/25 | Twitter API | 4 | 10 |
2 | 1/26 - 2/1 | Data Exploration | 5 | 10 |
3 | 2/2 - 2/15 | Research Exploration | 5 | 10 |
4 | 2/16 - 2/22 | Experiments with LSTM | 5 | 3 |
5 | 2/23 - 2/29 | Data Preprocessing | 5 | 2 |
6 | 3/1 - 3/7 | Model Architecture Experiments | 10 | 5 |
7 | 3/8 - 3/14 | Model Training | 6 | 2 |
8 | 3/15 - 3/21 | Model Testing | 4 | 2 |
9 | 3/22 - 4/4 | Integration | 7 | 2 |
10 | 4/5 - 4/11 | Documentation and Github | 4 | 4 |
Total Hours | 55 | 50 |
- MIT license
- Copyright 2020