Read our report: CS294 Final_Report.pdf
RQ1: How does the quality of science communication differ by the type of source used? (e.g., news article, research paper)
Clear and accurate science communication is critical for helping the public become better informed about scientific topics. Importantly, scholars have recently identified several issues with the quality of scientific discourse on social media, since the results of scientific research are often distorted or sensationalized by major media outlets. However, little is known about how the qualities of science communication such as accuracy or clarity are rewarded in online environments, how those qualities have changed over time, and how those qualities differ across diverse media sources. Such insights would help inform how recommendation algorithms and social media interfaces could be designed to optimize for more effective science communication. To address this knowledge gap, we systematically evaluate posts shared on r/science from 2016 to 2022, measuring the level of jargon, sensationalism, and factual consistency in each post, and exploring how these metrics correspond with the type of source used in the post and the engagement metrics of the posts. We find that posts that link to news sources are more sensational and contain less jargon than posts that link to academic papers directly. Furthermore, we find that moderate levels of sensationalism and factual consistency are associated with posts receiving high numbers of upvotes, while jargon is negatively associated with upvotes. Finally, we observe that in 2020, the quality of science communication on the subreddit improved, associated with shifts in topical content during the COVID-19 pandemic. Ultimately, our results suggest that solely optimizing for engagement in social media platforms will not reward the most effective science communication, but our work offers critical insights into how platforms can be reimagined to support the platforming of high-quality science communication.
This repository is structured in severel sub-folders according to the different focus areas and metrics that we evaluated:
- Sensationalism
- Jargon
- Factual Consistency
- (Scraping)
Python Version: 3.11