Skip to content

hetyey-b/RedditCommentsDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Reddit Comments Dataset

This is a set of comments scraped from posts on Reddit. Top level comments were saved from the fifty top subreddits by subscriber count. (as of April of 2020) There are a maximum of hundred comments saved from the maximum of the top 1000 posts.

The comments are in separate .txt files by subreddit. There are two separate .txt files included, one of them has data on each of the files, (word count, character count) the other one has a list of all subreddits formatted as a Python list, for easy use.

Getting started

For some reason, uploading the zip file straight to Github ran into problems, so I have uploaded it to Google Drive instead. I decided to keep this repository so people have an easier time finding it.

Link to the dataset

Tools used

  • Python
  • Python Reddit API Wrapper - PRAW

About

Aroun 800 million characters worth of Reddit comments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published