Skip to content

JoachimStanislaus/DS_Analyse_Reddit_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

DS_Analyse_Reddit_data

In this piece of work, we answer the following questions

What were the top 50 most popular subreddits in terms of the number of active users?

What does the Probability Density Function (PDF) of the number of active users per subreddit look like for all subreddits?

What is the proportion between the number of users in the ith popular subreddit compared to the (i + 1)th for i ∈ [1...100]? Comment on how fast the popularity drops and how this ratio/proportion changes with i.

How many comments does each of these subreddits receive in a given hour of the day (i.e., 1AM, 2AM, 3AM...11PM,12AM)?

When you plot these curves where the x-axis is hours from 0 to 23 and the y-axis is counts, can you see patterns in these curves? How do these curves compare to each other? Do they have offsets relative to each other?

If you consider the /r/unitedkingdom as being UTC, what can you say about the timezones of the users in the other subreddits?

What are the top 10 most frequent words in each of the five subreddits above? Do you see differences/similarities?

What are the top 10 most frequent words in each of the five subreddits above? Do you see differences/similarities?

What does the word-frequency distribution look like? Plot the relative frequencies of the words as a probability density function. What can you say about the word frequency you observed and the predicted by the Zipf’s Law (https://en.wikipedia.org/wiki/Zipf%27s_law)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors