This is a guided project under Dataquest (Data Engineering track) where I analyzed a data set of submissions to Hacker News (HN), a website focusing on computer science and entrepreneurship. I am interested in two specific categories of user-submitted posts: Ask HN
posts where users ask the HN community a specific question, and Show HN
posts where users present the HN community an interesting project or product. I wanted to know the following:
- Do
Ask HN
orShow HN
posts receive more comments on average? - Do posts created at a certain time receive more comments on average?
- Do
Ask HN
orShow HN
posts receive more upvotes or points on average? - Do posts created at a certain time receive more points on average?
- Do posts other than
Ask HN
orShow HN
receive more comments and points on average?
I performed the following for the data analysis:
- Opening and exploration of the
hacker_news.csv
data set - Extraction of
Ask HN
andShow HN
posts - Calculation of the average number of comments for
Ask HN
andShow HN
posts - Determination of the number of
Ask HN
posts and comments by hour created - Calculation of the average number of comments for
Ask HN
posts by hour - Calculation of the average number of upvotes or points for
Ask HN
andShow HN
posts - Determination of the number of points by hour created for
Ask HN
posts - Calculation of the average number of points for
Ask HN
posts by hour - Calculation of the average number of comments and points for other posts
Results of the data analysis show that to maximize the number of comments and upvotes a post receives, the post should be categorized as an Ask HN
post and created around 3:00 - 4:00 EST.
Please see the hacker_news.csv
data set and the full exploratory data analysis in the Project 2.ipynb notebook above.