Skip to content

tiechengsu/Intro-Hadoop-MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Intro-Hadoop-MapReduce

Analyze discussion forum data with Hadoop and MapReduce

Useful field names:

Final Project:

"forum_node.tsv" "id" "title" "tagnames" "author_id" "body" "node_type" "parent_id" "abs_parent_id" "added_at" "score" "state_string" "last_edited_id" "last_activity_by_id" "last_activity_at" "active_revision_id" "extra" "extra_ref_id" "extra_count" "marked"

"forum_users.tsv" "user_ptr_id" "reputation" "gold" "silver" "bronze"

StackExchange:

[('Body', "text here"), ('ViewCount', '1191'), ('Title', 'Why does the Macbook Pro Unibody crash on hibernate under Windows?'), ('LastActivityDate', '2009-07-15T21:15:21.323'), ('AnswerCount', '3'), ('CommentCount', '2'), ('AcceptedAnswerId', '3841'), ('Score', '4'), ('PostTypeId', '1'), ('OwnerUserId', '26'), ('Tags', ''), ('CreationDate', '2009-07-15T07:17:13.970'), ('FavoriteCount', '1'), ('Id', '37')]

Analyzing Reddit comments:

  • subreddit: The subreddit the comment was posted in
  • author: Username of the comment author
  • body: Comment text
  • create_utc: UTC timestamp of when the comment was posted
  • ups: Comment upvotes
  • downs: Comment downvotes
  • gilded: 1 if the user was given Reddit gold for the comment, 0 otherwise
  • archived: 1 if the comment was archived, 0 otherwise

About

Projects for Udacity Intro to Hadoop and MapReduce and Deploying a Hapoop Cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages