GitHub - hydradon/devpost_hackathons-LDA

Description

In this project, I analyze the text description written by hackathon participants.

First, I crawled for data from Devpost for the metadata of all hackathon projects.

Then, for each project, I used latent Dirichlet Allocation (LDA) to automatically extract the topics from what are written. Each project page is structured as follows:

Inspiration
What it does
How we built it (or How I built it)
Challenges
Accomplishment
What we learned
What's next

I applied topic modeling on each of the above sections to find out the main topics mentioned.

The spiders

devpost: visits all projects at devpost/software/trending and crawls for all metadata. Output: all_projects_raw.csv
devpost_hackathon: visits all hackathon pages extracted from the metadata of the projects and crawls for hackathon metadata. Output: all_hackathons_raw.csv
devpost_app_page: visits all project pages and saves their HTMLs. Output is stored on figshare: https://figshare.com/s/73a5686bf6b1670092d4
dev_proj_desc_local: crawls the offline HTML pages (obtained above) and parses the text description into 7 sections as described in the previous part. Output: proj_description_raw_local.csv
devpost_hack_num_submission: crawls the hackathon pages and retrieves the number of submissions. Output: all_hackathons_numsub.csv

Datasets

Three initial datasets can be downloaded at: https://figshare.com/s/73a5686bf6b1670092d4

They are:

raw_html_text.rar -> offline html pages of hackathon projects
githubdata.rar, gitlabdata.rar -> source control activity data of those projects that provide public git link

Data cleaning and analysis

Various python scripts and a notebook for LDA are in Python_scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description

The spiders

Datasets

Data cleaning and analysis

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Python_scripts		Python_scripts
R_analysis		R_analysis
dataset		dataset
dev_proj_desc_local		dev_proj_desc_local
devpost		devpost
devpost_app_page		devpost_app_page
devpost_hack_num_submission		devpost_hack_num_submission
devpost_hackathon		devpost_hackathon
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

hydradon/devpost_hackathons-LDA

Folders and files

Latest commit

History

Repository files navigation

Description

The spiders

Datasets

Data cleaning and analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages