All the data from GSoC-archive in JSON format.
NOTE For running the scrapers you must install the following dependencies
- asyncio
- aiohttp
You can do that by running: pip install ayncio aiohttp
-
Data/-
orgs/- all orgs that have been a part of GSoC from 2005 to 2017 -
projects/- all projects that are completed under GSoC program from year 2005-2017
-
-
Scrapers/- Contains all the scrapers used for scraping the data
-
2005.json-2008.jsonlink: URL of the orgname: Name of the org
-
2009-2013.jsonabout: Work that org dolink: URL of the orgmail: Mailing list of the orgname: Name of the orgpage: Idea page of the org
-
2014-2015.jsonlink: URL of the orgmail: Mailing list of the orgpage: Idea page of the orgname: Name of the org selected
-
2016-2017.jsonabout: Info about the organizationlink: URL of the orgname: Name of the org
-
2005.json-2008.jsonMentor: Name of the mentor of the projectproject: Name of the projectstudent: Name of the student
-
2009-2013.json&2014-2015.jsonOrganization: Name of the organizationdetail: Detail about the projectlink: Link to the projectstudent: Name of the student selectedtitle: Name of the project
-
2016-2017.jsonOrganization: Name of the organizationlink: Link to the projectmentors: Name of the mentorsstudent: Name of the studenttitle: Name of the project
This data will be used for improving the functionality of Soccer.
It can also be used to generate various stats, plots or answer data-related questions like:
- Who did the most number of GSoCs? under which org?
- Which org has the highest sutdent-to-mentor conversion rate? (students who first did GSoC under the org, and then became mentors)
- Run some magic on the descriptions of projects over the years to find out if there is a trend of ML related projects.
etc. etc.
Feel free to open issues to discuss any more ideas!