Skip to content
gousiosg edited this page Mar 21, 2012 · 17 revisions

#GHTorrent

Welcome to the GHTorrent project, an effort to bring Github's data to the hands of the software engineering research community, without taxing Github.

GHTorrent reads GitHub events from the Github Events stream, stores them to a MongoDB database, retrieves the associated data and provides them in MongoDB dump format over Bittorrent.

All original data is copyright of its owners.

The GHTorrent project is brought to you by the SENSE group at the Athens University of Economics and Business. If you use the dataset for research, please provide a reference to the following work:

Georgios Gousios and Diomidis Spinellis, "GHTorrent: GitHub’s data from a firehose," in MSR '12: Proceedings of the 9th Working Conference on Mining Software Repositories, June 2–3, 2012. Zurich, Switzerland.

Clone this wiki locally