Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
followersIndexer.js		followersIndexer.js
package.json		package.json
repoIndexer.js		repoIndexer.js
savetodb.js		savetodb.js
starsIndexer.js		starsIndexer.js

Repository files navigation

ghindex

Creates github index for similar repositories discovery. You can see working website here: Gazing Stargazers

WIP.

Usage

1. Gatehring popular repositories

To index popular repositories (> 200 stars):

node repoIndexer.js --tokens="COMMA_SEPARATED_LIST_OF_GITHUB_TOKENS" > allrepo.json

This will save JSON stream of repositories with >= 200 stars into file allrepo.json.

If you think something does not go right, you can enable logging, by setting ENABLE_LOG variable:

ENABLE_LOG=1 node repoIndexer.js --tokens=...

2. Gathering followers for repositories

Second step to building index of recommendations is to gather followers of popular repositories. To do so run:

node followersIndex.js allRepo.json ./db/followers --tokens="COMMA_SEPARATED_LIST_OF_GITHUB_TOKENS"

This will create a new leveldb database followers inside db folder. The database will include all repositories from allrepo.json along with users who gave them a star.

3. Gathering stars

Last indexing step is to collect all repositories which are starred by found users. To do so run:

node starsIndexer.js ./db/followers ./db/stars  --tokens="COMMA_SEPARATED_LIST_OF_GITHUB_TOKENS"

This will read all unique followers from the followers database ./db/followers, constructed in step 2, and will output results into database called ./db/stars. Each reacord in ./db/stars will have user name as a key, and starred repositories as a value.

This is the most time consuming step. As of Jun, 2014 GitHub had 13,000+ repositories with more than 200 stars. This translates to 600,000+ unique users, who gave stars to popular repositories.

Even though majority of users gave less than 100 stars to different projects, we still need to make at least one request to fetch stars. I.e. we need to make more than 600,000 requests to GitHub.

GitHub's current rate limit is 5,000 requests per hour, thus if we are indexing with one token: 600,000/5,000 = 120 hours of work.

Good news, this indexer can be interrupted, and resumed at any time.

4. TODO: Constructing Recommendations

Now that we have all popular repositories with stargazers, let's construct recommendations database.

license

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ghindex

Usage

1. Gatehring popular repositories

2. Gathering followers for repositories

3. Gathering stars

4. TODO: Constructing Recommendations

license

About

Releases

Packages

Languages

License

anvaka/ghindex

Folders and files

Latest commit

History

Repository files navigation

ghindex

Usage

1. Gatehring popular repositories

2. Gathering followers for repositories

3. Gathering stars

4. TODO: Constructing Recommendations

license

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages