pga-create: improve files generated from the analyzed data #125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes:
repack command: it only repacks watchers.csv and projects.csv from
the ghtorrent's mysql dump.
discover command: it only generates one file named repositories.csv.gz
containing all the information needed in the next steps
(repository name, number of stars).
select command: it can only filter by number of stars. It generates
two files: a list on the stdout of one url repository per line to
feed borges and a file named repositories-index.csv.gz (filtered from
repositories.csv.gz) to be used by the index generation command.
index command: it will use the repositories.csv.gz by default. To give
it the filtered information in repositories-index.csv.gz the flag -r
must be used.
Signed-off-by: Manuel Carmona manu.carmona90@gmail.com