Releases: src-d/datasets
Releases · src-d/datasets
v5.3.0
v5.2.0
Change Log
v5.2.0 (2019-09-26)
New features:
- New
pga siva
set of commands to work with siva files: raw unpack, dump revisions, list Git commits and references. - New
pga2uast
CLI app to extract Babelfish UASTs from siva. - New
list-pga-heads
CLI app to list files in HEAD revisions in siva. pga
andpga-create
are built with Go modules instead ofdep
.
v5.1.0
Change Log
v5.1.0 (2019-07-31)
Closed issues:
Merged pull requests:
- cmd: fix get, do not download concurrently the same siva file #150 (mcarmonaa)
- Add notebooks related to function deduplication experiments #149 (EgorBu)
- Add configurable timeout in pga2uast #148 (vmarkovtsev)
- Add the missing information about the duplicates dataset #147 (vmarkovtsev)
- Fix blacklisting bugs #146 (vmarkovtsev)
- Add monitor mode #145 (vmarkovtsev)
- Add go.sum and refactor the output file name composition #144 (vmarkovtsev)
- Survive some huge repositories #143 (vmarkovtsev)
- Add pga2uast #141 (vmarkovtsev)
- Add the missing stage3 script by @EgorBu #139 (vmarkovtsev)
- Clarify the dockerfile claim #138 (vmarkovtsev)
- Fix the typos download size #136 (vmarkovtsev)
- Add the typos dataset #135 (vmarkovtsev)
- Add DockerHub Metadata dataset #134 (vmarkovtsev)
- Harden the message parsing script #133 (vmarkovtsev)
- Fix MAINTAINERS file #131 (zurk)
- Add siva-head tool #130 (vmarkovtsev)
v5.0.1-rc.6
v5.0.1-rc.5
v5.0.1-rc.4
Change Log
v5.0.1-rc.4 (2019-04-11)
Merged pull requests:
- Add the commit features dataset to the root README index #126 (warenlg)
- pga-create: improve files generated from the analyzed data #125 (mcarmonaa)
- Update the notebook and the link of the PR comments dataset #124 (warenlg)
- Update the review comments dataset README #123 (vmarkovtsev)
- Structural features dataset #122 (Jan21)
- Add commit messages dataset #120 (vmarkovtsev)
v5.0.1-rc.3
v5.0.1-rc.2
v5.0.1-rc.1
Change Log
v5.0.1-rc.1 (2018-03-20)
Implemented enhancements:
- Provide md5 for index #57
- [Feature request] Add stars to index file #43
- Make PGA downloader safe to cancel #39
- [Feature request] Add repository size to index #27
Fixed bugs:
- borges-indexer fails to run with database schema from latest borges version #48
- update lint repo #89 (kuba--)
Closed issues:
- Support selecting version in pga downloader #108
- When would one repository contains more than one sivaFile? #98
- pga get --stdin file.txt freezes #82
- Merge gitbook documentation and make it the main one #75
- Support https on pga.sourced.tech #60
- Logo for Public Git Archive #55
- check md5 files to decide whether they should be downloaded again #37
- unexpected EOF leads to corrupted siva files. #36
Merged pull requests:
- *: fix travis file to push the pga-create docker image correctly #115 (mcarmonaa)
- pga-create docker image #114 (mcarmonaa)
- pga-create: add GO_TAGS = norwfs because it uses go-git v4.5.0 #113 (mcarmonaa)
- pga: add flag to the command to select pga version to download #111 (mcarmonaa)
- pga: add field SIZE to the index #109 (mcarmonaa)
- Improve dataset descriptions #106 (marnovo)
- Add Duplicates dataset #105 (vmarkovtsev)
- pga: add compatibility to read index with or without stars field. #104 (mcarmonaa)
- Add stars field to pga index #101 (mcarmonaa)
- Supress md5 logs. #88 (kuba--)
- fixed typo #87 (namhsuya)
- Fix typo #84 (gomesfernanda)
- [PGA] simplify pga-create usage #81 (smola)
- [PGA] fix stars handling, reduce memory #80 (smola)
- [PGA] add pga-create repack command #79 (smola)
- [PGA] consolidate multitool and borges-indexer into pga-create #78 (smola)
- pga: avoid warning when local copy doesn't exist #72 (campoy)
- Add the Identifiers dataset #71 (vmarkovtsev)
- pga: safe downloads #69 (smola)
- Add material to compile the PGA poster #67 (warenlg)
- Upgrade to upstream core-retrieval #64 (erizocosmico)
- provide windows binaries #62 (campoy)
- Update the list of maintainers #61 (vmarkovtsev)
- use MD5 hashes when available #58 (campoy)
v5.0.0-rc2
This new version introduces a new tool to explore and download the Public Git Archive dataset: pga
.
Read the document of pga for more details on how to use it, and file any issues you might find.