ENH: Speed up S3DataGrabber using prefix arg #2143
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes proposed in this pull request
When finding files on S3 (using
S3DataGrabber
), the command ...bkt_files = list(k.key for k in bkt.list())
... finds all files in a given bucket, which takes very long for the
openneuro
andopenfmri
buckets. On my computer, this command (using the example fromnipype/interfaces/tests/test_io.py
) takes 170 seconds. However, as proposed in my PR, when you use theprefix
argument with thebucket_path
in thebkt.list()
call, it only takes 116 milliseconds. This is because it restricts the filesearch to only the files in the specifiedbucket_path
. The interface still works whenbucket_path
is not set as input (i.e., it works with the default''
value of thebucket_path
parameter).