Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster ListingTable Listing #6182

Closed
tustvold opened this issue May 1, 2023 · 0 comments · Fixed by #6183
Closed

Faster ListingTable Listing #6182

tustvold opened this issue May 1, 2023 · 0 comments · Fixed by #6183
Assignees
Labels
enhancement New feature or request

Comments

@tustvold
Copy link
Contributor

tustvold commented May 1, 2023

Is your feature request related to a problem or challenge?

ListingTable currently has a very naive algorithm for finding files within a dataset that serially lists every file in the dataset, and then applies pruning to each returned file

Describe the solution you'd like

The partition pruning logic should instead list each partition separately, ideally in parallel, prune these partitions, and then return the list of contained files

Describe alternatives you've considered

No response

Additional context

No response

@tustvold tustvold added the enhancement New feature or request label May 1, 2023
@tustvold tustvold self-assigned this May 1, 2023
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue May 1, 2023
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue May 1, 2023
tustvold added a commit that referenced this issue May 17, 2023
* Faster ListingTable partition listing (#6182)

* Fix strip_prefix

* Fix strip_prefix

* Implement list_with_delimiter for MirroringObjectStore

* Use split_terminator

* Fix MirroringObjectStore::list_with_delimiter

* Fix logical conflict

* Add logs

* Limit concurrency

* Increase concurrency limit

* Review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant