Add a memory bound FileStatisticsCache for the Listing Table#20047
Open
mkleen wants to merge 6 commits intoapache:mainfrom
Open
Add a memory bound FileStatisticsCache for the Listing Table#20047mkleen wants to merge 6 commits intoapache:mainfrom
mkleen wants to merge 6 commits intoapache:mainfrom
Conversation
a66420a to
3b33739
Compare
3b33739 to
8e5560b
Compare
e273afc to
b297378
Compare
59c6bce to
4542db8
Compare
Contributor
Author
|
@kosiew Thank you for the feedback! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
This change introduces a default
FileStatisticsCacheimplementation for the Listing-Table with a size limit, implementing the following steps following #19052 (comment) :Add heap size estimation for file statistics and the relevant data types used in caching (This is temporary until Add heap memory estimation for statistics #19599 and Add a crate for HeapSize trait arrow-rs#9138 are resolved)
Redesign
DefaultFileStatisticsCacheto use aLruQueueto make it memory-bound following Adds memory-bound DefaultListFilesCache #18855Introduce a size limit and use it together with the heap-size to limit the memory usage of the cache
Move
FileStatisticsCachecreation intoCacheManager, making it session-scoped and shared across statements and tables.Disable caching in some of the SQL-logic tests where the change altered the output result, because the cache is now session-scoped and not query-scoped anymore.
Closes Add a default
FileStatisticsCacheimplementation for theListingTable#19217Closes Add limit to
DefaultFileStatisticsCache#19052Rationale for this change
See above.
What changes are included in this PR?
See above.
Are these changes tested?
Yes.
Are there any user-facing changes?
A new runtime setting
datafusion.runtime.file_statistics.cache_limit