Add runtime config options for list_files_cache_limit and list_files_cache_ttl#19108
Add runtime config options for list_files_cache_limit and list_files_cache_ttl#19108alamb merged 11 commits intoapache:mainfrom
list_files_cache_limit and list_files_cache_ttl#19108Conversation
list_files_cache_limit and list_files_cache_ttl
|
Hi @BlakeOrth @alamb, this is ready for review |
BlakeOrth
left a comment
There was a problem hiding this comment.
This overall looks very nice to me, thanks! Perhaps I'm missing it, but is there anyway to set the TTL back to infinity (None) after it's been set to Some(Duration)?
|
@BlakeOrth Yep, you can run |
BlakeOrth
left a comment
There was a problem hiding this comment.
My approval doesn't have any real power here, but this all looks good to me. The CI failure seems like it's probably unrelated. I can't imagine this work having any effect on benchmarks.
|
Thanks @delamarch3 and @BlakeOrth -- I'll try and check this out soon |
alamb
left a comment
There was a problem hiding this comment.
Thanks for this PR @delamarch3 and @BlakeOrth 🙏
I think this PR would be better with unit tests for time parsing but perhaps we can add that as a follow on PR. Otherwise it looks really nice 👌
The values for these will not update after running SET, unless the runtime has been configured with a ListFilesCache (it's None by default, so there is nothing to update)
This confused me for a while but it makes sense to me and will be fixed with
Ah sorry, should have been more clear, I meant in the datafusion-cli. I configured it like this to test it out: diff --git a/datafusion-cli/src/main.rs b/datafusion-cli/src/main.rs
index de666fced..abed30ea3 100644
--- a/datafusion-cli/src/main.rs
+++ b/datafusion-cli/src/main.rs
@@ -23,6 +23,8 @@ use std::process::ExitCode;
use std::sync::{Arc, LazyLock};
use datafusion::error::{DataFusionError, Result};
+use datafusion::execution::cache::cache_manager::CacheManagerConfig;
+use datafusion::execution::cache::DefaultListFilesCache;
use datafusion::execution::context::SessionConfig;
use datafusion::execution::memory_pool::{
FairSpillPool, GreedyMemoryPool, MemoryPool, TrackConsumersPool,
@@ -222,6 +224,11 @@ async fn main_inner() -> Result<()> {
);
rt_builder = rt_builder.with_object_store_registry(instrumented_registry.clone());
+ rt_builder = rt_builder.with_cache_manager(
+ CacheManagerConfig::default()
+ .with_list_files_cache(Some(Arc::new(DefaultListFilesCache::default()))),
+ );
+
let runtime_env = rt_builder.build_arc()?;
// enable dynamic file query |
No worries -- I think it is a temporary situation so we should be good to go |
|
Thanks @delamarch3 and @BlakeOrth |
Which issue does this PR close?
Rationale for this change
Make the list file cache memory limit and TTL configurable via runtime config.
What changes are included in this PR?
SETandRESETlist_files_cache_limitandlist_files_cache_ttllist_files_cache_ttlwill expect the duration to look like either1m30sor30(I'm wondering if it would be simpler for it to just accept a single unit?)update_cache_ttl()to theListFilesCachetrait so we can update it fromRuntimeEnvBuilder::build()Are these changes tested?
Yes
Are there any user-facing changes?
update_cache_ttl()has been added to theListFilesCachetrait