-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: configurable IO runtime #2789
Conversation
5726f77
to
41b1fdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ion-elgreco, looking great.
The only comment is around how to enable this... As we are looking into some performance optimizations, the delta_rs_storage_hadler
, may actually also be the place where we may want to introduce some simple caching for the commit files (specifically the json commits) in order to keep the log replay logic as simple as possible.
The above is just another argument, why i think we should consider passing the config via the DeltaTableOptions
...
this will also allow us to pass in configuration for the runtime itself. For now it's ok, but if we GA this feature, we probably want to expose some sort of way to pass configuration to the runtime.
@roeap do you mean the DeltaTableLoadOptions which is used in the table builder? |
i thought maybe this one: delta-rs/crates/core/src/table/builder.rs Lines 53 to 82 in eea53f4
|
@roeap Hmm does it need to be accessible later though? Below could work also I believe, we can still save it in the DeltaTableConfig but wondering where else it will be used #[derive(Debug)]
pub struct DeltaTableLoadOptions {
....
pub log_batch_size: usize,
pub io_runtime_config: Option<IORuntimeConfig>, // If passed enable experimental IO runtime with custom config
} Then DeltaTableBuilder has |
Yeah, this triggers some memories - at some point i felt there is a bunch of config distributed in various ways throughout our codebase, and we need to consolidate into a single way of handling config. The duplication of Naming wise I do find that "..LoadOptions" is the worse name, as it implies that this only relates to the initial table load? |
@ion-elgreco - after a quick glance at the relevant code, it feels maybe the some options should be flattened into the |
Yeah that makes sense! Now it's double 😆 I'll do another round on this tomorrow then to refactor the table builder a bit and add the IO runtime config on the table config. Will have to take a look at what is normally configurable on a Tokio runtime :) |
c45e0fc
to
5a949e9
Compare
4f39227
to
aecfb60
Compare
aecfb60
to
5fa5c7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this turned out quote nice, great work! 👍
Description
A configured Tokio handle can be passed into the TableBuilder nested in the RuntTime enum, conveniently added a default which simply creates a new runtime and uses that handle.
Related issues:
to_pyarrow_table()
on a table in S3 kept getting "Generic S3 error: error decoding response body" #2595, AsyncChunkReader::get_bytes error: Generic MicrosoftAzure error: error decoding response body #2592