-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Adds Object Store Profiling options/commands to CLI #18004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Adds a CLI option and command to datafusion-cli to enable or disabled object store profiling - Integrates the command with the instrumented object stores to allow the user input to change the mode of the instrumented stores - Adds tests to exercise the expected behavior of the commands - Adds user docs for the commands/CLI options - Updates visibility of `InstrumentedObjectStore` now that it needs to be interacted with outside of its module
7afd282 to
4060b14
Compare
|
Oops, forgot to cc @alamb Sorry this one is a bit longer. A decent chunk is added/new tests so hopefully the review for those lines ends up being quicker than the runtime code. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @BlakeOrth -- this looks great to me
I also played around with it locally:
> \object_store_profiling
ObjectStore Profile mode is Disabled
> \object_store_profiling
ObjectStore Profile mode is Disabled
> \object_store_profiling on
Execution error: Failed to parse input: on. Valid options are disabled, enabled
> \object_store_profiling enabled
ObjectStore Profile mode set to Enabled
> \object_store_profiling enabled
ObjectStore Profile mode set to Enabled
> \object_store_profiling disabled
ObjectStore Profile mode set to Disabled| pub async fn main() { | ||
| let my_ctx = MyUnionerContext::default(); | ||
|
|
||
| let profile_mode = InstrumentedObjectStoreMode::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor code structure thing (not needed) but I think it would be nicer to encapsulate more of this construction into the InstrumentedObjectStoreRegistry
Maybe something like putting the default registries in ::new()
``rust
let instrumented_registry = Arc::new(InstrumentedObjectStoreRegistry::new());
And then maybe you can add builder style APIs if you need, like
```rust
let instrumented_registry = InstrumentedObjectStoreRegistry::new()
.with_inner(inner_registry)
.with_profile_mode(mode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicated code was also bothering me, but I was having trouble of thinking of a clean way to reduce the duplication at the time. I like this suggestion. Considering the InstrumentedObjectStoreRegistry is currently only ever used with a DefaultObjectStoreRegistry as it's inner I'm inclined to only implement with_profile_mode to avoid functionally dead code. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and modified new() and added the with_profile_mode() as a builder style method. I'm happy to also add with_inner() quickly if you'd prefer having that available in spite of it not (yet?) being used. This was an excellent change for the ergonomics of this type!
|
|
||
| /// Returns the current [`InstrumentedObjectStoreMode`] for this | ||
| /// [`InstrumentedObjectStoreRegistry`] | ||
| pub fn mode(&self) -> InstrumentedObjectStoreMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please name the methods consistently? mode and set_instrument_mode are not consistent
I think this would be consistent with the other method and the name of the field
| pub fn mode(&self) -> InstrumentedObjectStoreMode { | |
| pub fn instrument_mode(&self) -> InstrumentedObjectStoreMode { |
| for store in self.instrumented_registry.stores() { | ||
| writeln!(writer, "{store}")?; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if (as a follow on PR) we should also move the timing into this structure too
I think we mentioned before it is somewhat strange that something named PrintOptions has an object store registry, but I see why it is like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the PrintOptions change is unfortunate here. If you have a recommendation on a better way to handle this I'd be happy to explore a different solution. The fact that we need to support the commands which all leverage print options is what pushed me towards this solution versus something like changing the method signature(s) to pass the registry through. Perhaps PrintOptions has just outgrown its original purpose and could use a refactor/rethink at some point down the road.
I wonder if (as a follow on PR) we should also move the timing into this structure too
I'm not sure I'm understanding you here. Do you mean keeping the records from the store(s) that will be printed in PrintOptions?
0e1b685 to
e1a95af
Compare
- Adds better methods to build an InstrumentedObjectStoreRegistry to reduce code duplication in common usage - Enhances test success criteria - Normalizes method names
| quiet: false, | ||
| maxrows: datafusion_cli::print_options::MaxRows::Unlimited, | ||
| color: true, | ||
| instrumented_registry: Arc::new(InstrumentedObjectStoreRegistry::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Which issue does this PR close?
This does not fully close, but is an incremental building block component for:
datafusion-cli] Add a way to see what object store requests are made #17207The full context of how this code is likely to progress can be seen in the POC for this effort:
Rationale for this change
This change allows a user of
datafusion-clito enable or disable object store profiling, either from the command line or from an internal command while in a cli session.Note that since no actual instrumentation is implemented for the
InstrumentedObjectStores the output for the user at this point is generally not interesting (yet). This PR implements a tapas-sized functional unit of work towards #17207What changes are included in this PR?
InstrumentedObjectStorenow that it needs to be interacted with outside of its moduleAre these changes tested?
Yes. New Unit tests have been added for all meaningful changes.
Example functional output from
datafusion-cli$ ../../target/debug/datafusion-cli --object-store-profiling enabledAre there any user-facing changes?
Yes. Unfortunately this work seems to necessitate adding a new public item to the
datafusion-cli::print_options::PrintOptionstype which is exposed as a public type. Other obvious user-facing changes are the addition of the new command and CLI option todatafusion-cli, which have been documented in the CLI user guide accordingly.