-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add memory profiling support to DataFusion CLI and memory pool metrics #17021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…y profiling and tracker fields in SessionStateBuilder
…blank lines - Fixed the spelling of "Apche" to "Apache" in the license comment. - Removed unnecessary blank lines in `no_grouping.rs`, `symmetric_hash_join.rs`, and `sort.rs`.
…porting" This reverts commit 1258405.
…hensive reporting" This reverts commit 8dbdfa0.
…aFusion" This reverts commit 7db0933.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kosiew -- I am sorry for the slow reviews, but I am really having trouble reviewing this PR -- I don't know why there are more unrelated changes that showed up here the changes to the CLI testing
Are you by any chance using a AI coding tool? If so, perhaps you can spend some time reviewing the PRs a bit more yourself and spit them into smaller PRs for review?
Some of the other changes in this PR are nice (like the change to tracked pool, potentially) but they seem unrelated to printing the contents of the memory
I am sorry for another round of feedback (this is like the 3rd time and there are already 280 commits 🤯 )
In general, I think we would have an easier time reviewing PRs (and get them merged faster) if you could keep them smaller and more focused
| let lower_stdout = stdout.to_lowercase(); | ||
| let lower_stderr = stderr.to_lowercase(); | ||
|
|
||
| let has_planning_error = lower_stdout.contains("failed to coerce arguments") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand any of these changes -- why would we accept arrow cast errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Returns a string with peak usage, cumulative allocations, and totals per | ||
| /// operator category. The caller is responsible for printing the returned | ||
| /// string if desired. | ||
| pub fn format_metrics(metrics: &[ConsumerMemoryMetrics]) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this formatting stuff should be put in the CLI code for now (not the main repo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved
| ) -> Result<DataFrame, DataFusionError>; | ||
|
|
||
| /// Return true if memory profiling is enabled. | ||
| fn memory_profiling(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the memory profiling flag would best be stored on PrintOptions, similarly to the quiet mode flag (that suppresses execution time printing). Then you would not need to introduce so much new code and a new trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require print_options to be mut because we can
toggle memory_profiling
It was moved out of print_options after a comment that print_options should not be mut for memory profiling.
| @@ -1,9 +1,9 @@ | |||
| --- | |||
| source: tests/cli_integration.rs | |||
| source: datafusion-cli/tests/cli_integration.rs | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change these files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These unrelated changes have been cleaned up.
… documentation Updated the default value for the `--top-memory-consumers` option from 3 to 5 in the CLI usage documentation to reflect the correct configuration.
…acking Added detailed metrics collection for memory consumption in the memory profiling example. This includes peak memory and cumulative allocation bytes for each operator or execution stage, along with a summarized output by operator category.
Resolved conflicts in memory pool implementation: - Kept cumulative memory tracking feature in TrackedConsumer - Updated test snapshots to match current error message format - All conflicts resolved in favor of HEAD branch (memory-16904)
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |

Which issue does this PR close?
Rationale for this change
Currently, the DataFusion CLI has limited visibility into query memory usage. Users cannot easily track peak memory, cumulative allocations, or operator-level memory breakdowns. This PR introduces memory profiling features, providing actionable insights for debugging, tuning, and capacity planning. It also strengthens error reporting with clearer context about memory consumers.
What changes are included in this PR?
CLI Enhancements:
\memory_profilingto toggle profiling during CLI sessions.Core Enhancements:
TrackedPooltrait for memory pools with consumer tracking.ConsumerMemoryMetricsandformat_metricsfor reporting.MemoryReservationto track peak usage and provide human-readable summaries.Examples and Documentation:
memory_profiling.rsexample demonstrating programmatic usage.Tests and Snapshots:
Are these changes tested?
Yes:
TrackConsumersPooland memory metrics.cli_memory_auto_report,cli_memory_disable_stops_report).Are there any user-facing changes?
Yes:
\memory_profiling(toggle profiling on/off).--top-memory-consumers N(default now 5, previously 3).memory_profiling.rs.No breaking API changes; all new features are additive.