-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add memory profiling support to DataFusion CLI and memory pool metrics #17021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kosiew
wants to merge
293
commits into
apache:main
Choose a base branch
from
kosiew:memory-16904
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
293 commits
Select commit
Hold shift + click to select a range
340f92f
Remove unused import of LightweightMemoryTracker and initialize memor…
kosiew 1204df1
Add memory profiling methods to SessionContext and enhance documentation
kosiew b482cc3
Fix memory profiling methods in MyUnionerContext to ensure proper ret…
kosiew b566cf9
Remove ExplainMemory features and related code from various modules
kosiew 714ac4c
Merge branch 'main' into memory-16904a
kosiew 05b5fc2
Refactor memory profiling test to compare enabled vs disabled states
kosiew 2de502f
Fix typo in memory profiling test configuration method
kosiew 1bd3c18
Fix memory profiling test context initialization for configuration
kosiew 8cac4fb
Refactor memory profiling test to initialize context with configuration
kosiew 8135ec8
Update memory profiling test to use 'on_demand' setting and adjust ov…
kosiew 9537eb1
Fix memory profiling configuration path in test
kosiew aa3ce0b
Fix memory profiling configuration path and adjust overhead calculati…
kosiew ec26adf
fix: correct spelling of "Apache" in comments and remove unnecessary …
kosiew 85481d4
feat: add memory profiling example with detailed usage and reporting
kosiew d0fb807
Revert "feat: add memory profiling example with detailed usage and re…
kosiew 7a381d3
feat: add memory profiling example for DataFusion with comprehensive …
kosiew 07edefb
Revert "feat: add memory profiling example for DataFusion with compre…
kosiew 4fcc6a4
feat: add qwen comprehensive memory profiling example for DataFusion
kosiew 3a3f5cc
Revert "feat: add qwen comprehensive memory profiling example for Dat…
kosiew 6087e8f
feat: add memory profiling example for DataFusion (kimi)
kosiew 0d63a88
feat: add memory profiling example for DataFusion (qwen)
kosiew 62497b4
feat: add memory profiling example for DataFusion (codex)
kosiew 2fd387a
feat: add memory profiling for DataFrame collect methods
kosiew ced0904
Revert "feat: add memory profiling for DataFrame collect methods"
kosiew 52941fa
feat: add memory profiling to DataFrame collect methods
kosiew c3f5b74
feat: enhance memory profiling examples with multi-stage queries and …
kosiew fd9f047
feat: format memory usage output in MB for better readability
kosiew 07312b7
feat: enhance memory profiling with detailed analysis and operator ca…
kosiew f1b1aa7
fix: correct formatting and improve readability in memory profiling e…
kosiew 2e05a15
feat: implement enhanced memory profiling report with detailed analys…
kosiew 0cd6d62
feat: implement enhanced memory profiling with detailed categorizatio…
kosiew a6acf76
fix: remove unused EnhancedMemoryReport import from prelude
kosiew 4044951
feat: add global memory tracker for enhanced memory management
kosiew cc37d64
feat: add memory profiling example with detailed analysis and operato…
kosiew 6e1063f
refactor: remove redundant memory profiling notes and summary from ex…
kosiew 4ad99c7
feat: remove unused diagnostic configuration test
kosiew c9b04c9
fix: return None for empty memory report in CLI session context
kosiew ee23d74
fix: clarify memory command description in CLI
kosiew 053745e
fix: update categorized_operators type to static str for enhanced mem…
kosiew 885b2a1
feat: add memory tracking for incremental allocations in MemoryReserv…
kosiew 4ce2330
fix: simplify mutex usage in memory tracker by removing redundant imp…
kosiew ce1954d
fix: remove unnecessary unwrap calls from mutex lock in LightweightMe…
kosiew 78f7990
fix: add memory profiling configuration to execution settings
kosiew 9fccada
fix: update memory profiling test to assert duration overhead within …
kosiew 8e11a2f
fix: update memory profiling test to use a complex query for baseline…
kosiew 24b5d3c
fix: add memory profiling integration tests to evaluate performance o…
kosiew 7daa999
fix: remove memory profiling test for enabled vs disabled comparison
kosiew 3342b17
fix: add memory profiling report content test to verify metrics capture
kosiew f40b74d
fix: update memory profiling report test to assert expected operator …
kosiew 65329e6
fix: update memory profiling report test to include additional expect…
kosiew 578ad5b
fix: update memory profiling report test to validate non-zero entries…
kosiew 4ec463c
fix: improve comment clarity in memory profiling report content test
kosiew b23228b
test: add memory profiling report test for disabled profiling scenario
kosiew f760140
fix: enhance error message for non-zero memory entry assertion in pro…
kosiew 265bb38
fix: update expected operator prefixes in memory profiling report test
kosiew d528496
fix: update comment to reflect accurate operator names in memory prof…
kosiew 1b8fff0
fix: remove top consumer example as it is no longer needed
kosiew 55e6738
fix: remove AutoSample variant from MemoryProfilingMode enum
kosiew 1413f0d
fix: update memory report handling in documentation for clarity
kosiew 568e19c
fix: conditionally enable Avro example in doctests based on feature flag
kosiew 0831b3a
fix: remove outdated memory profiling status messages from EnhancedMe…
kosiew 5456ef1
fix: add memory commands to DataFusion CLI usage documentation
kosiew 122d5dd
fix: update memory commands in DataFusion CLI usage documentation to …
kosiew 989a08f
fix: add memory profiling support to DataFusion CLI
kosiew 22f0f95
fix: enable memory profiling in execute
kosiew d58d13f
Revert "fix: enable memory profiling in execute"
kosiew e3295e7
Revert "fix: add memory profiling support to DataFusion CLI"
kosiew b32cd2b
feat: add memory profiling support to DataFusion CLI
kosiew e7aacc9
Revert "feat: add memory profiling support to DataFusion CLI"
kosiew 1655e99
feat: add memory profiling commands, tests and update documentation
kosiew 405d323
fix(tests): update snapshot for CLI memory profiling output
kosiew c135750
Revert "fix(tests): update snapshot for CLI memory profiling output"
kosiew 682aedf
```
kosiew 9a4ff0e
fix(tests): restore snapshot for CLI memory profiling enable and show…
kosiew 5e1cf5b
fix(tests): remove obsolete snapshot for CLI memory profiling enable …
kosiew 9c1e3f1
fix(tests): add new snapshot for CLI memory profiling enable and show…
kosiew 6ecbf65
fix(cli): rename memory command to memory profiling and update usage …
kosiew 7df3a9c
fix(tests): update CLI memory command to memory profiling in integrat…
kosiew 5cb4a5d
feat(cli): add enhanced memory report functionality and update comman…
kosiew dcf4f1a
feat(tests): update snapshot for memory profiling commands in CLI tests
kosiew e279780
Merge branch 'main' into memory-16904a
kosiew 8e02d83
fix(docs): update memory profiling commands in README for consistency
kosiew 9b071eb
refactor: delegate memory report logic to core SessionContext impleme…
kosiew 5812981
refactor: remove once_cell dependency and replace with LazyLock in me…
kosiew fd96dc8
feat: add operator categorization and utility function for query plans
kosiew 27f00dd
test: add unit tests for EnhancedMemoryReport categorization and memo…
kosiew e3efb72
refactor: reorganize EnhancedMemoryReport structure and improve opera…
kosiew 5953b0d
fix: add missing import for Url in EnhancedMemoryReport
kosiew de14455
feat: add print_analysis method to EnhancedMemoryReport for CLI output
kosiew 1965384
feat: add join operation categorization to EnhancedMemoryReport
kosiew a90fc3a
feat: implement MemoryUsage struct and MemoryExplain trait for memory…
kosiew c93b848
feat: add license headers and module documentation for memory profili…
kosiew 79bf76e
fix fmt errors
kosiew fd5e5a8
refactor: remove obsolete CLI memory snapshot test
kosiew 448f491
feat: remove blank line in Cargo.toml for cleaner formatting
kosiew b9cf128
refactor: replace LightweightMemoryTracker with MemoryTracker for imp…
kosiew 5fbcea1
```
kosiew eb0791b
fix: remove unnecessary whitespace in row_hash.rs file
kosiew b1f5c69
Merge branch 'main' into memory-16904
kosiew 4b163a2
fix clippy error
kosiew 08eaa4a
refactor: reorganize imports in memory_profiling.rs for better readab…
kosiew d8f32f7
refactor: reorganize and group imports for improved clarity in sessio…
kosiew 5d0e5f5
feat(docs): add memory profiling configuration option to user guide
kosiew 018a593
fix: implement Default for MemoryTracker to satisfy clippy lint
kosiew 7279659
fix: use Arc::clone for memory tracker in DataFrame and SessionContext
kosiew 1c15b66
fix md errors
kosiew 08e2d25
Update md docs
kosiew 75d7d32
fix: use Arc::clone for memory tracker in DataFrame to improve memory…
kosiew de25176
fix: replace std::time::Instant with datafusion_common::instant::Inst…
kosiew 6e8207d
fix: add datafusion-common dependency to Cargo.toml for example projects
kosiew 12b9fa4
fix: add datafusion-common dependency and improve print formatting in…
kosiew d34e0d7
Merge branch 'main' into memory-16904
kosiew 173486c
fix: Allow 'on' as an alias for 'enable' in MemoryProfiling command
kosiew 30c22ae
refactor: Remove tests for large dataset creation and memory profilin…
kosiew bfcc17f
Remove comment of unused import of LightweightMemoryTracker
kosiew 5aef326
refactor: Remove unnecessary blank line in tests module
kosiew 61b827a
refactor: Remove OperatorCategory enum and categorize_operator function
kosiew 0720d56
refactor: Remove lz4, zstd features
kosiew 59b4b1f
Merge branch 'main' into memory-16904
kosiew e394307
fix: Use options_mut() to set memory profiling mode in SessionConfig
kosiew 922cc45
refactor: Remove unused Operator import from execution context
kosiew 3a8af55
fix: Correct case of memory profiling commands in CLI usage documenta…
kosiew 9e06580
fix: Update memory profiling documentation and improve related code c…
kosiew f3aac60
feat: Enhance memory profiling with new MemoryReport struct and updat…
kosiew e3b731e
fix: Add missing import for clap::ValueEnum in command.rs
kosiew 38b8061
refactor: Rename print method to print_analysis in EnhancedMemoryRepo…
kosiew deef59b
refactor: Replace StdMutex with parking_lot::Mutex for improved perfo…
kosiew 5afd8f5
docs: Add documentation for print_analysis method in EnhancedMemoryRe…
kosiew 1a112e3
fix prettier errors
kosiew d82ee4c
refactor: Update memory profiling mode description and fix formatting…
kosiew 32749d6
Merge branch 'main' into memory-16904
kosiew 9922603
feat: Implement IntoIterator for MemoryReport to enable iteration ove…
kosiew 2909ed3
docs: Update comment for EnhancedMemoryReport to clarify its purpose
kosiew c58d5f6
docs: Fix formatting of memory profiling commands in README.md
kosiew c39916c
docs: Update memory profiling section in README.md with example and e…
kosiew aef5480
Merge branch 'main' into memory-16904
kosiew 208a540
Refactor memory profiling functionality in DataFusion
kosiew d18838d
Add memory profiling example to demonstrate tracking and reporting me…
kosiew c2631df
test(cli): update memory_enable_show snapshot to reflect recorded mem…
kosiew a480b78
feat(memory): integrate tracked memory pool and enhance profiling met…
kosiew dc85ae1
fix(tests): update memory profiling snapshot to reflect accurate output
kosiew da2b2c0
fix(tests): format memory profiling snapshot for consistency
kosiew 107a770
fix(metrics): standardize operator categorization to lowercase for co…
kosiew 3e09c22
fix(docs): update memory profiling output in README for accuracy
kosiew dc38e85
refactor(command): reorganize imports for improved readability
kosiew c8ce5ed
fix(command): datafusion-cli don't store metrics in print_options
kosiew 86db315
fix(exec): remove disable_tracking in exec_and_print
kosiew abeeff5
test(cli): add snapshot for memory profiling integration test
kosiew 2519537
fix(memory): immutable print_options
kosiew 6f6b111
fix(print_options): remove last_memory_metrics from PrintOptions
kosiew 1d36617
fix(config): remove unnecessary blank lines in config.rs
kosiew f4013db
fix(dataframe): remove unnecessary blank line in cache method
kosiew d549fe4
fix(mod.rs): add missing newline before module declarations
kosiew fe101cf
fix(session_state): reorganize imports for better readability
kosiew 2c9e4b9
fix(usage): update memory profiling output for clarity
kosiew 368c0e5
fix(configs): format license comment for improved readability
kosiew 7c591c3
fix fmt errors
kosiew 66ca74d
fix(license): improve formatting of license comments for consistency
kosiew 3a5694d
fix(command): update memory profiling command syntax for consistency
kosiew afa4018
fix(command): clarify memory profiling command description for better…
kosiew e3fb00c
fix(docs): amend README memory profiling command description for cons…
kosiew 9cdef04
Merge branch 'main' into memory-16904a
kosiew 0345c12
feat(memory): implement memory profiling support in CLI context
kosiew b08d4e2
Revert "feat(memory): implement memory profiling support in CLI context"
kosiew c6a7ba1
feat(memory): refactor memory profiling support in CLI context
kosiew dd48d0b
fix(reader): rename parameter for clarity in get_metadata function
kosiew 40b7663
fix(metrics): update log message for clarity in print_metrics function
kosiew cccdd14
docs: add memory profiling top_memory_consumers tip to README and usa…
kosiew 53f3a11
refactor(metrics): rename print_metrics to format_metrics and update …
kosiew 5488251
test(datafusion-cli): update cli_memory_enable_show snapshot (add tra…
kosiew b7014f9
fix(docs): correct memory usage label from 'Other' to 'Repartition' i…
kosiew 73b24a4
refactor(metrics): reorganize import statements for clarity
kosiew 292bdbb
docs(cli): fix memory profiling tip formatting in CLI usage docs
kosiew ecd135f
fix(reader): rename parameter for clarity in get_metadata function
kosiew 5d83cb8
fix(cli): update usage message for memory profiling command
kosiew 16d5072
fix(cli): loosen memory profiling output by replacing dynamic values …
kosiew 11d8b28
fix(cli): update memory profiling output to use placeholders for clarity
kosiew c78b834
fix(reader): rename parameter for clarity in get_metadata function
kosiew ac15b2a
feat(memory): add peak size method to MemoryReservation and update di…
kosiew e4e1cee
feat(cli): enhance memory pool management in ReplSessionContext
kosiew 3e8feee
refactor: remove memory profiling documentation from SessionContext
kosiew 4829d01
refactor: simplify Avro example documentation condition
kosiew 51a8a76
refactor: simplify operator category matching using a lookup table
kosiew 62edbdb
refactor: add additional operator categories for memory usage reporting
kosiew 47e9de3
refactor: remove unnecessary pool tracking enabling in exec_and_print
kosiew 6b5ffdb
refactor: change operator_category function visibility to public
kosiew cb3dbc2
refactor: update memory profiling test to include larger dataset and …
kosiew d860a5a
refactor: update memory profiling test to include additional query an…
kosiew 7b7c984
refactor: update memory profiling command usage message to include al…
kosiew a55fdc3
refactor: increase default value for top memory consumers from 3 to 5
kosiew be5c1b4
refactor: add CLI tests for memory profiling show commands
kosiew 5e9dde7
Merge branch 'main' into memory-16904
kosiew 062a19c
refactor: clean up imports in main.rs by removing redundant entry
kosiew 6d88978
fix: update context reference for registering `metadata_cache` UDTF i…
kosiew caf1863
fix: update parameter name for options in get_metadata method in Cach…
kosiew fba5cb1
remove \memory_profiling show
kosiew 3cf46f1
Merge branch 'main' into memory-16904
kosiew b51998a
refactor: simplify memory profiling logic in ReplSessionContext
kosiew 78efcf5
test: add CLI snapshot tests for various output formats and memory pr…
kosiew 45bc14d
test: update CLI memory profiling test input for improved formatting
kosiew d44798b
style: reorder import statements for improved readability
kosiew d80f994
refactor: remove mut base_pool
kosiew 5683217
test: enhance backtrace output verification to include planning error…
kosiew b6765ab
refactor: streamline memory pool implementation and enhance Arc<T> su…
kosiew 907c6af
refactor: simplify memory profiling logic in ReplSessionContext
kosiew 921be60
refactor: update memory profiling command to toggle state without arg…
kosiew 8224541
Merge branch 'main' into memory-16904
kosiew 19e2fb8
docs: clarify memory profiling command as a toggle in CLI usage
kosiew 6db9339
docs: improve documentation for MemoryPool implementation with Arc<T>
kosiew be23d39
fix(tests): update AWS region auto resolution snapshots to include me…
kosiew 28559b3
docs: add tip for memory profiling requirement in CLI usage documenta…
kosiew 27f85d3
fix(tests): update snapshot paths and add AWS environment variables
kosiew 660899b
docs: clarify memory profiling toggle note in CLI usage documentation
kosiew 333e5da
prettier config docs
kosiew eb33d21
fix(memory): update MemoryPool implementation for Arc<dyn MemoryPool>
kosiew d847244
Merge branch 'main' into memory-16904
kosiew d47fcec
refactor(session_state): reorganize use statements for improved reada…
kosiew a492a67
refactor(dataframe): simplify collect method by removing unnecessary …
kosiew 896d16d
refactor(command): remove FromStr implementation for MemoryProfilingC…
kosiew 20eec41
refactor(docs): move memory profiling instructions from README to usage
kosiew fe1b656
refactor(command): update MemoryProfiling command to disallow arguments
kosiew ab1aaab
refactor(memory): disable memory profiling by default and update rela…
kosiew 03ea519
refactor(memory): streamline memory pool initialization and remove un…
kosiew efabb40
refactor(cli): simplify memory profiling command syntax and update do…
kosiew 88c6343
Added a new tracking_enabled method to the TrackedPool trait and impl…
kosiew b6d07fa
Merge branch 'main' into memory-16904
kosiew e67f1d9
Merge branch 'main' into memory-16904
kosiew 16382fc
fix: update AWS endpoint in snapshot tests for consistency
kosiew 07bd009
style: reorder and format use statements for improved readability
kosiew df45d18
style: reorder and format use statements for improved readability
kosiew e055441
refactor(cli): simplify memory profiling command syntax and update do…
kosiew b791042
Merge branch 'main' into memory-16904
kosiew c161355
Merge branch 'main' into memory-16904
kosiew 38137b1
fix: update AWS endpoint in snapshot tests for consistency
kosiew 84e3458
style: reorder and format use statements for improved readability
kosiew 5c387e4
Merge branch 'memory-16904-toggle' into memory-16904
kosiew 63de515
fix: simplify memory profiling commands in cli_memory_disable_stops_r…
kosiew c3d4382
Merge branch 'main' into memory-16904
kosiew 5b08791
Remove unrelated changes and tidy-up
kosiew fe8f856
fix(docs): update default value for top memory consumers in CLI usage…
kosiew 9dd2518
feat(tests): add memory profiling command to top memory consumers test
kosiew 0a377c0
feat(examples): enhance memory profiling example with per-consumer tr…
kosiew 3d00ee2
fix test_cli_top_memory_consumers
kosiew f1cf29e
amend filter for backtrace
kosiew e09db5f
Merge branch 'main' into memory-16904
kosiew a7b04cc
Merge branch 'main' into memory-16904
kosiew a064cf0
fix(tests): update memory consumer regex to match optional peak memor…
kosiew b4b852a
Merge branch 'main' into memory-16904
alamb e049ebc
Merge branch 'main' into memory-16904
kosiew ea0f65d
Merge branch 'memory-16904' of github.com:kosiew/datafusion into memo…
kosiew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the memory profiling flag would best be stored on
PrintOptions, similarly to thequietmode flag (that suppresses execution time printing). Then you would not need to introduce so much new code and a new traithttps://github.com/apache/datafusion/blob/df45d186d34f2ac131d64e4a068d9f39b35e99c7/datafusion-cli/src/print_options.rs#L73-L72
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require print_options to be mut because we can
toggle memory_profiling
It was moved out of print_options after a comment that print_options should not be mut for memory profiling.