-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
BaselineMetrics provides a common set of metrics that should be included by all operators. This way we can keep those basic metrics consistent and also simplify implementation when adding more metrics into BaselineMetrics
datafusion/datafusion/physical-plan/src/metrics/baseline.rs
Lines 47 to 56 in a4f4b17
| pub struct BaselineMetrics { | |
| /// end_time is set when `ExecutionMetrics::done()` is called | |
| end_time: Timestamp, | |
| /// amount of time the operator was actively trying to use the CPU | |
| elapsed_compute: Time, | |
| /// output rows: the total output rows | |
| output_rows: Count, | |
| } |
However, @hendrikmakait noticed in #16244 (comment) that, some operators are not using BaselineMetrics, and instead include the same basic metrics into their specific metrics struct.
Specifically, SMJ operator keeps the same output_rows metrics but it's missing other baseline metircs.
datafusion/datafusion/physical-plan/src/joins/sort_merge_join.rs
Lines 603 to 619 in a4f4b17
| struct SortMergeJoinMetrics { | |
| /// Total time for joining probe-side batches to the build-side batches | |
| join_time: metrics::Time, | |
| /// Number of batches consumed by this operator | |
| input_batches: Count, | |
| /// Number of rows consumed by this operator | |
| input_rows: Count, | |
| /// Number of batches produced by this operator | |
| output_batches: Count, | |
| /// Number of rows produced by this operator | |
| output_rows: Count, | |
| /// Peak memory used for buffered data. | |
| /// Calculated as sum of peak memory values across partitions | |
| peak_mem_used: metrics::Gauge, | |
| /// Metrics related to spilling | |
| spill_metrics: SpillMetrics, | |
| } |
I think it's a good idea to clean it up by reusing BaselineMetrics.
Describe the solution you'd like
Refactor SortMergeJoinMetrics to reuse BaselineMetrics
Describe alternatives you've considered
No response
Additional context
No response