Skip to content

Refactor SortMergeJoinMetrics to reuse BaselineMetrics #16493

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

BaselineMetrics provides a common set of metrics that should be included by all operators. This way we can keep those basic metrics consistent and also simplify implementation when adding more metrics into BaselineMetrics

pub struct BaselineMetrics {
/// end_time is set when `ExecutionMetrics::done()` is called
end_time: Timestamp,
/// amount of time the operator was actively trying to use the CPU
elapsed_compute: Time,
/// output rows: the total output rows
output_rows: Count,
}

However, @hendrikmakait noticed in #16244 (comment) that, some operators are not using BaselineMetrics, and instead include the same basic metrics into their specific metrics struct.

Specifically, SMJ operator keeps the same output_rows metrics but it's missing other baseline metircs.

struct SortMergeJoinMetrics {
/// Total time for joining probe-side batches to the build-side batches
join_time: metrics::Time,
/// Number of batches consumed by this operator
input_batches: Count,
/// Number of rows consumed by this operator
input_rows: Count,
/// Number of batches produced by this operator
output_batches: Count,
/// Number of rows produced by this operator
output_rows: Count,
/// Peak memory used for buffered data.
/// Calculated as sum of peak memory values across partitions
peak_mem_used: metrics::Gauge,
/// Metrics related to spilling
spill_metrics: SpillMetrics,
}

I think it's a good idea to clean it up by reusing BaselineMetrics.

Describe the solution you'd like

Refactor SortMergeJoinMetrics to reuse BaselineMetrics

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions