feat: Add `selectivity` metrics to `FilterExec` by 2010YOUY01 · Pull Request #18406 · apache/datafusion

2010YOUY01 · 2025-10-31T11:21:29Z

Which issue does this PR close?

Rationale for this change

In FilterExec, selectivity is calculated as output_rows/input_rows.
This PR supports such metric. I think this metrics provides important application-level insights, and would be commonly used, so it is displayed in the summary verbose level.

Demo in `datafusion-cli`

> set datafusion.explain.analyze_level = summary;
0 row(s) fetched.
Elapsed 0.000 seconds.

> explain analyze select * from generate_series(100) as t1(v1) where v1 <10;
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                   |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=10, elapsed_compute=1.763µs, output_bytes=64.0 KB]                                                                          |
|                   |   CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=10, elapsed_compute=25.833µs, output_bytes=64.0 KB]                                                                |
|                   |     FilterExec: value@0 < 10, metrics=[output_rows=10, elapsed_compute=34.888µs, output_bytes=128.0 B, selectivity=9.9% (10/101)]                                                      |
|                   |       RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=1, metrics=[]                                                                                                |
|                   |         LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=33.167µs, output_bytes=64.0 KB] |
|                   |                                                                                                                                                                                        |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.

What changes are included in this PR?

Add a new MetricValue for ratio.
Tracking selectivity in FilterExec with MetricValue::Ratio

Are these changes tested?

UT

Are there any user-facing changes?

No

alamb

This makes sense to me, though I have a suggestion to reduce the duplication between this and pruned metrics

alamb · 2025-10-31T20:50:11Z

datafusion/physical-plan/src/metrics/value.rs

+///
+/// The counters are thread-safe and shared across clones.
+#[derive(Debug, Clone, Default)]
+pub struct RatioMetrics {


this is basically the same as Pruned metrics except the display is different -- I wonder if we could consolidate the two somehow 🤔

Indeed the core logic is mostly duplicated, but I think the tradeoff is more lines of code for better readability, the APIs named add_matched/add_pruned for pruning are easier to understand than the APIs in the ratio metric.

xudong963

Thank you, love this one.

2010YOUY01 · 2025-11-02T01:30:00Z

Thank you @alamb and @xudong963 for the review!

## Which issue does this PR close?  Part of apache#18217 ## Rationale for this change  In `FilterExec`, selectivity is calculated as `output_rows/input_rows`. This PR supports such metric. I think this metrics provides important application-level insights, and would be commonly used, so it is displayed in the `summary` verbose level. ### Demo in `datafusion-cli` ``` > set datafusion.explain.analyze_level = summary; 0 row(s) fetched. Elapsed 0.000 seconds. > explain analyze select * from generate_series(100) as t1(v1) where v1 <10; +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=10, elapsed_compute=1.763µs, output_bytes=64.0 KB] | | | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=10, elapsed_compute=25.833µs, output_bytes=64.0 KB] | | | FilterExec: value@0 < 10, metrics=[output_rows=10, elapsed_compute=34.888µs, output_bytes=128.0 B, selectivity=9.9% (10/101)] | | | RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=1, metrics=[] | | | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=33.167µs, output_bytes=64.0 KB] | | | | +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.004 seconds. ``` ## What changes are included in this PR?  1. Add a new `MetricValue` for ratio. 2. Tracking selectivity in `FilterExec` with `MetricValue::Ratio` ## Are these changes tested? UT  ## Are there any user-facing changes? No

2010YOUY01 added 4 commits October 31, 2025 10:48

better format floats

c47de0f

Merge branch 'main' into metrics-selectivity

1255b38

fix test

4ed85bd

unit test for selectivity metrics

9d70df6

github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Oct 31, 2025

fix typo

6218bfc

alamb approved these changes Oct 31, 2025

View reviewed changes

xudong963 approved these changes Nov 1, 2025

View reviewed changes

2010YOUY01 added this pull request to the merge queue Nov 2, 2025

Merged via the queue into apache:main with commit d256caa Nov 2, 2025
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `selectivity` metrics to `FilterExec`#18406

feat: Add `selectivity` metrics to `FilterExec`#18406
2010YOUY01 merged 5 commits intoapache:mainfrom
2010YOUY01:metrics-selectivity

2010YOUY01 commented Oct 31, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb Oct 31, 2025

Uh oh!

2010YOUY01 Nov 2, 2025

Uh oh!

xudong963 left a comment

Uh oh!

2010YOUY01 commented Nov 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

2010YOUY01 commented Oct 31, 2025

Which issue does this PR close?

Rationale for this change

Demo in datafusion-cli

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 commented Nov 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Demo in `datafusion-cli`