Skip to content

Optimize TopK with threshold filter ~1.4x speedup #15697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Apr 13, 2025

Which issue does this PR close?

Rationale for this change

This optimizes our TopK by filtering early based on the threshold values, avoiding conversion to Row-values and slower conversions.
While pushing down to the scan is yielding more gains when possible, this is only possible if it is supported / enabled, has relevant statistics that allow pruning / filter pushdown is enabled and the TopK happens directly after a scan.

┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ improve_topk ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  24.28ms │      16.45ms │ +1.48x faster │
│ Q2           │  26.17ms │      20.65ms │ +1.27x faster │
│ Q3           │  79.67ms │      54.37ms │ +1.47x faster │
│ Q4           │  27.44ms │      21.16ms │ +1.30x faster │
│ Q5           │  17.38ms │      13.30ms │ +1.31x faster │
│ Q6           │  30.91ms │      28.07ms │ +1.10x faster │
│ Q7           │  74.48ms │      73.20ms │     no change │
│ Q8           │  76.44ms │      45.69ms │ +1.67x faster │
│ Q9           │  88.62ms │      58.62ms │ +1.51x faster │
│ Q10          │ 128.54ms │     100.68ms │ +1.28x faster │
│ Q11          │  72.47ms │      57.48ms │ +1.26x faster │
└──────────────┴──────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)           │ 646.41ms │
│ Total Time (improve_topk)   │ 489.66ms │
│ Average Time (main)         │  58.76ms │
│ Average Time (improve_topk) │  44.51ms │
│ Queries Faster              │       10 │
│ Queries Slower              │        0 │
│ Queries with No Change      │        1 │
└─────────────────────────────┴──────────┘

Also some clickbench queries seems to be improved:

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ improve_topk ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     1.32ms │       1.23ms │ +1.07x faster │
│ QQuery 1     │    24.07ms │      23.99ms │     no change │
│ QQuery 2     │    66.63ms │      64.99ms │     no change │
│ QQuery 3     │    54.47ms │      52.82ms │     no change │
│ QQuery 4     │   571.78ms │     480.98ms │ +1.19x faster │
│ QQuery 5     │   634.56ms │     621.52ms │     no change │
│ QQuery 6     │     1.05ms │       1.07ms │     no change │
│ QQuery 7     │    27.49ms │      28.62ms │     no change │
│ QQuery 8     │   593.15ms │     599.61ms │     no change │
│ QQuery 9     │   875.49ms │     859.69ms │     no change │
│ QQuery 10    │   193.04ms │     192.22ms │     no change │
│ QQuery 11    │   220.40ms │     212.42ms │     no change │
│ QQuery 12    │   764.13ms │     785.23ms │     no change │
│ QQuery 13    │  1006.88ms │    1041.78ms │     no change │
│ QQuery 14    │   673.52ms │     681.64ms │     no change │
│ QQuery 15    │   713.13ms │     706.34ms │     no change │
│ QQuery 16    │  1447.87ms │    1385.00ms │     no change │
│ QQuery 17    │  1325.88ms │    1285.75ms │     no change │
│ QQuery 18    │  2599.18ms │    2571.18ms │     no change │
│ QQuery 19    │    48.23ms │      49.82ms │     no change │
│ QQuery 20    │   950.18ms │     956.24ms │     no change │
│ QQuery 21    │  1135.24ms │    1164.32ms │     no change │
│ QQuery 22    │  1838.18ms │    1858.80ms │     no change │
│ QQuery 23    │  6291.70ms │    6117.67ms │     no change │
│ QQuery 24    │   387.35ms │     370.48ms │     no change │
│ QQuery 25    │   358.61ms │     300.32ms │ +1.19x faster │
│ QQuery 26    │   435.01ms │     366.38ms │ +1.19x faster │
│ QQuery 27    │  1456.11ms │    1410.06ms │     no change │
│ QQuery 28    │ 11483.18ms │   11470.65ms │     no change │
│ QQuery 29    │   433.16ms │     441.81ms │     no change │
│ QQuery 30    │   590.06ms │     602.67ms │     no change │
│ QQuery 31    │   595.45ms │     611.93ms │     no change │
│ QQuery 32    │  2735.39ms │    2203.37ms │ +1.24x faster │
│ QQuery 33    │  2928.70ms │    2918.67ms │     no change │
│ QQuery 34    │  3165.39ms │    3146.05ms │     no change │
│ QQuery 35    │   912.50ms │     911.56ms │     no change │
│ QQuery 36    │    83.08ms │      84.13ms │     no change │
│ QQuery 37    │    38.31ms │      38.61ms │     no change │
│ QQuery 38    │    83.66ms │      82.82ms │     no change │
│ QQuery 39    │   138.21ms │     139.02ms │     no change │
│ QQuery 40    │    28.63ms │      28.20ms │     no change │
│ QQuery 41    │    27.61ms │      28.06ms │     no change │
│ QQuery 42    │    23.49ms │      22.86ms │     no change │
└──────────────┴────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)           │ 47961.50ms │
│ Total Time (improve_topk)   │ 46920.59ms │
│ Average Time (main)         │  1115.38ms │
│ Average Time (improve_topk) │  1091.18ms │
│ Queries Faster              │          5 │
│ Queries Slower              │          0 │
│ Queries with No Change      │         38 │
└─────────────────────────────┴────────────┘

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

// If the heap doesn't have k elements yet, we can't create thresholds
match self.heap.max() {
Some(max_row) => {
// Get the batch that contains the max row
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a bit of code from @adriangb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of it can probably combined when dynamic filter for topk is ready

@Dandandan Dandandan changed the title Optimize TopK with filter Optimize TopK with filter Apr 13, 2025
@Dandandan Dandandan changed the title Optimize TopK with filter Optimize TopK with filter ~1.4x Apr 13, 2025
@Dandandan Dandandan changed the title Optimize TopK with filter ~1.4x Optimize TopK with filter ~1.4x faster Apr 13, 2025
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the ideas to basically do the same thing we're going to do for the dynamic filters but essentially do the filtering inside of top K to avoid some extra work. Is that correct? If so, it sounds like a great idea and we're going to be able to reuse a lot of the code

@Dandandan
Copy link
Contributor Author

Dandandan commented Apr 13, 2025

If I understand correctly, the ideas to basically do the same thing we're going to do for the dynamic filters but essentially do the filtering inside of top K to avoid some extra work. Is that correct? If so, it sounds like a great idea and we're going to be able to reuse a lot of the code

Yeah that's totally correct! The gains won't be as impressive as with dynamic filter being able to push it down to a scan, but still avoid work in TopK by not having to convert the sorting keys to row format.

@adriangb
Copy link
Contributor

adriangb commented Apr 13, 2025

Nice! We can even wire it up with the filter pushdown so that if an operator under us "absorbs" the filter (eg it got pushed down to the scan) we skip doing this internally.

But 1.4x faster is a great reason to merge this and re-use the code later.

@Dandandan
Copy link
Contributor Author

Nice! We can even wire it up with the filter pushdown so that if an operator under us "absorbs" the filter (eg it got pushed down to the scan) we skip doing this internally.

Yeah, would be useful to avoid filtering twice and the way to go👍

@adriangb
Copy link
Contributor

@Dandandan will be happy to review once CI is passing 😄

@Dandandan
Copy link
Contributor Author

Dandandan commented Apr 13, 2025

@adriangb FYI CI is passing, it's ready for review.
I had to make some changes to the filter that is applied to respect lexicographic ordering (which made Q7 lose the speedup), but it looks like it is still a big improvement while I can see benchmarks. I filed #15698 to support multiple columns + use BinaryExpr to utilize some further optimizations.

@Dandandan Dandandan changed the title Optimize TopK with filter ~1.4x faster Optimize TopK with filter ~1.4x speedup Apr 13, 2025
@Dandandan Dandandan changed the title Optimize TopK with filter ~1.4x speedup Optimize TopK with threshold filter ~1.4x speedup Apr 13, 2025
@adriangb
Copy link
Contributor

I'll take a look tomorrow! Why do we have to use only the first column? Is it just to break up the change into smaller units? We had multi-column support working in the now closed PR that added it.

@Dandandan
Copy link
Contributor Author

I'll take a look tomorrow! Why do we have to use only the first column? Is it just to break up the change into smaller units? We had multi-column support working in the now closed PR that added it.

Thanks!
Time was up yesterday.
I see your PR also handled it.

I think it is not super hard to add support for all columns, but want to benchmark the change well as well. As the first column(s) filter out most of the rows the gains for adding more filters become smaller and with many rows it might be faster to only keep a smaller number of first sort columns instead of filtering on all.

}
let filter_predicate = FilterBuilder::new(&filter);
let filter_predicate = if sort_keys.len() > 1 {
filter_predicate.optimize().build()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments to explain this optimize()? The original doc is not super clear I think.

@@ -212,6 +212,10 @@ main() {
# same data as for tpch
data_tpch "1"
;;
sort_tpch_limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best name would be, but I feel it would be useful for discoverability to have topk in it. tpch_topk? sort_tpch_topk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please also add a description of this benchmark in https://github.com/apache/datafusion/tree/main/benchmarks#benchmarks ?

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be nicer (and tie in better with future work 😉) if we essentially followed the structure of #15301 but do the filtering in TopK or SortExec:

  1. Keep track of a thresholds: Arc<RwLock<Vec<Option<ScalarValue>>>> and filter: Option<Arc>onTopK`.
  2. For each batch check pass it through the existing filter, if any, and exit early if no rows remain.
  3. If we updated our heap propagate the update to thresholds and filter.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan and @2010YOUY01 and @adriangb and @geoffreyclaude !

One thing I was wondering about for this PR is how much will it help once we implement actual topk filter pushdown into the scan (aka #15037)

I am thinking that the topk filter pushdown will already filter out rows that are known not to be in the topK

Specifically, once we implement topk filter pushdown the rows should already be filtered and so checking again in the TopK itself won't add any benefit, will it?

@@ -212,6 +212,10 @@ main() {
# same data as for tpch
data_tpch "1"
;;
sort_tpch_limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please also add a description of this benchmark in https://github.com/apache/datafusion/tree/main/benchmarks#benchmarks ?

@adriangb
Copy link
Contributor

Thanks @Dandandan and @2010YOUY01 and @adriangb and @geoffreyclaude !

One thing I was wondering about for this PR is how much will it help once we implement actual topk filter pushdown into the scan (aka #15037)

I am thinking that the topk filter pushdown will already filter out rows that are known not to be in the topK

Specifically, once we implement topk filter pushdown the rows should already be filtered and so checking again in the TopK itself won't add any benefit, will it?

Yes that's right for Parquet, but not all data sources support filter pushdown, so there's still benefit for those. But yeah, I'm hoping we can structure this in a way that we get an immediate win that justifies the change but also introduces all of the code necessary for filter push down later on.

@Dandandan
Copy link
Contributor Author

Dandandan commented Apr 15, 2025

I am am thinking that the topk filter pushdown will already filter out rows that are known not to be in the topK

Yes, in those cases it might not be adding something but it is still useful in the following cases:

  • Filter pushdown not enabled (this is not yet default)
  • Stats not enabled or not in useful distribution to allow effective pruning
  • Non Parquet sources
  • TopK on a plan that doesn't allow pushing down the filter to a source (i.e. most plans involving aggregate, joins, ...)

Dandandan and others added 3 commits April 15, 2025 03:13
@alamb
Copy link
Contributor

alamb commented Apr 15, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.8.0-1016-gcp #18-Ubuntu SMP Fri Oct 4 22:16:29 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve_topk (923089f) to 0b01fdf diff
Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@Dandandan
Copy link
Contributor Author

I think this would be nicer (and tie in better with future work 😉) if we essentially followed the structure of #15301 but do the filtering in TopK or SortExec:

  1. Keep track of a thresholds: Arc<RwLock<Vec<Option<ScalarValue>>>> and filter: OptiononTopK`.
  2. For each batch check pass it through the existing filter, if any, and exit early if no rows remain.
  3. If we updated our heap propagate the update to thresholds and filter.

Yeah this sounds like a good idea, let me have a look to make some changes in this direction within the scope of this PR.

@alamb
Copy link
Contributor

alamb commented Apr 15, 2025

🤖: Benchmark completed

Details

Comparing HEAD and improve_topk
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ improve_topk ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1943.28ms │    1969.83ms │    no change │
│ QQuery 1     │   681.96ms │     728.58ms │ 1.07x slower │
│ QQuery 2     │  1436.48ms │    1490.62ms │    no change │
│ QQuery 3     │   722.59ms │     703.61ms │    no change │
│ QQuery 4     │  1510.12ms │    1488.30ms │    no change │
│ QQuery 5     │ 17120.46ms │   17285.82ms │    no change │
│ QQuery 6     │  2018.04ms │    2130.12ms │ 1.06x slower │
└──────────────┴────────────┴──────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 25432.93ms │
│ Total Time (improve_topk)   │ 25796.88ms │
│ Average Time (HEAD)         │  3633.28ms │
│ Average Time (improve_topk) │  3685.27ms │
│ Queries Faster              │          0 │
│ Queries Slower              │          2 │
│ Queries with No Change      │          5 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ improve_topk ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.09ms │       2.30ms │  1.10x slower │
│ QQuery 1     │    35.96ms │      41.81ms │  1.16x slower │
│ QQuery 2     │    89.21ms │      94.35ms │  1.06x slower │
│ QQuery 3     │   100.11ms │     106.10ms │  1.06x slower │
│ QQuery 4     │   740.65ms │     844.59ms │  1.14x slower │
│ QQuery 5     │   893.88ms │     872.62ms │     no change │
│ QQuery 6     │     2.08ms │       2.15ms │     no change │
│ QQuery 7     │    43.60ms │      42.68ms │     no change │
│ QQuery 8     │   975.40ms │     926.48ms │ +1.05x faster │
│ QQuery 9     │  1215.04ms │    1248.24ms │     no change │
│ QQuery 10    │   269.69ms │     279.78ms │     no change │
│ QQuery 11    │   311.36ms │     308.34ms │     no change │
│ QQuery 12    │   970.22ms │     957.57ms │     no change │
│ QQuery 13    │  1298.95ms │    1405.24ms │  1.08x slower │
│ QQuery 14    │   895.31ms │     888.76ms │     no change │
│ QQuery 15    │  1091.69ms │    1055.89ms │     no change │
│ QQuery 16    │  1794.06ms │    1763.62ms │     no change │
│ QQuery 17    │  1702.88ms │    1671.17ms │     no change │
│ QQuery 18    │  3417.49ms │    3264.16ms │     no change │
│ QQuery 19    │    86.44ms │      90.38ms │     no change │
│ QQuery 20    │  1208.05ms │    1160.72ms │     no change │
│ QQuery 21    │  1350.88ms │    1396.17ms │     no change │
│ QQuery 22    │  2391.91ms │    2464.59ms │     no change │
│ QQuery 23    │  8396.24ms │    8636.63ms │     no change │
│ QQuery 24    │   475.01ms │     482.72ms │     no change │
│ QQuery 25    │   397.48ms │     331.76ms │ +1.20x faster │
│ QQuery 26    │   542.99ms │     470.24ms │ +1.15x faster │
│ QQuery 27    │  1692.92ms │    1733.53ms │     no change │
│ QQuery 28    │ 12848.64ms │   12869.68ms │     no change │
│ QQuery 29    │   542.01ms │     525.71ms │     no change │
│ QQuery 30    │   831.35ms │     851.85ms │     no change │
│ QQuery 31    │   886.12ms │     874.50ms │     no change │
│ QQuery 32    │  2733.36ms │    2737.27ms │     no change │
│ QQuery 33    │  3387.35ms │    3435.62ms │     no change │
│ QQuery 34    │  3403.40ms │    3421.25ms │     no change │
│ QQuery 35    │  1332.67ms │    1304.71ms │     no change │
│ QQuery 36    │   129.50ms │     130.03ms │     no change │
│ QQuery 37    │    57.73ms │      59.25ms │     no change │
│ QQuery 38    │   128.54ms │     127.34ms │     no change │
│ QQuery 39    │   208.60ms │     202.00ms │     no change │
│ QQuery 40    │    51.24ms │      50.71ms │     no change │
│ QQuery 41    │    46.59ms │      44.84ms │     no change │
│ QQuery 42    │    41.40ms │      39.66ms │     no change │
└──────────────┴────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 59020.08ms │
│ Total Time (improve_topk)   │ 59217.06ms │
│ Average Time (HEAD)         │  1372.56ms │
│ Average Time (improve_topk) │  1377.14ms │
│ Queries Faster              │          3 │
│ Queries Slower              │          6 │
│ Queries with No Change      │         34 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     HEAD ┃ improve_topk ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 121.42ms │     121.00ms │     no change │
│ QQuery 2     │  24.65ms │      23.99ms │     no change │
│ QQuery 3     │  37.12ms │      36.85ms │     no change │
│ QQuery 4     │  20.75ms │      21.37ms │     no change │
│ QQuery 5     │  58.62ms │      56.99ms │     no change │
│ QQuery 6     │   8.25ms │       8.55ms │     no change │
│ QQuery 7     │ 108.04ms │     103.83ms │     no change │
│ QQuery 8     │  26.38ms │      27.04ms │     no change │
│ QQuery 9     │  64.14ms │      65.01ms │     no change │
│ QQuery 10    │  60.60ms │      60.32ms │     no change │
│ QQuery 11    │  13.10ms │      13.68ms │     no change │
│ QQuery 12    │  38.69ms │      37.79ms │     no change │
│ QQuery 13    │  30.78ms │      29.11ms │ +1.06x faster │
│ QQuery 14    │  10.42ms │      10.44ms │     no change │
│ QQuery 15    │  26.47ms │      26.78ms │     no change │
│ QQuery 16    │  24.46ms │      24.51ms │     no change │
│ QQuery 17    │ 100.25ms │      97.99ms │     no change │
│ QQuery 18    │ 249.59ms │     243.85ms │     no change │
│ QQuery 19    │  29.34ms │      30.67ms │     no change │
│ QQuery 20    │  40.78ms │      40.13ms │     no change │
│ QQuery 21    │ 168.74ms │     176.15ms │     no change │
│ QQuery 22    │  17.23ms │      18.98ms │  1.10x slower │
└──────────────┴──────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary           ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 1279.82ms │
│ Total Time (improve_topk)   │ 1275.03ms │
│ Average Time (HEAD)         │   58.17ms │
│ Average Time (improve_topk) │   57.96ms │
│ Queries Faster              │         1 │
│ Queries Slower              │         1 │
│ Queries with No Change      │        20 │
└─────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

This slowdown in the queries probably is because the branch didn't have the upgrade to arrow 55 in it #15466

🤖: Benchmark completed

Details

Would maybe nice to have the benchmarking code to automatically apply the changes onto master to avoid this?

@alamb
Copy link
Contributor

alamb commented Apr 15, 2025

Would maybe nice to have the benchmarking code to automatically apply the changes onto master to avoid this?

My benchmarking script compares the branch with merge base (aka the last commit common with main)

At least that was the intention

https://github.com/alamb/datafusion-benchmarking/blob/a638da44f509907d704115a6d425672bed95484f/gh_compare_branch.sh#L49

@alamb
Copy link
Contributor

alamb commented Apr 15, 2025

I'll run it again to see if the results are repeatable (For the queries that run in very small time limits I think tie benchmarks are noisy)

@alamb
Copy link
Contributor

alamb commented Apr 15, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.8.0-1016-gcp #18-Ubuntu SMP Fri Oct 4 22:16:29 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve_topk (6fb5b59) to cde8690 diff
Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Apr 15, 2025

🤖: Benchmark completed

Details

Comparing HEAD and improve_topk
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ improve_topk ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2026.61ms │    1972.90ms │ no change │
│ QQuery 1     │   715.64ms │     733.88ms │ no change │
│ QQuery 2     │  1439.69ms │    1457.79ms │ no change │
│ QQuery 3     │   703.93ms │     730.90ms │ no change │
│ QQuery 4     │  1536.30ms │    1538.20ms │ no change │
│ QQuery 5     │ 15657.54ms │   15848.80ms │ no change │
│ QQuery 6     │  2071.27ms │    2106.06ms │ no change │
└──────────────┴────────────┴──────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 24150.98ms │
│ Total Time (improve_topk)   │ 24388.53ms │
│ Average Time (HEAD)         │  3450.14ms │
│ Average Time (improve_topk) │  3484.08ms │
│ Queries Faster              │          0 │
│ Queries Slower              │          0 │
│ Queries with No Change      │          7 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ improve_topk ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.09ms │       2.63ms │  1.26x slower │
│ QQuery 1     │    36.65ms │      38.61ms │  1.05x slower │
│ QQuery 2     │    90.81ms │      95.54ms │  1.05x slower │
│ QQuery 3     │    99.37ms │     100.80ms │     no change │
│ QQuery 4     │   774.97ms │     773.49ms │     no change │
│ QQuery 5     │   866.06ms │     892.78ms │     no change │
│ QQuery 6     │     2.02ms │       2.11ms │     no change │
│ QQuery 7     │    41.51ms │      42.36ms │     no change │
│ QQuery 8     │   930.75ms │     946.94ms │     no change │
│ QQuery 9     │  1230.00ms │    1254.72ms │     no change │
│ QQuery 10    │   273.05ms │     279.80ms │     no change │
│ QQuery 11    │   304.70ms │     318.68ms │     no change │
│ QQuery 12    │   929.11ms │     943.79ms │     no change │
│ QQuery 13    │  1415.14ms │    1329.56ms │ +1.06x faster │
│ QQuery 14    │   893.55ms │     887.62ms │     no change │
│ QQuery 15    │  1069.65ms │    1054.51ms │     no change │
│ QQuery 16    │  1763.26ms │    1764.22ms │     no change │
│ QQuery 17    │  1615.11ms │    1663.77ms │     no change │
│ QQuery 18    │  3111.15ms │    3124.48ms │     no change │
│ QQuery 19    │    87.64ms │      85.66ms │     no change │
│ QQuery 20    │  1153.52ms │    1178.78ms │     no change │
│ QQuery 21    │  1329.54ms │    1391.11ms │     no change │
│ QQuery 22    │  2350.61ms │    2477.51ms │  1.05x slower │
│ QQuery 23    │  8507.13ms │    8762.70ms │     no change │
│ QQuery 24    │   476.97ms │     467.63ms │     no change │
│ QQuery 25    │   397.00ms │     327.40ms │ +1.21x faster │
│ QQuery 26    │   537.83ms │     471.77ms │ +1.14x faster │
│ QQuery 27    │  1708.33ms │    1729.52ms │     no change │
│ QQuery 28    │ 12735.31ms │   12838.94ms │     no change │
│ QQuery 29    │   536.00ms │     538.85ms │     no change │
│ QQuery 30    │   843.94ms │     836.01ms │     no change │
│ QQuery 31    │   885.37ms │     865.55ms │     no change │
│ QQuery 32    │  2750.56ms │    2642.99ms │     no change │
│ QQuery 33    │  3417.81ms │    3392.42ms │     no change │
│ QQuery 34    │  3515.91ms │    3454.96ms │     no change │
│ QQuery 35    │  1359.29ms │    1333.59ms │     no change │
│ QQuery 36    │   131.54ms │     128.80ms │     no change │
│ QQuery 37    │    57.62ms │      58.17ms │     no change │
│ QQuery 38    │   126.41ms │     128.85ms │     no change │
│ QQuery 39    │   205.97ms │     209.53ms │     no change │
│ QQuery 40    │    47.59ms │      49.37ms │     no change │
│ QQuery 41    │    46.52ms │      49.23ms │  1.06x slower │
│ QQuery 42    │    37.84ms │      41.18ms │  1.09x slower │
└──────────────┴────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 58695.21ms │
│ Total Time (improve_topk)   │ 58976.93ms │
│ Average Time (HEAD)         │  1365.00ms │
│ Average Time (improve_topk) │  1371.56ms │
│ Queries Faster              │          3 │
│ Queries Slower              │          6 │
│ Queries with No Change      │         34 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃     HEAD ┃ improve_topk ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 123.73ms │     129.65ms │    no change │
│ QQuery 2     │  24.28ms │      24.16ms │    no change │
│ QQuery 3     │  36.72ms │      35.07ms │    no change │
│ QQuery 4     │  20.42ms │      21.08ms │    no change │
│ QQuery 5     │  56.03ms │      55.11ms │    no change │
│ QQuery 6     │   8.02ms │       8.29ms │    no change │
│ QQuery 7     │ 100.49ms │     106.63ms │ 1.06x slower │
│ QQuery 8     │  25.88ms │      26.98ms │    no change │
│ QQuery 9     │  62.54ms │      63.99ms │    no change │
│ QQuery 10    │  58.84ms │      60.65ms │    no change │
│ QQuery 11    │  13.08ms │      13.25ms │    no change │
│ QQuery 12    │  39.76ms │      38.79ms │    no change │
│ QQuery 13    │  29.76ms │      30.09ms │    no change │
│ QQuery 14    │   9.76ms │      11.90ms │ 1.22x slower │
│ QQuery 15    │  24.64ms │      25.52ms │    no change │
│ QQuery 16    │  22.78ms │      24.05ms │ 1.06x slower │
│ QQuery 17    │  98.59ms │      94.64ms │    no change │
│ QQuery 18    │ 240.57ms │     237.52ms │    no change │
│ QQuery 19    │  28.10ms │      27.83ms │    no change │
│ QQuery 20    │  38.31ms │      40.77ms │ 1.06x slower │
│ QQuery 21    │ 172.17ms │     168.69ms │    no change │
│ QQuery 22    │  16.74ms │      17.34ms │    no change │
└──────────────┴──────────┴──────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary           ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 1251.20ms │
│ Total Time (improve_topk)   │ 1261.99ms │
│ Average Time (HEAD)         │   56.87ms │
│ Average Time (improve_topk) │   57.36ms │
│ Queries Faster              │         0 │
│ Queries Slower              │         4 │
│ Queries with No Change      │        18 │
└─────────────────────────────┴───────────┘

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 18, 2025
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Apr 18, 2025
@adriangb
Copy link
Contributor

FYI @Dandandan although very rough I put up a draft of filter pushdown in #15770.

The interaction with this PR is something to think about. In particular it’d be nice if TopK could know if it’s filter got pushed down as exact or not, or if a FilterExec got created with that filter, so that it can avoid re-filtering

@Dandandan
Copy link
Contributor Author

FYI @Dandandan although very rough I put up a draft of filter pushdown in #15770.

The interaction with this PR is something to think about. In particular it’d be nice if TopK could know if it’s filter got pushed down as exact or not, or if a FilterExec got created with that filter, so that it can avoid re-filtering

Yeah would be good (I think it could follow when pushding down the filter into the scan is completed).
Probably the effect would be small, as the filter itself seems not very expensive and if topk is pushed down, the number of rows coming into topk is also itself significantly reduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize TopK with filter
6 participants