Skip to content

Conversation

@xudong963
Copy link
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

If the two projections have same exprs (my real case). We can directly remove the current projection and return the child projection. Avoid really running the merge_consecutive_projections method. This would improve performance if the projections have many exprs

Benchmark

use criterion::{criterion_group, criterion_main, Criterion};
use datafusion_common::{DFSchema, ScalarValue, Result};
use datafusion_expr::{col, lit, Expr, LogicalPlan, Projection};
use std::sync::Arc;
use datafusion_optimizer::optimize_projections::{merge_consecutive_projections, merge_consecutive_projections_v2};

/// Creates a projection with another projection as input, both with n expressions
fn create_nested_projection(n_exprs: usize) -> Result<Projection> {
    // Create a simple empty relation as the base
    let schema = Arc::new(DFSchema::empty());
    let base_plan = LogicalPlan::EmptyRelation(
        datafusion_expr::EmptyRelation {
            produce_one_row: true,
            schema: schema.clone(),
        }
    );

    // Create inner projection with n expressions (i AS col_i)
    let inner_exprs: Vec<Expr> = (0..n_exprs)
        .map(|i| lit(ScalarValue::Int32(Some(i as i32))).alias(format!("col_{}", i)))
        .collect();

    let inner_projection = Projection::try_new(
        inner_exprs.clone(),
        Arc::new(base_plan)
    )?;

    Projection::try_new(
        inner_exprs,
        Arc::new(LogicalPlan::Projection(inner_projection))
    )
}

fn bench_merge_consecutive_projections(c: &mut Criterion) {
    let n_exprs = 1000;
    let projection = create_nested_projection(n_exprs).unwrap();

    let mut group = c.benchmark_group("projection_merge");

    group.bench_function("merge_consecutive_projections", |b| {
        b.iter(|| {
            let proj_clone = projection.clone();
            merge_consecutive_projections(proj_clone).unwrap()
        });
    });

    group.bench_function("merge_consecutive_projections_v2", |b| {
        b.iter(|| {
            let proj_clone = projection.clone();
            merge_consecutive_projections_v2(proj_clone).unwrap()
        });
    });

    group.finish();
}

criterion_group!(benches, bench_merge_consecutive_projections);
criterion_main!(benches);

Benchmark result

    Finished `bench` profile [optimized] target(s) in 36.80s
     Running benches/projection_merge.rs (arrow-datafusion/target/release/deps/projection_merge-968d896acadbdb0b)
projection_merge/merge_consecutive_projections
                        time:   [219.66 µs 220.36 µs 221.21 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
projection_merge/merge_consecutive_projections_v2
                        time:   [66.409 µs 66.974 µs 67.756 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

What changes are included in this PR?

Add a fast path

Are these changes tested?

By existing tests

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Apr 17, 2025
@xudong963 xudong963 changed the title Add fast path for optimize_projection Add a fast path for optimize_projection Apr 17, 2025
@xudong963 xudong963 added the performance Make DataFusion faster label Apr 17, 2025
@xudong963 xudong963 force-pushed the speed_up_optimize_projection branch from cd85701 to f62173e Compare April 17, 2025 09:41
Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xudong963 LGTM.
I ran the clickbecnh_1 to ensure the long projection case won't be slower.

Comparing speed_up_optimize_projection-disable and speed_up_optimize_projection
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ speed_up_optimize_projection-disable ┃ speed_up_optimize_projection ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │                               0.21ms │                       0.27ms │  1.29x slower │
│ QQuery 1     │                              26.94ms │                      27.52ms │     no change │
│ QQuery 2     │                              53.18ms │                      50.71ms │     no change │
│ QQuery 3     │                              46.54ms │                      46.98ms │     no change │
│ QQuery 4     │                             306.56ms │                     296.81ms │     no change │
│ QQuery 5     │                             435.87ms │                     430.61ms │     no change │
│ QQuery 6     │                               0.20ms │                       0.20ms │     no change │
│ QQuery 7     │                              29.09ms │                      30.44ms │     no change │
│ QQuery 8     │                             356.99ms │                     351.61ms │     no change │
│ QQuery 9     │                             508.16ms │                     506.84ms │     no change │
│ QQuery 10    │                             124.69ms │                     128.00ms │     no change │
│ QQuery 11    │                             146.35ms │                     148.31ms │     no change │
│ QQuery 12    │                             469.04ms │                     467.58ms │     no change │
│ QQuery 13    │                             563.89ms │                     587.40ms │     no change │
│ QQuery 14    │                             437.09ms │                     434.55ms │     no change │
│ QQuery 15    │                             352.85ms │                     357.64ms │     no change │
│ QQuery 16    │                             809.19ms │                     827.07ms │     no change │
│ QQuery 17    │                             737.81ms │                     737.52ms │     no change │
│ QQuery 18    │                            2107.86ms │                    1865.77ms │ +1.13x faster │
│ QQuery 19    │                              44.79ms │                      44.96ms │     no change │
│ QQuery 20    │                             639.96ms │                     624.59ms │     no change │
│ QQuery 21    │                             747.94ms │                     793.88ms │  1.06x slower │
│ QQuery 22    │                            1480.86ms │                    1533.00ms │     no change │
│ QQuery 23    │                            4606.85ms │                    4562.09ms │     no change │
│ QQuery 24    │                             273.64ms │                     267.35ms │     no change │
│ QQuery 25    │                             272.76ms │                     271.14ms │     no change │
│ QQuery 26    │                             303.19ms │                     315.82ms │     no change │
│ QQuery 27    │                             980.59ms │                     980.28ms │     no change │
│ QQuery 28    │                            7661.86ms │                    7857.94ms │     no change │
│ QQuery 29    │                             345.65ms │                     358.20ms │     no change │
│ QQuery 30    │                             383.39ms │                     376.86ms │     no change │
│ QQuery 31    │                             389.84ms │                     395.84ms │     no change │
│ QQuery 32    │                            1705.60ms │                    1648.16ms │     no change │
│ QQuery 33    │                            1816.90ms │                    1898.96ms │     no change │
│ QQuery 34    │                            2069.67ms │                    2047.80ms │     no change │
│ QQuery 35    │                             517.60ms │                     507.64ms │     no change │
│ QQuery 36    │                              79.65ms │                      79.48ms │     no change │
│ QQuery 37    │                              39.88ms │                      39.67ms │     no change │
│ QQuery 38    │                              79.42ms │                      79.24ms │     no change │
│ QQuery 39    │                             131.94ms │                     127.71ms │     no change │
│ QQuery 40    │                              28.20ms │                      29.04ms │     no change │
│ QQuery 41    │                              27.70ms │                      28.23ms │     no change │
│ QQuery 42    │                              26.41ms │                      23.72ms │ +1.11x faster │
└──────────────┴──────────────────────────────────────┴──────────────────────────────┴───────────────┘

@alamb
Copy link
Contributor

alamb commented Apr 17, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.8.0-1016-gcp #18-Ubuntu SMP Fri Oct 4 22:16:29 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Comparing speed_up_optimize_projection (f62173e) to 2cba3ad diff
Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Apr 17, 2025

🤖: Benchmark completed

Details

Comparing HEAD and speed_up_optimize_projection
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ speed_up_optimize_projection ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1878.10ms │                    1982.90ms │ 1.06x slower │
│ QQuery 1     │   752.74ms │                     733.41ms │    no change │
│ QQuery 2     │  1486.85ms │                    1480.35ms │    no change │
│ QQuery 3     │   722.30ms │                     724.74ms │    no change │
│ QQuery 4     │  1461.54ms │                    1471.33ms │    no change │
│ QQuery 5     │ 15306.71ms │                   15481.21ms │    no change │
│ QQuery 6     │  2057.42ms │                    2052.96ms │    no change │
└──────────────┴────────────┴──────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 23665.66ms │
│ Total Time (speed_up_optimize_projection)   │ 23926.90ms │
│ Average Time (HEAD)                         │  3380.81ms │
│ Average Time (speed_up_optimize_projection) │  3418.13ms │
│ Queries Faster                              │          0 │
│ Queries Slower                              │          1 │
│ Queries with No Change                      │          6 │
└─────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ speed_up_optimize_projection ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.06ms │                       2.46ms │  1.19x slower │
│ QQuery 1     │    38.58ms │                      38.28ms │     no change │
│ QQuery 2     │    92.70ms │                      92.70ms │     no change │
│ QQuery 3     │    97.61ms │                      99.82ms │     no change │
│ QQuery 4     │   757.33ms │                     754.91ms │     no change │
│ QQuery 5     │   868.68ms │                     866.85ms │     no change │
│ QQuery 6     │     2.11ms │                       2.39ms │  1.13x slower │
│ QQuery 7     │    44.61ms │                      44.49ms │     no change │
│ QQuery 8     │   914.34ms │                     958.88ms │     no change │
│ QQuery 9     │  1197.35ms │                    1335.72ms │  1.12x slower │
│ QQuery 10    │   280.35ms │                     277.21ms │     no change │
│ QQuery 11    │   317.00ms │                     311.89ms │     no change │
│ QQuery 12    │   933.43ms │                     953.26ms │     no change │
│ QQuery 13    │  1353.78ms │                    1187.68ms │ +1.14x faster │
│ QQuery 14    │   849.73ms │                     858.20ms │     no change │
│ QQuery 15    │  1057.46ms │                    1057.40ms │     no change │
│ QQuery 16    │  1756.38ms │                    1763.64ms │     no change │
│ QQuery 17    │  1610.05ms │                    1643.65ms │     no change │
│ QQuery 18    │  3115.25ms │                    3159.56ms │     no change │
│ QQuery 19    │    84.70ms │                      84.77ms │     no change │
│ QQuery 20    │  1151.16ms │                    1138.42ms │     no change │
│ QQuery 21    │  1307.50ms │                    1343.72ms │     no change │
│ QQuery 22    │  2361.41ms │                    2381.62ms │     no change │
│ QQuery 23    │  8523.42ms │                    8618.68ms │     no change │
│ QQuery 24    │   477.11ms │                     499.04ms │     no change │
│ QQuery 25    │   391.38ms │                     402.51ms │     no change │
│ QQuery 26    │   538.03ms │                     554.73ms │     no change │
│ QQuery 27    │  1678.72ms │                    1714.18ms │     no change │
│ QQuery 28    │ 12668.72ms │                   13275.12ms │     no change │
│ QQuery 29    │   536.05ms │                     531.67ms │     no change │
│ QQuery 30    │   850.19ms │                     843.19ms │     no change │
│ QQuery 31    │   923.59ms │                     886.73ms │     no change │
│ QQuery 32    │  2773.10ms │                    2707.61ms │     no change │
│ QQuery 33    │  3486.97ms │                    3429.91ms │     no change │
│ QQuery 34    │  3691.14ms │                    3529.34ms │     no change │
│ QQuery 35    │  1344.73ms │                    1376.59ms │     no change │
│ QQuery 36    │   124.92ms │                     128.34ms │     no change │
│ QQuery 37    │    59.39ms │                      57.88ms │     no change │
│ QQuery 38    │   129.03ms │                     124.86ms │     no change │
│ QQuery 39    │   203.61ms │                     208.92ms │     no change │
│ QQuery 40    │    48.29ms │                      49.77ms │     no change │
│ QQuery 41    │    48.00ms │                      48.52ms │     no change │
│ QQuery 42    │    40.93ms │                      39.50ms │     no change │
└──────────────┴────────────┴──────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 58730.89ms │
│ Total Time (speed_up_optimize_projection)   │ 59384.62ms │
│ Average Time (HEAD)                         │  1365.83ms │
│ Average Time (speed_up_optimize_projection) │  1381.04ms │
│ Queries Faster                              │          1 │
│ Queries Slower                              │          3 │
│ Queries with No Change                      │         39 │
└─────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     HEAD ┃ speed_up_optimize_projection ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 122.85ms │                     122.96ms │     no change │
│ QQuery 2     │  24.88ms │                      23.93ms │     no change │
│ QQuery 3     │  34.40ms │                      35.19ms │     no change │
│ QQuery 4     │  20.67ms │                      20.20ms │     no change │
│ QQuery 5     │  55.01ms │                      54.97ms │     no change │
│ QQuery 6     │   8.18ms │                       8.10ms │     no change │
│ QQuery 7     │ 102.05ms │                     105.43ms │     no change │
│ QQuery 8     │  27.28ms │                      26.27ms │     no change │
│ QQuery 9     │  63.39ms │                      60.53ms │     no change │
│ QQuery 10    │  57.52ms │                      57.67ms │     no change │
│ QQuery 11    │  12.95ms │                      13.03ms │     no change │
│ QQuery 12    │  38.07ms │                      40.44ms │  1.06x slower │
│ QQuery 13    │  29.63ms │                      30.05ms │     no change │
│ QQuery 14    │  10.18ms │                       9.96ms │     no change │
│ QQuery 15    │  24.72ms │                      24.69ms │     no change │
│ QQuery 16    │  22.45ms │                      23.54ms │     no change │
│ QQuery 17    │  98.96ms │                      94.86ms │     no change │
│ QQuery 18    │ 249.95ms │                     241.20ms │     no change │
│ QQuery 19    │  27.76ms │                      27.86ms │     no change │
│ QQuery 20    │  41.10ms │                      37.18ms │ +1.11x faster │
│ QQuery 21    │ 175.70ms │                     171.94ms │     no change │
│ QQuery 22    │  17.71ms │                      18.18ms │     no change │
└──────────────┴──────────┴──────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 1265.42ms │
│ Total Time (speed_up_optimize_projection)   │ 1248.16ms │
│ Average Time (HEAD)                         │   57.52ms │
│ Average Time (speed_up_optimize_projection) │   56.73ms │
│ Queries Faster                              │         1 │
│ Queries Slower                              │         1 │
│ Queries with No Change                      │        20 │
└─────────────────────────────────────────────┴───────────┘

@xudong963
Copy link
Member Author

🤖: Benchmark completed

Details

Quote reply

It doesn't seem to have any obvious influence.

@Dandandan
Copy link
Contributor

Maybe it possible to run some sql planner benchmarks to quantify the performance improvement for full queries?

@xudong963
Copy link
Member Author

sql planner benchmarks

Do we have some existing sql planner benchmarks?

@xudong963
Copy link
Member Author

xudong963 commented Apr 18, 2025

sql planner benchmarks

Do we have some existing sql planner benchmarks?

Also, there is a real production case:
image

The optimize_projection matches 10.9%, and the cost of merge_consecutive_projections in the green box(matched part) occupies about 5%

@Dandandan
Copy link
Contributor

sql planner benchmarks

Do we have some existing sql planner benchmarks?

Yes, there is a sql_planner bench.
https://github.com/apache/datafusion/blob/main/datafusion/core/benches/sql_planner.rs

@xudong963
Copy link
Member Author

sql planner benchmarks

Do we have some existing sql planner benchmarks?

Yes, there is a sql_planner bench. https://github.com/apache/datafusion/blob/main/datafusion/core/benches/sql_planner.rs

I don't find SQL in the file that has the following pattern

Projection ..
  Projection ..
    ...

@xudong963
Copy link
Member Author

Thank you for your review, let's go and continue to optimize.

@xudong963 xudong963 merged commit 185f5d9 into apache:main Apr 18, 2025
27 checks passed
@Dandandan
Copy link
Contributor

I don't find SQL in the file that has the following pattern

They sometimes might have after a optimization rule that adds one extra projection? Not sure if that happens in those queries though, but always good to test.

@xudong963
Copy link
Member Author

I don't find SQL in the file that has the following pattern

They sometimes might have after a optimization rule that adds one extra projection? Not sure if that happens in those queries though, but always good to test.

Oh, I'll do it

@xudong963
Copy link
Member Author

I don't find SQL in the file that has the following pattern

They sometimes might have after a optimization rule that adds one extra projection? Not sure if that happens in those queries though, but always good to test.

Oh, I'll do it

Something is broken, will retest after #15762 is fixed

xudong963 added a commit to massive-com/arrow-datafusion that referenced this pull request Apr 23, 2025
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
xudong963 added a commit to massive-com/arrow-datafusion that referenced this pull request May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants