Skip to content

Conversation

@e-dard
Copy link
Contributor

@e-dard e-dard commented Jul 15, 2021

This pulls in the recent perf work to re-use build comparators

e-dard added 2 commits July 13, 2021 18:04
This commit stores built Arrow comparators for two arrays on each of the sort key cursors, resulting in a significant reduction in the cost associated with merging record batches using the `SortPreservingMerge` operator.

Benchmarks improved as follows:

```
⇒  critcmp master pr
group                               master                                 pr
-----                               ------                                 --
interleave_batches                  1.83   623.8±12.41µs        ? ?/sec    1.00    341.2±6.98µs        ? ?/sec
merge_batches_no_overlap_large      1.56    400.6±4.94µs        ? ?/sec    1.00    256.3±6.57µs        ? ?/sec
merge_batches_no_overlap_small      1.63   425.1±24.88µs        ? ?/sec    1.00    261.1±7.46µs        ? ?/sec
merge_batches_small_into_large      1.18    228.0±3.95µs        ? ?/sec    1.00    193.6±2.86µs        ? ?/sec
merge_batches_some_overlap_large    1.68   505.4±10.27µs        ? ?/sec    1.00    301.3±6.63µs        ? ?/sec
merge_batches_some_overlap_small    1.64    515.7±5.21µs        ? ?/sec    1.00   314.6±12.66µs        ? ?/sec
```
@e-dard e-dard closed this Jul 15, 2021
H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025
* deps: update datafusion to 39.0.0, pyo3 to 0.21, and object_store to 0.10.1

`datafusion-common` also depends on `pyo3`, so they need to be upgraded together.

* feat: remove GetIndexField

datafusion replaced Expr::GetIndexField with a FieldAccessor trait.

Ref apache#10568
Ref apache#10769

* feat: update ScalarFunction

The field `func_name` was changed to `func` as part of removing `ScalarFunctionDefinition` upstream.

Ref apache#10325

* feat: incorporate upstream array_slice fixes

Fixes apache#670

* update ExectionPlan::children impl for DatasetExec

Ref apache#10543

* update value_interval_daytime

Ref apache/arrow-rs#5769

* update regexp_replace and regexp_match

Fixes apache#677

* add gil-refs feature to pyo3

This silences pyo3's deprecation warnings for its new Bounds api.

It's the 1st step of the migration, and should be removed before merge.

Ref https://pyo3.rs/v0.21.0/migration#from-020-to-021

* fix signature for octet_length

Ref apache#10726

* update signature for covar_samp

AggregateUDF expressions now have a builder API design, which removes arguments like filter and order_by

Ref apache#10545
Ref apache#10492

* convert covar_pop to expr_fn api

Ref: https://github.com/apache/datafusion/pull/10418/files

* convert median to expr_fn api

Ref apache#10644

* convert variance sample to UDF

Ref apache#10667

* convert first_value and last_value to UDFs

Ref apache#10648

* checkpointing with a few todos to fix remaining compile errors

* impl PyExpr::python_value for IntervalDayTime and IntervalMonthDayNano

* convert sum aggregate function to UDF

* remove unnecessary clone on double reference

* apply cargo fmt

* remove duplicate allow-dead-code annotation

* update tpch examples for new pyarrow interval

Fixes apache#665

* marked q11 tpch example as expected fail

Ref apache#730

* add default stride of None back to array_slice
H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025
* provides workaround for half-migrated UDAF `sum`

Ref apache#730

* provide compatibility for sqlparser::ast::NullTreatment

This is now exposed as part of the API to `first_value` and `last_value` functions.

If there's a more elegant way to achieve this, please let me know.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant