Closed
Conversation
This commit stores built Arrow comparators for two arrays on each of the sort key cursors, resulting in a significant reduction in the cost associated with merging record batches using the `SortPreservingMerge` operator. Benchmarks improved as follows: ``` ⇒ critcmp master pr group master pr ----- ------ -- interleave_batches 1.83 623.8±12.41µs ? ?/sec 1.00 341.2±6.98µs ? ?/sec merge_batches_no_overlap_large 1.56 400.6±4.94µs ? ?/sec 1.00 256.3±6.57µs ? ?/sec merge_batches_no_overlap_small 1.63 425.1±24.88µs ? ?/sec 1.00 261.1±7.46µs ? ?/sec merge_batches_small_into_large 1.18 228.0±3.95µs ? ?/sec 1.00 193.6±2.86µs ? ?/sec merge_batches_some_overlap_large 1.68 505.4±10.27µs ? ?/sec 1.00 301.3±6.63µs ? ?/sec merge_batches_some_overlap_small 1.64 515.7±5.21µs ? ?/sec 1.00 314.6±12.66µs ? ?/sec ```
H0TB0X420
pushed a commit
to H0TB0X420/datafusion
that referenced
this pull request
Oct 7, 2025
* deps: update datafusion to 39.0.0, pyo3 to 0.21, and object_store to 0.10.1 `datafusion-common` also depends on `pyo3`, so they need to be upgraded together. * feat: remove GetIndexField datafusion replaced Expr::GetIndexField with a FieldAccessor trait. Ref apache#10568 Ref apache#10769 * feat: update ScalarFunction The field `func_name` was changed to `func` as part of removing `ScalarFunctionDefinition` upstream. Ref apache#10325 * feat: incorporate upstream array_slice fixes Fixes apache#670 * update ExectionPlan::children impl for DatasetExec Ref apache#10543 * update value_interval_daytime Ref apache/arrow-rs#5769 * update regexp_replace and regexp_match Fixes apache#677 * add gil-refs feature to pyo3 This silences pyo3's deprecation warnings for its new Bounds api. It's the 1st step of the migration, and should be removed before merge. Ref https://pyo3.rs/v0.21.0/migration#from-020-to-021 * fix signature for octet_length Ref apache#10726 * update signature for covar_samp AggregateUDF expressions now have a builder API design, which removes arguments like filter and order_by Ref apache#10545 Ref apache#10492 * convert covar_pop to expr_fn api Ref: https://github.com/apache/datafusion/pull/10418/files * convert median to expr_fn api Ref apache#10644 * convert variance sample to UDF Ref apache#10667 * convert first_value and last_value to UDFs Ref apache#10648 * checkpointing with a few todos to fix remaining compile errors * impl PyExpr::python_value for IntervalDayTime and IntervalMonthDayNano * convert sum aggregate function to UDF * remove unnecessary clone on double reference * apply cargo fmt * remove duplicate allow-dead-code annotation * update tpch examples for new pyarrow interval Fixes apache#665 * marked q11 tpch example as expected fail Ref apache#730 * add default stride of None back to array_slice
H0TB0X420
pushed a commit
to H0TB0X420/datafusion
that referenced
this pull request
Oct 7, 2025
* provides workaround for half-migrated UDAF `sum` Ref apache#730 * provide compatibility for sqlparser::ast::NullTreatment This is now exposed as part of the API to `first_value` and `last_value` functions. If there's a more elegant way to achieve this, please let me know.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pulls in the recent perf work to re-use build comparators