Skip to content

fix: column indices in FFI partition evaluator #16480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 27, 2025

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented Jun 20, 2025

Which issue does this PR close?

Rationale for this change

There is a bug in how we compute the indices to create a schema for. It misses the last index and it can erroneously create an index 0 when none exists.

What changes are included in this PR?

Properly handle the None case when finding max index.

Are these changes tested?

Unit test will be added before marking this PR ready. Tested in datafusion-python.

Are there any user-facing changes?

No

@github-actions github-actions bot added the ffi Changes to the ffi crate label Jun 20, 2025
@timsaucer timsaucer marked this pull request as ready for review June 26, 2025 12:13
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @timsaucer -- this makes sense to me

DataType::Null,
true,
),
let max_column = required_columns.keys().max();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is time to simply pass in the real schema to PartitionEvaluatorArgs 🤔

}

#[tokio::test]
async fn test_lag_udwf() -> datafusion::common::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified these tests fail without the code change in this PR

    thread 'udwf::tests::test_lag_udwf' panicked at /Users/andrewlamb/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-schema-55.1.0/src/schema.rs:382:10:
    index out of bounds: the len is 0 but the index is 0
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    thread 'udwf::tests::test_lag_udwf' panicked at library/core/src/panicking.rs:218:5:
    panic in a function that cannot unwind
    stack backtrace:
       0:        0x108eec990 - std::backtrace_rs::backtrace::libunwind::trace::hd09570c029a6744a
                                   at /rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
       1:        0x108eec990 - std::backtrace_rs::backtrace::trace_unsynchronized::h8d2fa64833f91cb3

@alamb alamb merged commit cce3f3f into apache:main Jun 27, 2025
27 checks passed
@timsaucer timsaucer deleted the bugfix/ffi-column-indices branch June 30, 2025 11:28
alamb pushed a commit to alamb/datafusion that referenced this pull request Jul 2, 2025
* Column indices were not computed correctly, causing a panic

* Add unit tests
alamb added a commit that referenced this pull request Jul 4, 2025
* Column indices were not computed correctly, causing a panic

* Add unit tests

Co-authored-by: Tim Saucer <timsaucer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ffi Changes to the ffi crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic in FFI UDWF when using wrapping lead function
2 participants