chore: upgrade datafusion to 50.0.0 and arrow to 56.1 #4751

ddupg · 2025-09-17T07:20:10Z

This PR is to update the following dependencies:

DataFusion to 50.0.0
Arrow to 56.1

I am working on supporting GEO type #4678 , which is blocked by the DataFusion upgrade. Recently, DataFusion released 50 apache/datafusion#16799 , so follow up on the upgrade

Closes #4753

LuQQiu · 2025-09-17T16:51:46Z

@ddupg Could you help fix the build issues?

ddupg · 2025-09-18T02:09:29Z

@ddupg Could you help fix the build issues?

@LuQQiu Sure, now blocked in datafusion-python is not published to pypi, but it may take a few weeks apache/datafusion#16799 (comment)

westonpace · 2025-09-18T22:15:13Z

Sure, now blocked in datafusion-python is not published to pypi

Hmm...we use the FFI for our boundary between datafusion-python and lance rust which in theory has a stable ABI. Do we know why we have a hard pin on the datafusion-python version? Do the tests pass if we loosen the pin?

codecov-commenter · 2025-09-19T04:19:37Z

Codecov Report

❌ Patch coverage is 48.00000% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.38%. Comparing base (e464c0a) to head (77f4aa7).

Files with missing lines	Patch %	Lines
rust/lance-datafusion/src/planner.rs	13.33%	13 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4751      +/-   ##
==========================================
+ Coverage   80.85%   81.38%   +0.52%     
==========================================
  Files         333      333              
  Lines      131944   130020    -1924     
  Branches   131944   130020    -1924     
==========================================
- Hits       106687   105811     -876     
+ Misses      21508    20653     -855     
+ Partials     3749     3556     -193

Flag	Coverage Δ
unittests	`81.38% <48.00%> (+0.52%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ddupg · 2025-09-19T04:48:33Z

rust/lance/src/io/exec/rowids.rs

        let mut added_byte_size = stats
            .num_rows
-            .map(|n| (n * 8).max(64))
+            .map(|n| n * 8)


Closes: apache/arrow-rs#7383

ddupg · 2025-09-19T10:23:25Z

Do we know why we have a hard pin on the datafusion-python version? Do the tests pass if we loosen the pin?

Related to #4577, maybe Json type has a requirement for datafusion version.

Do the tests pass if we loosen the pin?

The currently failing tests are not related to datafusion version, I'm trying to fix it.

ddupg · 2025-09-20T00:29:18Z

The failing test python/tests/test_scalar_index.py::test_null_handling is seems to be caused by a bug of DataFusion rust 50.0.0, and I have raised an issue apache/datafusion#17681 to the DataFusion

ddupg · 2025-09-22T06:41:20Z

The failing test python/tests/test_scalar_index.py::test_null_handling is seems to be caused by a bug of DataFusion rust 50.0.0, and I have raised an issue apache/datafusion#17681 to the DataFusion

This is not a bug with DataFusion, it seems to be because Lance using indexes and DataFusion have different logic on how to handle NULL. After apache/datafusion#17088, datafusion simplifies the handling of NULL expressions, which affects the results of the test python/tests/test_scalar_index.py::test_null_handling.

In rust/lance-index/src/scalar/expression.rs:create_filter_plan https://github.com/lancedb/lance/blob/5c60975b2c032314304ca1d38865d6eefde4d790/rust/lance-index/src/scalar/expression.rs#L1560, if the filter executed is IN (1, NULL), then the passed Expr is looks like this:

BinaryExpr(
    BinaryExpr {
        left: BinaryExpr(
            BinaryExpr {
                left: Column(
                    Column {
                        relation: None,
                        name: "x",
                    },
                ),
                op: Eq,
                right: Literal(
                    Int32(1),
                    None,
                ),
            },
        ),
        op: Or,
        right: BinaryExpr(
            BinaryExpr {
                left: Column(
                    Column {
                        relation: None,
                        name: "x",
                    },
                ),
                op: Eq,
                right: Cast(
                    Cast {
                        expr: Literal(
                            NULL,
                            None,
                        ),
                        data_type: Int32,
                    },
                ),
            },
        ),
    },
)

but after self.optimize_expr(filter)?, the Eq Expr about NULL is simplified, it looks like this:

BinaryExpr(
    BinaryExpr {
        left: BinaryExpr(
            BinaryExpr {
                left: Column(
                    Column {
                        relation: None,
                        name: "x",
                    },
                ),
                op: Eq,
                right: Literal(
                    Int32(1),
                    None,
                ),
            },
        ),
        op: Or,
        right: Literal(
            Boolean(NULL),
            None,
        ),
    },
)

and then Literal sub Expr cannot apply indexes, resulting in the entire Expr not being able to use indexes, which eventually causes the test to fail.

Can @westonpace please give me some help on how to fix this test?

timsaucer · 2025-09-24T13:18:18Z

This appears to be a duplicate of my PR: #4778 but since you're actively working on the issue I was just looking at with the null unit test failure, I can close my PR in favor of this one.

timsaucer · 2025-09-24T14:10:05Z

A little more from my own investigation comparing DF 49 to 50:

~~I suspect the issue may not be in the change to the DF optimizer. I am not sure why the prior test did work right now.~~

Correction: I am beginning to suspect that because the new plan has a Boolean(NULL) that the index read filters are no longer passing these through. It also appears that the output now is closer to what DataFusion expects to give by accident not intention. By that I mean it the optimizer were enhanced to give Int64(NULL) we would go back to the prior result, which is not expected from a DF perspective.

jackye1995 · 2025-09-25T03:45:36Z

Thank you @timsaucer for sharing this. I also chatted with @ddupg offline a bit, sorry I forgot to share the analysis here.

I think we did not handle the case of a Literal(Boolean(Null)) in visit_node.

In the past because the right hand side is a X = NULL, it is handled by visit_comparison, but now it directly returns None, indicating there is no index match which is where the wrong logic is.

timsaucer · 2025-09-25T11:33:13Z

Thank you @timsaucer for sharing this. I also chatted with @ddupg offline a bit, sorry I forgot to share the analysis here.

I think we did not handle the case of a Literal(Boolean(Null)) in visit_node.

In the past because the right hand side is a X = NULL, it is handled by visit_comparison, but now it directly returns None, indicating there is no index match which is where the wrong logic is.

I do want to point out that "fixing" the issue by handling the literal null boolean will still give different results than an equivalent SQL command. From a SQL perspective not returning the row with the null IS the correct behavior, even though I think it is counter intuitive.

ddupg · 2025-09-25T13:11:28Z

Thank @jackye1995 @timsaucer both for sharing and thinking, which helped me a lot

I do want to point out that "fixing" the issue by handling the literal null boolean will still give different results than an equivalent SQL command. From a SQL perspective not returning the row with the null IS the correct behavior, even though I think it is counter intuitive.

Agreed.

My personal opinion is that root cause stems from the different semantics of handling NULL when indexing is enabled or not. This is weird behavior, so I think we should make semantics consistent first. I proposed a draft #4812 to solve this problem. This will fix the currently failed test.

Secondly, the DataFusion upgrade exposed a situation where the index is not working as expected. DF 50 optimizes Expr of NULL handling, for example, the Expr x = 1 or x = NULL is optimized to x = 1 or NULL, where the sub Expr NULL cannot infer indexes, resulting in the entire query being unable to apply indexes. This may be something we need to optimize in the future.

westonpace · 2025-09-25T17:48:09Z

I agree that we should migrate to the SQL behavior and I agree with @ddupg's summary. Shall I make a PR to fix the second issue?

westonpace · 2025-09-25T17:49:09Z

Also, there is a third issue, reported here, which will unfortunately be quite a bit trickier to fix. I don't think that one needs to block the DF upgrade however.

westonpace · 2025-09-25T18:59:11Z

I ended up creating #4815 which complements #4812 although maybe #4812 is no longer necessary.

jackye1995 · 2025-09-26T05:45:56Z

looks like there are some additional errors, any clue?

ddupg · 2025-09-26T12:14:41Z

looks like there are some additional errors, any clue?

called `Result::unwrap()` on an `Err` value: IO { source: Custom { kind: Other, error: Generic { store: "LocalFileSystem", source: InvalidUrl { url: Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/C:%5CUsers%5CRUNNER~1%5CAppData%5CLocal%5CTemp%5C.tmpIGGVXH/some_file.lance", query: None, fragment: None } } } }, location: Location { file: "rust\\lance-file\\src\\v2\\writer.rs", line: 191, column: 9 } }

This unusual local path "/C:%5CUsers%5CRUNNER~1%5CAppData%5CLocal%5CTemp%5C.tmpIGGVXH/some_file.lance" appears to have a parsing issue in Windows.

jackye1995 · 2025-09-28T03:02:01Z

hmm this is quite strange, I don't see any change that could affect the windows path handling... any clue? @westonpace @wjones127 @timsaucer

timsaucer · 2025-09-28T11:42:01Z

hmm this is quite strange, I don't see any change that could affect the windows path handling... any clue? @westonpace @wjones127 @timsaucer

I think I've seen this happen in CI in this repo before, but having trouble finding an exact example from my PRs. In the past rerunning CI has resolved it.

timsaucer · 2025-09-29T12:03:04Z

These new failures I'm also seeing on PRs that are completely unrelated. example: https://github.com/lancedb/lance/actions/runs/18080818538/job/51443808660?pr=4834

westonpace · 2025-09-29T12:12:14Z

These new failures I'm also seeing on PRs that are completely unrelated. example: https://github.com/lancedb/lance/actions/runs/18080818538/job/51443808660?pr=4834

Agreed, the test_zonemap_zone_size test was fixed on main

westonpace · 2025-09-29T12:13:22Z

I'll see if I can make sense of the Windows failures or at least see what changed

ddupg · 2025-09-29T23:08:14Z

I rebased the main branch code and reran the tests multiple times. This error consistently occurs in Windows.

IO { source: Custom { kind: Other, error: Generic { store: "LocalFileSystem", source: InvalidUrl { url: Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/C:%5CUsers%5CRUNNER~1%5CAppData%5CLocal%5CTemp%5C.tmpllPc01/some_file.lance", query: None, fragment: None } } } }, location: Location { file: "rust\\lance-file\\src\\v2\\writer.rs", line: 191, column: 9 } }

westonpace · 2025-09-29T23:22:52Z

Tracked down the root cause at least. Still working on fix. The upgrade to DF caused a transitive dependency (url) to upgrade from 2.5.4 to 2.5.7 and this is the root of the failure. My guess is we are hitting this change: servo/rust-url#1060

Anyways, I'm sure there is just some magic incantation of path manipulation that has to happen. My trial and error continues 🫠

timsaucer · 2025-09-30T15:39:58Z

Cargo.lock

+version = "2.5.7"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "32f8b686cadd1473f4bd0117a5d28d36b1ade384ea9b5069a1c40aefed7fda60"
+checksum = "08bc136a29a3d1758e07a9cca267be308aeebf5cfd5a10f3f67ab2097683ef5b"
 dependencies = [
 "form_urlencoded",
 "idna",
 "percent-encoding",
+ "serde",


Since there are issues filed upstream in both object-store and url crates, can we simply revert these lines of the lock file? Or we can pin url in the cargo.toml to 2.5.4

Datafusion asks for 2.5.7 specifically so we can't revert sadly 😢

westonpace · 2025-09-30T17:18:52Z

I've got a fix (more of a workaround) in progress that avoids these tricky paths by reworking how we do temporary directories: #4860

westonpace · 2025-09-17T12:30:02Z

rust/lance-datafusion/src/planner.rs

                enable_options_value_normalization: false,
                collect_spans: false,
                map_string_types_to_utf8view: false,
+                default_null_ordering: NullOrdering::NullsMax,


Do we know if this was the same as before?

I saw the PR apache/datafusion#16963, nulls_max is the default behavior.

westonpace · 2025-09-30T17:18:32Z

Cargo.lock

+version = "2.5.7"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "32f8b686cadd1473f4bd0117a5d28d36b1ade384ea9b5069a1c40aefed7fda60"
+checksum = "08bc136a29a3d1758e07a9cca267be308aeebf5cfd5a10f3f67ab2097683ef5b"
 dependencies = [
 "form_urlencoded",
 "idna",
 "percent-encoding",
+ "serde",


Datafusion asks for 2.5.7 specifically so we can't revert sadly 😢

westonpace · 2025-10-02T01:13:30Z

The Windows issue is fixed in #4860 if anyone is up for a review (I'm shamelessly 🎣 for reviewers)

westonpace · 2025-10-02T04:31:32Z

Ok, merged. If you rebase then hopefully you are good to go.

Change-Id: Iedc2a3133d2caf8f9630edde5723618a244cb958

Change-Id: Ic0a4e492eb128316df95d04e3780547100b93de1

Change-Id: Iddf384fafe1a59157b780eea0ffa8e037599d372

Change-Id: I833aa60aee64fd06907c1de10a973d3db7c93ec3

ddupg · 2025-10-03T06:17:04Z

Ok, merged. If you rebase then hopefully you are good to go.

Thank @westonpace very much for the fix, all the tests have passed after I rebased

ddupg · 2025-10-03T06:20:51Z

Could @jackye1995 @westonpace @timsaucer please help review this?

jackye1995

thank you for pushing this through!

westonpace

🎉 🎉 🎉

github-actions bot added python chore labels Sep 17, 2025

ddupg force-pushed the chore/upgrade-df-50 branch 3 times, most recently from bbfcabc to a29bc29 Compare September 17, 2025 11:36

github-actions bot added the java label Sep 17, 2025

LuQQiu mentioned this pull request Sep 17, 2025

Bump datafusion (-> 50.0) and arrow (-> 56.0) #4753

Closed

jackye1995 linked an issue Sep 17, 2025 that may be closed by this pull request

Bump datafusion (-> 50.0) and arrow (-> 56.0) #4753

Closed

ddupg commented Sep 19, 2025

View reviewed changes

timsaucer mentioned this pull request Sep 25, 2025

chore: update DataFusion to 50, Arrow to 56 #4778

Closed

ddupg force-pushed the chore/upgrade-df-50 branch from c6493af to a28bf28 Compare September 26, 2025 00:37

ddupg force-pushed the chore/upgrade-df-50 branch 2 times, most recently from 1597a63 to ccac35f Compare September 29, 2025 02:42

ddupg force-pushed the chore/upgrade-df-50 branch 3 times, most recently from 9ed1a11 to 0cd5af9 Compare September 29, 2025 22:14

timsaucer reviewed Sep 30, 2025

View reviewed changes

westonpace reviewed Sep 30, 2025

View reviewed changes

ddupg added 4 commits October 3, 2025 13:09

chore: upgrade datafusion to 50.0.0 and arrow to 56.1

0a48b30

Change-Id: Iedc2a3133d2caf8f9630edde5723618a244cb958

fix io::exec::rowids::test::test_stats

dd17e28

Change-Id: Ic0a4e492eb128316df95d04e3780547100b93de1

unpin datafusion in pyproject

0217b5a

Change-Id: Iddf384fafe1a59157b780eea0ffa8e037599d372

update cargo.lock

77f4aa7

Change-Id: I833aa60aee64fd06907c1de10a973d3db7c93ec3

ddupg force-pushed the chore/upgrade-df-50 branch from 0cd5af9 to 77f4aa7 Compare October 3, 2025 05:11

jackye1995 approved these changes Oct 3, 2025

View reviewed changes

westonpace approved these changes Oct 3, 2025

View reviewed changes

westonpace merged commit 8a0f525 into lance-format:main Oct 3, 2025
29 checks passed

ddupg deleted the chore/upgrade-df-50 branch November 7, 2025 02:45

chore: upgrade datafusion to 50.0.0 and arrow to 56.1 #4751

chore: upgrade datafusion to 50.0.0 and arrow to 56.1 #4751

Uh oh!

Conversation

ddupg commented Sep 17, 2025 • edited by jackye1995 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuQQiu commented Sep 17, 2025

Uh oh!

ddupg commented Sep 18, 2025

Uh oh!

westonpace commented Sep 18, 2025

Uh oh!

codecov-commenter commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ddupg Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

ddupg commented Sep 19, 2025

Uh oh!

ddupg commented Sep 20, 2025

Uh oh!

ddupg commented Sep 22, 2025

Uh oh!

timsaucer commented Sep 24, 2025

Uh oh!

timsaucer commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jackye1995 commented Sep 25, 2025

Uh oh!

timsaucer commented Sep 25, 2025

Uh oh!

ddupg commented Sep 25, 2025

Uh oh!

westonpace commented Sep 25, 2025

Uh oh!

westonpace commented Sep 25, 2025

Uh oh!

westonpace commented Sep 25, 2025

Uh oh!

jackye1995 commented Sep 26, 2025

Uh oh!

ddupg commented Sep 26, 2025

Uh oh!

jackye1995 commented Sep 28, 2025

Uh oh!

timsaucer commented Sep 28, 2025

Uh oh!

timsaucer commented Sep 29, 2025

Uh oh!

westonpace commented Sep 29, 2025

Uh oh!

westonpace commented Sep 29, 2025

Uh oh!

ddupg commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

westonpace commented Sep 29, 2025

Uh oh!

timsaucer Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

westonpace Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

westonpace commented Sep 30, 2025

Uh oh!

westonpace Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

ddupg Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

westonpace Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

westonpace commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

ddupg commented Sep 17, 2025 •

edited by jackye1995

Loading

codecov-commenter commented Sep 19, 2025 •

edited

Loading

timsaucer commented Sep 24, 2025 •

edited

Loading

ddupg commented Sep 29, 2025 •

edited

Loading

westonpace commented Oct 2, 2025 •

edited

Loading