Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Sep 2, 2025

Which issue does this PR close?

Closes #2343

Rationale for this change

  • Need DataFusion 50 to unstick Parquet module encryption support, and other fixes.

What changes are included in this PR?

  • Bump DataFusion to 50.0.0
  • Bump Arrow to 56.1.0
  • cargo update

How are these changes tested?

Existing tests.

@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.51%. Comparing base (f09f8af) to head (bcb1208).
⚠️ Report is 511 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2286      +/-   ##
============================================
+ Coverage     56.12%   57.51%   +1.39%     
- Complexity      976     1295     +319     
============================================
  Files           119      147      +28     
  Lines         11743    13469    +1726     
  Branches       2251     2352     +101     
============================================
+ Hits           6591     7747    +1156     
- Misses         4012     4457     +445     
- Partials       1140     1265     +125     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbutrovich mbutrovich changed the title chore: test DataFusion 50 chore: prepare for DataFusion 50 Sep 3, 2025
@wForget wForget mentioned this pull request Sep 8, 2025
@mbutrovich mbutrovich self-assigned this Sep 11, 2025
@mbutrovich mbutrovich added this to the 0.11.0 milestone Sep 11, 2025
@andygrove
Copy link
Member

@mbutrovich DataFusion 50 has now been released

@mbutrovich
Copy link
Contributor Author

@mbutrovich DataFusion 50 has now been released

Yep I'm testing this morning and will update the PR.

@mbutrovich mbutrovich marked this pull request as ready for review September 16, 2025 14:27
@mbutrovich mbutrovich marked this pull request as draft September 16, 2025 14:28
@mbutrovich mbutrovich changed the title chore: prepare for DataFusion 50 chore: upgrade to DataFusion 50 Sep 16, 2025
@mbutrovich mbutrovich changed the title chore: upgrade to DataFusion 50 chore: upgrade to DataFusion 50.0.0 Sep 16, 2025
@mbutrovich
Copy link
Contributor Author

Note that there's discussion of DF 50.0.1, but I want to unblock a few features and fixes that are waiting on 50, so let's get 50.0.0 in first.

apache/datafusion#17594

comet_hour,
args,
field_ref,
Arc::new(ConfigOptions::default()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of possibly instantiating multiple default ConfigOptions, in the future we stash one somewhere. This would have the benefits of:

  1. A custom config would propagate throughout
  2. Reduced memory overhead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of possibly instantiating multiple default ConfigOptions, in the future we stash one somewhere. This would have the benefits of:

  1. A custom config would propagate throughout
  2. Reduced memory overhead

That would be in the ExecutionContext perhaps?

fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.name.hash(state);
self.signature.hash(state);
(self.stats_type as u8).hash(state);
Copy link
Contributor Author

@mbutrovich mbutrovich Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatsType does not implement Hash, so here we do a cast. std::mem::discriminant is the solution if the enum gets any more complex.


impl DynEq for RLike {
fn dyn_eq(&self, other: &dyn Any) -> bool {
if let Some(other) = other.downcast_ref::<Self>() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old DynEq implementation did not look at the child. Is that expected?

}

impl Hash for RLike {
fn hash<H: Hasher>(&self, state: &mut H) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old hash implementation did not hash the child. Is that expected?


impl Hash for RLike {
fn hash<H: Hasher>(&self, state: &mut H) {
self.child.hash(state);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new implementation hashes the child.

@mbutrovich mbutrovich marked this pull request as ready for review September 16, 2025 17:21
@mbutrovich mbutrovich changed the title chore: upgrade to DataFusion 50.0.0 chore: upgrade to DataFusion 50.0.0 and Arrow 56.1.0 Sep 16, 2025
@mbutrovich mbutrovich changed the title chore: upgrade to DataFusion 50.0.0 and Arrow 56.1.0 chore: upgrade to DataFusion 50.0.0 and Arrow 56.1.0, among other dependencies Sep 16, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
@comphead
Copy link
Contributor

For scalar functions partialEqs the DF relies on builtin mechanism

/// Implements [`ScalarUDFImpl`] for functions that have a single signature and
/// return type.
#[derive(PartialEq, Eq, Hash)]
pub struct SimpleScalarUDF {
    name: String,
    signature: Signature,
    return_type: DataType,
    fun: PtrEq<ScalarFunctionImplementation>,
}

@mbutrovich mbutrovich changed the title chore: upgrade to DataFusion 50.0.0 and Arrow 56.1.0, among other dependencies chore: upgrade to DataFusion 50.0.0 and Arrow 56, among other dependencies Sep 16, 2025
@mbutrovich mbutrovich changed the title chore: upgrade to DataFusion 50.0.0 and Arrow 56, among other dependencies chore: upgrade to DataFusion 50.0.0 and Arrow 56.0.0, among other dependencies Sep 16, 2025
@mbutrovich mbutrovich changed the title chore: upgrade to DataFusion 50.0.0 and Arrow 56.0.0, among other dependencies chore: upgrade to DataFusion 50.0.0, Arrow 56.1.0, Parquet 56.0.0 among others Sep 16, 2025
@mbutrovich mbutrovich merged commit 24f5209 into apache:main Sep 17, 2025
94 checks passed
@mbutrovich mbutrovich deleted the df50 branch September 22, 2025 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to DataFusion 50.0.0

5 participants