Blog: Add blog post about DataFusion 50.0.0 release by nuno-faria · Pull Request #115 · apache/datafusion-site

nuno-faria · 2025-09-23T08:07:10Z

closes Blog post for the DataFusion 50.0.0 release datafusion#16931

content/blog/2025-09-29-datafusion-50.0.0.md

content/blog/2025-09-24-datafusion-50.0.0.md

content/blog/2025-09-29-datafusion-50.0.0.md

content/blog/2025-09-24-datafusion-50.0.0.md

alamb · 2025-09-23T17:29:32Z

Amazing! Thank you @nuno-faria -- I will review this PR today or tomorrow

Co-authored-by: Yongting You <2010youy01@gmail.com>

content/blog/2025-09-29-datafusion-50.0.0.md

content/blog/2025-09-24-datafusion-50.0.0.md

content/blog/2025-09-29-datafusion-50.0.0.md

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>

alamb

Thank you @nuno-faria -- I think this post looks great. Thank you so much for writing it

I am working on getting some performance numbers and will update the post when completed.

I pushed a commit to change the date to next Monday as well as add new committers and the post that @timsaucer published yesterday https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata/

I also have a few other suggestions I am working on that I will post shortly

content/blog/2025-09-24-datafusion-50.0.0.md

content/blog/2025-09-29-datafusion-50.0.0.md

alamb

Here are a few more suggestions.

The only other thing I noticed is that the performance section is somewhat duplicated by the new features section (the dynamic filters are mentioned twice, for example)

I am going to take a pass at trying to make that a bit better, but I don't think I can pull it off as github suggestions. I'll make a suggestion PR instead

content/blog/2025-09-29-datafusion-50.0.0.md

alamb · 2025-09-25T13:00:40Z

@nuno-faria would it be ok if I pushed some edits directly to this branch to avoid back/forth and PRs? Then you could review the changes commit by commit

alamb

Thanks again @nuno-faria and @adriangb and @2010YOUY01

I made some non trivial suggestions here:

nuno-faria#1

Let me know what you think

content/blog/2025-09-29-datafusion-50.0.0.md

Suggestions for DataFusion 50 blog post

nuno-faria · 2025-09-25T14:40:41Z

@nuno-faria would it be ok if I pushed some edits directly to this branch to avoid back/forth and PRs? Then you could review the changes commit by commit

Please, feel free to do so. Later today I will also take a look at the suggestions.

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

nuno-faria · 2025-09-25T20:03:20Z

Suggestions applied.

vegarsti · 2025-09-26T16:11:39Z

Great post! Somehow I get an error trying to comment in review mode, but for the section on the cache:

Maybe this is too much hedging, but: this sounds like we get orders of magnitude gains always. But that probably depends on the query!

alamb · 2025-09-26T16:26:45Z

Maybe this is too much hedging, but: this sounds like we get orders of magnitude gains always. But that probably depends on the query!

yeah, I think specifically it really helps:

When the files are entirely remote (on object_store)
The queries are relatively low latency (10s of ms) as parsing the footer can be substantial amount of the overall query processing time

I'll try and add that detail to the post

alamb · 2025-09-26T16:27:50Z

Thanks @nuno-faria I think we need to include a Known issues section and point users to upcoming hot fixes release and whats in there.

Just point to apache/datafusion#17594

Maybe I can also make a more general "patch set" section that talks about how we have been stabilizing the releases recently by releasing patches as the community upgrades and finds issues 🤔

alamb · 2025-09-26T16:39:12Z

Also, is it ok if I put contributors names next to the features as we have done in past releases? I think that is a nice acknowledgment to the community as well as serves as additional motivation for future contributors

Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

nuno-faria · 2025-09-26T18:49:00Z

Thanks @nuno-faria I think we need to include a Known issues section and point users to upcoming hot fixes release and whats in there.

Just point to apache/datafusion#17594

Agreed, added.

nuno-faria · 2025-09-26T18:50:03Z

Also, is it ok if I put contributors names next to the features as we have done in past releases? I think that is a nice acknowledgment to the community as well as serves as additional motivation for future contributors

Sounds good to me.

Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>

nuno-faria · 2025-09-26T18:57:29Z

Maybe this is too much hedging, but: this sounds like we get orders of magnitude gains always. But that probably depends on the query!

yeah, I think specifically it really helps:
1. When the files are entirely remote (on object_store)

2. The queries are relatively low latency (10s of ms) as parsing the footer can be substantial amount of the overall query processing time
I'll try and add that detail to the post

I added a small clarification when mentioning the speedup.

alamb · 2025-09-29T11:13:26Z

I am going to take one final pass to incorporate the feedback here and get this post published!

alamb · 2025-09-29T13:38:45Z

content/blog/2025-09-29-datafusion-50.0.0.md

+[ticket](https://github.com/apache/datafusion/pull/16971)). This optimization
+is production ready and enabled by default (more details in the
+[Epic](https://github.com/apache/datafusion/issues/17000)).
+Thanks to [Nuno Faria], [Jonathan Chen], [Shehab Amin], [Oleks V], [Tim Saucer], and [Blake Orth] for delivering this feature.


fyi @nuno-faria @jonathanc-n, @shehabgamin, @comphead, @timsaucer and @BlakeOrth as you are mentioned here

alamb · 2025-09-29T13:39:18Z

content/blog/2025-09-29-datafusion-50.0.0.md

+More information can be found in the respective
+[ticket](https://github.com/apache/datafusion/pull/16445) and the next step will be to
+[extend the dynamic filters to other types of joins](https://github.com/apache/datafusion/issues/16973), such as `LEFT` and
+`RIGHT` outer joins. Thanks to [Adrian Garcia Badaracco], [Qi Zhu], [xudong963], [Daniël Heres], and [Lía Adriana]


fyi @adriangb @zhuqi-lucas , @xudong963 @Dandandan and @LiaCastaneda as you are mentioned here

alamb · 2025-09-29T13:39:47Z

content/blog/2025-09-29-datafusion-50.0.0.md

+of multi-level merge sorts (more details in the respective
+[ticket](https://github.com/apache/datafusion/pull/15700)). It is now
+possible to execute almost any sorting query that would have previously triggered *out-of-memory*
+errors, by relying on disk spilling. Thanks to [Raz Luvaton], [Yongting You], and


fyi @rluvaton, @2010YOUY01 and @ding-young as you are mentioned here

alamb · 2025-09-29T13:40:12Z

content/blog/2025-09-29-datafusion-50.0.0.md

+
+Although it is not part of the SQL standard (yet), it has been gaining
+adoption in several SQL analytical systems such as DuckDB, Snowflake, and
+BigQuery. Thanks to [Huaijin] and [Jonah Gao] for delivering this feature.


FYI @haohuaijin and @jonahgao as you are mentioned here

alamb · 2025-09-29T13:40:30Z

content/blog/2025-09-29-datafusion-50.0.0.md

+FROM table
+```
+
+Thanks to [Geoffrey Claude] and [Jeffrey Vo] for delivering this feature.


FYI @geoffreyclaude and @Jefffrey as you are mentioned here

alamb · 2025-09-29T13:40:42Z

content/blog/2025-09-29-datafusion-50.0.0.md

+behavior that varies based on runtime state; for example, time UDFs can use the
+session-specified time zone instead of just UTC.
+
+Thanks to [Bruce Ritchie], [Piotr Findeisen], [Oleks V], and [Andrew Lamb] for delivering this feature.


FYI @comphead @findepi @comphead as you are mentioned here

alamb

Ok, I think this blog post is looking good so let's publish it. We can make a follow on PR with any edits that are needed

Thanks again everyone!

alamb · 2025-09-29T13:42:36Z

And the blog is live: https://datafusion.apache.org/blog/2025/09/29/datafusion-50.0.0/

Blog: Add blog post about DataFusion 50.0.0 release

516b520

nuno-faria commented Sep 23, 2025

View reviewed changes

2010YOUY01 reviewed Sep 23, 2025

View reviewed changes

content/blog/2025-09-24-datafusion-50.0.0.md Outdated Show resolved Hide resolved

content/blog/2025-09-24-datafusion-50.0.0.md Outdated Show resolved Hide resolved

nuno-faria and others added 2 commits September 23, 2025 19:34

Update content/blog/2025-09-24-datafusion-50.0.0.md

bb7393c

Co-authored-by: Yongting You <2010youy01@gmail.com>

Add ref to future work of dynamic filter pushdown

fc2eb2a

adriangb reviewed Sep 23, 2025

View reviewed changes

content/blog/2025-09-29-datafusion-50.0.0.md Outdated Show resolved Hide resolved

adriangb reviewed Sep 23, 2025

View reviewed changes

content/blog/2025-09-24-datafusion-50.0.0.md Outdated Show resolved Hide resolved

adriangb reviewed Sep 23, 2025

View reviewed changes

content/blog/2025-09-29-datafusion-50.0.0.md Outdated Show resolved Hide resolved

nuno-faria and others added 4 commits September 24, 2025 08:16

Update content/blog/2025-09-24-datafusion-50.0.0.md

0ce5b67

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>

Add clarification about dynamic filters

5a2e15b

Adjust date

ff3bed0

Add new committers and additional blog

0c96542

alamb approved these changes Sep 25, 2025

View reviewed changes

alamb reviewed Sep 25, 2025

View reviewed changes

content/blog/2025-09-29-datafusion-50.0.0.md Outdated Show resolved Hide resolved

content/blog/2025-09-29-datafusion-50.0.0.md Outdated Show resolved Hide resolved

content/blog/2025-09-29-datafusion-50.0.0.md Outdated Show resolved Hide resolved

alamb added 4 commits September 25, 2025 09:46

Move dynamic predicate content into section

abf5ce4

Improve spilling sorts section

39633d0

Update filter pushdown section

06bda1d

Edit parquet metadata cache section

c8013e2

alamb mentioned this pull request Sep 25, 2025

Suggestions for DataFusion 50 blog post nuno-faria/datafusion-site#1

Merged

alamb reviewed Sep 25, 2025

View reviewed changes

content/blog/2025-09-29-datafusion-50.0.0.md Outdated Show resolved Hide resolved

Merge pull request #1 from alamb/alamb/df50_suggestions

6dcb94f

Suggestions for DataFusion 50 blog post

alamb and others added 5 commits September 25, 2025 15:17

Update performance numbers

08fb67d

Update content/blog/2025-09-29-datafusion-50.0.0.md

a0f8cc3

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Update content/blog/2025-09-29-datafusion-50.0.0.md

fe95e61

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Update content/blog/2025-09-29-datafusion-50.0.0.md

e53bb50

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Apply suggestions, Minor fixes

312e260

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

nuno-faria and others added 2 commits September 26, 2025 19:38

Apply suggestions

3714364

Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>

Add 'Known Issues' section

7f8369e

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

Clarify cache improvements

9cb7f8c

Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>

alamb added 8 commits September 29, 2025 07:17

reword known issues section

aa4b697

Tighten up intro and figure caption

c38da61

Add thanks for contributors

9665a38

Add thanks for contributors for metadata cache

39fd971

Thanks for filter, qualify, and configs

9f624bd

more thanks

bc658a0

fixups

62fed22

final touchups

fe5b498

alamb reviewed Sep 29, 2025

View reviewed changes

alamb merged commit 286c09f into apache:main Sep 29, 2025
1 check passed

nuno-faria deleted the datafusion_50 branch September 29, 2025 18:20

alamb mentioned this pull request Nov 8, 2025

Blog post for the DataFusion 51.0.0 release apache/datafusion#18548

Closed

Conversation

nuno-faria commented Sep 23, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb commented Sep 25, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nuno-faria commented Sep 25, 2025

Uh oh!

nuno-faria commented Sep 25, 2025

Uh oh!

vegarsti commented Sep 26, 2025

Uh oh!

alamb commented Sep 26, 2025

Uh oh!

alamb commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Sep 26, 2025

Uh oh!

nuno-faria commented Sep 26, 2025

Uh oh!

nuno-faria commented Sep 26, 2025

Uh oh!

nuno-faria commented Sep 26, 2025

Uh oh!

alamb commented Sep 29, 2025

Uh oh!

alamb Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

nuno-faria commented Sep 23, 2025 •

edited by alamb

Loading

alamb commented Sep 26, 2025 •

edited

Loading