Blog: Add blog post about DataFusion 50.0.0 release#115
Conversation
|
Amazing! Thank you @nuno-faria -- I will review this PR today or tomorrow |
Co-authored-by: Yongting You <2010youy01@gmail.com>
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
alamb
left a comment
There was a problem hiding this comment.
Thank you @nuno-faria -- I think this post looks great. Thank you so much for writing it
I am working on getting some performance numbers and will update the post when completed.
I pushed a commit to change the date to next Monday as well as add new committers and the post that @timsaucer published yesterday https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata/
I also have a few other suggestions I am working on that I will post shortly
alamb
left a comment
There was a problem hiding this comment.
Here are a few more suggestions.
The only other thing I noticed is that the performance section is somewhat duplicated by the new features section (the dynamic filters are mentioned twice, for example)
I am going to take a pass at trying to make that a bit better, but I don't think I can pull it off as github suggestions. I'll make a suggestion PR instead
|
@nuno-faria would it be ok if I pushed some edits directly to this branch to avoid back/forth and PRs? Then you could review the changes commit by commit |
alamb
left a comment
There was a problem hiding this comment.
Thanks again @nuno-faria and @adriangb and @2010YOUY01
I made some non trivial suggestions here:
Let me know what you think
Suggestions for DataFusion 50 blog post
Please, feel free to do so. Later today I will also take a look at the suggestions. |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
Suggestions applied. |
|
Great post! Somehow I get an error trying to comment in review mode, but for the section on the cache: Maybe this is too much hedging, but: this sounds like we get orders of magnitude gains always. But that probably depends on the query! |
yeah, I think specifically it really helps:
I'll try and add that detail to the post |
Maybe I can also make a more general "patch set" section that talks about how we have been stabilizing the releases recently by releasing patches as the community upgrades and finds issues 🤔 |
|
Also, is it ok if I put contributors names next to the features as we have done in past releases? I think that is a nice acknowledgment to the community as well as serves as additional motivation for future contributors |
Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Agreed, added. |
Sounds good to me. |
Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>
I added a small clarification when mentioning the speedup. |
|
I am going to take one final pass to incorporate the feedback here and get this post published! |
| [ticket](https://github.com/apache/datafusion/pull/16971)). This optimization | ||
| is production ready and enabled by default (more details in the | ||
| [Epic](https://github.com/apache/datafusion/issues/17000)). | ||
| Thanks to [Nuno Faria], [Jonathan Chen], [Shehab Amin], [Oleks V], [Tim Saucer], and [Blake Orth] for delivering this feature. |
There was a problem hiding this comment.
fyi @nuno-faria @jonathanc-n, @shehabgamin, @comphead, @timsaucer and @BlakeOrth as you are mentioned here
| More information can be found in the respective | ||
| [ticket](https://github.com/apache/datafusion/pull/16445) and the next step will be to | ||
| [extend the dynamic filters to other types of joins](https://github.com/apache/datafusion/issues/16973), such as `LEFT` and | ||
| `RIGHT` outer joins. Thanks to [Adrian Garcia Badaracco], [Qi Zhu], [xudong963], [Daniël Heres], and [Lía Adriana] |
There was a problem hiding this comment.
fyi @adriangb @zhuqi-lucas , @xudong963 @Dandandan and @LiaCastaneda as you are mentioned here
| of multi-level merge sorts (more details in the respective | ||
| [ticket](https://github.com/apache/datafusion/pull/15700)). It is now | ||
| possible to execute almost any sorting query that would have previously triggered *out-of-memory* | ||
| errors, by relying on disk spilling. Thanks to [Raz Luvaton], [Yongting You], and |
There was a problem hiding this comment.
fyi @rluvaton, @2010YOUY01 and @ding-young as you are mentioned here
|
|
||
| Although it is not part of the SQL standard (yet), it has been gaining | ||
| adoption in several SQL analytical systems such as DuckDB, Snowflake, and | ||
| BigQuery. Thanks to [Huaijin] and [Jonah Gao] for delivering this feature. |
There was a problem hiding this comment.
FYI @haohuaijin and @jonahgao as you are mentioned here
| FROM table | ||
| ``` | ||
|
|
||
| Thanks to [Geoffrey Claude] and [Jeffrey Vo] for delivering this feature. |
There was a problem hiding this comment.
FYI @geoffreyclaude and @Jefffrey as you are mentioned here
| behavior that varies based on runtime state; for example, time UDFs can use the | ||
| session-specified time zone instead of just UTC. | ||
|
|
||
| Thanks to [Bruce Ritchie], [Piotr Findeisen], [Oleks V], and [Andrew Lamb] for delivering this feature. |
alamb
left a comment
There was a problem hiding this comment.
Ok, I think this blog post is looking good so let's publish it. We can make a follow on PR with any edits that are needed
Thanks again everyone!
|
And the blog is live: https://datafusion.apache.org/blog/2025/09/29/datafusion-50.0.0/ |
50.0.0release datafusion#16931