-
Notifications
You must be signed in to change notification settings - Fork 18
Blog: Add blog post about DataFusion 50.0.0 release #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Amazing! Thank you @nuno-faria -- I will review this PR today or tomorrow |
Co-authored-by: Yongting You <2010youy01@gmail.com>
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @nuno-faria -- I think this post looks great. Thank you so much for writing it
I am working on getting some performance numbers and will update the post when completed.
I pushed a commit to change the date to next Monday as well as add new committers and the post that @timsaucer published yesterday https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata/
I also have a few other suggestions I am working on that I will post shortly
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few more suggestions.
The only other thing I noticed is that the performance section is somewhat duplicated by the new features section (the dynamic filters are mentioned twice, for example)
I am going to take a pass at trying to make that a bit better, but I don't think I can pull it off as github suggestions. I'll make a suggestion PR instead
|
@nuno-faria would it be ok if I pushed some edits directly to this branch to avoid back/forth and PRs? Then you could review the changes commit by commit |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @nuno-faria and @adriangb and @2010YOUY01
I made some non trivial suggestions here:
Let me know what you think
Suggestions for DataFusion 50 blog post
Please, feel free to do so. Later today I will also take a look at the suggestions. |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
Suggestions applied. |
|
Great post! Somehow I get an error trying to comment in review mode, but for the section on the cache: Maybe this is too much hedging, but: this sounds like we get orders of magnitude gains always. But that probably depends on the query! |
yeah, I think specifically it really helps:
I'll try and add that detail to the post |
Maybe I can also make a more general "patch set" section that talks about how we have been stabilizing the releases recently by releasing patches as the community upgrades and finds issues 🤔 |
|
Also, is it ok if I put contributors names next to the features as we have done in past releases? I think that is a nice acknowledgment to the community as well as serves as additional motivation for future contributors |
Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Agreed, added. |
Sounds good to me. |
Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>
I added a small clarification when mentioning the speedup. |
|
I am going to take one final pass to incorporate the feedback here and get this post published! |
| [ticket](https://github.com/apache/datafusion/pull/16971)). This optimization | ||
| is production ready and enabled by default (more details in the | ||
| [Epic](https://github.com/apache/datafusion/issues/17000)). | ||
| Thanks to [Nuno Faria], [Jonathan Chen], [Shehab Amin], [Oleks V], [Tim Saucer], and [Blake Orth] for delivering this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @nuno-faria @jonathanc-n, @shehabgamin, @comphead, @timsaucer and @BlakeOrth as you are mentioned here
| More information can be found in the respective | ||
| [ticket](https://github.com/apache/datafusion/pull/16445) and the next step will be to | ||
| [extend the dynamic filters to other types of joins](https://github.com/apache/datafusion/issues/16973), such as `LEFT` and | ||
| `RIGHT` outer joins. Thanks to [Adrian Garcia Badaracco], [Qi Zhu], [xudong963], [Daniël Heres], and [Lía Adriana] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @adriangb @zhuqi-lucas , @xudong963 @Dandandan and @LiaCastaneda as you are mentioned here
| of multi-level merge sorts (more details in the respective | ||
| [ticket](https://github.com/apache/datafusion/pull/15700)). It is now | ||
| possible to execute almost any sorting query that would have previously triggered *out-of-memory* | ||
| errors, by relying on disk spilling. Thanks to [Raz Luvaton], [Yongting You], and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @rluvaton, @2010YOUY01 and @ding-young as you are mentioned here
|
|
||
| Although it is not part of the SQL standard (yet), it has been gaining | ||
| adoption in several SQL analytical systems such as DuckDB, Snowflake, and | ||
| BigQuery. Thanks to [Huaijin] and [Jonah Gao] for delivering this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @haohuaijin and @jonahgao as you are mentioned here
| FROM table | ||
| ``` | ||
|
|
||
| Thanks to [Geoffrey Claude] and [Jeffrey Vo] for delivering this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @geoffreyclaude and @Jefffrey as you are mentioned here
| behavior that varies based on runtime state; for example, time UDFs can use the | ||
| session-specified time zone instead of just UTC. | ||
|
|
||
| Thanks to [Bruce Ritchie], [Piotr Findeisen], [Oleks V], and [Andrew Lamb] for delivering this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think this blog post is looking good so let's publish it. We can make a follow on PR with any edits that are needed
Thanks again everyone!
|
And the blog is live: https://datafusion.apache.org/blog/2025/09/29/datafusion-50.0.0/ |
50.0.0release datafusion#16931