-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Introduction
- Follow on to Nov 20. 2024: This week in DataFusion #13503
This ticket is a weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please feel free to leave comments on this ticket about things that I may have missed or you think should get wider attention by the community
Loosely inspired by https://this-week-in-rust.org/
Highlights
- Apache DataFusion is now the fastest single node engine for querying Apache Parquet files
- DataFusion is featured as one of the coolest 10 open source software tools by CRN
- DataFusion Blog: Comparing approaches to User Defined Functions in Apache DataFusion using Python
- DataFusion Blog: Apache DataFusion Comet 0.4.0 Release
- Building Databases over a Weekend -- @ameyc ❤️ (thanks to @devanbenz for this link)
@findepi became a committeer 🎉
Discussions
- [DISCUSSION] More SqlLogicTest test coverage for queries, including join queries #13470
- [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) #13525: discussion on making it easier to build on DataFusion
Major Projects / Discussions under way
DataFusion Related Reading List
(looking for help updating this list):
- New Concepts, Readings, Events](https://datafusion.apache.org/user-guide/concepts-readings-events.html) page
Upcoming Releases
- DataFusion python 43.0.0
- sqlparser: Release sqlparser-rs version
0.53.0/ sqlparser_derive0.3.0datafusion-sqlparser-rs#1517 - DataFusion (minor): Release Minor DataFusion 43.1.0 release #13499
Highlights from last week(s):
(I am sorry if I missed you -- please add a note to this ticket with anything you would like to highlight)
Performance
- @jayzhan211 @Rachelint @jonathanc-n and others started looking at [EPIC] Improved performance in H2O.ai benchmarks #13548 (the h20 benchmark)
- @jonathanc-n is cranking away expanding support for different types
🐛 🔨
- @findepi completed fixes to LIKE Update tests and resolve TODOs after arrow update #13538
- Fix panic when hashing empty FixedSizeList Array #13533
- @zhuliquan is helping windows support test: allow external_access_plan run on windows #13531
- @timsaucer fixed some issues with lists / schemas: Preserve field name when casting List #13468
Features
- @comphead made macros to create function documetation; Doc gen: Attributes to support
related_udf,alternative_syntax#13575 - @joseph-isaacs added a good way to implement ScalarUDF: Add
ScalarUDFImpl::invoke_with_argsto support passing the return type created for the udf instance #13290 - @rluvaton added feat(function): add greatest function #12474
- @2010YOUY01 made a streaming generate_series: Add generate_series() udtf (and introduce 'lazy'
MemoryExec) #13540
There were many small improvements / bug fixes as well.
Others
- I started documenting how to use multiple threadpools, see Improve documentation (and ASCII art) about streaming execution, and thread pools #13423
Looking to get more involved? Try code review!
DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.
We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try @ mentioning one of the committers.
Help wanted
- I would love to see some additional help testing, triaging bugs helping to make DataFusion a more stable foundation for building systems
Please feel leave your own comments on this ticket if you are looking for help
Community
- Weekly Call
- Slack/Discord: info links
Upcoming meetups:
- 2024 Dec 18 Chicago: https://lu.ma/eq5myc5i @adriangb @timsaucer
- DISCUSSION: January 2025 DataFusion Meetup in Amsterdam / CIDR 2025 #12988
- 2025 Jan 15 Boston
Background:
Previous update: