-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
This is my weekly plan, mostly for my own organizational needs (as I am dropping too many things). I am making it public in the hopes that helps others to see what I am working on -- also I spend so much time in github the interface is very familiar to me and I can cross link all the issues I am working
(it is also my excuse as to why I haven't reviewed many good looking PRs)
I will update this list as I run into more things
Notes to myself: a duplicate entry unchecked means I need to go back and re-review
PR review queue (rough order)
- Speed up
collect_booland removeunsafe, optimizetake_bits,take_nativefor null values arrow-rs#8849 - Add ability to skip or transform page encoding statistics in Parquet metadata arrow-rs#8797
- Removed incorrect union check in enforce_sorting and updated tests #18661
- Allow Users to Provide Custom
ArrayFormatters when Pretty-Printing Record Batches arrow-rs#8829 - Refactor InListExpr to support structs by re-using existing hashing infrastructure #18449
- Optimize planning for projected nested union #18713
- optimizer: Support dynamic filter in
MIN/MAXaggregates #18644 - Removed incorrect union check in enforce_sorting and updated tests #18661
- Add comparison support for Union arrays arrow-rs#8838
- Add support for
Uniontypes inRowConverterarrow-rs#8839 - [DRAFT] Extension Type Registry Draft #18552
- move projection handling into FileSource #18627
- Make Parquet SBBF serialize/deserialize helpers public for external reuse arrow-rs#8762
- feat: implement GroupArrayAggAccumulator attempt 3 #17915
- feat: Deduplicating recursive CTE implementation #18254
- [Variant] Enforce shredded-type validation in
shred_variantarrow-rs#8796 (review) - Test datafusion with new sqlparser: Eliminating whitespace from the parser logic datafusion-sqlparser-rs#2076 (comment)
- Refactor state management in
HashJoinExecand use CASE expressions for more precise filters #18451 - Add
mergeandmerge_nalgorithms arrow-rs#8753 - Add relation planner extension support #17843
What am I personally working on now
- Internal error: Physical input schema should be the same as the one converted from logical input schema. #18337 (and related fallout)
Projects I am supporting actively (high on my priority list)
- Next DF release: Release DataFusion
51.0.0(Nov 2025) #17558 with @xudong963 - Improve DataFusion ClickBench performance: [EPIC] Make DataFusion the top of the ClickBench Parquet leaderboard #18489
- Adaptive parquet predicate pushdown evaluation faster with @hhhizzz [Parquet] Adaptive Parquet Predicate Pushdown arrow-rs#8733
- Low level arrow bit manipulation fast with @rluvaton feat: add
apply_unary_opandapply_binary_opbitwise operations arrow-rs#8619 - DataFusion object store requests go faster with @BlakeOrth [EPIC] ListingTable object store usage improvements #17214
- Release object store 0.13.0: Release object store
0.13.0(breaking) - Target Nov 2025 arrow-rs-object-store#367
Projects on my backlog
These are ones I would like to support but don't have the capacity at the moment to push, in relative order
- Help integrate Variant with @friendlymatthew [EPIC] Support
VARIANTtype for unstructured data #16116 - Epic: Join Order Enumeration #18249 from @NGA-TRAN
PRs that look great but need a thorough review (looking for help here 🎣 from anyone else)
- Examples of extending SQL syntax #17824 from the @theirix
- external tables for multiple locations: feat(cli): support external tables on multiple locations #17702
- relation extension planner: Add relation planner extension support #17843
- writing REE arrays to parquet: Support writing RunEndEncoded as Parquet arrow-rs#8069
Metadata
Metadata
Assignees
Labels
No labels