-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Insights: apache/datafusion
Overview
Could not load contribution data
Please try again later
41 Pull requests merged by 22 people
-
Fix CI Failure: replace false with NullEqualsNothing
#16437 merged
Jun 18, 2025 -
chore: generate basic spark function tests
#16409 merged
Jun 17, 2025 -
Use dedicated NullEquality enum instead of null_equals_null boolean
#16419 merged
Jun 17, 2025 -
fix: Enable WASM compilation by making sqlparser's recursive-protection optional
#16418 merged
Jun 17, 2025 -
TopK dynamic filter pushdown attempt 2
#15770 merged
Jun 17, 2025 -
Unify Metadata Handing: use
FieldMetadata
inExpr::Alias
andExprSchemable
#16320 merged
Jun 17, 2025 -
chore(deps): bump libc from 0.2.172 to 0.2.173
#16421 merged
Jun 17, 2025 -
chore(deps): bump mimalloc from 0.1.46 to 0.1.47
#16426 merged
Jun 17, 2025 -
Migrate core test to insta, part1
#16324 merged
Jun 17, 2025 -
feat: mapping sql Char/Text/String default to Utf8View
#16290 merged
Jun 17, 2025 -
chore(deps): bump rust_decimal from 1.37.1 to 1.37.2
#16422 merged
Jun 16, 2025 -
Add design process section to the docs
#16397 merged
Jun 16, 2025 -
fix: Fixed error handling for
generate_series/range
#16391 merged
Jun 16, 2025 -
feat: Support RightMark join for NestedLoop and Hash join
#16083 merged
Jun 16, 2025 -
Update PMC management instructions to follow new ASF process
#16417 merged
Jun 16, 2025 -
Add note in upgrade guide about changes to
Expr::Scalar
in 48.0.0#16360 merged
Jun 16, 2025 -
Minor: Clean-up
bench.sh
usage message#16416 merged
Jun 15, 2025 -
Add fast paths for try_process_unnest
#16389 merged
Jun 15, 2025 -
Simplify expressions passed to table functions
#16388 merged
Jun 15, 2025 -
[datafusion-spark] Example of using Spark compatible function library
#16384 merged
Jun 15, 2025 -
chore(deps): bump syn from 2.0.102 to 2.0.103
#16393 merged
Jun 15, 2025 -
Reduce some cloning
#16404 merged
Jun 14, 2025 -
Add topk_tpch benchmark
#16410 merged
Jun 14, 2025 -
fix typo in test file name
#16403 merged
Jun 13, 2025 -
fix: Fix SparkSha2 to be compliant with Spark response and add support for Int32
#16350 merged
Jun 13, 2025 -
chore(deps): bump aws-config from 1.6.3 to 1.8.0
#16394 merged
Jun 13, 2025 -
Minor: add testing case for add YieldStreamExec and polish docs
#16369 merged
Jun 13, 2025 -
doc: Add SQL examples for SEMI + ANTI Joins
#16316 merged
Jun 12, 2025 -
Fix array_concat with NULL arrays
#16348 merged
Jun 12, 2025 -
Disable
datafusion-cli
tests for hash_collision tests, fix extended CI#16382 merged
Jun 12, 2025 -
chore(deps): bump object_store from 0.12.1 to 0.12.2
#16368 merged
Jun 12, 2025 -
feat: Support tpch and tpch10 benchmark for csv format
#16373 merged
Jun 12, 2025 -
chore: refactor Substrait consumer's "rename_field" and implement the rest of types
#16345 merged
Jun 12, 2025 -
Document Table Constraint Enforcement Behavior in Custom Table Providers Guide
#16340 merged
Jun 12, 2025 -
bug: remove busy-wait while sort is ongoing
#16322 merged
Jun 12, 2025 -
feat: support FixedSizeList for array_has
#16333 merged
Jun 11, 2025 -
fix: preserve null_equals_null flag in eliminate_cross_join rule
#16356 merged
Jun 11, 2025 -
Fix: datafusion-sqllogictest 48.0.0 can't be published
#16376 merged
Jun 11, 2025 -
Add more context to error message for datafusion-cli config failure
#16379 merged
Jun 11, 2025 -
Update publish command
#16377 merged
Jun 11, 2025 -
Fix array_agg memory over use
#16346 merged
Jun 11, 2025
18 Pull requests opened by 13 people
-
Enable schema evolution for nested structs via adapt_column and custom adapter support in ListingTable
#16371 opened
Jun 11, 2025 -
POC: Reduce `Arc` cloning on hashmap build side
#16380 opened
Jun 11, 2025 -
Improved experience when remote object store URL does not end in /
#16386 opened
Jun 12, 2025 -
enhancement (datafusion-cli): Add support for glob patterns in CREATE EXTERNAL TABLE commands
#16387 opened
Jun 12, 2025 -
Add an example of embedding indexes inside a parquet file
#16395 opened
Jun 13, 2025 -
Use Tokio's task budget consistently, better APIs to support task cancellation
#16398 opened
Jun 13, 2025 -
Update Roadmap documentation
#16399 opened
Jun 13, 2025 -
feat: add SchemaProvider::table_type(table_name: &str)
#16401 opened
Jun 13, 2025 -
[WIP] fix: respect inexact flags in row group metadata
#16412 opened
Jun 14, 2025 -
feat: support fixed size list for array reverse
#16423 opened
Jun 16, 2025 -
Prune files during streams and avoid additional pruning if there are no dynamic filters
#16424 opened
Jun 16, 2025 -
Fix constant window for evaluate stateful
#16430 opened
Jun 17, 2025 -
Only update TopK dynamic filters if the new ones are more selective
#16433 opened
Jun 17, 2025 -
feat: Support `u32` indices for `HashJoinExec`
#16434 opened
Jun 18, 2025 -
doc: Add comments to clarify algorithm for `MarkJoin`s
#16436 opened
Jun 18, 2025 -
chore(deps): bump prost-build from 0.13.5 to 0.14.1 in the proto group
#16439 opened
Jun 18, 2025 -
chore(deps): bump libc from 0.2.173 to 0.2.174
#16440 opened
Jun 18, 2025 -
chore(deps): bump bzip2 from 0.5.2 to 0.6.0
#16441 opened
Jun 18, 2025
22 Issues closed by 8 people
-
Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries)
#15037 closed
Jun 17, 2025 -
Optimize TopK with filter
#15699 closed
Jun 17, 2025 -
Mapping Char/Text/String default to Utf8View
#16288 closed
Jun 17, 2025 -
Improve sql planing performance (optimize `try_process_unnest`)
#16242 closed
Jun 15, 2025 -
Table function supports non-literal args
#14958 closed
Jun 15, 2025 -
Can't publish datafusion-spark crate due to error
#16383 closed
Jun 15, 2025 -
[datafusion-spark] Example of using Spark compatible function library
#15915 closed
Jun 15, 2025 -
Add TopK benchmark
#16411 closed
Jun 14, 2025 -
SparkSha2 is not compliant with Spark and does not support Int32 type
#16336 closed
Jun 13, 2025 -
Request to update crates.io ownership
#16323 closed
Jun 13, 2025 -
row-wise min and max
#16366 closed
Jun 13, 2025 -
Document semi join, anti semi join and more supported join types
#16245 closed
Jun 12, 2025 -
array_concat fails with when passed NULL list literals
#16349 closed
Jun 12, 2025 -
CI failure on Datafusion extended tests / cargo test hash collisions (amd64) (push)
#16378 closed
Jun 12, 2025 -
Support RightMark join for `SortMergeJoin`
#16226 closed
Jun 12, 2025 -
Add tpch csv support to bench.sh
#16370 closed
Jun 12, 2025 -
Add documentation on constraint enforcements
#16309 closed
Jun 12, 2025 -
Busy-waiting in SortPreservingMergeStream
#16321 closed
Jun 12, 2025 -
Release DataFusion `48.0.0` (June 2025)
#15771 closed
Jun 11, 2025 -
datafusion-sqllogictest 48.0.0 can't be published
#16375 closed
Jun 11, 2025 -
June 2025 ASF Board Report
#15182 closed
Jun 11, 2025 -
DF 48 upgrade guide missing window function breaking change
#16326 closed
Jun 11, 2025
17 Issues opened by 9 people
-
404 for indexed docs page
#16438 opened
Jun 18, 2025 -
Add BloomFilter PhysicalExpr
#16435 opened
Jun 18, 2025 -
Only update topk filter when updated filter is more selective
#16432 opened
Jun 17, 2025 -
Extend `DESCRIBE` statement to output the schema
#16429 opened
Jun 17, 2025 -
Add support for clickbench data and benchmark with page index
#16427 opened
Jun 17, 2025 -
Implement Single Join
#16425 opened
Jun 17, 2025 -
Add documentation to clarify algorithms for Mark Joins
#16415 opened
Jun 15, 2025 -
Automate `Upgrade Guide` on top of most recent `deprecated` methods
#16414 opened
Jun 15, 2025 -
[Blog] Proposal: Add categorical-tags to blogs for better navigation
#16407 opened
Jun 14, 2025 -
decimal calculate overflow but not throw error
#16406 opened
Jun 14, 2025 -
Upgrade to sqlparser 0.56.0
#16405 opened
Jun 13, 2025 -
Add statistics to ParquetExec for *files* pruned
#16402 opened
Jun 13, 2025 -
Blog post about DataFusion Async / Stream execution model / cancellation
#16396 opened
Jun 13, 2025 -
RightMark Join support to SortMergeJoin execution
#16385 opened
Jun 12, 2025 -
Add an example of embedding indexes *inside* a parquet file
#16374 opened
Jun 11, 2025 -
Blog Post for Accelerating Query Processing with Specialized Indexes
#16372 opened
Jun 11, 2025
57 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add compression option to SpillManager
#16268 commented on
Jun 18, 2025 • 13 new comments -
Implementation for regex_instr
#15928 commented on
Jun 17, 2025 • 12 new comments -
Eliminate Self Joins
#16023 commented on
Jun 17, 2025 • 10 new comments -
Simplify predicates in filter
#16362 commented on
Jun 12, 2025 • 3 new comments -
feat: Parquet modular encryption
#16351 commented on
Jun 18, 2025 • 2 new comments -
Add hooks to `SchemaAdapter` to add custom column generators
#15261 commented on
Jun 16, 2025 • 1 new comment -
Support data source sampling with TABLESAMPLE
#16325 commented on
Jun 14, 2025 • 1 new comment -
Draft: Use take-in kernel in repartitioning
#15392 commented on
Jun 12, 2025 • 0 new comments -
refactor!: consistent null handling in coercible signatures
#15404 commented on
Jun 17, 2025 • 0 new comments -
feat: implement GroupsAccumulator for `count(DISTINCT)` aggr
#15324 commented on
Jun 12, 2025 • 0 new comments -
Add `deserialize` to `BatchSerializer`
#15411 commented on
Jun 12, 2025 • 0 new comments -
refactor(joins::utils): Replace OnceAsync/OnceFut with tokio's OnceCell
#15431 commented on
Jun 12, 2025 • 0 new comments -
fix!: incorrect coercion when comparing with string literals
#15482 commented on
Jun 16, 2025 • 0 new comments -
Use pager and allow configuration via `\pset`
#15597 commented on
Jun 16, 2025 • 0 new comments -
fix: miss output ordering during projection
#15683 commented on
Jun 16, 2025 • 0 new comments -
dynamic filter refactor
#15685 commented on
Jun 18, 2025 • 0 new comments -
[datafusion-spark] Implement ceil&floor function for spark
#15958 commented on
Jun 18, 2025 • 0 new comments -
Optimize char expression
#16076 commented on
Jun 15, 2025 • 0 new comments -
Optimize Hex Function
#16077 commented on
Jun 13, 2025 • 0 new comments -
[datafusion-spark] Implement `factorical` function
#16125 commented on
Jun 18, 2025 • 0 new comments -
feat: optimize and unparse grouping
#16161 commented on
Jun 17, 2025 • 0 new comments -
Fix: Map functions crash on out of bounds cases
#16203 commented on
Jun 17, 2025 • 0 new comments -
Always add parentheses when formatting `BinaryExpr` with `SchemaDisplay`
#16209 commented on
Jun 16, 2025 • 0 new comments -
Perf: load default Utf8View for CSV datatype
#16243 commented on
Jun 11, 2025 • 0 new comments -
Draft: Use upstream arrow `coalesce` kernel in DataFusion
#16249 commented on
Jun 17, 2025 • 0 new comments -
feat: use spawned tasks to reduce call stack depth and avoid busy waiting
#16319 commented on
Jun 11, 2025 • 0 new comments -
Example for using a separate threadpool for CPU bound work (try 3)
#16331 commented on
Jun 17, 2025 • 0 new comments -
Add support for glob string in datafusion-cli query
#16332 commented on
Jun 13, 2025 • 0 new comments -
fix: create file for empty stream
#16342 commented on
Jun 17, 2025 • 0 new comments -
Investigate performance tradeoff in compressing spill files
#16367 commented on
Jun 12, 2025 • 0 new comments -
Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable`
#13433 commented on
Jun 12, 2025 • 0 new comments -
Enable merge queue in github to avoid commit confliction.
#6880 commented on
Jun 12, 2025 • 0 new comments -
[Epic] DataFusion Blogs
#14836 commented on
Jun 13, 2025 • 0 new comments -
`datafusion-cli`: Use correct S3 region if it is not specified
#16306 commented on
Jun 13, 2025 • 0 new comments -
[Epic] Pipeline breaking cancellation support and improvement
#16353 commented on
Jun 13, 2025 • 0 new comments -
Improved experience when remote object store URL does not end in `/`
#16302 commented on
Jun 13, 2025 • 0 new comments -
Panic in `datafusion_expr::window_state::WindowAggState::update`
#16308 commented on
Jun 14, 2025 • 0 new comments -
[EPIC] Complete `datafusion-spark` Spark Compatible Functions
#15914 commented on
Jun 15, 2025 • 0 new comments -
Support Aggregating by `RunArray`s
#16011 commented on
Jun 16, 2025 • 0 new comments -
Optimize `NestedLoopJoinExec` Memory Usage
#16364 commented on
Jun 16, 2025 • 0 new comments -
Blog post about parquet vs custom file formats
#16149 commented on
Jun 16, 2025 • 0 new comments -
Support reading multiple parquet files via `datafusion-cli`
#16303 commented on
Jun 16, 2025 • 0 new comments -
Improve performance of `datafusion-cli` when reading from remote storage
#16365 commented on
Jun 16, 2025 • 0 new comments -
Evaluate filter pushdown against the physical schema for performance and correctness
#15780 commented on
Jun 17, 2025 • 0 new comments -
[DISCUSSION] JOIN "task force" / project team
#15885 commented on
Jun 17, 2025 • 0 new comments -
Update all github workflow to use actions tied to sha hashes
#15298 commented on
Jun 17, 2025 • 0 new comments -
Reduce page metadata loading to only what is necessary for query execution in ParquetOpen
#16200 commented on
Jun 17, 2025 • 0 new comments -
Release DataFusion `49.0.0` (July 2025)
#16235 commented on
Jun 17, 2025 • 0 new comments -
Blog post about TopK filter pushdown
#15513 commented on
Jun 18, 2025 • 0 new comments -
[Epic] A collection of dynamic filtering related items
#15512 commented on
Jun 18, 2025 • 0 new comments -
Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc)
#7955 commented on
Jun 18, 2025 • 0 new comments -
Add hook for sharing join state in distributed execution
#12523 commented on
Jun 14, 2025 • 0 new comments -
Support bounds evaluation for temporal data types
#14523 commented on
Jun 17, 2025 • 0 new comments -
wip: proto to physical plan conversion
#14530 commented on
Jun 17, 2025 • 0 new comments -
Introduce Async User Defined Functions
#14837 commented on
Jun 18, 2025 • 0 new comments -
refactor: use TypeSignature::Coercible for math functions
#14872 commented on
Jun 14, 2025 • 0 new comments -
updatted github action by change version tag to sha hashes
#15315 commented on
Jun 17, 2025 • 0 new comments