Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
3ca09a6
benchmark: Support sort_tpch10 for benchmark (#16671)
zhuqi-lucas Jul 3, 2025
18f00a7
chore(deps): bump tokio from 1.45.1 to 1.46.0 (#16666)
dependabot[bot] Jul 3, 2025
06e5bbe
Fix TopK Sort incorrectly pushed down past operators that do not acce…
zhuqi-lucas Jul 3, 2025
50dc83a
Convert Option<Vec<sort expression>> to Vec<sort expression> (#16615)
ViggoC Jul 3, 2025
65242a6
Improve error message when ScalarValue fails to cast array (#16670)
findepi Jul 3, 2025
5a48857
Add an example of embedding indexes inside a parquet file (#16395)
zhuqi-lucas Jul 3, 2025
1cc67ab
`datafusion-cli`: Refactor statement execution logic (#16634)
liamzwbao Jul 3, 2025
3118b81
Implementation for regex_instr (#15928)
nirnayroy Jul 3, 2025
acf0bbe
Refactor error handling to use boxed errors for DataFusionError varia…
kosiew Jul 4, 2025
8b9c1f1
Add SchemaAdapterFactory Support for ListingTable with Schema Evoluti…
kosiew Jul 4, 2025
8c5d06d
Reuse Rows allocation in RowCursorStream (#16647)
Dandandan Jul 4, 2025
0185da6
Perf: fast CursorValues compare for StringViewArray using inline_key_…
zhuqi-lucas Jul 4, 2025
a715173
refactor: shrink `SchemaError` (#16653)
crepererum Jul 4, 2025
aadb79b
rustup version (#16663)
melroy12 Jul 4, 2025
85a9251
Refactor StreamJoinMetrics to reuse BaselineMetrics (#16674)
Standing-Man Jul 5, 2025
12c40ca
Remove unused AggregateUDF struct (#16683)
ViggoC Jul 6, 2025
01698cb
chore: refactor `BuildProbeJoinMetrics` to use `BaselineMetrics` (#16…
Samyak2 Jul 7, 2025
2741c60
Use compression type in CSV file suffices (#16609)
theirix Jul 7, 2025
1e7c3a1
Clarify the generality of the embedded parquet index (#16692)
alamb Jul 7, 2025
698155a
Refactor SortMergeJoinMetrics to reuse BaselineMetrics (#16675)
Standing-Man Jul 7, 2025
d359d64
Add support for Arrow Dictionary type in Substrait (#16608)
jkosh44 Jul 7, 2025
e162ec5
fix: sqllogictest runner label condition mismatch (#16633)
lliangyu-lin Jul 7, 2025
ec15558
Fix duplicate field name error in Join::try_new_with_project_input du…
LiaCastaneda Jul 7, 2025
e95d038
fix: port arrow inline fast key fix to datafusion (#16698)
zhuqi-lucas Jul 7, 2025
ca3071e
chore(deps): bump tokio from 1.46.0 to 1.46.1 (#16700)
dependabot[bot] Jul 7, 2025
ebb8e95
Update Upgrade Guide for 48.0.1 (#16699)
alamb Jul 7, 2025
9e144b2
Add reproducer for tpch Q16 deserialization bug (#16662)
NGA-TRAN Jul 7, 2025
5db5fbc
optimize ScalarValue::to_array_of_size for structural types (#16706)
ding-young Jul 8, 2025
e950df5
Update release instructions (#16701)
alamb Jul 8, 2025
a089eff
refactor filter pushdown APIs (#16642)
adriangb Jul 8, 2025
92d8632
Add comments to ClickBench queries about setting binary_as_string (#1…
alamb Jul 8, 2025
e45de7f
Improve display output for FFI execution plans (#16713)
timsaucer Jul 8, 2025
985eb49
feat: Support `u32` indices for `HashJoinExec` (#16434)
jonathanc-n Jul 8, 2025
2884612
fix: try to lower plain reserved functions to columns as well (#16669)
crepererum Jul 8, 2025
33be09a
Revert "fix: create file for empty stream" (#16682)
brunal Jul 8, 2025
8596812
Add the missing equivalence info for filter pushdown (#16686)
liamzwbao Jul 8, 2025
2afb681
Fix sqllogictests test running compatibility (ignore `--test-threads`…
mjgarton Jul 8, 2025
2093551
Fix: CopyTo logical plan outputs 1 column (#16705)
bert-beyondloops Jul 8, 2025
a17292d
chore(devcontainer): use debian's `protobuf-compiler` package (#16687)
fvj Jul 8, 2025
5648201
fix: Fix CI failing due to #16686 (#16718)
jonathanc-n Jul 9, 2025
2bdf167
Add link to upgrade guide in changelog script (#16680)
alamb Jul 9, 2025
f38f52f
Improve display format of BoundedWindowAggExec (#16645)
geetanshjuneja Jul 9, 2025
bd8fd29
Bump the MSRV due to transitive dependencies (#16728)
rtyler Jul 9, 2025
de8cbd0
Fix: optimize projections for unnest logical plan. (#16632)
bert-beyondloops Jul 9, 2025
76ff87b
Use the `test-threads` option in sqllogictests (#16722)
mjgarton Jul 10, 2025
54592e8
chore(deps): bump clap from 4.5.40 to 4.5.41 (#16735)
dependabot[bot] Jul 10, 2025
8d7b11b
chore: make more clarity for internal errors (#16741)
comphead Jul 11, 2025
eb4f852
Remove parquet_filter and parquet `sort` benchmarks (#16730)
alamb Jul 11, 2025
4bc66c8
Refactor filter pushdown APIs to enable joins to pass through filters…
adriangb Jul 11, 2025
4dd7825
Perform type coercion for corr aggregate function (#15776)
kumarlokesh Jul 11, 2025
2e08d5c
Improve dictionary null handling in hashing and expand aggregate test…
kosiew Jul 12, 2025
95e583f
Improve Ci cache (#16709)
blaginin Jul 12, 2025
ce3f62a
fix in list round trip in df proto (#16744)
XiangpengHao Jul 12, 2025
c01c71f
ensure MemTable has at least one partition (#16754)
waynexia Jul 12, 2025
c8bd776
chore: Make `GroupValues` and APIs on `PhysicalGroupBy` aggregation …
haohuaijin Jul 13, 2025
8a0227e
Extend binary coercion rules to support Decimal arithmetic operations…
jatin510 Jul 13, 2025
14df97b
feat: expose intersect distinct/except distinct in dataframe api (#16…
chenkovsky Jul 13, 2025
8674454
Support Type Coercion for NULL in Binary Arithmetic Expressions (#16761)
kosiew Jul 14, 2025
3291e4e
chore(deps): bump chrono-tz from 0.10.3 to 0.10.4 (#16769)
dependabot[bot] Jul 14, 2025
04b006c
perf: Optimize hash joins with an empty build side (#16716)
nuno-faria Jul 14, 2025
6965fd3
feat: Add a configuration to make parquet encryption optional (#16649)
corwinjoy Jul 14, 2025
36991ac
limit intermediate batch size in nested_loop_join (#16443)
UBarney Jul 14, 2025
a45a4c4
Add serialization/deserialization and round-trip tests for all tpc-h …
NGA-TRAN Jul 14, 2025
e1a1889
Auto start testcontainers for `datafusion-cli` (#16644)
blaginin Jul 14, 2025
d1e6eb4
Refactor BinaryTypeCoercer to Handle Null Coercion Early and Avoid Re…
kosiew Jul 15, 2025
62dbebd
Remove fixed version from MSRV check (#16786)
findepi Jul 15, 2025
1d14e56
Per file filter evaluation (#15057)
adriangb Jul 15, 2025
18a30ce
Add `clickbench_pushdown` benchmark (#16731)
alamb Jul 15, 2025
fe45258
add filter to handle backtrace (#16752)
geetanshjuneja Jul 15, 2025
37ee8fa
fix: return NULL if any of the param to make_date is NULL (#16759)
feniljain Jul 15, 2025
a614716
Support min/max aggregates for FixedSizeBinary type (#16765)
theirix Jul 15, 2025
c4b9995
fix: add `order_requirement` & `dist_requirement` to `OutputRequireme…
Loaki07 Jul 16, 2025
38b87bf
fix tests in page_pruning when filter pushdown is enabled by default …
XiangpengHao Jul 16, 2025
6abc162
Restore custom SchemaAdapter functionality for Parquet (#16791)
adriangb Jul 17, 2025
63dd4e2
Automatically split large single RecordBatches in `MemorySource` into…
kosiew Jul 17, 2025
d3cacac
Fix slow join test (#16796)
2010YOUY01 Jul 17, 2025
44e63e0
Benchmark for char expression (#16743)
ajita-asthana Jul 17, 2025
50e6114
Update `upgrading.md` for new unified config for sql string mapping t…
zhuqi-lucas Jul 17, 2025
eabf3b7
Add example of custom file schema casting rules (#16803)
adriangb Jul 17, 2025
4e32ab9
Fix discrepancy in Float64 to timestamp(9) casts for constants (#16639)
findepi Jul 17, 2025
2a33c87
fix: support nullable columns in pre-sorted data sources (#16783)
crepererum Jul 17, 2025
01234eb
Fix: Preserve sorting for the COPY TO plan (#16785)
bert-beyondloops Jul 17, 2025
d4d5bfd
chore(deps): bump object_store from 0.12.2 to 0.12.3 (#16807)
dependabot[bot] Jul 17, 2025
afd8235
Implement equals for stateful functions (#16781)
findepi Jul 18, 2025
46afb3b
benchmark: Add parquet h2o support (#16804)
zhuqi-lucas Jul 18, 2025
2cc83be
docs: Remove reference to forthcoming example (#16817) (#16818)
m09526 Jul 18, 2025
28b6e9d
fix: The inconsistency between scalar and array on the cast decimal t…
chenkovsky Jul 18, 2025
acff1b6
chore: use `equals_datatype` for `BinaryExpr` (#16813)
comphead Jul 18, 2025
8a3ea87
fix: unit test for object_storage (#16824)
chenkovsky Jul 19, 2025
8484c96
chore: add tests for out of bounds for NullArray (#16802)
comphead Jul 20, 2025
eb25e8d
Refactor binary.rs tests into modular submodules under `binary/tests`…
kosiew Jul 20, 2025
3869857
cache generation of dictionary keys and null arrays for ScalarValue (…
adriangb Jul 20, 2025
9d6a443
fix(docs): Update broken links to `TableProvider` docs (#16830)
jcsherin Jul 21, 2025
350c61b
refactor(examples): remove redundant call to create directory (#16825)
jcsherin Jul 21, 2025
70e7eb3
Add benchmark for ByteViewGroupValueBuilder (#16826)
zhuqi-lucas Jul 21, 2025
c74fcaf
fix broken links (#16839)
2010YOUY01 Jul 21, 2025
1b6a382
Simplify try cast expr evaluation (#16834)
lewiszlw Jul 22, 2025
2d949e1
Add note to upgrade guide about MSRV update (#16845)
alamb Jul 22, 2025
1d5b2ca
Fix flaky test case in joins.slt (#16849)
findepi Jul 22, 2025
3ec51eb
chore(deps): bump sysinfo from 0.35.2 to 0.36.1 (#16850)
dependabot[bot] Jul 22, 2025
4d1db63
chore(deps): bump aws-credential-types from 1.2.3 to 1.2.4 (#16815)
dependabot[bot] Jul 22, 2025
a0ce581
feat: Allow tree explain format width to be customizable (#16827)
nuno-faria Jul 22, 2025
3c95281
feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a…
haohuaijin Jul 23, 2025
5dd706d
[main] Update version to 49.0.0, add 49.0.0 changelog (#16855)
alamb Jul 23, 2025
518fb4b
fix(build-wasm): put `arrow-ipc/zstd` dep under `compression` feature…
chrisvander Jul 23, 2025
3abd638
chore(deps): bump serde_json from 1.0.140 to 1.0.141 (#16863)
dependabot[bot] Jul 23, 2025
14ac31d
chore(deps): bump aws-config from 1.8.1 to 1.8.2 (#16864)
dependabot[bot] Jul 23, 2025
3a40625
test: Fix flaky join tests (#16860)
2010YOUY01 Jul 24, 2025
3d4fdf2
chore(deps): bump rand from 0.9.1 to 0.9.2 (#16882)
dependabot[bot] Jul 24, 2025
51f04c8
Report error when `SessionState::sql_to_expr_with_alias` does not con…
pepijnve Jul 24, 2025
dbc03fa
fix another flaky join test (#16880)
2010YOUY01 Jul 24, 2025
d553ffd
Improve async_udf example and docs (#16846)
alamb Jul 24, 2025
07516aa
Chore: add unit tests for chr function (#16856)
waynexia Jul 24, 2025
d9e963e
remove deprecated methods from FileScanConfig / DataSourceExec (#16901)
adriangb Jul 24, 2025
3b4eda5
Support utf8view for spark hex (#16885)
xudong963 Jul 25, 2025
a6d4798
Fixes 3 bugs during serialization and deserialization of physical pla…
NGA-TRAN Jul 25, 2025
b7da86e
feat(spark): Implement Spark `string` function `luhn_check` (#16848)
Standing-Man Jul 25, 2025
bb1b55c
chore(deps): bump aws-config from 1.8.2 to 1.8.3 (#16912)
dependabot[bot] Jul 25, 2025
070517a
Derive UDF equality from PartialEq, Hash (#16842)
findepi Jul 25, 2025
675b96c
Ensure Substrait consumer can handle expressions in VirtualTable (#16…
lorenarosati Jul 25, 2025
8b9204c
Mutable Join Unwind (#16883)
berkaysynnada Jul 25, 2025
2ff02b2
feat(spark): implement Spark datetime function last_day (#16828)
Standing-Man Jul 26, 2025
5e0b2d0
fix(datafusion-proto): support serializing/deserilizing ArrowFormat t…
colinmarc Jul 26, 2025
871d4b5
ScalarValue Default + Min + Max (#16891)
berkaysynnada Jul 26, 2025
4f53358
fix: `PlaceholderRowExec::partition_statistics` (#16851)
crepererum Jul 26, 2025
84c8881
minor: add is_superset() method for Interval's (#16895)
berkaysynnada Jul 26, 2025
9deec2a
minor: implement with_new_expressions for AggregateFunctionExpr (#16897)
berkaysynnada Jul 26, 2025
cbda394
Update enforce_distribution.rs (#16913)
berkaysynnada Jul 27, 2025
4b9a468
feat: Add `ScalarValue::{new_one,new_zero,new_ten,distance}` support …
theirix Jul 27, 2025
fd08e72
Implement Helpers for ScopedTimerGuard and Time Structs (#16911)
berkaysynnada Jul 27, 2025
8bf7123
Fix Partial Sort Get Slice Point Between Batches (#16881)
berkaysynnada Jul 27, 2025
71dbdf6
fix: skip predicates on struct unnest in PushDownFilter (#16790)
akoshchiy Jul 27, 2025
e033d42
optimize initcap function by avoiding memory allocation (#16878)
waynexia Jul 27, 2025
1cb7bcb
Merge remote-tracking branch 'upstream/main' into alamb/thread_config…
Omega359 Jul 27, 2025
547bcac
Add ConfigOptions to ScalarFunctionArgs, refactor AsyncScalarUDF.invo…
Omega359 Jul 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 4 additions & 6 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,7 @@ RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
&& apt-get purge -y imagemagick imagemagick-6-common

# Add protoc
# https://datafusion.apache.org/contributor-guide/getting_started.html#protoc-installation
RUN curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v25.1/protoc-25.1-linux-x86_64.zip \
&& unzip protoc-25.1-linux-x86_64.zip -d $HOME/.local \
&& rm protoc-25.1-linux-x86_64.zip

ENV PATH="$PATH:$HOME/.local/bin"
# https://datafusion.apache.org/contributor-guide/development_environment.html#protoc-installation
RUN apt-get update \
&& apt-get install -y --no-install-recommends protobuf-compiler libprotobuf-dev \
&& rm -rf /var/lib/apt/lists/*
2 changes: 2 additions & 0 deletions .github/actions/setup-macos-aarch64-builder/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,5 +45,7 @@ runs:
rustup component add rustfmt
- name: Setup rust cache
uses: Swatinem/rust-cache@v2
with:
save-if: ${{ github.ref_name == 'main' }}
- name: Configure rust runtime env
uses: ./.github/actions/setup-rust-runtime
72 changes: 50 additions & 22 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@ jobs:
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- name: Rust Dependency Cache
uses: Swatinem/rust-cache@v2
with:
shared-key: "amd-ci-check" # this job uses it's own cache becase check has a separate cache and we need it to be fast as it blocks other jobs
save-if: ${{ github.ref_name == 'main' }}
- name: Prepare cargo build
run: |
# Adding `--locked` here to assert that the `Cargo.lock` file is up to
Expand Down Expand Up @@ -99,6 +104,11 @@ jobs:
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- name: Rust Dependency Cache
uses: Swatinem/rust-cache@v2
with:
save-if: false # set in linux-test
shared-key: "amd-ci"
- name: Check datafusion-substrait (default features)
run: cargo check --profile ci --all-targets -p datafusion-substrait
#
Expand Down Expand Up @@ -162,6 +172,11 @@ jobs:
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- name: Rust Dependency Cache
uses: Swatinem/rust-cache@v2
with:
save-if: false # set in linux-test
shared-key: "amd-ci"
- name: Check datafusion (default features)
run: cargo check --profile ci --all-targets -p datafusion
#
Expand Down Expand Up @@ -203,6 +218,8 @@ jobs:
run: cargo check --profile ci --no-default-features -p datafusion --features=string_expressions
- name: Check datafusion (unicode_expressions)
run: cargo check --profile ci --no-default-features -p datafusion --features=unicode_expressions
- name: Check parquet encryption (parquet_encryption)
run: cargo check --profile ci --no-default-features -p datafusion --features=parquet_encryption

# Check datafusion-functions crate features
#
Expand Down Expand Up @@ -247,15 +264,22 @@ jobs:
name: cargo test (amd64)
needs: linux-build-lib
runs-on: ubuntu-latest
container:
image: amd64/rust
steps:
- uses: actions/checkout@v4
with:
submodules: true
fetch-depth: 1
- name: Setup Rust toolchain
run: rustup toolchain install stable
- name: Install Protobuf Compiler
run: sudo apt-get install -y protobuf-compiler
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- name: Rust Dependency Cache
uses: Swatinem/rust-cache@v2
with:
save-if: ${{ github.ref_name == 'main' }}
shared-key: "amd-ci"
- name: Run tests (excluding doctests and datafusion-cli)
env:
RUST_BACKTRACE: 1
Expand All @@ -279,25 +303,17 @@ jobs:
name: cargo test datafusion-cli (amd64)
needs: linux-build-lib
runs-on: ubuntu-latest
# should be uncommented once https://github.com/apache/datafusion/pull/16644 is merged
# and cache should be added
# container:
# image: amd64/rust
steps:
- uses: actions/checkout@v4
with:
submodules: true
fetch-depth: 1
- name: Setup Rust toolchain
run: rustup toolchain install stable
- name: Setup Minio - S3-compatible storage
run: |
docker run -d --name minio-container \
-p 9000:9000 \
-e MINIO_ROOT_USER=TEST-DataFusionLogin -e MINIO_ROOT_PASSWORD=TEST-DataFusionPassword \
-v $(pwd)/datafusion/core/tests/data:/source quay.io/minio/minio \
server /data
docker exec minio-container /bin/sh -c "\
mc ready local
mc alias set localminio http://localhost:9000 TEST-DataFusionLogin TEST-DataFusionPassword && \
mc mb localminio/data && \
mc cp -r /source/* localminio/data"
- name: Run tests (excluding doctests)
env:
RUST_BACKTRACE: 1
Expand All @@ -309,9 +325,6 @@ jobs:
run: cargo test --profile ci -p datafusion-cli --lib --tests --bins
- name: Verify Working Directory Clean
run: git diff --exit-code
- name: Minio Output
if: ${{ !cancelled() }}
run: docker logs minio-container


linux-test-example:
Expand All @@ -329,6 +342,11 @@ jobs:
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- name: Rust Dependency Cache
uses: Swatinem/rust-cache@v2
with:
save-if: ${{ github.ref_name == 'main' }}
shared-key: "amd-ci-linux-test-example"
- name: Run examples
run: |
# test datafusion-sql examples
Expand Down Expand Up @@ -655,6 +673,11 @@ jobs:
rust-version: stable
- name: Install Clippy
run: rustup component add clippy
- name: Rust Dependency Cache
uses: Swatinem/rust-cache@v2
with:
save-if: ${{ github.ref_name == 'main' }}
shared-key: "amd-ci-clippy"
- name: Run clippy
run: ci/scripts/rust_clippy.sh

Expand Down Expand Up @@ -733,10 +756,15 @@ jobs:
# `rust-version` key of `Cargo.toml`.
#
# To reproduce:
# 1. Install the version of Rust that is failing. Example:
# rustup install 1.80.1
# 2. Run the command that failed with that version. Example:
# cargo +1.80.1 check -p datafusion
# 1. Install the version of Rust that is failing.
# 2. Run the command that failed with that version.
#
# Example:
# # MSRV looks like "1.80.0" and is specified in Cargo.toml. We can read the value with the following command:
# msrv="$(cargo metadata --format-version=1 | jq '.packages[] | select( .name == "datafusion" ) | .rust_version' -r)"
# echo "MSRV: ${msrv}"
# rustup install "${msrv}"
# cargo "+${msrv}" check
#
# To resolve, either:
# 1. Change your code to use older Rust features,
Expand Down
Loading
Loading