Skip to content

Commit 7cd491c

Browse files
authored
Merge branch 'main' into support_overflow_sum_function
2 parents 607c9de + e89f553 commit 7cd491c

File tree

740 files changed

+56128
-43816
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

740 files changed

+56128
-43816
lines changed

.github/actions/java-test/action.yaml

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,32 @@ runs:
6262
- name: Run Maven compile
6363
shell: bash
6464
run: |
65-
./mvnw -B compile test-compile scalafix:scalafix -Dscalafix.mode=CHECK -Psemanticdb ${{ inputs.maven_opts }}
65+
./mvnw -B package -DskipTests scalafix:scalafix -Dscalafix.mode=CHECK -Psemanticdb ${{ inputs.maven_opts }}
66+
67+
- name: Setup Node.js
68+
uses: actions/setup-node@v6
69+
with:
70+
node-version: '24'
71+
72+
- name: Install prettier
73+
shell: bash
74+
run: |
75+
npm install -g prettier
76+
77+
- name: Run prettier
78+
shell: bash
79+
run: |
80+
npx prettier "**/*.md" --write
81+
82+
- name: Mark workspace as safe for git
83+
shell: bash
84+
run: |
85+
git config --global --add safe.directory "$GITHUB_WORKSPACE"
86+
87+
- name: Check for any local git changes (such as generated docs)
88+
shell: bash
89+
run: |
90+
./dev/ci/check-working-tree-clean.sh
6691
6792
- name: Run all tests
6893
shell: bash
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
name: Setup Iceberg Builder
19+
description: 'Setup Apache Iceberg to run Spark SQL tests'
20+
inputs:
21+
iceberg-version:
22+
description: 'The Apache Iceberg version (e.g., 1.8.1) to build'
23+
required: true
24+
runs:
25+
using: "composite"
26+
steps:
27+
- name: Clone Iceberg repo
28+
uses: actions/checkout@v4
29+
with:
30+
repository: apache/iceberg
31+
path: apache-iceberg
32+
ref: apache-iceberg-${{inputs.iceberg-version}}
33+
fetch-depth: 1
34+
35+
- name: Setup Iceberg for Comet
36+
shell: bash
37+
run: |
38+
cd apache-iceberg
39+
git apply ../dev/diffs/iceberg-rust/${{inputs.iceberg-version}}.diff

.github/pull_request_template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Closes #.
1010

1111
<!--
1212
Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
13-
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
13+
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
1414
-->
1515

1616
## What changes are included in this PR?

.github/workflows/iceberg_spark_test.yml

Lines changed: 120 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ jobs:
4646
matrix:
4747
os: [ubuntu-24.04]
4848
java-version: [11, 17]
49-
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}]
49+
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
5050
spark-version: [{short: '3.5', full: '3.5.7'}]
5151
scala-version: ['2.13']
5252
fail-fast: false
@@ -85,7 +85,7 @@ jobs:
8585
matrix:
8686
os: [ubuntu-24.04]
8787
java-version: [11, 17]
88-
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}]
88+
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
8989
spark-version: [{short: '3.5', full: '3.5.7'}]
9090
scala-version: ['2.13']
9191
fail-fast: false
@@ -124,7 +124,7 @@ jobs:
124124
matrix:
125125
os: [ubuntu-24.04]
126126
java-version: [11, 17]
127-
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}]
127+
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
128128
spark-version: [{short: '3.5', full: '3.5.7'}]
129129
scala-version: ['2.13']
130130
fail-fast: false
@@ -156,3 +156,120 @@ jobs:
156156
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ./gradlew -DsparkVersions=${{ matrix.spark-version.short }} -DscalaVersion=${{ matrix.scala-version }} -DflinkVersions= -DkafkaVersions= \
157157
:iceberg-spark:iceberg-spark-runtime-${{ matrix.spark-version.short }}_${{ matrix.scala-version }}:integrationTest \
158158
-Pquick=true -x javadoc
159+
160+
iceberg-spark-rust:
161+
if: contains(github.event.pull_request.title, '[iceberg]')
162+
strategy:
163+
matrix:
164+
os: [ubuntu-24.04]
165+
java-version: [11, 17]
166+
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
167+
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.7'}]
168+
scala-version: ['2.13']
169+
fail-fast: false
170+
name: iceberg-spark-rust/${{ matrix.os }}/iceberg-${{ matrix.iceberg-version.full }}/spark-${{ matrix.spark-version.full }}/scala-${{ matrix.scala-version }}/java-${{ matrix.java-version }}
171+
runs-on: ${{ matrix.os }}
172+
container:
173+
image: amd64/rust
174+
env:
175+
SPARK_LOCAL_IP: localhost
176+
steps:
177+
- uses: actions/checkout@v5
178+
- name: Setup Rust & Java toolchain
179+
uses: ./.github/actions/setup-builder
180+
with:
181+
rust-version: ${{env.RUST_VERSION}}
182+
jdk-version: ${{ matrix.java-version }}
183+
- name: Build Comet
184+
shell: bash
185+
run: |
186+
PROFILES="-Pspark-${{matrix.spark-version.short}} -Pscala-${{matrix.scala-version}}" make release
187+
- name: Setup Iceberg
188+
uses: ./.github/actions/setup-iceberg-rust-builder
189+
with:
190+
iceberg-version: ${{ matrix.iceberg-version.full }}
191+
- name: Run Iceberg Spark tests (Rust)
192+
run: |
193+
cd apache-iceberg
194+
rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
195+
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ./gradlew -DsparkVersions=${{ matrix.spark-version.short }} -DscalaVersion=${{ matrix.scala-version }} -DflinkVersions= -DkafkaVersions= \
196+
:iceberg-spark:iceberg-spark-${{ matrix.spark-version.short }}_${{ matrix.scala-version }}:test \
197+
-Pquick=true -x javadoc
198+
199+
iceberg-spark-extensions-rust:
200+
if: contains(github.event.pull_request.title, '[iceberg]')
201+
strategy:
202+
matrix:
203+
os: [ubuntu-24.04]
204+
java-version: [11, 17]
205+
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
206+
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.7'}]
207+
scala-version: ['2.13']
208+
fail-fast: false
209+
name: iceberg-spark-extensions-rust/${{ matrix.os }}/iceberg-${{ matrix.iceberg-version.full }}/spark-${{ matrix.spark-version.full }}/scala-${{ matrix.scala-version }}/java-${{ matrix.java-version }}
210+
runs-on: ${{ matrix.os }}
211+
container:
212+
image: amd64/rust
213+
env:
214+
SPARK_LOCAL_IP: localhost
215+
steps:
216+
- uses: actions/checkout@v5
217+
- name: Setup Rust & Java toolchain
218+
uses: ./.github/actions/setup-builder
219+
with:
220+
rust-version: ${{env.RUST_VERSION}}
221+
jdk-version: ${{ matrix.java-version }}
222+
- name: Build Comet
223+
shell: bash
224+
run: |
225+
PROFILES="-Pspark-${{matrix.spark-version.short}} -Pscala-${{matrix.scala-version}}" make release
226+
- name: Setup Iceberg
227+
uses: ./.github/actions/setup-iceberg-rust-builder
228+
with:
229+
iceberg-version: ${{ matrix.iceberg-version.full }}
230+
- name: Run Iceberg Spark extensions tests (Rust)
231+
run: |
232+
cd apache-iceberg
233+
rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
234+
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ./gradlew -DsparkVersions=${{ matrix.spark-version.short }} -DscalaVersion=${{ matrix.scala-version }} -DflinkVersions= -DkafkaVersions= \
235+
:iceberg-spark:iceberg-spark-extensions-${{ matrix.spark-version.short }}_${{ matrix.scala-version }}:test \
236+
-Pquick=true -x javadoc
237+
238+
iceberg-spark-runtime-rust:
239+
if: contains(github.event.pull_request.title, '[iceberg]')
240+
strategy:
241+
matrix:
242+
os: [ubuntu-24.04]
243+
java-version: [11, 17]
244+
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
245+
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.7'}]
246+
scala-version: ['2.13']
247+
fail-fast: false
248+
name: iceberg-spark-runtime-rust/${{ matrix.os }}/iceberg-${{ matrix.iceberg-version.full }}/spark-${{ matrix.spark-version.full }}/scala-${{ matrix.scala-version }}/java-${{ matrix.java-version }}
249+
runs-on: ${{ matrix.os }}
250+
container:
251+
image: amd64/rust
252+
env:
253+
SPARK_LOCAL_IP: localhost
254+
steps:
255+
- uses: actions/checkout@v5
256+
- name: Setup Rust & Java toolchain
257+
uses: ./.github/actions/setup-builder
258+
with:
259+
rust-version: ${{env.RUST_VERSION}}
260+
jdk-version: ${{ matrix.java-version }}
261+
- name: Build Comet
262+
shell: bash
263+
run: |
264+
PROFILES="-Pspark-${{matrix.spark-version.short}} -Pscala-${{matrix.scala-version}}" make release
265+
- name: Setup Iceberg
266+
uses: ./.github/actions/setup-iceberg-rust-builder
267+
with:
268+
iceberg-version: ${{ matrix.iceberg-version.full }}
269+
- name: Run Iceberg Spark runtime tests (Rust)
270+
run: |
271+
cd apache-iceberg
272+
rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
273+
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ./gradlew -DsparkVersions=${{ matrix.spark-version.short }} -DscalaVersion=${{ matrix.scala-version }} -DflinkVersions= -DkafkaVersions= \
274+
:iceberg-spark:iceberg-spark-runtime-${{ matrix.spark-version.short }}_${{ matrix.scala-version }}:integrationTest \
275+
-Pquick=true -x javadoc

.github/workflows/pr_build_linux.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ jobs:
103103
value: |
104104
org.apache.comet.CometFuzzTestSuite
105105
org.apache.comet.CometFuzzAggregateSuite
106+
org.apache.comet.CometFuzzIcebergSuite
106107
org.apache.comet.CometFuzzMathSuite
107108
org.apache.comet.DataGeneratorSuite
108109
- name: "shuffle"
@@ -124,6 +125,7 @@ jobs:
124125
org.apache.spark.sql.comet.ParquetDatetimeRebaseV2Suite
125126
org.apache.spark.sql.comet.ParquetEncryptionITCase
126127
org.apache.comet.exec.CometNativeReaderSuite
128+
org.apache.comet.CometIcebergNativeSuite
127129
- name: "exec"
128130
value: |
129131
org.apache.comet.exec.CometAggregateSuite

.github/workflows/pr_build_macos.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ jobs:
6868
value: |
6969
org.apache.comet.CometFuzzTestSuite
7070
org.apache.comet.CometFuzzAggregateSuite
71+
org.apache.comet.CometFuzzIcebergSuite
7172
org.apache.comet.CometFuzzMathSuite
7273
org.apache.comet.DataGeneratorSuite
7374
- name: "shuffle"
@@ -89,6 +90,7 @@ jobs:
8990
org.apache.spark.sql.comet.ParquetDatetimeRebaseV2Suite
9091
org.apache.spark.sql.comet.ParquetEncryptionITCase
9192
org.apache.comet.exec.CometNativeReaderSuite
93+
org.apache.comet.CometIcebergNativeSuite
9294
- name: "exec"
9395
value: |
9496
org.apache.comet.exec.CometAggregateSuite
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
name: Check Markdown Formatting
19+
20+
concurrency:
21+
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
22+
cancel-in-progress: true
23+
24+
on:
25+
pull_request:
26+
paths:
27+
- '**.md'
28+
29+
jobs:
30+
prettier-check:
31+
runs-on: ubuntu-latest
32+
steps:
33+
- uses: actions/checkout@v5
34+
35+
- name: Setup Node.js
36+
uses: actions/setup-node@v6
37+
with:
38+
node-version: '24'
39+
40+
- name: Install prettier
41+
run: npm install -g prettier
42+
43+
- name: Check markdown formatting
44+
run: |
45+
# if you encounter error, run prettier locally and commit changes using instructions at:
46+
#
47+
# https://datafusion.apache.org/comet/contributor-guide/development.html#how-to-format-md-document
48+
#
49+
prettier --check "**/*.md"

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@ under the License.
1919

2020
# Apache DataFusion Comet Changelog
2121

22-
Comprehensive changelogs for each release are available [here](dev/changelog).
22+
Comprehensive changelogs for each release are available [here](dev/changelog).

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Apache DataFusion Comet is a high-performance accelerator for Apache Spark, buil
3434
performance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the
3535
Spark ecosystem without requiring any code changes.
3636

37-
Comet also accelerates Apache Iceberg, when performing Parquet scans from Spark.
37+
Comet also accelerates Apache Iceberg, when performing Parquet scans from Spark.
3838

3939
[Apache DataFusion]: https://datafusion.apache.org
4040

@@ -44,7 +44,7 @@ Comet also accelerates Apache Iceberg, when performing Parquet scans from Spark.
4444

4545
Comet delivers a performance speedup for many queries, enabling faster data processing and shorter time-to-insights.
4646

47-
The following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format
47+
The following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format
4848
using a single executor with 8 cores. See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html)
4949
for details of the environment used for these benchmarks.
5050

@@ -66,16 +66,16 @@ The following charts shows how much Comet currently accelerates each query from
6666

6767
![](docs/source/_static/images/benchmark-results/0.11.0/tpch_queries_speedup_abs.png)
6868

69-
These benchmarks can be reproduced in any environment using the documentation in the
70-
[Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage
69+
These benchmarks can be reproduced in any environment using the documentation in the
70+
[Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage
7171
you to run your own benchmarks.
7272

7373
Results for our benchmark derived from TPC-DS are available in the [benchmarking guide](https://datafusion.apache.org/comet/contributor-guide/benchmark-results/tpc-ds.html).
7474

7575
## Use Commodity Hardware
7676

7777
Comet leverages commodity hardware, eliminating the need for costly hardware upgrades or
78-
specialized hardware accelerators, such as GPUs or FPGA. By maximizing the utilization of commodity hardware, Comet
78+
specialized hardware accelerators, such as GPUs or FPGA. By maximizing the utilization of commodity hardware, Comet
7979
ensures cost-effectiveness and scalability for your Spark deployments.
8080

8181
## Spark Compatibility
@@ -102,7 +102,7 @@ To get started with Apache DataFusion Comet, follow the
102102
[DataFusion Slack and Discord channels](https://datafusion.apache.org/contributor-guide/communication.html) to connect
103103
with other users, ask questions, and share your experiences with Comet.
104104

105-
Follow [Apache DataFusion Comet Overview](https://datafusion.apache.org/comet/user-guide/overview.html) to get more detailed information
105+
Follow [Apache DataFusion Comet Overview](https://datafusion.apache.org/comet/user-guide/overview.html) to get more detailed information
106106

107107
## Contributing
108108

benchmarks/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,4 +102,3 @@ $SPARK_HOME/bin/spark-submit \
102102
--queries /opt/datafusion-benchmarks/tpcds/queries-spark \
103103
--iterations 1
104104
```
105-

0 commit comments

Comments
 (0)