chore: Add Iceberg TPC-H benchmarking scripts #3294

andygrove · 2026-01-27T01:34:22Z

Summary

Add scripts to benchmark TPC-H queries against Iceberg tables using Comet's native iceberg-rust integration
create-iceberg-tpch.py: Convert Parquet TPC-H data to Iceberg tables
tpcbench-iceberg.py: Run TPC-H queries against Iceberg catalog tables
comet-tpch-iceberg.sh: Shell script to run the benchmark with Comet
Update README.md with Iceberg benchmarking documentation

Test plan

Run create-iceberg-tpch.py to create Iceberg tables from Parquet data
Run comet-tpch-iceberg.sh and verify CometIcebergNativeScanExec appears in plans
Compare benchmark results between Parquet and Iceberg formats

🤖 Generated with Claude Code

Add scripts to benchmark TPC-H queries against Iceberg tables using Comet's native iceberg-rust integration: - create-iceberg-tpch.py: Convert Parquet TPC-H data to Iceberg tables - tpcbench-iceberg.py: Run TPC-H queries against Iceberg catalog tables - comet-tpch-iceberg.sh: Shell script to run the benchmark with Comet Also updates README.md with Iceberg benchmarking documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…h.py Merge tpcbench-iceberg.py into tpcbench.py using mutually exclusive args: - --data for Parquet files - --catalog/--database for Iceberg tables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

andygrove · 2026-01-27T20:45:52Z

@mbutrovich I can now run TPC-H w/ Iceberg native scan locally

Resolve conflict in tpcbench.py by combining: - Upstream: --format and --options for multiple file formats - Branch: --catalog and --database for Iceberg tables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mbutrovich · 2026-01-27T21:01:25Z

dev/benchmarks/README.md

+
+$SPARK_HOME/bin/spark-submit \
+    --master $SPARK_MASTER \
+    --jars $ICEBERG_JAR \


It should work either way, but this doesn't match the usage in create-iceberg-tpch.py. There we use --packages, here we're defining the jar. Both should work, I think, but maybe best to be consistent.

mbutrovich · 2026-01-27T21:02:01Z

dev/benchmarks/README.md

+    --conf spark.cores.max=8 \
+    --conf spark.executor.memory=16g \
+    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.local.type=hadoop \


This hardcodes the catalog. Above you have ICEBERG_CATALOG=${ICEBERG_CATALOG:-local}. I'd be consistent.

mbutrovich · 2026-01-27T21:02:59Z

dev/benchmarks/README.md

+    --conf spark.sql.catalog.local.warehouse=$ICEBERG_WAREHOUSE \
+    create-iceberg-tpch.py \
+    --parquet-path $TPCH_DATA \
+    --catalog local \


Same hardcoded catalog.

codecov-commenter · 2026-01-27T21:18:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.95%. Comparing base (f09f8af) to head (07e1fa3).
⚠️ Report is 904 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3294      +/-   ##
============================================
+ Coverage     56.12%   59.95%   +3.82%     
- Complexity      976     1473     +497     
============================================
  Files           119      175      +56     
  Lines         11743    16167    +4424     
  Branches       2251     2682     +431     
============================================
+ Hits           6591     9693    +3102     
- Misses         4012     5126    +1114     
- Partials       1140     1348     +208

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Use --packages instead of --jars for table creation to match create-iceberg-tpch.py usage - Use $ICEBERG_CATALOG variable instead of hardcoding 'local' in spark.sql.catalog config to be consistent with comet-tpch-iceberg.sh - Clarify that JAR download is only needed for benchmark execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

andygrove · 2026-01-28T16:19:15Z

Thanks @mbutrovich! I pushed a commit to address the feedback.

andygrove changed the title ~~[WIP] Add Iceberg TPC-H benchmarking scripts~~ chore: Add Iceberg TPC-H benchmarking scripts [WIP] Jan 27, 2026

andygrove and others added 3 commits January 27, 2026 13:32

fix

c7ddf70

fix

538006f

Consolidate Parquet and Iceberg benchmark scripts into single tpcbenc…

8f3039c

…h.py Merge tpcbench-iceberg.py into tpcbench.py using mutually exclusive args: - --data for Parquet files - --catalog/--database for Iceberg tables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

andygrove changed the title ~~chore: Add Iceberg TPC-H benchmarking scripts [WIP]~~ chore: Add Iceberg TPC-H benchmarking scripts Jan 27, 2026

andygrove marked this pull request as ready for review January 27, 2026 20:45

Merge apache/main into iceberg-benchmark-scripts

375d912

Resolve conflict in tpcbench.py by combining: - Upstream: --format and --options for multiple file formats - Branch: --catalog and --database for Iceberg tables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mbutrovich reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Add Iceberg TPC-H benchmarking scripts #3294

chore: Add Iceberg TPC-H benchmarking scripts #3294

Uh oh!

andygrove commented Jan 27, 2026

Uh oh!

andygrove commented Jan 27, 2026

Uh oh!

mbutrovich Jan 27, 2026

Uh oh!

mbutrovich Jan 27, 2026 •

edited

Loading

Uh oh!

mbutrovich Jan 27, 2026

Uh oh!

codecov-commenter commented Jan 27, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: Add Iceberg TPC-H benchmarking scripts #3294

Are you sure you want to change the base?

chore: Add Iceberg TPC-H benchmarking scripts #3294

Uh oh!

Conversation

andygrove commented Jan 27, 2026

Summary

Test plan

Uh oh!

andygrove commented Jan 27, 2026

Uh oh!

mbutrovich Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

mbutrovich Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbutrovich Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mbutrovich Jan 27, 2026 •

edited

Loading

codecov-commenter commented Jan 27, 2026 •

edited

Loading