I tried benchmarking TPC-DS for Spark vs Datafusion Comet on AWS Glue Catalog Iceberg Tables and Spark was faster. #3199

NoahKus · 2026-01-15T22:53:21Z

NoahKus
Jan 15, 2026

I tried switching many different configs for comet, like the native vs non-native iceberg, or sparkToColumnar, etc.

Is this a known limitation? Would providing more detailed configs used help?

vigneshsiva11 · 2026-01-30T17:59:47Z

vigneshsiva11
Jan 30, 2026

Hi NoahKus, it's common to see results like this. It usually isn't because Spark is 'faster,' but because there is hidden overhead when moving data between Spark and Comet.

Here are the 3 main reasons this happens:

The 'Moving' Tax: Every time Comet has to send data back to Spark (and vice versa), it costs time to convert and copy that data. If your query plan has many 'Fallback' nodes, these copies can make Comet slower than just staying in Spark.

Small Data Batches: Native engines like DataFusion (Comet’s core) work best with huge chunks of data. If your Iceberg tables are sending very small batches of rows, Comet cannot use its full speed.

Cloud Metadata: Since you are using AWS Glue, a lot of time is spent just 'finding' the data in the cloud before the actual processing starts. Spark and Comet handle this metadata differently, which can hide the native speed gains.

1 reply

NoahKus Jan 30, 2026
Author

Are there configs I should particularly focus on to get iceberg comet performant?
https://datafusion.apache.org/comet/user-guide/0.13/configs.html
I can't tell which makes the greatest difference, and flipping configs on / off and tuning batch-sizes and thread counts can become complicated pretty quickly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I tried benchmarking TPC-DS for Spark vs Datafusion Comet on AWS Glue Catalog Iceberg Tables and Spark was faster. #3199

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

I tried benchmarking TPC-DS for Spark vs Datafusion Comet on AWS Glue Catalog Iceberg Tables and Spark was faster. #3199

Uh oh!

NoahKus Jan 15, 2026

Replies: 1 comment · 1 reply

Uh oh!

vigneshsiva11 Jan 30, 2026

Uh oh!

NoahKus Jan 30, 2026 Author

NoahKus
Jan 15, 2026

Replies: 1 comment 1 reply

vigneshsiva11
Jan 30, 2026

NoahKus Jan 30, 2026
Author