Replies: 1 comment 1 reply
-
|
Hi NoahKus, it's common to see results like this. It usually isn't because Spark is 'faster,' but because there is hidden overhead when moving data between Spark and Comet. Here are the 3 main reasons this happens: The 'Moving' Tax: Every time Comet has to send data back to Spark (and vice versa), it costs time to convert and copy that data. If your query plan has many 'Fallback' nodes, these copies can make Comet slower than just staying in Spark. Small Data Batches: Native engines like DataFusion (Comet’s core) work best with huge chunks of data. If your Iceberg tables are sending very small batches of rows, Comet cannot use its full speed. Cloud Metadata: Since you are using AWS Glue, a lot of time is spent just 'finding' the data in the cloud before the actual processing starts. Spark and Comet handle this metadata differently, which can hide the native speed gains. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I tried switching many different configs for comet, like the native vs non-native iceberg, or sparkToColumnar, etc.
Is this a known limitation? Would providing more detailed configs used help?
Beta Was this translation helpful? Give feedback.
All reactions