-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory over-reservation when running native shuffle write #887
Comments
I found that the memory reserved by the native shuffle writer is based on guesses of data sizes and proportional to the number of partitions. For TPC-H query 10, the schema of shuffled record batches is:
The estimated size of each record batch is 4588544 bytes given the batch size of 8192. The shuffle repartitioner reserves a batch for each partition, given the partition number of 200 (the default value of We have multiple cores on each executor, each core runs a shuffle writer and reserves its own memory. On a 6-core worker instance, the amount of reserved memory will be 5 GB. If we tune the number of shuffle partitions for large ETL, the native shuffle writer has to reserve more memory. For partition number = 2000, the total reserved memory will be 50 GB. This is usually more than the total memory of the worker instance. Even if we allocate most of the worker instance's memory to comet, we'll still run into problems. Maybe a better approach is to always reserve |
I'm working on this. |
That's great to hear! A few days ago, I tried to tackle this issue myself. You can see my approach here. However, I suspect my solution might not be the most efficient, as it involved changing the initial size of array builders, which could lead to the cost of reallocations. I'm really looking forward to seeing a better solution! |
Describe the bug
We've seen this exception when running queries with
spark.comet.exec.shuffle.mode=native
:This happens when running TPC-H Query 10 with scale factor = 1. The memory allocated for comet is quite small but it should not prevent the query from finishing.
Steps to reproduce
Running TPC-H query 10 on a Spark cluster. The detailed environment and spark configurations are listed in Additional context.
Expected behavior
All TPC-H queries should finish successfully.
Additional context
The problem was produced on a self-deployed K8S Spark cluster on AWS.
Here are relevant spark configurations:
The text was updated successfully, but these errors were encountered: