-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Description / Background
If two bulk import jobs run on the same Sleeper table, and the table doesn't have enough leaf partitions for a bulk import job, a Spark driver will start for both jobs. Both drivers will generate sketches for the input data and create a transaction to extend the partition tree. Both will attempt to add their transaction to the state store. One will win the race to do that, and the other will fail because the transaction will not validate against the newly updated partition tree.
The Spark driver that failed to update the partition tree fails completely, when it could just load the updated partition tree and continue the bulk import.
Steps to reproduce
- Create a Sleeper table with a single partition
- Set a minimum required leaf partitions for bulk import
- Send two bulk import jobs for the table at the same time
- One of them will probably fail
Expected behaviour
If a bulk import Spark driver fails to extend the partition tree, it should retry after loading the updated partition tree. If the updated partition tree has enough leaf partitions, it should carry on with the bulk import.
Technical Notes / Implementation Details
This behaviour is in BulkImportDriver. We can probably reproduce it in a unit test in BulkImportDriverTest.