Skip to content

When two bulk import jobs extend the partition tree at once, one fails #6471

@patchwork01

Description

@patchwork01

Description / Background

If two bulk import jobs run on the same Sleeper table, and the table doesn't have enough leaf partitions for a bulk import job, a Spark driver will start for both jobs. Both drivers will generate sketches for the input data and create a transaction to extend the partition tree. Both will attempt to add their transaction to the state store. One will win the race to do that, and the other will fail because the transaction will not validate against the newly updated partition tree.

The Spark driver that failed to update the partition tree fails completely, when it could just load the updated partition tree and continue the bulk import.

Steps to reproduce

  1. Create a Sleeper table with a single partition
  2. Set a minimum required leaf partitions for bulk import
  3. Send two bulk import jobs for the table at the same time
  4. One of them will probably fail

Expected behaviour

If a bulk import Spark driver fails to extend the partition tree, it should retry after loading the updated partition tree. If the updated partition tree has enough leaf partitions, it should carry on with the bulk import.

Technical Notes / Implementation Details

This behaviour is in BulkImportDriver. We can probably reproduce it in a unit test in BulkImportDriverTest.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions