feat: support and optimize Spark MERGE INTO #172

zhangyue19921010 · 2026-01-12T07:15:47Z

Background

Currently, when Spark writes data into a Lance table using the Merge into syntax, it performs data shuffle with Segment_id as the shuffle key and conducts concurrent data writing.
During the join operation between the source data and the Lance target table data, the source data is split into three categories: insert data, update data and delete data. For the insert data, the segment_id field in the intermediate result dataset of the join operation is assigned a null value. This results in a shuffle operation based on null values, which shuffles all insert data into a single task and consequently causes data skew (All insert data goes into the same write task, with shuffling based on null values).

Solution

Attempt to modify the requiredDistribution() method

Then reconstruct the expression of Distributions.clustered(new NamedReference[] {segmentId}), to achieve the logic that a random number is used as the random value when the segment_id value is null.

Before

After

github-actions · 2026-01-12T07:16:04Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

fangbo · 2026-01-12T07:37:30Z

Great optimization !

support spark merge into

4a0d04f

github-actions bot added the enhancement New feature or request label Jan 12, 2026

zhangyue19921010 changed the title ~~feat: Support and Optimize spark merge into~~ feat: support and optimize Spark MERGE INTO Jan 12, 2026

zhangyue19921010 added 2 commits January 12, 2026 15:39

fmt

16daf2d

fmt

895876d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support and optimize Spark MERGE INTO #172

feat: support and optimize Spark MERGE INTO #172

zhangyue19921010 commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

fangbo commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: support and optimize Spark MERGE INTO #172

Are you sure you want to change the base?

feat: support and optimize Spark MERGE INTO #172

Conversation

zhangyue19921010 commented Jan 12, 2026

Background

Solution

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

fangbo commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants