Skip to content

feat: merge-insert jni support#4172

Closed
kaori-seasons wants to merge 19 commits intolance-format:mainfrom
kaori-seasons:issue-4050
Closed

feat: merge-insert jni support#4172
kaori-seasons wants to merge 19 commits intolance-format:mainfrom
kaori-seasons:issue-4050

Conversation

@kaori-seasons
Copy link

@kaori-seasons kaori-seasons commented Jul 8, 2025

In order to support the Merge insert API of flink cdc streaming writing, it is necessary to support the relevant Java JNI.

MergeInsertBuilder

The MergeInsertBuilder class provides a fluent API for building merge insert operations, which allow you to merge new data with existing data in a Lance dataset.

Usage Example

import com.lancedb.lance.MergeInsertBuilder;
import com.lancedb.lance.MergeInsertResult;
import org.apache.arrow.vector.VectorSchemaRoot;

// Create a merge insert builder
MergeInsertBuilder builder = MergeInsertBuilder.create(dataset, "id");

// Configure the operation
MergeInsertBuilder configured = builder
    .whenMatchedUpdateAll("source.value > target.value")
    .whenNotMatchedInsertAll()
    .whenNotMatchedBySourceDelete("target.id > 100")
    .conflictRetries(5)
    .retryTimeout(1000L);

// Execute the merge insert
try (VectorSchemaRoot newData = createNewData()) {
    MergeInsertResult result = configured.execute(newData);

    System.out.println("Inserted: " + result.getNumInsertedRows());
    System.out.println("Updated: " + result.getNumUpdatedRows());
    System.out.println("Deleted: " + result.getNumDeletedRows());
}

Performance Considerations

  • Native memory is managed automatically
  • Arrow streams are handled efficiently through FFI
  • Async operations are executed on a dedicated runtime
  • Resource cleanup is guaranteed through AutoCloseable

@github-actions github-actions bot added the java label Jul 8, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2025

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@jackye1995 jackye1995 self-requested a review July 8, 2025 04:20
@kaori-seasons kaori-seasons changed the title 【ISSUE#4050】Merge-Insert JNI Support feat: Merge-Insert JNI Support Jul 8, 2025
@github-actions github-actions bot added the enhancement New feature or request label Jul 8, 2025
@kaori-seasons kaori-seasons changed the title feat: Merge-Insert JNI Support feat: merge-insert jni support Jul 8, 2025
@majin1102
Copy link
Contributor

Thanks for this contribution

The code has imported many unnecessary changes(maybe some AI framework?) and the comments are Chinese.
Please check it.

* @param schema dataset schema
* @param params write params
* @return Dataset
* @return Datase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this change imported?

}

/**
* 创建 merge insert 操作构建器(单列版本)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use English for comments?

* @param datasetUri the dataset uri
* @param allocator the buffer allocator
* @param root the vector schema root
* @param root the vector schema roo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this change happen?

@kaori-seasons
Copy link
Author

@majin1102 Thanks for your review. This is because I wrote an MCP server myself. It connects DeepWiki and Claude. I will try to fix it this weekend.


The `MergeInsertBuilder` class provides a fluent API for building merge insert operations, which allow you to merge new data with existing data in a Lance dataset. This is similar to SQL's MERGE statement.

#### Basic Usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentations should in general go to the website not in a separated README

}
```

#### API Reference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks very AI lol, I think we should just keep the basic usage.

}

#[no_mangle]
pub extern "system" fn Java_com_lancedb_lance_MergeInsertBuilder_createNativeBuilder(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to have these builder and boilerplate classes in Java, and when building the actual job we can pass it to rust, so we don't need to do this back and forth conversion.

) -> jobject {
let builder = unsafe { Box::from_raw(builder_handle as *mut LanceMergeInsertBuilder) };

// 从内存地址创建 ArrowArrayStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please remove Chinese comments

// limitations under the License.

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests should go under the same file of relevant modules, see other rust tests for reference.

@@ -0,0 +1,791 @@
// Copyright 2024 Lance Developers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wrong license header

@jackye1995
Copy link
Contributor

Thank you for working on this! Added some basic comments, and looks like there are some unintentional changes introduced in each file that need to be removed. Let me know when this is ready for another review!

}

#[no_mangle]
pub extern "system" fn Java_com_lancedb_lance_MergeInsertBuilder_executeNative(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should try to consolidate this with what is already there in BlockingDataset and make merge_insert another method of it.

undertaker86001 added 3 commits July 31, 2025 19:44
# Conflicts:
#	java/core/lance-jni/src/blocking_dataset.rs
#	java/core/src/main/java/com/lancedb/lance/Dataset.java
#	java/core/src/main/java/com/lancedb/lance/ReadOptions.java
Copy link
Contributor

@majin1102 majin1102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up. Give some suggestions.

PTAL

}

/**
* 执行 merge insert 操作(使用 BlockingDataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English please

private final BufferAllocator allocator;

// Configuration options
private String whenMatchedConfig = "do_nothing";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is just null a little better?

private long timeoutMillis = 0;

// Native method declarations
private static native Object executeWithConfigNative(
Copy link
Contributor

@majin1102 majin1102 Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This datasetHandle a little weird outside of Dataset. I think just passing the dataset works as well. What do you think
  2. Why return Object? I think just return the target class works

);

static {
System.loadLibrary("lance_jni");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest put the static block to the top


@Override
public void close() {
// No cleanup needed for the new design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not necessary, why not delete it and remove the interface

@jackye1995
Copy link
Contributor

Another contributor has implemented this feature in #4685

@jackye1995 jackye1995 closed this Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants