feat: merge-insert jni support#4172
feat: merge-insert jni support#4172kaori-seasons wants to merge 19 commits intolance-format:mainfrom
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
|
Thanks for this contribution The code has imported many unnecessary changes(maybe some AI framework?) and the comments are Chinese. |
| * @param schema dataset schema | ||
| * @param params write params | ||
| * @return Dataset | ||
| * @return Datase |
There was a problem hiding this comment.
How is this change imported?
| } | ||
|
|
||
| /** | ||
| * 创建 merge insert 操作构建器(单列版本) |
There was a problem hiding this comment.
Use English for comments?
| * @param datasetUri the dataset uri | ||
| * @param allocator the buffer allocator | ||
| * @param root the vector schema root | ||
| * @param root the vector schema roo |
There was a problem hiding this comment.
How did this change happen?
|
@majin1102 Thanks for your review. This is because I wrote an MCP server myself. It connects DeepWiki and Claude. I will try to fix it this weekend. |
|
|
||
| The `MergeInsertBuilder` class provides a fluent API for building merge insert operations, which allow you to merge new data with existing data in a Lance dataset. This is similar to SQL's MERGE statement. | ||
|
|
||
| #### Basic Usage |
There was a problem hiding this comment.
documentations should in general go to the website not in a separated README
| } | ||
| ``` | ||
|
|
||
| #### API Reference |
There was a problem hiding this comment.
this looks very AI lol, I think we should just keep the basic usage.
| } | ||
|
|
||
| #[no_mangle] | ||
| pub extern "system" fn Java_com_lancedb_lance_MergeInsertBuilder_createNativeBuilder( |
There was a problem hiding this comment.
I think it is better to have these builder and boilerplate classes in Java, and when building the actual job we can pass it to rust, so we don't need to do this back and forth conversion.
| ) -> jobject { | ||
| let builder = unsafe { Box::from_raw(builder_handle as *mut LanceMergeInsertBuilder) }; | ||
|
|
||
| // 从内存地址创建 ArrowArrayStream |
There was a problem hiding this comment.
nit: please remove Chinese comments
| // limitations under the License. | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
tests should go under the same file of relevant modules, see other rust tests for reference.
| @@ -0,0 +1,791 @@ | |||
| // Copyright 2024 Lance Developers. | |||
There was a problem hiding this comment.
nit: wrong license header
|
Thank you for working on this! Added some basic comments, and looks like there are some unintentional changes introduced in each file that need to be removed. Let me know when this is ready for another review! |
java/core/lance-jni/src/lib.rs
Outdated
| } | ||
|
|
||
| #[no_mangle] | ||
| pub extern "system" fn Java_com_lancedb_lance_MergeInsertBuilder_executeNative( |
There was a problem hiding this comment.
we should try to consolidate this with what is already there in BlockingDataset and make merge_insert another method of it.
# Conflicts: # java/core/lance-jni/src/blocking_dataset.rs # java/core/src/main/java/com/lancedb/lance/Dataset.java # java/core/src/main/java/com/lancedb/lance/ReadOptions.java
majin1102
left a comment
There was a problem hiding this comment.
Thanks for following up. Give some suggestions.
PTAL
| } | ||
|
|
||
| /** | ||
| * 执行 merge insert 操作(使用 BlockingDataset) |
| private final BufferAllocator allocator; | ||
|
|
||
| // Configuration options | ||
| private String whenMatchedConfig = "do_nothing"; |
There was a problem hiding this comment.
Is just null a little better?
| private long timeoutMillis = 0; | ||
|
|
||
| // Native method declarations | ||
| private static native Object executeWithConfigNative( |
There was a problem hiding this comment.
- This datasetHandle a little weird outside of Dataset. I think just passing the dataset works as well. What do you think
- Why return Object? I think just return the target class works
| ); | ||
|
|
||
| static { | ||
| System.loadLibrary("lance_jni"); |
There was a problem hiding this comment.
I suggest put the static block to the top
|
|
||
| @Override | ||
| public void close() { | ||
| // No cleanup needed for the new design |
There was a problem hiding this comment.
If not necessary, why not delete it and remove the interface
|
Another contributor has implemented this feature in #4685 |
In order to support the Merge insert API of flink cdc streaming writing, it is necessary to support the relevant Java JNI.
MergeInsertBuilder
The
MergeInsertBuilderclass provides a fluent API for building merge insert operations, which allow you to merge new data with existing data in a Lance dataset.Usage Example
Performance Considerations
AutoCloseable