Replies: 4 comments 1 reply
-
|
A C++ API would be great. It should be possible for C bindings to be built on top of the dataset interface. Then a C++ wrapper with better ergonomics could be built on top of that. The Arrow C data interface can be used for serialization and deserialization of data. |
Beta Was this translation helpful? Give feedback.
-
|
Distributed index creation APIs would also be possible and a nice addition. There is some prototype for this in python with the The first step, kmeans, is currently fastest when run on the GPU, and tends to be relatively fast even with billions of rows (1-2 hrs) last I profiled. The second step, training the PQ subvector centroids, is also quite fast. The slowest step today, at scale, is quantizing (and potentially normalizing) all the vectors. This step is also embarrassingly parallel and so a good candidate for distributed compute. The final step is to reorder and write the data in kmeans partition order. This step is relatively fast compared to the first (kmeans) and third (quantization). |
Beta Was this translation helpful? Give feedback.
-
|
I'm not entirely sure what is meant by distributed vector search. Our vector search already exploits threaded parallelism and it's a thread safe API so you can issue multiple searches simultaneously. |
Beta Was this translation helpful? Give feedback.
-
|
If I understand correctly, the offical lance format API includes Python, Rust and Java. And lance-duckdb uses sth. like a C++ bindings over Rust. So my question, is there a plan for offical C or C++ APIs? Shall it based on the lance-duckdb implementation? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The current integration between the vectorized compute engine and Lance relies solely on JNI, which incurs some performance overhead1. Additionally, the existing API lacks support for distributed index building and vector search. We propose to initiate a Lance C++ project with two primary objectives: 1) Develop a native C++ API to facilitate seamless integration with native compute engines; 2) Enable distributed index construction and vector search capabilities.
Beta Was this translation helpful? Give feedback.
All reactions