-
Notifications
You must be signed in to change notification settings - Fork 502
feat: add describeIndex method for detailed index metadata retrieval #5561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a describeIndex API method across Rust, Java, and Python to retrieve detailed metadata for a specific index by name, complementing the existing describe_indices method that returns all indices. The implementation provides index type, row coverage statistics, and distance metrics (for vector indices) without loading the full index into memory.
Key Changes:
- Added
describe_indexmethod as a convenience wrapper that filters by index name - Introduced
IndexDescriptionclass in Java to encapsulate index metadata - Implemented comprehensive tests across all three language bindings
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/lance-index/src/traits.rs | Added default implementation of describe_index trait method that filters describe_indices by name |
| rust/lance/src/dataset/tests/dataset_index.rs | Added comprehensive tests for BTree and Inverted indices with non-existent index handling |
| python/src/dataset.rs | Implemented Rust FFI binding that calls native describe_index and raises PyKeyError for missing indices |
| python/python/lance/dataset.py | Added Python API wrapper with docstring and error handling |
| python/python/tests/test_scalar_index.py | Added tests covering INVERTED, BITMAP, and BTREE indices plus error case |
| java/src/main/java/org/lance/index/IndexDescription.java | New class with builder pattern for index metadata (type, distance, row counts) |
| java/src/main/java/org/lance/Dataset.java | Added public describeIndex method with validation and locking |
| java/lance-jni/src/blocking_dataset.rs | Implemented JNI native method with index type detection and row count calculation |
| java/src/test/java/org/lance/ScalarIndexTest.java | Added test for BTree index description |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Can anyone help with what the problem is? What should I do to solve the problem? |
|
@majin1102 @yanghua could you please help review this one? |
majin1102
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Left one comment
rust/lance-index/src/traits.rs
Outdated
| index_name: &str, | ||
| ) -> Result<Option<Arc<dyn IndexDescription>>> { | ||
| let indices = self.describe_indices(None).await?; | ||
| Ok(indices.into_iter().find(|idx| idx.name() == index_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we still need to load the full index metadata — likely because the underlying protobuf spec doesn’t support partial loading. I’m not sure this API is worth exposing unless we have solid use cases that describe_indices not suitable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to push Lance's REST APIs, see: https://lance.org/format/namespace/integrations/gravitino/#option-1-native-lance-rest-support into practice. Indexes are also key components for the Lance table, so I raise it to support more metadata operations.
About It seems we still need to load the full index metadata, let me check whether we can optimize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to push Lance's REST APIs, see: https://lance.org/format/namespace/integrations/gravitino/#option-1-native-lance-rest-support
That's good to hear.
I’m not sure but I’m not opposed to it.
Thanks for this contribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static final String JSON_PROPERTY_DISTANCE_TYPE = "distance_type"; | ||
| @Nullable | ||
| private String distanceType; |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distanceType field in IndexDescription is never populated in the JNI implementation. While the test correctly expects it to be null for scalar indices, this field will also be null for vector indices where it should contain meaningful data (e.g., "l2", "cosine", "dot").
For vector indices, the distance type information is stored in the index details (as a protobuf Any) and would need to be extracted and set during the construction of the IndexDescription object in the JNI layer. Consider parsing this information from the index details for vector indices to make this field useful, or alternatively document that this field is not yet implemented and will always return null.
This pull request introduces a new "describe index" feature across the Rust, Java, and Python APIs, allowing users to retrieve detailed metadata and statistics about a specific index by name. The change includes the addition of a new
IndexDescriptionclass in Java, updates to the JNI and Rust layers to support the new API, and corresponding tests and documentation in all three languages.The most important changes are:
API Additions:
describeIndex(String indexName)method to the JavaDatasetclass, which returns a newIndexDescriptionobject containing metadata and statistics for a specific index. This is supported by a new native JNI method and Rust FFI implementation. [1] [2]describe_index(self, index_name: str)method to the PythonLanceDatasetclass, providing similar functionality for Python users. [1] [2]describe_indexmethod to the RustDatasetIndexExttrait, enabling retrieval of a single index's metadata without loading the full index.New Data Structures:
IndexDescriptionclass in Java to encapsulate index metadata (type, distance metric, indexed/unindexed row counts) with a builder pattern for construction.Testing and Validation:
Documentation and Imports:
These changes make it much easier for users to programmatically inspect the properties and coverage of individual indices in a Lance dataset.
Fixed: #5553