-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Use optional repeated string for query metadata ids argument #2820
base: main
Are you sure you want to change the base?
[BUG] Use optional repeated string for query metadata ids argument #2820
Conversation
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @Sicheng-Pan and the rest of your teammates on Graphite |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @Sicheng-Pan and the rest of your teammates on Graphite |
@@ -81,6 +81,12 @@ def get_metadata( | |||
) -> Sequence[MetadataEmbeddingRecord]: | |||
"""Query for embedding metadata.""" | |||
|
|||
if limit is not None and limit < 0: | |||
raise ValueError("Limit cannot be negative") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer a ChromaError with a specific error code, otherwise this manifests as a 500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a new OutOfBoundError
for this scenario
@@ -847,3 +848,508 @@ def test_0dim_embedding_validation() -> None: | |||
with pytest.raises(ValueError) as e: | |||
validate_embeddings(embds) | |||
assert "Expected each embedding in the embeddings to be a non-empty list" in str(e) | |||
|
|||
|
|||
def test_integrated(client: ClientAPI) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this new test exercising? can we add some documentation around the specific case / is it needed? Was this test breaking before, can we maybe add a more simple reproduction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a second thought, this test was too complex and was initially a generated test case. It was introduced to debug a previous PR. I'll remove it for now.
@@ -130,6 +130,10 @@ message MetadataEmbeddingRecord { | |||
UpdateMetadata metadata = 2; | |||
} | |||
|
|||
message UserIds { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment what user ids means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments
@@ -114,7 +114,7 @@ message QueryMetadataRequest { | |||
string segment_id = 1; | |||
Where where = 2; | |||
WhereDocument where_document = 3; | |||
repeated string ids = 4; | |||
optional UserIds ids = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is but is there anywhere else we expect option<vec>? from a proto? we should make the same change elsewhere then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that GetVectorsRequest
and QueryVectorsRequest
accept a sequence of string ids, similar to QueryMetadataRequest
. In the frontend, get_vectors
and query_vectors
in grpc_sement.py
accepts Optional[Sequence[str]]
for these fields, but the caller of these methods explicitly check for the case of empty ids, and the query node is expecting Vec<String>
as inputs. I'm unsure whether we should convert these into optional fields as well.
idl/makefile
Outdated
@@ -1,5 +1,8 @@ | |||
.PHONY: proto | |||
|
|||
PROTOC_GEN_GO_BIN_PATH := $(if $(shell which protoc-gen-go),$(shell which protoc-gen-go),"${GOPATH}/bin/protoc-gen-go") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer isolated changes to their own PRs as a general practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isolated the changes in a stacked PR
b456470
to
af1fbcf
Compare
af1fbcf
to
087bc97
Compare
Description of changes
Summarize the changes made by this PR.
ids
argument forQueryMetadataRequest
fromrepeated string
tooptional repeated string
. This matches the input type of the query node, which isOption<Vec<String>>
.Test plan
Tested with existing tests
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
All user facing APIs remain unchanged.