-
Notifications
You must be signed in to change notification settings - Fork 2k
[CHORE]: Prevent users from editing the source attached function collection metadata #6016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Protect system-managed collection metadata from user updates Adds server-side guards around collection metadata to stop clients from setting or changing the Key Changes• Defined Affected Areas• go/pkg/sysdb/coordinator/table_catalog.go This summary was automatically generated by @propel-code-bot |
This comment has been minimized.
This comment has been minimized.
eabcfd5 to
462f142
Compare
This comment has been minimized.
This comment has been minimized.
collection metadata
462f142 to
120dbda
Compare
| def test_cannot_modify_system_metadata_key(basic_http_client: System) -> None: | ||
| """Test that users cannot modify the system-reserved metadata key on output collections""" | ||
| client = ClientCreator.from_system(basic_http_client) | ||
| client.reset() | ||
|
|
||
| # Create input collection and attach a function | ||
| input_collection = client.create_collection(name="input_collection") | ||
| input_collection.add(ids=["id1"], documents=["test"]) | ||
|
|
||
| attached_fn, created = input_collection.attach_function( | ||
| name="my_function", | ||
| function=RECORD_COUNTER_FUNCTION, | ||
| output_collection="output_collection", | ||
| params=None, | ||
| ) | ||
| assert attached_fn is not None | ||
| assert created is True | ||
|
|
||
| # Get the output collection | ||
| output_collection = client.get_collection(name="output_collection") | ||
|
|
||
| # Verify the output collection has the system metadata key | ||
| assert output_collection.metadata is not None | ||
| assert "chroma:source_attached_function_id" in output_collection.metadata | ||
|
|
||
| # Attempt to modify the system-reserved metadata key should fail | ||
| with pytest.raises( | ||
| ChromaError, match="cannot set or modify system-reserved metadata key" | ||
| ): | ||
| output_collection.modify( | ||
| metadata={"chroma:source_attached_function_id": "fake-value"} | ||
| ) | ||
|
|
||
| # Verify the metadata was not changed | ||
| output_collection = client.get_collection(name="output_collection") | ||
| assert output_collection.metadata["chroma:source_attached_function_id"] == str( | ||
| attached_fn.id | ||
| ) | ||
|
|
||
| # Clean up | ||
| input_collection.detach_function(attached_fn.name, delete_output_collection=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Logic] This is a great test for preventing direct modification of a system key. To make it more robust, it would be beneficial to also verify that system keys are preserved when modifying other metadata fields.
The current implementation in table_catalog.go might allow for the deletion of system keys if they are omitted from the metadata object in a modify call. Adding a test case for this would help catch that potential regression.
You could add a check like this after the existing pytest.raises block:
# Also attempt to modify non-system metadata, which should succeed
# and preserve the system key.
output_collection.modify(
metadata={"user_key": "user_value"}
)
# Verify the system metadata key was preserved and user key was added
output_collection = client.get_collection(name="output_collection")
assert output_collection.metadata is not None
assert "user_key" in output_collection.metadata
assert output_collection.metadata["user_key"] == "user_value"
assert "chroma:source_attached_function_id" in output_collection.metadata
assert output_collection.metadata["chroma:source_attached_function_id"] == str(
attached_fn.id
)Context for Agents
This is a great test for preventing direct modification of a system key. To make it more robust, it would be beneficial to also verify that system keys are preserved when modifying *other* metadata fields.
The current implementation in `table_catalog.go` might allow for the deletion of system keys if they are omitted from the `metadata` object in a `modify` call. Adding a test case for this would help catch that potential regression.
You could add a check like this after the existing `pytest.raises` block:
```python
# Also attempt to modify non-system metadata, which should succeed
# and preserve the system key.
output_collection.modify(
metadata={"user_key": "user_value"}
)
# Verify the system metadata key was preserved and user key was added
output_collection = client.get_collection(name="output_collection")
assert output_collection.metadata is not None
assert "user_key" in output_collection.metadata
assert output_collection.metadata["user_key"] == "user_value"
assert "chroma:source_attached_function_id" in output_collection.metadata
assert output_collection.metadata["chroma:source_attached_function_id"] == str(
attached_fn.id
)
```
File: chromadb/test/distributed/test_task_api.py
Line: 493
Description of changes
Summarize the changes made by this PR.
Test plan
Added a test in test_task_api.py
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?