Add SQL filter compilation for properties JSON#1291
Add SQL filter compilation for properties JSON#1291edwinyyyu wants to merge 2 commits intoMemMachine:mainfrom
Conversation
6ccc211 to
7715ed4
Compare
|
Need to support system-defined metadata too. Will combine with existing compile_sql_filter. |
7715ed4 to
a9e83e5
Compare
|
Discovered client tests were wrong all along -- filters silently failed before. Need to fix. |
842325b to
631cc96
Compare
cea64ef to
6d3fd90
Compare
6d3fd90 to
e1e6dd6
Compare
Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>
Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>
e1e6dd6 to
496802d
Compare
|
Breaking Changes should be delayed or moved to 0.4.0, IMHO. We're not strictly following SemVer and are currently using MINOR versions for breaking changes. |
|
Reviewed these changes with CoPilot. What ChangedThe breaking change is in The Core Change: Unknown Filter Fields Now Raise Instead of Silently Returning
|
| Scenario | Before | After | Action Required |
|---|---|---|---|
| Filter on user metadata field | {"time": "morning"} |
{"m.time": "morning"} |
Add m. prefix |
| Filter on unknown field shared across memories | Silently ignored | ValueError raised |
Fix or remove the filter |
NOT/AND/OR with an unknown field |
Unpredictably dropped | ValueError raised |
Fix the field name |
Filter on known system fields (created_at, tag, etc.) |
Works | Works | No change needed |
The change is externally visible to any code that builds filter_dict queries, whether through the Python SDK or the REST API's filter parameters. Any filter using a bare key for metadata (without the m. prefix) will now fail at runtime instead of silently misbehaving.
|
Given this is a breaking change, we need to ensure all downstream components are updated:
|
There was a problem hiding this comment.
Pull request overview
Adds a unified SQLAlchemy filter compiler that can target both raw JSON fields and a new type-tagged “properties JSON” encoding, and updates storage backends/tests to treat unknown filter fields as errors (breaking behavior change).
Changes:
- Introduce
properties_jsonencode/decode helpers for type-tagged JSON storage (incl. timezone-preserving datetimes). - Expand
compile_sql_filterto support"column","json", and"properties_json"field encodings and to error on unknown fields. - Update SQLAlchemy + Neo4j semantic storage and tests to use the new resolver contract and enforce unknown-field errors.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/server/src/memmachine_server/semantic_memory/storage/sqlalchemy_pgvector_semantic.py | Updates filter field resolution to return (ColumnElement, FieldEncoding) and raise on unknown fields. |
| packages/server/src/memmachine_server/semantic_memory/storage/neo4j_semantic_storage.py | Adds unknown-field validation for filters via a _KNOWN_FIELDS allowlist. |
| packages/server/src/memmachine_server/common/vector_store/vector_store.py | Adds config abstract property to VectorStoreCollection and fixes docstring return type. |
| packages/server/src/memmachine_server/common/properties_json.py | New module for type-tagged JSON property encoding/decoding with datetime timezone preservation. |
| packages/server/src/memmachine_server/common/filter/sql_filter_util.py | Reworks SQLAlchemy filter compilation with field encodings and typed JSON support; now raises on unknown fields. |
| packages/server/src/memmachine_server/common/episode_store/episode_sqlalchemy_store.py | Updates episode filter resolver to new (ColumnElement, FieldEncoding) contract and raises on unknown fields. |
| packages/server/server_tests/memmachine_server/semantic_memory/storage/test_semantic_storage.py | Updates semantic storage tests to expect errors on unknown filter fields. |
| packages/server/server_tests/memmachine_server/semantic_memory/storage/in_memory_semantic_storage.py | Aligns in-memory filter behavior to raise on unknown fields. |
| packages/server/server_tests/memmachine_server/common/test_properties_json.py | Adds unit tests for properties-json encode/decode round-trips and error cases. |
| packages/server/server_tests/memmachine_server/common/filter/test_sql_filter_util.py | Expands filter compiler tests for raw JSON and properties-json (SQLite + PostgreSQL). |
| packages/client/client_tests/test_integration_complete.py | Updates client integration tests to use m.-prefixed metadata keys in filters. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _cast_json_value( | ||
| column: ColumnElement, | ||
| value: bool | float | str, | ||
| ) -> ColumnElement: | ||
| """Cast a raw JSON path element based on the Python value type.""" | ||
| if isinstance(value, bool): | ||
| return column.as_boolean() | ||
| if isinstance(value, int): | ||
| return column.as_integer() |
There was a problem hiding this comment.
In _cast_json_value, the value parameter type annotation excludes int even though the implementation has an isinstance(value, int) branch. This makes the helper’s typing misleading and may cause type-checking issues; include int in the accepted union (and keep bool checked first to avoid bool-as-int confusion).
| if isinstance(expr, In): | ||
| if not expr.values: | ||
| return false() | ||
| first_value = expr.values[0] | ||
| type_name = PROPERTY_TYPE_TO_PROPERTY_TYPE_NAME[type(first_value)] | ||
| value_path = column[PROPERTY_VALUE_KEY] | ||
| type_check = column[PROPERTY_TYPE_KEY].as_string() == type_name | ||
| if isinstance(first_value, int): | ||
| return and_(type_check, value_path.as_integer().in_(expr.values)) | ||
| return and_(type_check, value_path.as_string().in_(expr.values)) |
There was a problem hiding this comment.
_compile_properties_json_leaf handles IN by casting everything that isn’t an int as a string, and it treats bool as int due to isinstance(True, int). This can produce incorrect SQL (e.g., boolean values cast as integers, and any future float/datetime IN values compared as strings). Consider (1) explicitly handling bool before int with .as_boolean(), and (2) validating/raising for unsupported IN element types rather than silently falling back to string casting.
| _KNOWN_FIELDS: frozenset[str] = frozenset( | ||
| { | ||
| "set_id", | ||
| "set", | ||
| "semantic_category_id", | ||
| "category_name", | ||
| "category", | ||
| "tag_id", | ||
| "tag", | ||
| "feature", | ||
| "feature_name", | ||
| "value", |
There was a problem hiding this comment.
_KNOWN_FIELDS includes several alias/synonym names (e.g. set, semantic_category_id, tag_id, feature_name) that don’t correspond to properties created on Feature nodes (which use set_id, category_name, tag, feature). Because _resolve_field_reference returns f"{alias}.{field}" for these, filters using the aliases will silently evaluate against missing properties and return no matches instead of behaving like other backends. Map these aliases to the actual Neo4j property names (or remove them and raise ValueError for unsupported aliases) to keep filter semantics consistent across storages.
The TypeScript client needs no updates, but users may need to update their filters. However, the original logic was unintuitive (not consistent with any common boolean logic system) and filters are not in any of the documentation, such that we can assume anyone dependent on the logic is depending on undefined behavior. The filter is submitted to the server as a string. More changes may be necessary to bring the client/server to a beneficial state, so I am not confident that documentation changes would be beneficial at the moment. The Python client does not even support the filter expressions, according to Claude. Only the TypeScript client supports them. |
Purpose of the change
Needed for SQLAlchemySegmentStore, SQLite VectorStore implementations.
Description
Made based on #1302 for easier rebasing, but this does not need it.
Add unified, well-tested way to encode/decode properties as SQL JSON to avoid repeated reimplementation.
Consumers of existing behavior are modified to use raw JSON encoding -- they may be moved to properties JSON in the future if appropriate. In particular, existing consumers of existing compile_sql_filter do not handle datetimes (no way of handling type collisions -- the new method uses a discriminated union).
Type of change
Breaking: filters will no longer ignore unknown keys.
It's confusing and potentially dangerous (especially with more complex filters including NOT, AND, OR) what treating unknown keys as None is supposed to mean. New behavior: If a filter is meant to be sent to multiple memories simultaneously, then all memories must support those fields.
It maybe would be okay if it obeyed SQL 3VL, but there is no case where a filter on one memory's field should affect the filter for another memory.
How Has This Been Tested?
Checklist
[Please delete options that are not relevant.]
Maintainer Checklist