Add SQL filter compilation for properties JSON by edwinyyyu · Pull Request #1291 · MemMachine/MemMachine

edwinyyyu · 2026-04-01T21:30:01Z

Purpose of the change

Needed for SQLAlchemySegmentStore, SQLite VectorStore implementations.

Description

Made based on #1302 for easier rebasing, but this does not need it.

Add unified, well-tested way to encode/decode properties as SQL JSON to avoid repeated reimplementation.

Consumers of existing behavior are modified to use raw JSON encoding -- they may be moved to properties JSON in the future if appropriate. In particular, existing consumers of existing compile_sql_filter do not handle datetimes (no way of handling type collisions -- the new method uses a discriminated union).

Type of change

Breaking: filters will no longer ignore unknown keys.

It's confusing and potentially dangerous (especially with more complex filters including NOT, AND, OR) what treating unknown keys as None is supposed to mean. New behavior: If a filter is meant to be sent to multiple memories simultaneously, then all memories must support those fields.

It maybe would be okay if it obeyed SQL 3VL, but there is no case where a filter on one memory's field should affect the filter for another memory.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (does not change functionality, e.g., code style improvements, linting)
Documentation update
Project Maintenance (updates to build scripts, CI, etc., that do not affect the main project)
Security (improves security without changing functionality)

How Has This Been Tested?

Checklist

[Please delete options that are not relevant.]

Maintainer Checklist

Confirmed all checks passed
Contributor has signed the commit(s)
Reviewed the code
Run, Tested, and Verified the change(s) work as expected

edwinyyyu · 2026-04-01T22:39:08Z

Need to support system-defined metadata too. Will combine with existing compile_sql_filter.

edwinyyyu · 2026-04-02T00:52:39Z

Discovered client tests were wrong all along -- filters silently failed before. Need to fix.

Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>

sscargal · 2026-04-07T22:25:20Z

Breaking Changes should be delayed or moved to 0.4.0, IMHO. We're not strictly following SemVer and are currently using MINOR versions for breaking changes.

sscargal · 2026-04-07T22:28:48Z

Reviewed these changes with CoPilot.

What Changed

The breaking change is in compile_sql_filter and all _resolve_field callbacks across the codebase.

The Core Change: Unknown Filter Fields Now Raise Instead of Silently Returning `None`

Before — unknown filter fields were silently ignored. The resolver returned (None, False), and compile_sql_filter would swallow it, treating a filter on an unknown field as if it didn't exist (returning None):

# resolve_field returned (None, False) for unknown fields
def _resolve_episode_field(field: str) -> tuple[Any, bool] | tuple[None, bool]:
    ...
    return None, False  # ← unknown field: silently ignored

# compile_sql_filter then treated None as "no filter"
def compile_sql_filter(expr, resolve_field) -> ColumnElement[bool] | None:
    if isinstance(expr, And):
        left = compile_sql_filter(expr.left, resolve_field)
        right = compile_sql_filter(expr.right, resolve_field)
        if left is None:   # ← if one side was unknown, just skip it
            return right
        if right is None:
            return left
        return and_(left, right)

After — unknown fields raise ValueError, and compile_sql_filter always returns a valid expression (never None):

# resolve_field now raises for unknown fields
def _resolve_episode_field(field: str) -> tuple[ColumnElement, FieldEncoding]:
    ...
    raise ValueError(f"Unknown filter field: {field!r}")  # ← explicit error

# compile_sql_filter now always returns a clause, never None
def compile_sql_filter(expr, resolve_field) -> ColumnElement[bool]:
    if isinstance(expr, And):
        return and_(
            compile_sql_filter(expr.left, resolve_field),
            compile_sql_filter(expr.right, resolve_field),
        )

Before vs. After: Concrete Code Example

Scenario: Filtering on an unknown field

# BEFORE: No error. The unknown field is silently dropped.
# If you have two memories and one doesn't know about "category",
# the filter just vanishes — you get ALL records instead of an error.

results = memory.search(
    "What is my profession?",
    filter_dict={"category": "profession"},  # ← "category" unknown to this memory
    limit=10,
)
# Silently returns everything — dangerous, especially with NOT/AND/OR!
# e.g. NOT(unknown_field = x) would have matched ALL rows

# AFTER: Raises ValueError immediately — no silent data leak.

results = memory.search(
    "What is my profession?",
    filter_dict={"category": "profession"},  # ← unknown field
    limit=10,
)
# Raises: ValueError: Unknown filter field: 'category'

The integration tests in the PR demonstrate this directly:

# BEFORE: Test verified NO error occurred
await semantic_storage.get_feature_set(
    filter_expr=_expr("missing_column IN (foo)"),
)

# AFTER: Test now asserts an error IS raised
with pytest.raises(ValueError, match="Unknown filter field"):
    await semantic_storage.get_feature_set(
        filter_expr=_expr("missing_column IN (foo)"),
    )

Does the Breaking Change Require API Changes?

Yes — callers using filter_dict with unqualified or unrecognized field names will break. The PR already includes one such API-visible fix in the integration test client:

# BEFORE: bare key
filter_dict={"time": "morning"}
filter_dict={"category": "profession"}

# AFTER: must be prefixed with "m." to target the JSON metadata column
filter_dict={"m.time": "morning"}
filter_dict={"m.category": "profession"}

The m. (or metadata.) prefix is required for user-defined metadata fields. Without it, the resolver doesn't recognize the field and now raises a ValueError instead of silently passing through.

Summary of Required API Changes for Callers

Scenario	Before	After	Action Required
Filter on user metadata field	`{"time": "morning"}`	`{"m.time": "morning"}`	Add `m.` prefix
Filter on unknown field shared across memories	Silently ignored	`ValueError` raised	Fix or remove the filter
`NOT`/`AND`/`OR` with an unknown field	Unpredictably dropped	`ValueError` raised	Fix the field name
Filter on known system fields (`created_at`, `tag`, etc.)	Works	Works	No change needed

The change is externally visible to any code that builds filter_dict queries, whether through the Python SDK or the REST API's filter parameters. Any filter using a bare key for metadata (without the m. prefix) will now fail at runtime instead of silently misbehaving.

sscargal · 2026-04-07T22:37:26Z

Given this is a breaking change, we need to ensure all downstream components are updated:

Update the Python Client SDK code (Update client libraries to support strict filter field validation and metadata prefixing for PR #1291 #1305)
Update the Typescript Client SDK code
Update Examples
Update documentation
Ensure all SDK Docstrings contain the updated information so the auto-generated openapi.json used by the documentation is successfully updated and accurate (Update OpenAPI documentation and docstrings for strict filter field and prefixing changes (PR #1291) #1306)
Server, Client, Examples, and Documentation must be merged in the same Release to avoid breaking the front-end and back-end (causing incompatibility issues)

Copilot

Pull request overview

Adds a unified SQLAlchemy filter compiler that can target both raw JSON fields and a new type-tagged “properties JSON” encoding, and updates storage backends/tests to treat unknown filter fields as errors (breaking behavior change).

Changes:

Introduce properties_json encode/decode helpers for type-tagged JSON storage (incl. timezone-preserving datetimes).
Expand compile_sql_filter to support "column", "json", and "properties_json" field encodings and to error on unknown fields.
Update SQLAlchemy + Neo4j semantic storage and tests to use the new resolver contract and enforce unknown-field errors.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
packages/server/src/memmachine_server/semantic_memory/storage/sqlalchemy_pgvector_semantic.py	Updates filter field resolution to return `(ColumnElement, FieldEncoding)` and raise on unknown fields.
packages/server/src/memmachine_server/semantic_memory/storage/neo4j_semantic_storage.py	Adds unknown-field validation for filters via a `_KNOWN_FIELDS` allowlist.
packages/server/src/memmachine_server/common/vector_store/vector_store.py	Adds `config` abstract property to `VectorStoreCollection` and fixes docstring return type.
packages/server/src/memmachine_server/common/properties_json.py	New module for type-tagged JSON property encoding/decoding with datetime timezone preservation.
packages/server/src/memmachine_server/common/filter/sql_filter_util.py	Reworks SQLAlchemy filter compilation with field encodings and typed JSON support; now raises on unknown fields.
packages/server/src/memmachine_server/common/episode_store/episode_sqlalchemy_store.py	Updates episode filter resolver to new `(ColumnElement, FieldEncoding)` contract and raises on unknown fields.
packages/server/server_tests/memmachine_server/semantic_memory/storage/test_semantic_storage.py	Updates semantic storage tests to expect errors on unknown filter fields.
packages/server/server_tests/memmachine_server/semantic_memory/storage/in_memory_semantic_storage.py	Aligns in-memory filter behavior to raise on unknown fields.
packages/server/server_tests/memmachine_server/common/test_properties_json.py	Adds unit tests for properties-json encode/decode round-trips and error cases.
packages/server/server_tests/memmachine_server/common/filter/test_sql_filter_util.py	Expands filter compiler tests for raw JSON and properties-json (SQLite + PostgreSQL).
packages/client/client_tests/test_integration_complete.py	Updates client integration tests to use `m.`-prefixed metadata keys in filters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T22:44:33Z

packages/server/src/memmachine_server/common/filter/sql_filter_util.py

+def _cast_json_value(
+    column: ColumnElement,
+    value: bool | float | str,
+) -> ColumnElement:
+    """Cast a raw JSON path element based on the Python value type."""
+    if isinstance(value, bool):
+        return column.as_boolean()
+    if isinstance(value, int):
+        return column.as_integer()


In _cast_json_value, the value parameter type annotation excludes int even though the implementation has an isinstance(value, int) branch. This makes the helper’s typing misleading and may cause type-checking issues; include int in the accepted union (and keep bool checked first to avoid bool-as-int confusion).

Copilot · 2026-04-07T22:44:33Z

packages/server/src/memmachine_server/common/filter/sql_filter_util.py

+    if isinstance(expr, In):
+        if not expr.values:
+            return false()
+        first_value = expr.values[0]
+        type_name = PROPERTY_TYPE_TO_PROPERTY_TYPE_NAME[type(first_value)]
+        value_path = column[PROPERTY_VALUE_KEY]
+        type_check = column[PROPERTY_TYPE_KEY].as_string() == type_name
+        if isinstance(first_value, int):
+            return and_(type_check, value_path.as_integer().in_(expr.values))
+        return and_(type_check, value_path.as_string().in_(expr.values))


_compile_properties_json_leaf handles IN by casting everything that isn’t an int as a string, and it treats bool as int due to isinstance(True, int). This can produce incorrect SQL (e.g., boolean values cast as integers, and any future float/datetime IN values compared as strings). Consider (1) explicitly handling bool before int with .as_boolean(), and (2) validating/raising for unsupported IN element types rather than silently falling back to string casting.

Copilot · 2026-04-07T22:44:33Z

packages/server/src/memmachine_server/semantic_memory/storage/neo4j_semantic_storage.py

+    _KNOWN_FIELDS: frozenset[str] = frozenset(
+        {
+            "set_id",
+            "set",
+            "semantic_category_id",
+            "category_name",
+            "category",
+            "tag_id",
+            "tag",
+            "feature",
+            "feature_name",
+            "value",


_KNOWN_FIELDS includes several alias/synonym names (e.g. set, semantic_category_id, tag_id, feature_name) that don’t correspond to properties created on Feature nodes (which use set_id, category_name, tag, feature). Because _resolve_field_reference returns f"{alias}.{field}" for these, filters using the aliases will silently evaluate against missing properties and return no matches instead of behaving like other backends. Map these aliases to the actual Neo4j property names (or remove them and raise ValueError for unsupported aliases) to keep filter semantics consistent across storages.

edwinyyyu · 2026-04-08T00:49:30Z

Given this is a breaking change, we need to ensure all downstream components are updated:

* [ ]  Update the Python Client SDK code ([Update client libraries to support strict filter field validation and metadata prefixing for PR #1291 #1305](https://github.com/MemMachine/MemMachine/issues/1305))

* [ ]  Update the Typescript Client SDK code

* [ ]  Update Examples

* [ ]  Update documentation

* [ ]  Ensure all SDK Docstrings contain the updated information so the auto-generated `openapi.json` used by the documentation is successfully updated and accurate ([Update OpenAPI documentation and docstrings for strict filter field and prefixing changes (PR #1291) #1306](https://github.com/MemMachine/MemMachine/issues/1306))

* [ ]  Server, Client, Examples, and Documentation must be merged in the same Release to avoid breaking the front-end and back-end (causing incompatibility issues)

The TypeScript client needs no updates, but users may need to update their filters. However, the original logic was unintuitive (not consistent with any common boolean logic system) and filters are not in any of the documentation, such that we can assume anyone dependent on the logic is depending on undefined behavior. The filter is submitted to the server as a string.

More changes may be necessary to bring the client/server to a beneficial state, so I am not confident that documentation changes would be beneficial at the moment.

The Python client does not even support the filter expressions, according to Claude. Only the TypeScript client supports them.

edwinyyyu force-pushed the sql_json_properties branch from 6ccc211 to 7715ed4 Compare April 1, 2026 22:02

edwinyyyu marked this pull request as draft April 1, 2026 22:38

edwinyyyu force-pushed the sql_json_properties branch from 7715ed4 to a9e83e5 Compare April 2, 2026 00:19

edwinyyyu marked this pull request as ready for review April 2, 2026 00:20

edwinyyyu changed the title ~~Add SQL JSON properties utilities~~ Add SQL filter compilation for properties JSON Apr 2, 2026

This was referenced Apr 2, 2026

Embedded VectorStore Implementation(s) #1292

Draft

Define APIs and models for new episodic/event memory #1199

Open

edwinyyyu force-pushed the sql_json_properties branch from 842325b to 631cc96 Compare April 2, 2026 05:21

edwinyyyu modified the milestones: v0.3.5, v0.3.4 Apr 2, 2026

edwinyyyu requested review from malatewang, o-love and sscargal April 2, 2026 23:49

edwinyyyu force-pushed the sql_json_properties branch 3 times, most recently from cea64ef to 6d3fd90 Compare April 6, 2026 20:30

edwinyyyu requested a review from jealous April 6, 2026 22:38

edwinyyyu force-pushed the sql_json_properties branch from 6d3fd90 to e1e6dd6 Compare April 6, 2026 22:43

edwinyyyu added 2 commits April 6, 2026 16:49

Get config from collection

3c5a742

Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>

Add SQL filter compilation for properties JSON

496802d

Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>

edwinyyyu force-pushed the sql_json_properties branch from e1e6dd6 to 496802d Compare April 6, 2026 23:59

sscargal requested a review from Copilot April 7, 2026 22:25

Copilot started reviewing on behalf of sscargal April 7, 2026 22:26 View session

sscargal removed this from the v0.3.4 milestone Apr 7, 2026

sscargal added this to the v0.4.0 milestone Apr 7, 2026

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQL filter compilation for properties JSON#1291

Add SQL filter compilation for properties JSON#1291
edwinyyyu wants to merge 2 commits intoMemMachine:mainfrom
edwinyyyu:sql_json_properties

edwinyyyu commented Apr 1, 2026 •

edited

Loading

Uh oh!

edwinyyyu commented Apr 1, 2026 •

edited

Loading

Uh oh!

edwinyyyu commented Apr 2, 2026

Uh oh!

sscargal commented Apr 7, 2026

Uh oh!

sscargal commented Apr 7, 2026

Uh oh!

sscargal commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

edwinyyyu commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

edwinyyyu commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of the change

Description

Type of change

How Has This Been Tested?

Checklist

Maintainer Checklist

Uh oh!

edwinyyyu commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edwinyyyu commented Apr 2, 2026

Uh oh!

sscargal commented Apr 7, 2026

Uh oh!

sscargal commented Apr 7, 2026

What Changed

The Core Change: Unknown Filter Fields Now Raise Instead of Silently Returning None

Before vs. After: Concrete Code Example

Scenario: Filtering on an unknown field

Does the Breaking Change Require API Changes?

Summary of Required API Changes for Callers

Uh oh!

sscargal commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

edwinyyyu commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edwinyyyu commented Apr 1, 2026 •

edited

Loading

edwinyyyu commented Apr 1, 2026 •

edited

Loading

The Core Change: Unknown Filter Fields Now Raise Instead of Silently Returning `None`

sscargal commented Apr 7, 2026 •

edited

Loading

edwinyyyu commented Apr 8, 2026 •

edited

Loading