Skip to content

Add per-corpus agent memory system#1213

Open
JSv4 wants to merge 4 commits intomainfrom
claude/agent-memory-system-nYQUX
Open

Add per-corpus agent memory system#1213
JSv4 wants to merge 4 commits intomainfrom
claude/agent-memory-system-nYQUX

Conversation

@JSv4
Copy link
Copy Markdown
Collaborator

@JSv4 JSv4 commented Apr 7, 2026

Introduce a document-backed memory system that lets agents accumulate
reusable insights from conversations. Memory is stored as a first-class
markdown Document in the corpus, visible and editable by users.

Key components:

  • Corpus model: memory_enabled toggle + memory_document FK
  • Memory CRUD utilities with hybrid retrieval (full injection when small,
    keyword-scored section filtering when large)
  • Two-stage end-of-conversation curation: privacy-preserving summarisation
    then LLM-based pattern extraction via Celery task
  • Automatic memory injection into agent system prompts
  • New agent tools: get_corpus_memory and suggest_memory_update
  • GraphQL ToggleCorpusMemory mutation with privacy warning field
  • Periodic Celery beat task for idle conversation detection

https://claude.ai/code/session_01QRjEA1X1JyofxLFPhdPPmN

Introduce a document-backed memory system that lets agents accumulate
reusable insights from conversations. Memory is stored as a first-class
markdown Document in the corpus, visible and editable by users.

Key components:
- Corpus model: memory_enabled toggle + memory_document FK
- Memory CRUD utilities with hybrid retrieval (full injection when small,
  keyword-scored section filtering when large)
- Two-stage end-of-conversation curation: privacy-preserving summarisation
  then LLM-based pattern extraction via Celery task
- Automatic memory injection into agent system prompts
- New agent tools: get_corpus_memory and suggest_memory_update
- GraphQL ToggleCorpusMemory mutation with privacy warning field
- Periodic Celery beat task for idle conversation detection

https://claude.ai/code/session_01QRjEA1X1JyofxLFPhdPPmN
@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

PR Review: Per-corpus agent memory system

This is a well-structured feature with clear architectural intent. The two-stage privacy-preserving curation pipeline is a thoughtful design. Below are issues ranging from blocking bugs to minor style concerns.


Blocking / High Priority

1. asyncio.get_event_loop().run_until_complete() in Celery task (memory_tasks.py:133)

return asyncio.get_event_loop().run_until_complete(
    _curate_corpus_memory_async(conversation_id)
)

In Python 3.10+, calling get_event_loop() without an active running loop raises a DeprecationWarning and will fail in future versions. Celery workers typically don't have a running event loop. Replace with:

return asyncio.run(_curate_corpus_memory_async(conversation_id))

2. Race condition in update_memory_content (memory.py:142)

Two curation tasks firing concurrently for different conversations on the same corpus will both read current memory, merge independently, then write — the second write silently overwrites the first. With the periodic check firing every 10 minutes and potentially dispatching many conversations at once, this is a real risk. Consider using select_for_update() on the memory document row, or serialising curation tasks per corpus via a Celery chord / canvas.


3. Race condition in get_or_create_memory_document (memory.py:74)

If corpus.memory_document_id is None, two concurrent callers both pass the check and both call corpus.import_content(...). The second write will set the FK to a different document, leaving an orphaned document in the corpus with no FK pointing to it. A get_or_create-style DB-level guard (or select_for_update when checking the FK) is needed.


4. asuggest_memory_update accepts user_id as an LLM-controlled parameter (core_tools.py:2766)

The user_id is passed as a tool argument, meaning an adversarial or confused LLM could write memory attributed to any user. This is an IDOR risk — the user context should come from the agent's runtime context (same pattern used by every other write tool in the codebase), not from the tool arguments. Remove user_id from the signature and resolve the user from the tool's ToolContext/agent context.


Medium Priority

5. DRY violation: memory injection block duplicated verbatim in agent_factory.py

The 20-line memory injection block at lines 190–209 and 387–406 is identical. Extract it to a helper:

async def _inject_corpus_memory(corpus_obj, config) -> None:
    ...

This follows the project's DRY guideline and makes future changes safer.


6. MEMORY_CURATION_CHECK_INTERVAL_SECONDS constant defined but not used (base.py:683)

base.py hardcodes 600.0 instead of importing MEMORY_CURATION_CHECK_INTERVAL_SECONDS from opencontractserver/constants/agent_memory.py. This violates the project's "no magic numbers" rule and makes the constant misleading (changing the constant won't affect the actual schedule).

# base.py
from opencontractserver.constants.agent_memory import MEMORY_CURATION_CHECK_INTERVAL_SECONDS
...
"schedule": MEMORY_CURATION_CHECK_INTERVAL_SECONDS,

7. merge_curation_into_memory refinements use unconstrained str.replace (memory.py:300)

result = result.replace(old, new)

If old appears more than once in the document (e.g., a short common phrase), this replaces all occurrences, potentially corrupting unrelated entries. Use a targeted replacement (e.g., replace only the first occurrence: result.replace(old, new, 1)).


8. Prompt injection risk in curation system prompt (memory.py:339)

The full conversation_text is embedded directly into _CURATION_SYSTEM_PROMPT via .format(conversation=conversation_text, ...). A user who knows memory curation is happening could craft their conversation to escape the format string context and override the curation rules. Move conversation_text to the user prompt (where it semantically belongs), not the system prompt. This also makes the stage-1 summary → stage-2 curation pipeline cleaner.


9. check_conversations_for_curation may dispatch duplicate tasks

The query filters memory_curated=False, but a previously dispatched curate_corpus_memory task may still be in the Celery queue without having written memory_curated=True yet. When the periodic task runs again 10 minutes later, it will dispatch the same conversation again. This can cause double-curation and wasteful LLM calls. Consider a "curation in-flight" flag set before dispatching (cleared on task completion/failure), or use apply_async with a Celery lock (e.g., via django-celery-results or Redis lock).


10. Broad exception clause in ToggleCorpusMemory.mutate (corpus_mutations.py:1449)

except (Corpus.DoesNotExist, Exception):

Exception subsumes Corpus.DoesNotExist — the first clause is dead code. More importantly, swallowing all exceptions hides programming errors (e.g., a broken from_global_id call). Narrow this to the expected exceptions:

except (Corpus.DoesNotExist, ValueError):

Minor / Style

11. Memory injection query signal is the system prompt, not the user query

In agent_factory.py:196, query=config.system_prompt or "" passes the static agent instructions as the relevance signal for memory section selection. The user's actual question would be a much better signal, but that may not be available at factory-creation time. At minimum, document this limitation in a comment so future maintainers understand why injection quality degrades for large memory documents.

12. _split_memory_sections handles missing frontmatter silently but test_split_no_sections expects 1 section for plain text

"Just some text." contains no ## header, so re.split(r"(?=^## )", ...) returns the whole string as one element — that's correct, but it means the caller can't tell "one section with no header" from "one real section". The test passes today but the semantics are fragile; worth a comment in the function.

13. Memory document is a first-class corpus document visible in normal document lists

This is mentioned in the PR description as intentional ("visible and editable by users"). However, there's no filtering in place to prevent the memory document from appearing in list_documents agent tool results or showing up in semantic search. Consider whether aget_corpus_memory being the explicit read path means the document should be tagged/filtered in other contexts.

14. No maximum size enforcement on the memory document

merge_curation_into_memory appends indefinitely. After many curation runs, the document could grow beyond MEMORY_FULL_INJECTION_MAX_TOKENS and the keyword-scoring fallback would kick in permanently. Consider adding a total-size cap with an LLM-based consolidation pass, or at least logging a warning when the document exceeds the threshold.


Test Coverage

The pure-function unit tests are solid. A few gaps worth noting:

  • No test for the concurrent write race condition (even a simple test with threading.Thread would document the known limitation)
  • TestCurationEligibility.test_short_conversation_ineligible doesn't actually assert the curation task skips the conversation — it only verifies message count. Consider adding an async test that calls _curate_corpus_memory_async directly and asserts {"status": "skipped", "reason": "too_few_messages"}
  • No test for the ToggleCorpusMemory GraphQL mutation
  • No test for check_conversations_for_curation (the periodic dispatcher)

Overall this is a well-thought-out feature. The blocking items (#1, #2, #3, #4) need to be addressed before merge. The DRY (#5) and magic-number (#6) issues are also straightforward fixes.

JSv4 added 3 commits April 7, 2026 02:23
RateLimits has no MUTATIONS category. Use WRITE_LIGHT (30/m) which
is appropriate for a simple toggle mutation, and pass it via the
rate= keyword argument as the decorator expects.
The Conversation model's ChatMessage FK uses related_name="chat_messages",
not "messages" (which is the related_name from ChatMessage to
AgentConfiguration). Fixed in both the curation task and the test.
- Fix #1: Replace deprecated asyncio.get_event_loop().run_until_complete()
  with asyncio.run() in Celery task
- Fix #2: Add select_for_update on memory document row in
  update_memory_content to serialise concurrent writers
- Fix #3: Use select_for_update + transaction.atomic in
  get_or_create_memory_document to prevent orphan documents
- Fix #4: Add user_id to build_inject_params_for_context so the LLM
  cannot control which user is attributed to memory updates
- Fix #5: Extract duplicated memory injection block into
  _inject_corpus_memory helper in agent_factory.py
- Fix #6: Use MEMORY_CURATION_CHECK_INTERVAL_SECONDS constant in
  base.py Celery beat schedule instead of magic number 600.0
- Fix #7: Use str.replace(old, new, 1) for refinements to avoid
  corrupting unrelated entries with short common phrases
- Fix #8: Move conversation_text and current_memory from system prompt
  to user prompt in curation prompts to mitigate prompt injection
- Fix #9: Atomic claim via UPDATE...WHERE memory_curated=False before
  LLM calls, with release on error for retry support
- Fix #10: Narrow broad Exception clause to (DoesNotExist, ValueError,
  IndexError) in ToggleCorpusMemory mutation
- Update tests to match new prompt structure
@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

PR Review: Add per-corpus agent memory system

Overall this is a well-structured feature with good separation of concerns, proper migration handling, and solid test coverage for the pure-logic functions. A few issues warrant attention before merge.


Critical Issues

1. asyncio.run() inside a Celery task (opencontractserver/tasks/memory_tasks.py:1207)

def curate_corpus_memory(self, conversation_id: int) -> dict:
    return asyncio.run(_curate_corpus_memory_async(conversation_id))

asyncio.run() creates a new event loop. If the Celery worker uses gevent or eventlet concurrency (both common), this will deadlock or raise. The codebase already uses asgiref.sync.async_to_sync elsewhere — that handles the event loop correctly. Switch to async_to_sync(_curate_corpus_memory_async)(conversation_id).

2. Short-conversation claim is never released (memory_tasks.py:1249-1268)

# Atomically claim — sets memory_curated=True
updated = await database_sync_to_async(...)()
...
if len(messages) < MEMORY_CURATION_MIN_MESSAGES:
    return {"status": "skipped", "reason": "too_few_messages"}  # claim NOT released

The conversation is permanently marked memory_curated=True even though nothing was curated. A conversation that starts short but grows later will never be re-eligible. The message count check should happen before the atomic claim, or _release_claim() should be called in this branch.

3. select_for_update lock is dropped immediately (opencontractserver/agents/memory.py:367)

DocumentModel.objects.select_for_update().filter(pk=doc.pk).exists()
doc.txt_extract_file.save(...)  # lock is NOT held here

exists() evaluates the queryset and releases the lock. The subsequent .save() runs unlocked. This provides no protection against concurrent writers. To actually serialize writes, the lock must be held through the mutation in the same queryset or you need to hold the DB connection within the same atomic() block using select_for_update() and then call save() on the retrieved instance.


Important Issues

4. Privacy: raw conversation text is sent to an external LLM (memory_tasks.py:1312-1316)

The "privacy firewall" is implemented as an LLM system prompt instruction — there is no code-level guarantee that personal data is not forwarded. The raw conversation text (including all user messages) is sent to gpt-4o via the OpenAI API. This is a significant privacy implication that should be called out explicitly in the ToggleCorpusMemory mutation's warning text, not just "patterns may be stored."

5. Hardcoded OpenAI provider in curation task (memory_tasks.py:1298)

model_name = getattr(settings, "OPENAI_MODEL", "gpt-4o")

The rest of the codebase abstracts provider selection. Hardcoding OpenAI here means the curation task silently fails (or sends data to the wrong provider) for deployments using Anthropic or other backends. Use the same model resolution logic as UnifiedAgentFactory.

6. No user permission check in memory tools (core_tools.py:929-950, 953-1019)

aget_corpus_memory checks corpus.memory_enabled but does not verify whether the calling user has read permission on the corpus. asuggest_memory_update injects a user_id but never checks that this user has write permission on the corpus. Any agent with a valid corpus_id can read/write corpus memory regardless of permissions.

7. No limit on check_conversations_for_curation dispatch (memory_tasks.py:1415-1432)

eligible = Conversation.objects.filter(...).values_list("id", flat=True)
for conv_id in eligible:
    curate_corpus_memory.delay(conv_id)

On a large installation this could dispatch thousands of tasks in a single beat tick. Add a [:500] (or configurable) limit and order deterministically so that on the next tick the remainder is processed.


Minor Issues

8. MEMORY_DOCUMENT_SLUG_SUFFIX is defined but never used (constants/agent_memory.py:653)

Dead constant — either use it or remove it.

9. Hardcoded placeholder strings in get_memory_for_injection (memory.py:429-431)

if (
    "_No collection patterns recorded yet._" in body
    and "_No query patterns recorded yet._" in body
):

These duplicate the template strings inline. Extract them as constants (e.g., MEMORY_EMPTY_COLLECTION_PLACEHOLDER) so a template change doesn't silently break the check.

10. _split_memory_sections test false positive (test_agent_memory.py:1534)

test_split_no_sections asserts len(sections) == 1 for content that has no ## headers after stripping frontmatter. The body is "Just some text."_split_memory_sections returns ["Just some text."], which passes only because re.split on a non-matching pattern returns the entire string as one element. This is correct behaviour but the test would be more meaningful with an assertion that the single "section" has no heading.

11. Missing test coverage

  • aget_corpus_memory and asuggest_memory_update tools (their async tool behaviour, not just the underlying sync logic)
  • check_conversations_for_curation periodic task (dispatch counting, cutoff filtering)
  • ToggleCorpusMemory GraphQL mutation (permission denial, enable/disable round-trip)
  • _inject_corpus_memory in agent_factory.py (verify memory is appended to system prompt and that failure is silently swallowed)

Nits

  • _inject_corpus_memory docstring correctly notes that the system prompt is a weak relevance signal (the user's message would be better). Consider passing an optional user_message parameter through the factory and logging a debug note when falling back to the prompt.
  • merge_curation_into_memory uses str.replace(old, new, 1) for refinements. If the same text appears in the frontmatter or another section by coincidence the wrong occurrence is replaced. A regex anchored to the list item format would be more robust.
  • The _release_claim helper closes over conversation.pk fine, but if the Celery task is retried after the process crashes (not a graceful return), the claim is never released. Consider a dedicated memory_curated_at timestamp + cron cleanup instead of relying purely on the boolean.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 39.22414% with 141 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
opencontractserver/agents/memory.py 43.39% 90 Missing ⚠️
opencontractserver/llms/tools/core_tools.py 4.54% 42 Missing ⚠️
opencontractserver/llms/agents/agent_factory.py 25.00% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants