feat: propagate bank_id as user field in LLM calls#425
feat: propagate bank_id as user field in LLM calls#425AlexanderZaytsev wants to merge 1 commit intovectorize-io:mainfrom
Conversation
… attribution When Hindsight is deployed behind an LLM proxy, the proxy needs to know which caller triggered each LLM call for cost tracking and attribution. This adds an `llm_user` context variable (Python contextvars) that gets injected as the standard `user` field in OpenAI API requests. The context is set in three MemoryEngine entry points: - retain_batch_async: covers extraction LLM calls - reflect_async: covers reflection LLM calls - execute_task: covers all background worker tasks (consolidation, mental model refresh, etc.) The `user` field is part of the OpenAI API spec and is passed through by all major providers. Proxies can read this field to attribute costs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nicoloboschi
left a comment
There was a problem hiding this comment.
user is deprecated in OpenAI Completion
https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create
we should use safety_identifier
what gateway are you using?
I'm using my own gateway (sort of like LiteLLM) where each agent has its own api key. Another idea is to make hindsight also accept api keys via headers for each request ( |
I'm fine adding this header, as long as it doesn't cause any problem on other llms provider. since user is deprecated I was wondering if safety_identifier was a better fit |
Problem
When Hindsight is deployed behind an LLM proxy (e.g. for cost tracking or rate limiting), the proxy has no way to know which memory bank triggered each LLM call. This makes it impossible to attribute LLM costs to specific banks/tenants.
Solution
Add an
llm_usercontext variable (contextvars.ContextVar) that propagates into the OpenAI-compatible LLM provider as the standarduserfield in API requests.Changes (2 files)
engine/providers/openai_compatible_llm.pyllm_usercontext variableuserfield in bothcall()andcall_with_tools()when setengine/memory_engine.pyllm_userinretain_batch_async()— covers extraction LLM callsllm_userinreflect_async()— covers reflection LLM callsllm_userinexecute_task()— covers all background worker tasks (consolidation, mental model refresh, etc.)How it works
Why contextvars?
bank_idis available at the entry points but would need to be threaded through many intermediate functions (_extract_facts_from_chunk,_consolidate_batch,run_reflect_agent, etc.) to reach the LLM provider.contextvarsprovides clean propagation without touching any function signatures, and is async-safe — concurrent requests for different banks don't interfere.No behavior change when not using a proxy
The
userfield is simply ignored by providers when no proxy is involved. The feature is zero-cost when not used —llm_userdefaults toNoneand no field is injected.