Skip to content

feat: propagate bank_id as user field in LLM calls#425

Open
AlexanderZaytsev wants to merge 1 commit intovectorize-io:mainfrom
AlexanderZaytsev:feat/bank-id-in-llm-calls
Open

feat: propagate bank_id as user field in LLM calls#425
AlexanderZaytsev wants to merge 1 commit intovectorize-io:mainfrom
AlexanderZaytsev:feat/bank-id-in-llm-calls

Conversation

@AlexanderZaytsev
Copy link

Problem

When Hindsight is deployed behind an LLM proxy (e.g. for cost tracking or rate limiting), the proxy has no way to know which memory bank triggered each LLM call. This makes it impossible to attribute LLM costs to specific banks/tenants.

Solution

Add an llm_user context variable (contextvars.ContextVar) that propagates into the OpenAI-compatible LLM provider as the standard user field in API requests.

Changes (2 files)

engine/providers/openai_compatible_llm.py

  • Define llm_user context variable
  • Inject as user field in both call() and call_with_tools() when set

engine/memory_engine.py

  • Set llm_user in retain_batch_async() — covers extraction LLM calls
  • Set llm_user in reflect_async() — covers reflection LLM calls
  • Set llm_user in execute_task() — covers all background worker tasks (consolidation, mental model refresh, etc.)

How it works

HTTP request (retain/reflect)    → MemoryEngine method sets llm_user from bank_id
Background task (consolidation)  → execute_task() sets llm_user from task dict
                                   ↓
                          OpenAICompatibleLLM reads llm_user
                                   ↓
                          Injects as "user" field in API request
                                   ↓
                          Proxy reads "user" → attributes cost

Why contextvars?

bank_id is available at the entry points but would need to be threaded through many intermediate functions (_extract_facts_from_chunk, _consolidate_batch, run_reflect_agent, etc.) to reach the LLM provider. contextvars provides clean propagation without touching any function signatures, and is async-safe — concurrent requests for different banks don't interfere.

No behavior change when not using a proxy

The user field is simply ignored by providers when no proxy is involved. The feature is zero-cost when not used — llm_user defaults to None and no field is injected.

… attribution

When Hindsight is deployed behind an LLM proxy, the proxy needs to know
which caller triggered each LLM call for cost tracking and attribution.

This adds an `llm_user` context variable (Python contextvars) that gets
injected as the standard `user` field in OpenAI API requests.

The context is set in three MemoryEngine entry points:
- retain_batch_async: covers extraction LLM calls
- reflect_async: covers reflection LLM calls
- execute_task: covers all background worker tasks (consolidation, mental
  model refresh, etc.)

The `user` field is part of the OpenAI API spec and is passed through by
all major providers. Proxies can read this field to attribute costs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user is deprecated in OpenAI Completion

https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create

we should use safety_identifier

what gateway are you using?

@AlexanderZaytsev
Copy link
Author

user is deprecated in OpenAI Completion

https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create

we should use safety_identifier

what gateway are you using?

I'm using my own gateway (sort of like LiteLLM) where each agent has its own api key. Another idea is to make hindsight also accept api keys via headers for each request (X-LLM-API-Key, X-Embeddings-API-Key, etc). But there has to be a way to map all requests to each agent.

@nicoloboschi
Copy link
Collaborator

user is deprecated in OpenAI Completion
https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create
we should use safety_identifier
what gateway are you using?

I'm using my own gateway (sort of like LiteLLM) where each agent has its own api key. Another idea is to make hindsight also accept api keys via headers for each request (X-LLM-API-Key, X-Embeddings-API-Key, etc). But there has to be a way to map all requests to each agent.

I'm fine adding this header, as long as it doesn't cause any problem on other llms provider. since user is deprecated I was wondering if safety_identifier was a better fit
Looks like LiteLLM gateway supports both https://docs.litellm.ai/docs/completion/input#optional-fields

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants