Skip to content

Conversation

@nkanu17
Copy link

@nkanu17 nkanu17 commented Dec 26, 2025

Summary

Issue #105

This PR replaces the custom LLM wrapper classes in llms.py with a unified LLMClient abstraction layer backed by LiteLLM.

Changes

New:

  • agent_memory_server/llm/ - Unified LLM client package:
    • llm/client.py - LLMClient class with:
      • LLMClient.create_chat_completion() - Async chat completions
      • LLMClient.create_embedding() - Async embeddings
      • LLMClient.get_model_config() - Model metadata resolution
      • LLMClient.optimize_query() - Query optimization for vector search
    • llm/types.py - ChatCompletionResponse, EmbeddingResponse dataclasses, LLMBackend Protocol
    • llm/exceptions.py - LLMClientError, ModelValidationError, APIKeyMissingError
    • llm/__init__.py - Re-exports for clean imports

Removed:

  • agent_memory_server/llms.py - Deleted (replaced by llm/ package)

Updated:

  • main.py - Use LLMClient.get_model_config() for startup validation
  • extraction.py - Use LLMClient.create_chat_completion()
  • summarization.py - Use LLMClient.create_chat_completion()
  • memory_strategies.py - Use LLMClient.create_chat_completion()
  • long_term_memory.py - Use LLMClient.create_embedding()
  • api.py - Use LLMClient.get_model_config()
  • All affected test files updated

Testing

  • All 486 tests pass
  • Added tests/test_llm_client.py with unit tests for the new client

Example

# Before
from agent_memory_server.llms import get_model_client, ChatResponse
client = await get_model_client(model_name)
response = await client.create_chat_completion(model=model_name, prompt=messages)
text = response.choices[0].message.content

# After
from agent_memory_server.llm import LLMClient
response = await LLMClient.create_chat_completion(model=model_name, messages=messages)
text = response.content
┌─────────────────────────────────────────────────────────────────────────────┐
│                              CALL SITES                                     │
│  extraction.py | summarization.py | memory_strategies.py | api.py | main.py│
│                                                                             │
│     await LLMClient.create_chat_completion(model, messages, ...)            │
│     await LLMClient.create_embedding(model, input_texts, ...)               │
│     LLMClient.get_model_config(model_name)                                  │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            LLMClient                                        │
│                   (Facade - Public API, Never Changes)                      │
│                                                                             │
│  create_chat_completion() → ChatCompletionResponse                          │
│  create_embedding()       → EmbeddingResponse                               │
│  get_model_config()       → ModelConfig                                     │
│  set_backend() / reset()  → For testing                                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     Internal: LiteLLM (hidden, swappable)                   │
│                                                                             │
│  acompletion() | aembedding() | get_model_info()                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
                                    (LiteLLM)
┌────────────────────────────────────────────────────────┐
│    OpenAI    │  Anthropic   │ AWS Bedrock  │  Gemini | Ollama │  Huggingface | + more  │
└──────────────────────────────────────────────────────────────────────┘

Future Work

Any sort of gateway connection

  • can be handled by Litellm/LLMClient combo

Embeddings Consolidation

Context

During this refactoring, I identified that embeddings are NOT yet consolidated through LLMClient:

Operation Current Implementation Target
Chat completions LLMClient.create_chat_completion()
Embeddings (server) langchain_openai.OpenAIEmbeddings, langchain_aws.BedrockEmbeddings LLMClient.create_embedding()

Why Not Included in This PR

  1. LangChain interface requirement - Vector stores (langchain_redis.RedisVectorStore) require the LangChain Embeddings interface, not raw embedding functions
  2. Additional wrapper needed - Requires creating LLMClientEmbeddings class that implements LangChain's Embeddings ABC

Future Implementation

See dev_docs/embeddings_consolidation_plan.md for the full plan. Summary:

  1. Create LLMClientEmbeddings(Embeddings) wrapper in agent_memory_server/llm/embeddings.py
  2. Wrapper calls LLMClient.create_embedding() internally
  3. Update vectorstore_factory.py to use LLMClientEmbeddings instead of direct LangChain imports
  4. Remove langchain_openai.OpenAIEmbeddings and langchain_aws.BedrockEmbeddings imports

Decision

Defer to separate PR - Keep this PR focused on the core abstraction. Embeddings consolidation is a follow-up task that can be done independently.

- Create unified LLMClient class with static methods for all LLM operations
- Add async create_chat_completion() using litellm.acompletion()
- Add async create_embedding() using litellm.aembedding()
- Add get_model_config() with fallback chain: MODEL_CONFIGS → LiteLLM → defaults
- Add optimize_query() for vector search query optimization
- Define standardized ChatCompletionResponse and EmbeddingResponse dataclasses
- Add LLMBackend Protocol for test injection
- Add custom exception hierarchy:
  - LLMClientError (base)
  - ModelValidationError
  - APIKeyMissingError
- Remove llms.py module (replaced by llm_client.py)
- Update api.py to use LLMClient.get_model_config()
- Update extraction.py to use LLMClient.create_chat_completion()
- Update long_term_memory.py to use LLMClient
- Update memory_strategies.py to use LLMClient
- Update summarization.py to use LLMClient

BREAKING CHANGE: llms.py module removed, use llm_client.py instead
- Use LLMClient.get_model_config() for model resolution
- Add custom exception handling with ModelValidationError and APIKeyMissingError
- Validate API keys based on resolved provider
- Improve error messages with abstracted terminology
- Add test_llm_client.py with unit tests for LLMClient
- Update conftest.py with LLMClient mock fixtures
- Update test_llms.py imports for compatibility
- Update test_api.py imports
- Update test_contextual_grounding.py imports
- Update test_contextual_grounding_integration.py imports
- Update test_llm_judge_evaluation.py imports
- Update test_long_term_memory.py imports
- Update test_memory_compaction.py imports
- Update test_memory_strategies.py imports
- Update test_no_worker_mode.py imports
- Update test_query_optimization_errors.py imports
- Update test_summarization.py imports
- Update pyproject.toml with any new dependencies
- Sync uv.lock file
- Add LLMClient documentation for AI assistant context
- Update project structure references
- Exclude dev_docs directory from version control
- Keep development documentation local only
Split the monolithic llm_client.py into a proper package structure:
- llm/exceptions.py: LLMClientError, ModelValidationError, APIKeyMissingError
- llm/types.py: ChatCompletionResponse, EmbeddingResponse, LLMBackend Protocol
- llm/client.py: LLMClient class with all methods
- llm/__init__.py: Re-exports for clean imports

This improves code organization and makes it easier to:
- Find specific components (exceptions, types, client)
- Test individual modules
- Extend the package with new backends
Update all application code imports from:
  from agent_memory_server.llm_client import ...
to:
  from agent_memory_server.llm import ...

Files updated:
- api.py
- extraction.py
- long_term_memory.py
- main.py
- memory_strategies.py
- summarization.py
Update all test file imports from:
  from agent_memory_server.llm_client import ...
to:
  from agent_memory_server.llm import ...

Update mock patch paths from:
  agent_memory_server.llm_client.LLMClient
to:
  agent_memory_server.llm.client.LLMClient

Also removed duplicate mock_llm_client fixture in conftest.py.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant