Feat/llmclient-abstraction-with-litellm #106

nkanu17 · 2025-12-26T21:41:55Z

Summary

Issue #105

This PR replaces the custom LLM wrapper classes in llms.py with a unified LLMClient abstraction layer backed by LiteLLM.

Changes

New:

agent_memory_server/llm/ - Unified LLM client package:
- llm/client.py - LLMClient class with:
  - LLMClient.create_chat_completion() - Async chat completions
  - LLMClient.create_embedding() - Async embeddings
  - LLMClient.get_model_config() - Model metadata resolution
  - LLMClient.optimize_query() - Query optimization for vector search
- llm/types.py - ChatCompletionResponse, EmbeddingResponse dataclasses, LLMBackend Protocol
- llm/exceptions.py - LLMClientError, ModelValidationError, APIKeyMissingError
- llm/__init__.py - Re-exports for clean imports

Removed:

agent_memory_server/llms.py - Deleted (replaced by llm/ package)

Updated:

main.py - Use LLMClient.get_model_config() for startup validation
extraction.py - Use LLMClient.create_chat_completion()
summarization.py - Use LLMClient.create_chat_completion()
memory_strategies.py - Use LLMClient.create_chat_completion()
long_term_memory.py - Use LLMClient.create_embedding()
api.py - Use LLMClient.get_model_config()
All affected test files updated

Testing

All 486 tests pass
Added tests/test_llm_client.py with unit tests for the new client

Example

# Before
from agent_memory_server.llms import get_model_client, ChatResponse
client = await get_model_client(model_name)
response = await client.create_chat_completion(model=model_name, prompt=messages)
text = response.choices[0].message.content

# After
from agent_memory_server.llm import LLMClient
response = await LLMClient.create_chat_completion(model=model_name, messages=messages)
text = response.content

┌─────────────────────────────────────────────────────────────────────────────┐
│                              CALL SITES                                     │
│  extraction.py | summarization.py | memory_strategies.py | api.py | main.py│
│                                                                             │
│     await LLMClient.create_chat_completion(model, messages, ...)            │
│     await LLMClient.create_embedding(model, input_texts, ...)               │
│     LLMClient.get_model_config(model_name)                                  │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            LLMClient                                        │
│                   (Facade - Public API, Never Changes)                      │
│                                                                             │
│  create_chat_completion() → ChatCompletionResponse                          │
│  create_embedding()       → EmbeddingResponse                               │
│  get_model_config()       → ModelConfig                                     │
│  set_backend() / reset()  → For testing                                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     Internal: LiteLLM (hidden, swappable)                   │
│                                                                             │
│  acompletion() | aembedding() | get_model_info()                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
                                    (LiteLLM)
┌────────────────────────────────────────────────────────┐
│    OpenAI    │  Anthropic   │ AWS Bedrock  │  Gemini | Ollama │  Huggingface | + more  │
└──────────────────────────────────────────────────────────────────────┘

Future Work

Any sort of gateway connection

can be handled by Litellm/LLMClient combo

Embeddings Consolidation

Context

During this refactoring, I identified that embeddings are NOT yet consolidated through LLMClient:

Operation	Current Implementation	Target
Chat completions	`LLMClient.create_chat_completion()` ✅	✅
Embeddings (server)	`langchain_openai.OpenAIEmbeddings`, `langchain_aws.BedrockEmbeddings` ❌	`LLMClient.create_embedding()`

Why Not Included in This PR

LangChain interface requirement - Vector stores (langchain_redis.RedisVectorStore) require the LangChain Embeddings interface, not raw embedding functions
Additional wrapper needed - Requires creating LLMClientEmbeddings class that implements LangChain's Embeddings ABC

Future Implementation

See dev_docs/embeddings_consolidation_plan.md for the full plan. Summary:

Create LLMClientEmbeddings(Embeddings) wrapper in agent_memory_server/llm/embeddings.py
Wrapper calls LLMClient.create_embedding() internally
Update vectorstore_factory.py to use LLMClientEmbeddings instead of direct LangChain imports
Remove langchain_openai.OpenAIEmbeddings and langchain_aws.BedrockEmbeddings imports

Decision

Defer to separate PR - Keep this PR focused on the core abstraction. Embeddings consolidation is a follow-up task that can be done independently.

- Create unified LLMClient class with static methods for all LLM operations - Add async create_chat_completion() using litellm.acompletion() - Add async create_embedding() using litellm.aembedding() - Add get_model_config() with fallback chain: MODEL_CONFIGS → LiteLLM → defaults - Add optimize_query() for vector search query optimization - Define standardized ChatCompletionResponse and EmbeddingResponse dataclasses - Add LLMBackend Protocol for test injection - Add custom exception hierarchy: - LLMClientError (base) - ModelValidationError - APIKeyMissingError

- Remove llms.py module (replaced by llm_client.py) - Update api.py to use LLMClient.get_model_config() - Update extraction.py to use LLMClient.create_chat_completion() - Update long_term_memory.py to use LLMClient - Update memory_strategies.py to use LLMClient - Update summarization.py to use LLMClient BREAKING CHANGE: llms.py module removed, use llm_client.py instead

- Use LLMClient.get_model_config() for model resolution - Add custom exception handling with ModelValidationError and APIKeyMissingError - Validate API keys based on resolved provider - Improve error messages with abstracted terminology

- Add test_llm_client.py with unit tests for LLMClient - Update conftest.py with LLMClient mock fixtures - Update test_llms.py imports for compatibility

- Update test_api.py imports - Update test_contextual_grounding.py imports - Update test_contextual_grounding_integration.py imports - Update test_llm_judge_evaluation.py imports - Update test_long_term_memory.py imports - Update test_memory_compaction.py imports - Update test_memory_strategies.py imports - Update test_no_worker_mode.py imports - Update test_query_optimization_errors.py imports - Update test_summarization.py imports

- Update pyproject.toml with any new dependencies - Sync uv.lock file

- Add LLMClient documentation for AI assistant context - Update project structure references

- Exclude dev_docs directory from version control - Keep development documentation local only

Split the monolithic llm_client.py into a proper package structure: - llm/exceptions.py: LLMClientError, ModelValidationError, APIKeyMissingError - llm/types.py: ChatCompletionResponse, EmbeddingResponse, LLMBackend Protocol - llm/client.py: LLMClient class with all methods - llm/__init__.py: Re-exports for clean imports This improves code organization and makes it easier to: - Find specific components (exceptions, types, client) - Test individual modules - Extend the package with new backends

Update all application code imports from: from agent_memory_server.llm_client import ... to: from agent_memory_server.llm import ... Files updated: - api.py - extraction.py - long_term_memory.py - main.py - memory_strategies.py - summarization.py

Update all test file imports from: from agent_memory_server.llm_client import ... to: from agent_memory_server.llm import ... Update mock patch paths from: agent_memory_server.llm_client.LLMClient to: agent_memory_server.llm.client.LLMClient Also removed duplicate mock_llm_client fixture in conftest.py.

nkanu17 added 13 commits December 26, 2025 06:52

test(llm): add LLMClient tests and update test fixtures

a5a2639

- Add test_llm_client.py with unit tests for LLMClient - Update conftest.py with LLMClient mock fixtures - Update test_llms.py imports for compatibility

build(deps): update dependencies for LLMClient

473c6d7

- Update pyproject.toml with any new dependencies - Sync uv.lock file

docs: update CLAUDE.md with LLMClient context

893d424

- Add LLMClient documentation for AI assistant context - Update project structure references

chore: add dev_docs to .gitignore

25b5a48

- Exclude dev_docs directory from version control - Keep development documentation local only

Update docs for query optimization

5311daa

docs: update project structure for llm/ package

6b03d0c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/llmclient-abstraction-with-litellm #106

Feat/llmclient-abstraction-with-litellm #106

Uh oh!

nkanu17 commented Dec 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feat/llmclient-abstraction-with-litellm #106

Are you sure you want to change the base?

Feat/llmclient-abstraction-with-litellm #106

Uh oh!

Conversation

nkanu17 commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Example

Future Work

Any sort of gateway connection

Embeddings Consolidation

Why Not Included in This PR

Future Implementation

Decision

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nkanu17 commented Dec 26, 2025 •

edited

Loading