-
Notifications
You must be signed in to change notification settings - Fork 29
Feat/llmclient-abstraction-with-litellm #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
nkanu17
wants to merge
13
commits into
redis:main
Choose a base branch
from
nkanu17:feat/litellm-llmclient-abstraction
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Create unified LLMClient class with static methods for all LLM operations - Add async create_chat_completion() using litellm.acompletion() - Add async create_embedding() using litellm.aembedding() - Add get_model_config() with fallback chain: MODEL_CONFIGS → LiteLLM → defaults - Add optimize_query() for vector search query optimization - Define standardized ChatCompletionResponse and EmbeddingResponse dataclasses - Add LLMBackend Protocol for test injection - Add custom exception hierarchy: - LLMClientError (base) - ModelValidationError - APIKeyMissingError
- Remove llms.py module (replaced by llm_client.py) - Update api.py to use LLMClient.get_model_config() - Update extraction.py to use LLMClient.create_chat_completion() - Update long_term_memory.py to use LLMClient - Update memory_strategies.py to use LLMClient - Update summarization.py to use LLMClient BREAKING CHANGE: llms.py module removed, use llm_client.py instead
- Use LLMClient.get_model_config() for model resolution - Add custom exception handling with ModelValidationError and APIKeyMissingError - Validate API keys based on resolved provider - Improve error messages with abstracted terminology
- Add test_llm_client.py with unit tests for LLMClient - Update conftest.py with LLMClient mock fixtures - Update test_llms.py imports for compatibility
- Update test_api.py imports - Update test_contextual_grounding.py imports - Update test_contextual_grounding_integration.py imports - Update test_llm_judge_evaluation.py imports - Update test_long_term_memory.py imports - Update test_memory_compaction.py imports - Update test_memory_strategies.py imports - Update test_no_worker_mode.py imports - Update test_query_optimization_errors.py imports - Update test_summarization.py imports
- Update pyproject.toml with any new dependencies - Sync uv.lock file
- Add LLMClient documentation for AI assistant context - Update project structure references
- Exclude dev_docs directory from version control - Keep development documentation local only
Split the monolithic llm_client.py into a proper package structure: - llm/exceptions.py: LLMClientError, ModelValidationError, APIKeyMissingError - llm/types.py: ChatCompletionResponse, EmbeddingResponse, LLMBackend Protocol - llm/client.py: LLMClient class with all methods - llm/__init__.py: Re-exports for clean imports This improves code organization and makes it easier to: - Find specific components (exceptions, types, client) - Test individual modules - Extend the package with new backends
Update all application code imports from: from agent_memory_server.llm_client import ... to: from agent_memory_server.llm import ... Files updated: - api.py - extraction.py - long_term_memory.py - main.py - memory_strategies.py - summarization.py
Update all test file imports from: from agent_memory_server.llm_client import ... to: from agent_memory_server.llm import ... Update mock patch paths from: agent_memory_server.llm_client.LLMClient to: agent_memory_server.llm.client.LLMClient Also removed duplicate mock_llm_client fixture in conftest.py.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Issue #105
This PR replaces the custom LLM wrapper classes in
llms.pywith a unifiedLLMClientabstraction layer backed by LiteLLM.Changes
New:
agent_memory_server/llm/- Unified LLM client package:llm/client.py-LLMClientclass with:LLMClient.create_chat_completion()- Async chat completionsLLMClient.create_embedding()- Async embeddingsLLMClient.get_model_config()- Model metadata resolutionLLMClient.optimize_query()- Query optimization for vector searchllm/types.py-ChatCompletionResponse,EmbeddingResponsedataclasses,LLMBackendProtocolllm/exceptions.py-LLMClientError,ModelValidationError,APIKeyMissingErrorllm/__init__.py- Re-exports for clean importsRemoved:
agent_memory_server/llms.py- Deleted (replaced byllm/package)Updated:
main.py- UseLLMClient.get_model_config()for startup validationextraction.py- UseLLMClient.create_chat_completion()summarization.py- UseLLMClient.create_chat_completion()memory_strategies.py- UseLLMClient.create_chat_completion()long_term_memory.py- UseLLMClient.create_embedding()api.py- UseLLMClient.get_model_config()Testing
tests/test_llm_client.pywith unit tests for the new clientExample
Future Work
Any sort of gateway connection
Embeddings Consolidation
Context
During this refactoring, I identified that embeddings are NOT yet consolidated through
LLMClient:LLMClient.create_chat_completion()✅langchain_openai.OpenAIEmbeddings,langchain_aws.BedrockEmbeddings❌LLMClient.create_embedding()Why Not Included in This PR
langchain_redis.RedisVectorStore) require the LangChainEmbeddingsinterface, not raw embedding functionsLLMClientEmbeddingsclass that implements LangChain'sEmbeddingsABCFuture Implementation
See
dev_docs/embeddings_consolidation_plan.mdfor the full plan. Summary:LLMClientEmbeddings(Embeddings)wrapper inagent_memory_server/llm/embeddings.pyLLMClient.create_embedding()internallyvectorstore_factory.pyto useLLMClientEmbeddingsinstead of direct LangChain importslangchain_openai.OpenAIEmbeddingsandlangchain_aws.BedrockEmbeddingsimportsDecision
Defer to separate PR - Keep this PR focused on the core abstraction. Embeddings consolidation is a follow-up task that can be done independently.