π€ An intelligent LLM-based chatbot with advanced conversation memory management. Supports multiple AI providers (OpenAI, Groq, Gemini) with automatic conversation summarization, context retrieval, and persistent memory across sessions.
- Multi-Provider Support: Seamlessly switch between OpenAI, Groq, and Gemini APIs
- Intelligent Memory Management: Automatic conversation summarization and context compression
- Session-Based Conversations: Maintain multiple independent chat sessions
- Context-Aware Responses: LLM automatically retrieves relevant past context
- Memory Search: Search through conversation history for specific topics
- Fact Extraction: Automatically extract and store key facts from conversations
- Streaming Support: Real-time streaming responses for better UX
- Export/Import: Save and restore conversation sessions
- Automatic Summarization: Condenses old messages when memory threshold is reached
- Smart Context Window: Includes relevant summaries + recent messages for LLM
- Key Facts Extraction: LLM extracts important facts from conversations
- Memory Search: Find relevant past messages by semantic similarity
- Usage Statistics: Track memory usage and conversation metrics
- Clone the repository:
git clone <repository-url>
cd memory-management-system- Install dependencies:
pip install -r requirements.txtCreate a .env file with your API keys:
# Required: At least one API key
OPENAI_API_KEY=sk-your-openai-key
GROQ_API_KEY=gsk_your-groq-key
GEMINI_API_KEY=your-gemini-key
# Optional: Server configuration
HOST=0.0.0.0
PORT=8000
RELOAD=truepython main.pyThe server will start at http://localhost:8000
API documentation: http://localhost:8000/docs
import requests
# Start a conversation
response = requests.post("http://localhost:8000/chat", json={
"message": "Hello! I'm working on a Python project.",
"provider": "openai", # or "groq", "gemini"
"temperature": 0.7
})
print(response.json())
# {
# "response": "Hello! I'd be happy to help...",
# "session_id": "abc123...",
# "timestamp": "2025-12-09T10:30:00",
# "metadata": {...}
# }# Use the same session_id to continue
response = requests.post("http://localhost:8000/chat", json={
"message": "Can you remind me what we discussed?",
"session_id": "abc123..." # Same session
})response = requests.post("http://localhost:8000/chat", json={
"message": "Tell me a story",
"stream": True
}, stream=True)
for line in response.iter_lines():
if line:
print(line.decode('utf-8'))# Get conversation summary
summary = requests.get("http://localhost:8000/memory/abc123/summary")
print(summary.json())
# Extract key facts
facts = requests.get("http://localhost:8000/memory/abc123/facts")
print(facts.json())
# Search conversation history
results = requests.post("http://localhost:8000/search/abc123", params={
"query": "Python project",
"top_k": 5
})
print(results.json())# Create new session with custom settings
session = requests.post("http://localhost:8000/sessions/new", json={
"provider": "groq",
"model": "llama-3.1-70b-versatile",
"system_prompt": "You are a helpful coding assistant."
})
# List all sessions
sessions = requests.get("http://localhost:8000/sessions")
# Delete session
requests.delete("http://localhost:8000/sessions/abc123")βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Layer β
β (FastAPI - REST endpoints for chat & management) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β Chatbot Engine β
β (Session management, conversation orchestration) β
ββββββββββ¬ββββββββββββββββββββββββββββββββ¬βββββββββββββββββ
β β
ββββββββββΌβββββββββββββββ ββββββββββββββΌββββββββββββββββ
β Memory Manager β β LLM Service β
β β’ Store messages β β β’ OpenAI client β
β β’ Summarization β β β’ Groq client β
β β’ Context retrieval β β β’ Gemini client β
β β’ Search & facts β β β’ Unified interface β
βββββββββββββββββββββββββ ββββββββββββββββββββββββββββββ
app/api.py: FastAPI endpoints for chatbot operationsapp/chatbot.py: Main chatbot engine with session managementapp/memory_manager.py: Conversation memory with LLM-based summarizationapp/llm_service.py: Unified interface for multiple LLM providersapp/prompts.py: System prompts and templatesapp/models.py: Provider configurations and metadataapp/settings.py: Application settings management
- Models:
gpt-4o,gpt-4,gpt-4-turbo,gpt-3.5-turbo - Best For: High-quality responses, general purpose
- Required:
OPENAI_API_KEY
- Models:
llama-3.1-70b-versatile,llama-3.1-8b-instant,mixtral-8x7b-32768 - Best For: Ultra-fast inference, cost-effective
- Required:
GROQ_API_KEY
- Models:
gemini-1.5-pro,gemini-1.5-flash,gemini-1.0-pro - Best For: Long context, multimodal tasks
- Required:
GEMINI_API_KEY
Edit data/model_config.json:
{
"default_provider": "openai",
"memory_settings": {
"max_messages": 50,
"summary_threshold": 20,
"context_window": 10
}
}| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Send message and get response |
/sessions/new |
POST | Create new chat session |
/sessions |
GET | List all sessions |
/sessions/{id} |
DELETE | Delete session |
| Endpoint | Method | Description |
|---|---|---|
/history/{id} |
GET | Get conversation history |
/history/{id}/clear |
POST | Clear conversation |
/memory/{id}/summary |
GET | Get memory summary |
/memory/{id}/facts |
GET | Extract key facts |
/search/{id} |
POST | Search memory |
/stats/{id} |
GET | Get session statistics |
| Endpoint | Method | Description |
|---|---|---|
/export/{id} |
GET | Export conversation |
/import |
POST | Import conversation |
/provider/{id}/switch |
POST | Switch LLM provider |
/health |
GET | Health check |
import requests
url = "http://localhost:8000"
# Start coding session
r1 = requests.post(f"{url}/chat", json={
"message": "I'm building a REST API with FastAPI",
"provider": "openai"
})
session_id = r1.json()["session_id"]
# Continue - chatbot remembers context
r2 = requests.post(f"{url}/chat", json={
"message": "How do I add authentication?",
"session_id": session_id
})
# Later - retrieve what was discussed
summary = requests.get(f"{url}/memory/{session_id}/summary")
print(summary.json()["summary"])# Session 1: Work project
work = requests.post(f"{url}/sessions/new", json={
"provider": "groq",
"system_prompt": "You are a Python expert helping with a work project."
})
work_id = work.json()["session_id"]
# Session 2: Personal learning
learn = requests.post(f"{url}/sessions/new", json={
"provider": "openai",
"system_prompt": "You are a patient tutor teaching machine learning."
})
learn_id = learn.json()["session_id"]
# Chat in different contexts
requests.post(f"{url}/chat", json={
"message": "Review this production code",
"session_id": work_id
})
requests.post(f"{url}/chat", json={
"message": "Explain backpropagation simply",
"session_id": learn_id
})from app.chatbot import Chatbot
# Initialize chatbot
bot = Chatbot(provider="openai", model="gpt-4o")
# Chat
response = bot.chat("Hello! Help me with Python.")
print(response["response"])
# Continue conversation
response = bot.chat("What's a decorator?")
print(response["response"])
# Get conversation summary
summary = bot.get_memory_summary()
print(summary)
# Extract facts
facts = bot.get_key_facts()
for fact in facts:
print(f"β’ {fact}")
# Export for later
data = bot.export_conversation()
# Save data to file or database
# Import later
bot2 = Chatbot()
bot2.import_conversation(data)- Storage: Messages stored in deque with configurable max size
- Threshold Check: When messages exceed threshold (default: 20)
- Summarization: LLM summarizes older messages
- Compression: Summarized messages removed, summary stored
- Context Building: Recent messages + summaries sent to LLM
ββββββββββββββββββββββββββββββββββββββββ
β System Prompt β
ββββββββββββββββββββββββββββββββββββββββ€
β Summary 1 (messages 1-20) β
β Summary 2 (messages 21-40) β
ββββββββββββββββββββββββββββββββββββββββ€
β Recent Message 1 (user) β
β Recent Message 2 (assistant) β
β Recent Message 3 (user) β
β ... β
β Current Message (user) β
ββββββββββββββββββββββββββββββββββββββββ
pytest tests/# Start conversation
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello!", "provider": "openai"}'
# Continue conversation (use session_id from above)
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "What did we discuss?", "session_id": "abc123..."}'
# Get history
curl http://localhost:8000/history/abc123...
# Get summary
curl http://localhost:8000/memory/abc123.../summaryError: OpenAI API key not found
Solution: Ensure .env file exists with valid API key:
echo "OPENAI_API_KEY=sk-your-key" > .envImportError: No module named 'openai'
Solution: Install dependencies:
pip install -r requirements.txtResponse doesn't include past context
Solution: Verify session_id is being passed correctly. Each session maintains separate memory.
| Provider | Speed | Cost | Context | Best For |
|---|---|---|---|---|
| Groq | β‘οΈβ‘οΈβ‘οΈ | π° | 32K | Fast responses |
| GPT-3.5 | β‘οΈβ‘οΈ | π° | 16K | Balanced |
| GPT-4o | β‘οΈ | π°π° | 128K | Complex tasks |
| Gemini Pro | β‘οΈβ‘οΈ | π°π° | 1M+ | Long context |
- Max Messages: 50 (configurable)
- Summary Threshold: 20 messages
- Compression Ratio: ~5:1 (20 messages β 1 summary)
- Context Window: Last 10 messages + summaries
- Vector embeddings for semantic search
- PostgreSQL/Redis backend for persistence
- Multi-user support with authentication
- Conversation branching and forking
- Advanced analytics and insights
- WebSocket support for real-time chat
- Integration with LangChain/LlamaIndex
- Voice input/output support
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License - see LICENSE file for details
- OpenAI for GPT models
- Groq for ultra-fast inference
- Google for Gemini models
- FastAPI for the excellent web framework
For issues or questions:
- Open an issue on GitHub
- Check documentation at
/docs - Review API reference at
/openapi.json
Built with β€οΈ using Python, FastAPI, and LLMs from app.client import ModelClient
client = ModelClient(provider="openai")
client.switch_provider("groq") client.switch_provider("gemini")
client.switch_model("gpt-3.5-turbo")
if client.has_capability("vision"): print("Provider supports vision!")
config = client.get_client_config() print(config)
### Running the API
```bash
uvicorn app.api:app --reload
curl -X POST http://localhost:8000/process \
-H "Content-Type: application/json" \
-d @data/sample_input.jsonmemory-management-system/
βββ app/
β βββ __init__.py
β βββ api.py # FastAPI endpoints
β βββ client.py # Model client manager
β βββ composer.py # Main processing logic
β βββ intent.py # Intent classification
β βββ models.py # Model provider configs
β βββ parser.py # Input validation
β βββ strings.py # String constants
βββ utils/
β βββ notes.py # Note generation
β βββ rephrase.py # Query rephrasing
βββ data/
β βββ model_config.json
β βββ sample_input.json
β βββ task_execution_example.json
β βββ memory_update_example.json
βββ scripts/
β βββ model_client_examples.py
βββ docs/
βββ tests/
βββ main.py
βββ requirements.txt
βββ README.md
See scripts/model_client_examples.py for comprehensive examples:
python scripts/model_client_examples.pyEdit data/model_config.json to customize:
- Default provider and models
- Fallback order
- Timeout and retry settings
- Provider-specific preferences
MIT