Skip to content

A streamlined Memory Management System for conversational AI. It analyzes user history, generates concise context notes, rewrites queries for clarity, and classifies user intent with confidence scoring. Outputs a clean, structured JSON object suitable for chatbots and intelligent assistants.

Notifications You must be signed in to change notification settings

Rafay-AABS/memory-management-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Chatbot Memory Management System

πŸ€– An intelligent LLM-based chatbot with advanced conversation memory management. Supports multiple AI providers (OpenAI, Groq, Gemini) with automatic conversation summarization, context retrieval, and persistent memory across sessions.

🌟 Features

Core Capabilities

  • Multi-Provider Support: Seamlessly switch between OpenAI, Groq, and Gemini APIs
  • Intelligent Memory Management: Automatic conversation summarization and context compression
  • Session-Based Conversations: Maintain multiple independent chat sessions
  • Context-Aware Responses: LLM automatically retrieves relevant past context
  • Memory Search: Search through conversation history for specific topics
  • Fact Extraction: Automatically extract and store key facts from conversations
  • Streaming Support: Real-time streaming responses for better UX
  • Export/Import: Save and restore conversation sessions

Memory Features

  • Automatic Summarization: Condenses old messages when memory threshold is reached
  • Smart Context Window: Includes relevant summaries + recent messages for LLM
  • Key Facts Extraction: LLM extracts important facts from conversations
  • Memory Search: Find relevant past messages by semantic similarity
  • Usage Statistics: Track memory usage and conversation metrics

πŸš€ Quick Start

Installation

  1. Clone the repository:
git clone <repository-url>
cd memory-management-system
  1. Install dependencies:
pip install -r requirements.txt

Environment Setup

Create a .env file with your API keys:

# Required: At least one API key
OPENAI_API_KEY=sk-your-openai-key
GROQ_API_KEY=gsk_your-groq-key
GEMINI_API_KEY=your-gemini-key

# Optional: Server configuration
HOST=0.0.0.0
PORT=8000
RELOAD=true

Running the Server

python main.py

The server will start at http://localhost:8000

API documentation: http://localhost:8000/docs

πŸ“– API Usage

Basic Chat

import requests

# Start a conversation
response = requests.post("http://localhost:8000/chat", json={
    "message": "Hello! I'm working on a Python project.",
    "provider": "openai",  # or "groq", "gemini"
    "temperature": 0.7
})

print(response.json())
# {
#   "response": "Hello! I'd be happy to help...",
#   "session_id": "abc123...",
#   "timestamp": "2025-12-09T10:30:00",
#   "metadata": {...}
# }

Continue Conversation

# Use the same session_id to continue
response = requests.post("http://localhost:8000/chat", json={
    "message": "Can you remind me what we discussed?",
    "session_id": "abc123..."  # Same session
})

Streaming Responses

response = requests.post("http://localhost:8000/chat", json={
    "message": "Tell me a story",
    "stream": True
}, stream=True)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

Memory Management

# Get conversation summary
summary = requests.get("http://localhost:8000/memory/abc123/summary")
print(summary.json())

# Extract key facts
facts = requests.get("http://localhost:8000/memory/abc123/facts")
print(facts.json())

# Search conversation history
results = requests.post("http://localhost:8000/search/abc123", params={
    "query": "Python project",
    "top_k": 5
})
print(results.json())

Session Management

# Create new session with custom settings
session = requests.post("http://localhost:8000/sessions/new", json={
    "provider": "groq",
    "model": "llama-3.1-70b-versatile",
    "system_prompt": "You are a helpful coding assistant."
})

# List all sessions
sessions = requests.get("http://localhost:8000/sessions")

# Delete session
requests.delete("http://localhost:8000/sessions/abc123")

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      API Layer                          β”‚
β”‚  (FastAPI - REST endpoints for chat & management)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Chatbot Engine                         β”‚
β”‚  (Session management, conversation orchestration)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                               β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Memory Manager      β”‚   β”‚     LLM Service            β”‚
β”‚ β€’ Store messages      β”‚   β”‚ β€’ OpenAI client            β”‚
β”‚ β€’ Summarization       β”‚   β”‚ β€’ Groq client              β”‚
β”‚ β€’ Context retrieval   β”‚   β”‚ β€’ Gemini client            β”‚
β”‚ β€’ Search & facts      β”‚   β”‚ β€’ Unified interface        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

  • app/api.py: FastAPI endpoints for chatbot operations
  • app/chatbot.py: Main chatbot engine with session management
  • app/memory_manager.py: Conversation memory with LLM-based summarization
  • app/llm_service.py: Unified interface for multiple LLM providers
  • app/prompts.py: System prompts and templates
  • app/models.py: Provider configurations and metadata
  • app/settings.py: Application settings management

πŸ”§ Configuration

Model Providers

OpenAI

  • Models: gpt-4o, gpt-4, gpt-4-turbo, gpt-3.5-turbo
  • Best For: High-quality responses, general purpose
  • Required: OPENAI_API_KEY

Groq

  • Models: llama-3.1-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768
  • Best For: Ultra-fast inference, cost-effective
  • Required: GROQ_API_KEY

Gemini

  • Models: gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro
  • Best For: Long context, multimodal tasks
  • Required: GEMINI_API_KEY

Memory Configuration

Edit data/model_config.json:

{
  "default_provider": "openai",
  "memory_settings": {
    "max_messages": 50,
    "summary_threshold": 20,
    "context_window": 10
  }
}

πŸ“š API Reference

Chat Endpoints

Endpoint Method Description
/chat POST Send message and get response
/sessions/new POST Create new chat session
/sessions GET List all sessions
/sessions/{id} DELETE Delete session

Memory Endpoints

Endpoint Method Description
/history/{id} GET Get conversation history
/history/{id}/clear POST Clear conversation
/memory/{id}/summary GET Get memory summary
/memory/{id}/facts GET Extract key facts
/search/{id} POST Search memory
/stats/{id} GET Get session statistics

Management Endpoints

Endpoint Method Description
/export/{id} GET Export conversation
/import POST Import conversation
/provider/{id}/switch POST Switch LLM provider
/health GET Health check

πŸ’‘ Usage Examples

Example 1: Coding Assistant with Memory

import requests

url = "http://localhost:8000"

# Start coding session
r1 = requests.post(f"{url}/chat", json={
    "message": "I'm building a REST API with FastAPI",
    "provider": "openai"
})
session_id = r1.json()["session_id"]

# Continue - chatbot remembers context
r2 = requests.post(f"{url}/chat", json={
    "message": "How do I add authentication?",
    "session_id": session_id
})

# Later - retrieve what was discussed
summary = requests.get(f"{url}/memory/{session_id}/summary")
print(summary.json()["summary"])

Example 2: Multi-Session Management

# Session 1: Work project
work = requests.post(f"{url}/sessions/new", json={
    "provider": "groq",
    "system_prompt": "You are a Python expert helping with a work project."
})
work_id = work.json()["session_id"]

# Session 2: Personal learning
learn = requests.post(f"{url}/sessions/new", json={
    "provider": "openai",
    "system_prompt": "You are a patient tutor teaching machine learning."
})
learn_id = learn.json()["session_id"]

# Chat in different contexts
requests.post(f"{url}/chat", json={
    "message": "Review this production code",
    "session_id": work_id
})

requests.post(f"{url}/chat", json={
    "message": "Explain backpropagation simply",
    "session_id": learn_id
})

Example 3: Python SDK Usage

from app.chatbot import Chatbot

# Initialize chatbot
bot = Chatbot(provider="openai", model="gpt-4o")

# Chat
response = bot.chat("Hello! Help me with Python.")
print(response["response"])

# Continue conversation
response = bot.chat("What's a decorator?")
print(response["response"])

# Get conversation summary
summary = bot.get_memory_summary()
print(summary)

# Extract facts
facts = bot.get_key_facts()
for fact in facts:
    print(f"β€’ {fact}")

# Export for later
data = bot.export_conversation()
# Save data to file or database

# Import later
bot2 = Chatbot()
bot2.import_conversation(data)

πŸ” How Memory Works

Memory Lifecycle

  1. Storage: Messages stored in deque with configurable max size
  2. Threshold Check: When messages exceed threshold (default: 20)
  3. Summarization: LLM summarizes older messages
  4. Compression: Summarized messages removed, summary stored
  5. Context Building: Recent messages + summaries sent to LLM

Context Window

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  System Prompt                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Summary 1 (messages 1-20)           β”‚
β”‚  Summary 2 (messages 21-40)          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Recent Message 1 (user)             β”‚
β”‚  Recent Message 2 (assistant)        β”‚
β”‚  Recent Message 3 (user)             β”‚
β”‚  ...                                 β”‚
β”‚  Current Message (user)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ§ͺ Testing

Run Tests

pytest tests/

Manual Testing with cURL

# Start conversation
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!", "provider": "openai"}'

# Continue conversation (use session_id from above)
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What did we discuss?", "session_id": "abc123..."}'

# Get history
curl http://localhost:8000/history/abc123...

# Get summary
curl http://localhost:8000/memory/abc123.../summary

πŸ› Troubleshooting

API Key Issues

Error: OpenAI API key not found

Solution: Ensure .env file exists with valid API key:

echo "OPENAI_API_KEY=sk-your-key" > .env

Import Errors

ImportError: No module named 'openai'

Solution: Install dependencies:

pip install -r requirements.txt

Memory Issues

Response doesn't include past context

Solution: Verify session_id is being passed correctly. Each session maintains separate memory.

πŸ“Š Performance

Provider Comparison

Provider Speed Cost Context Best For
Groq ⚑️⚑️⚑️ πŸ’° 32K Fast responses
GPT-3.5 ⚑️⚑️ πŸ’° 16K Balanced
GPT-4o ⚑️ πŸ’°πŸ’° 128K Complex tasks
Gemini Pro ⚑️⚑️ πŸ’°πŸ’° 1M+ Long context

Memory Efficiency

  • Max Messages: 50 (configurable)
  • Summary Threshold: 20 messages
  • Compression Ratio: ~5:1 (20 messages β†’ 1 summary)
  • Context Window: Last 10 messages + summaries

πŸ›£οΈ Roadmap

  • Vector embeddings for semantic search
  • PostgreSQL/Redis backend for persistence
  • Multi-user support with authentication
  • Conversation branching and forking
  • Advanced analytics and insights
  • WebSocket support for real-time chat
  • Integration with LangChain/LlamaIndex
  • Voice input/output support

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments

  • OpenAI for GPT models
  • Groq for ultra-fast inference
  • Google for Gemini models
  • FastAPI for the excellent web framework

πŸ“ž Support

For issues or questions:

  • Open an issue on GitHub
  • Check documentation at /docs
  • Review API reference at /openapi.json

Built with ❀️ using Python, FastAPI, and LLMs from app.client import ModelClient

Initialize with a provider

client = ModelClient(provider="openai")

Switch providers easily

client.switch_provider("groq") client.switch_provider("gemini")

Switch models within a provider

client.switch_model("gpt-3.5-turbo")

Check capabilities

if client.has_capability("vision"): print("Provider supports vision!")

Get configuration

config = client.get_client_config() print(config)


### Running the API

```bash
uvicorn app.api:app --reload

Example API Request

curl -X POST http://localhost:8000/process \
  -H "Content-Type: application/json" \
  -d @data/sample_input.json

Project Structure

memory-management-system/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ api.py           # FastAPI endpoints
β”‚   β”œβ”€β”€ client.py        # Model client manager
β”‚   β”œβ”€β”€ composer.py      # Main processing logic
β”‚   β”œβ”€β”€ intent.py        # Intent classification
β”‚   β”œβ”€β”€ models.py        # Model provider configs
β”‚   β”œβ”€β”€ parser.py        # Input validation
β”‚   └── strings.py       # String constants
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ notes.py         # Note generation
β”‚   └── rephrase.py      # Query rephrasing
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ model_config.json
β”‚   β”œβ”€β”€ sample_input.json
β”‚   β”œβ”€β”€ task_execution_example.json
β”‚   └── memory_update_example.json
β”œβ”€β”€ scripts/
β”‚   └── model_client_examples.py
β”œβ”€β”€ docs/
β”œβ”€β”€ tests/
β”œβ”€β”€ main.py
β”œβ”€β”€ requirements.txt
└── README.md

Model Switching Examples

See scripts/model_client_examples.py for comprehensive examples:

python scripts/model_client_examples.py

Configuration

Edit data/model_config.json to customize:

  • Default provider and models
  • Fallback order
  • Timeout and retry settings
  • Provider-specific preferences

License

MIT

About

A streamlined Memory Management System for conversational AI. It analyzes user history, generates concise context notes, rewrites queries for clarity, and classifies user intent with confidence scoring. Outputs a clean, structured JSON object suitable for chatbots and intelligent assistants.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages