- Overview
- System Architecture
- Core Components
- Memory Cognitive Architecture
- Installation & Deployment
- Configuration
- API Reference
- Memory Operations
- Search Capabilities
- Security & Monitoring
- Development Guide
- Advanced Features
AI Memory Service is an advanced cognitive memory system designed to transform AI memory from simple storage into a sophisticated architecture that evaluates, reinforces, and connects knowledge like a human brain. By combining MongoDB Atlas's vector search capabilities with AWS Bedrock's AI services, the system builds memory networks that prioritize important information, strengthen connections through repetition, and recall relevant context precisely when needed.
Key features include:
- Importance-weighted memory storage and retrieval
- Dynamic reinforcement and decay of memories
- Semantic merging of related memories
- Hybrid search combining vector and full-text capabilities
- Contextual conversation retrieval with AI-powered summarization
- Automatic importance assessment of new information
- Memory pruning based on cognitive principles
graph TD
Client(Client Application) --> FastAPI[FastAPI Service]
FastAPI --> MemSvc[Memory Service]
FastAPI --> ConvSvc[Conversation Service]
FastAPI --> BedSvc[Bedrock Service]
MemSvc --> Atlas[(MongoDB Atlas)]
ConvSvc --> Atlas
MemSvc & ConvSvc --> BedSvc
BedSvc --> EmbedModel[Embedding Model]
BedSvc --> LLMModel[LLM Model]
Atlas --> ConvColl[Conversations Collection]
Atlas --> MemColl[Memory Nodes Collection]
ConvColl --> ConvVecIdx[Vector Search Index]
ConvColl --> ConvTxtIdx[Fulltext Search Index]
MemColl --> MemVecIdx[Vector Search Index]
The architecture follows a service-oriented design where each component has a specific responsibility in the cognitive memory pipeline, leveraging MongoDB Atlas for advanced vector storage and AWS Bedrock for AI reasoning capabilities.
-
FastAPI Service Layer
- Purpose: Handles HTTP requests and orchestrates memory operations
- Technologies: Python 3.10+, FastAPI 0.115+
- Key endpoints: Add conversation messages, retrieve memories, search memories
-
MongoDB Atlas Integration
- Purpose: Provides persistent storage with vector search capabilities
- Collections:
- Conversations: Stores raw conversation history with embeddings
- Memory Nodes: Stores processed memory nodes with importance ratings
- Indexes:
- Vector search indexes for semantic retrieval
- Full-text search indexes for keyword retrieval
- Importance indexes for memory prioritization
-
AWS Bedrock Service
- Purpose: Delivers AI capabilities for embedding and reasoning
- Models:
- Embedding Model: Generates vector representations (Titan)
- LLM Model: Performs reasoning tasks (Claude)
- Operations:
- Embedding generation for semantic search
- Importance assessment of new information
- Memory summarization and merging
- Conversation context summarization
-
Memory Service
- Purpose: Manages the cognitive memory operations
- Features:
- Memory creation with importance assessment
- Memory reinforcement and decay
- Related memory merging
- Memory pruning based on importance
-
Conversation Service
- Purpose: Handles conversation storage and retrieval
- Features:
- Conversation history storage
- Context retrieval around specific points
- Hybrid search across conversations
- AI-powered conversation summarization
The system implements a cognitive architecture for memory that mimics human memory processes:
flowchart TD
Start([New Content]) --> Embed[Generate Embeddings]
Embed --> CheckSim{Similar Memory Exists?}
CheckSim -->|Is >0.85 Similar| Reinforce[Reinforce Existing Memory]
CheckSim -->|Not Similar| Assess[Assess Importance with LLM]
Assess --> CreateSumm[Generate Summary]
CreateSumm --> CreateNode[Create Memory Node]
CreateNode --> CheckMerge{Mergeable Memories?}
CheckMerge -->|Similar 0.7-0.85| Merge[Merge Related Memories]
CheckMerge -->|Not Similar| UpdateImportance[Update Other Memories]
Merge --> UpdateImportance
Reinforce --> UpdateImportance
UpdateImportance --> CheckPrune{Memory Count > Max?}
CheckPrune -->|Yes| Prune[Prune Least Important]
CheckPrune -->|No| Complete([Complete])
Key cognitive processes:
- Importance Assessment: Using AI to evaluate memory significance
- Memory Reinforcement: Strengthening memories through repetition
- Memory Decay: Gradually reducing importance of unused memories
- Memory Merging: Combining related information for coherent knowledge
- Memory Pruning: Removing less important memories when capacity is reached
- Python 3.10+
- MongoDB Atlas account with vector search capability
- AWS account with Bedrock access
- Docker (optional)
-
Clone the repository:
git clone https://github.com/mongodb-partners/ai-memory.git cd ai-memory
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables (see Configuration section)
-
Run the application:
python main.py
-
Build the Docker image:
docker build -t ai-memory .
-
Run the container:
docker run -p 8182:8182 --env-file .env ai-memory
Configure the application using environment variables or a .env
file:
# MongoDB Atlas Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
MONGODB_DB_NAME=ai_memory
# AWS Configuration
AWS_REGION=us-east-1
EMBEDDING_MODEL_ID=amazon.titan-embed-text-v1
LLM_MODEL_ID=us.anthropic.claude-3-7-sonnet-20250219-v1:0
# Memory System Parameters
MAX_DEPTH=5
SIMILARITY_THRESHOLD=0.7
DECAY_FACTOR=0.99
REINFORCEMENT_FACTOR=1.1
# Service Configuration
SERVICE_HOST=0.0.0.0
SERVICE_PORT=8182
DEBUG=False
The cognitive behavior of the memory system can be tuned through these parameters:
- MAX_DEPTH: Maximum number of memories per user (default: 5)
- SIMILARITY_THRESHOLD: Threshold for memory reinforcement (default: 0.7)
- DECAY_FACTOR: Rate at which memories fade (default: 0.99)
- REINFORCEMENT_FACTOR: Strength of memory reinforcement (default: 1.1)
-
POST /conversation/
- Purpose: Add a message to the conversation history
- Request Body: MessageInput model
- Response: Confirmation message
- Example:
{ "user_id": "user123", "conversation_id": "conv456", "type": "human", "text": "I prefer to be contacted via email at john@example.com", "timestamp": "2023-06-10T14:30:00Z" }
-
GET /retrieve_memory/
- Purpose: Retrieve memory items, context, and similar memory nodes
- Query Parameters: user_id, text
- Response: Related conversation, conversation summary, and similar memories
- Example URL:
/retrieve_memory/?user_id=user123&text=contact preference
-
GET /health
- Purpose: Health check endpoint
- Response: Status information
Key data models:
- MessageInput: Represents a conversation message
- MemoryNode: Represents a memory node with importance scoring
- SearchRequest: Request for memory search
- ErrorResponse: Standardized error response
New memories are created from significant human messages:
- Message is converted to embeddings
- Similar memories are checked
- If no similar memory exists, importance is assessed
- A summary is generated
- The memory node is created with metadata
flowchart TD
Query[User Query] --> Embed[Generate Query Embedding]
Embed --> ParallelOps[Parallel Operations]
ParallelOps --> HybridSearch[Hybrid Search]
ParallelOps --> MemorySearch[Memory Nodes Search]
HybridSearch --> Combine[Combine Results]
MemorySearch --> CalcImp[Calculate Effective Importance]
Combine & CalcImp --> Response[Build Response]
Memories are retrieved through a sophisticated process:
- Query embeddings are generated
- Hybrid search combines vector and text search
- Memory nodes are searched directly
- Context is retrieved around matching points
- Summaries are generated for conversations
- Results are combined with importance weighing
Memories evolve through:
- Reinforcement when similar content appears
- Decay when not accessed
- Merging when related information is found
- Pruning when capacity is exceeded
flowchart TD
Start([Search Query]) --> Split[Split Processing]
Split --> Text[Text Query]
Split --> Vector[Vector Query]
Text --> TextSearch["$search Aggregation"]
Vector --> VectorSearch["$vectorSearch Aggregation"]
TextSearch & VectorSearch --> Union["$unionWith Operation"]
Union --> Group["$group by document ID"]
Group --> Weighted["Weighted Score Combination"]
Weighted --> Sort["Sort by hybrid_score"]
The system combines multiple search methodologies:
- Vector search for semantic understanding
- Full-text search for keyword precision
- Score normalization across methodologies
- Weighted combination of results
- Context retrieval and summarization
MongoDB Atlas vector search is configured for optimal performance:
- Vector dimension: 1536 (Titan embeddings)
- Similarity metric: Cosine similarity
- Query filter: User ID filtering
- numCandidates: 200 (tunable parameter)
- Use HTTPS for all API communications
- Implement authentication for API access
- Regularly rotate AWS and MongoDB credentials
- Apply least privilege principle for service users
- Consider encryption for sensitive memory content
- The system uses the
logger.py
for structured logging - Key metrics to monitor:
- Memory creation rate and distribution
- Average memory importance scores
- Prune frequency and volume
- Query latency for memory retrieval
- AWS Bedrock API usage and costs
- Log levels can be configured based on operational needs
- Extend the appropriate service module
- Update models if necessary
- Add new API endpoints in main.py
- Update MongoDB indexes if needed
- Document changes and update tests
- Test memory creation with various importance levels
- Verify memory reinforcement and decay behavior
- Benchmark hybrid search performance
- Test with different memory parameters
- Use type hints and descriptive variable names
- Document all functions with docstrings
- Use the logger for all significant operations
- Handle exceptions appropriately
- Consider backward compatibility for API changes
The system can be extended to support hierarchical memory structures with parent-child relationships between memories.
The importance evaluation can be made more sophisticated by considering:
- User feedback on memory relevance
- Time-based relevance decay
- Domain-specific importance metrics
- User behavior patterns
Future versions can incorporate:
- Image embeddings and memory
- Voice pattern memory
- Document and structured data memory
- Cross-modal associative memory
Advanced implementations can support:
- Federated memory across services
- Privacy-preserving memory operations
- Selective forgetting capabilities
- Memory encryption for sensitive data
This documentation provides a comprehensive overview of the AI-Memory-Service's cognitive architecture. For specific implementation details, refer to the comments and docstrings within the codebase.