A semantic knowledge management system that combines graph-based knowledge representation with vector embeddings for information storage, retrieval, and synthesis.
Memory Engine is an experimental knowledge management system that transforms unstructured text into a structured, searchable knowledge graph. It combines graph databases with vector embeddings to create a foundation for applications that can understand, connect, and reason about information.
This is a personal open-source project developed for learning and research purposes. No guarantees are made regarding reliability, security, or suitability for production use. Use at your own risk.
This project is currently in active development (v0.5.0 - Orchestrator Integration) and should be considered experimental.
Our goal is to create a truly open and accessible knowledge management system that works with:
- Any AI model: Commercial APIs (OpenAI, Anthropic, Google) and local models (Ollama, Hugging Face)
- Any deployment: From laptop development to distributed production systems
- Any data: Text, documents, structured data, and multimedia content
We aim to eliminate dependency on paid APIs by providing full support for local model execution, making advanced knowledge management accessible to everyone.
Input: Unstructured text, documents, or data
Output: Structured knowledge with automatic relationships and semantic search capabilities
- Knowledge Ingestion: Feed text/documents β Engine extracts entities, facts, and relationships β Stores in graph database
- Knowledge Retrieval: Query in natural language β Engine searches semantically β Returns relevant information with context
- Automatic Processing: The engine handles complexity internally - relationship discovery, quality assessment, versioning, and optimization
- Multi-LLM Support: 5 different LLM providers (Gemini, OpenAI, Anthropic, Ollama, HuggingFace)
- LLM Independence: Fallback chains and circuit breaker pattern for resilience
- Local Operation: Complete offline capabilities with Ollama and HuggingFace Transformers
- Automatic Relationship Discovery: Detects and creates relationships between knowledge entities
- Advanced Caching: Multi-level caching with TTL, memory limits, and intelligent invalidation
- Connection Pooling: Health monitoring and configurable pool management
- Query Optimization: Prepared statements and batch processing for high throughput
- Memory Management: Garbage collection optimization and automatic resource cleanup
- Health Monitoring: Comprehensive system health checks and service monitoring
- CLI Tools: Complete command-line interface for all management operations
- Migration Tools: Backend migration utilities with multiple strategies
- Backup & Restore: Automated backup with compression and retention policies
- Plugin Architecture: Custom storage backends, LLM providers, and embedding providers
- Data Export/Import: Multiple formats (JSON, CSV, XML, GraphML, Cypher, Gremlin, RDF)
- Metrics Collection: Prometheus-compatible metrics with counters, gauges, histograms
- Semantic Search: Multi-provider vector embeddings with modular vector stores
- Modular Storage: Choose from JanusGraph, SQLite, or JSON backends
- Quality Enhancement: Automated quality assessment and contradiction resolution
- Version Control: Complete change tracking and rollback capabilities
- Basic Security Features: Authentication, RBAC, encryption, and audit logging (educational purposes)
- Privacy Controls: Fine-grained knowledge privacy levels and access control
- Flexible Integration: MCP (Module Communication Protocol) interface for external systems
- Agent Support: Google ADK integration for conversational knowledge interactions
- Python 3.8+
- Docker & Docker Compose (optional, for JanusGraph/Milvus)
- At least one LLM provider API key:
- Google Gemini API key (Get one here)
- OpenAI API key (Get one here)
- Anthropic API key (Get one here)
- Or use local models with Ollama or HuggingFace (no API key needed)
# Clone the repository
git clone https://github.com/Celebr4tion/memory-engine.git
cd memory-engine
# Run automated setup
./scripts/setup.sh
The setup script will:
- Check Python version compatibility
- Create virtual environment
- Install dependencies
- Create configuration template
- Set up development tools
# Edit the .env file created by setup
# Set your preferred LLM provider API keys (at least one required)
GOOGLE_API_KEY="your-gemini-api-key" # For Gemini
OPENAI_API_KEY="your-openai-api-key" # For OpenAI GPT
ANTHROPIC_API_KEY="your-anthropic-api-key" # For Claude
HUGGINGFACE_API_KEY="your-hf-api-key" # For HuggingFace API (optional)
# Optional: Set environment (defaults to development)
ENVIRONMENT="development"
For production storage backends:
# Start JanusGraph and Milvus (optional, for production storage)
cd docker
docker-compose up -d
# Wait for services to initialize (2-3 minutes)
docker-compose logs -f
For development, you can use lightweight storage backends (SQLite/JSON) that don't require external services.
from memory_core.core.knowledge_engine import KnowledgeEngine
from memory_core.model.knowledge_node import KnowledgeNode
# Initialize the system
engine = KnowledgeEngine()
engine.connect()
# Create knowledge from text
node = KnowledgeNode(
content="Machine learning is a subset of artificial intelligence",
source="AI Textbook",
rating_truthfulness=0.9
)
# Save to knowledge graph
node_id = engine.save_node(node)
print(f"Created knowledge node: {node_id}")
# Retrieve and explore
retrieved = engine.get_node(node_id)
print(f"Content: {retrieved.content}")
Memory Engine includes a comprehensive CLI for production management:
# Initialize a new Memory Engine instance
memory-engine init --backend=sqlite --embedding=sentence_transformers
# Check system health
memory-engine health-check --detailed
# Migrate between storage backends
memory-engine migrate --from=sqlite --to=janusgraph --verify
# Export knowledge graph data
memory-engine export --format=json --output=backup.json --include-metadata
# Import data from various formats
memory-engine import --file=data.json --merge-duplicates
# Create system backups
memory-engine backup --strategy=full --compression=gzip
# Restore from backup
memory-engine restore --backup=backup_12345 --clear-existing
# Manage plugins
memory-engine plugins list --type=storage
memory-engine plugins install custom-backend
# Configuration management
memory-engine config show --section=storage
memory-engine config set storage.backend janusgraph
memory-engine config validate
# System status
memory-engine status
memory-engine version
# Orchestrator Integration (v0.5.0+)
# Start streaming MCP operations
memory-engine mcp stream-query --query="knowledge about AI" --batch-size=50
# Manage event system
memory-engine events list --status=pending
memory-engine events replay --from-timestamp=1234567890
# Module registry management
memory-engine modules list --capabilities
memory-engine modules register my-custom-module
# Advanced GraphQL-like queries
memory-engine query build --type=nodes --filter="content contains 'AI'" --limit=10
memory-engine query execute --query-file=complex_query.json
Document | Description |
---|---|
π Setup Guide | Complete installation and configuration instructions |
βοΈ Configuration | Basic configuration and environment setup |
π§ Advanced Configuration | Advanced configuration system |
ποΈ Architecture | System architecture and component interactions |
ποΈ Project Structure | Detailed project organization and structure |
π‘ API Reference | Complete API documentation including MCP interface |
π Security Framework | Authentication, RBAC, encryption, and privacy controls |
π§ Troubleshooting | Common issues and solutions |
Explore practical examples in the examples/
directory:
- Basic Usage: Core operations and workflows
- Knowledge Extraction: Text processing and knowledge extraction
- MCP Integration: Using the Module Communication Protocol
- Security Framework: Authentication, RBAC, encryption, and privacy controls
- Advanced Queries: Complex querying and analytics
- Knowledge Synthesis: Question answering and insight discovery
# Ensure infrastructure is running
cd docker && docker-compose up -d
# Run basic usage example
python examples/basic_usage.py
# Run knowledge extraction demo
python examples/knowledge_extraction.py
# Test MCP interface
python examples/mcp_client_example.py
# Try configuration system
python examples/config_example.py
Memory Engine includes a comprehensive test suite organized by type:
# Run all tests
./scripts/test.sh all
# Run only unit tests (fast, no external dependencies)
./scripts/test.sh unit
# Run integration tests (requires JanusGraph and Milvus)
./scripts/test.sh integration
# Run tests with coverage report
./scripts/test.sh coverage
# Run specific test file
./scripts/test.sh --file config_manager
Test organization:
- Unit Tests (
tests/unit/
): Fast, isolated tests - Integration Tests (
tests/integration/
): Tests requiring external services - Component Tests (
tests/
): End-to-end component testing
Memory Engine uses a sophisticated multi-layer architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Application Layer β
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββ€
β Python API β MCP Interface β Knowledge Agentβ REST API β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββ€
β Knowledge Engine Core β
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββ€
β Knowledge β Relationship β Versioning β Rating β
β Processing β Extraction β Manager β System β
βββββββββββββββββββΌββββββββββββββββββΌββββββββββββββββββΌββββββββββββ€
β Graph Store β Vector Store β Embedding β LLM API β
β (JanusGraph) β (Milvus) β Manager β (Gemini) β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββ
- Modular Graph Storage: Multiple backend options (JanusGraph, SQLite, JSON file)
- Vector Database (Milvus): Enables semantic similarity search
- Embedding System: Generates and manages vector representations
- Processing Pipeline: Extracts and structures knowledge from text
- Versioning System: Tracks changes and enables rollbacks
- MCP Interface: Standardized API for external integration
Choose the storage backend that fits your deployment needs:
- π’ JanusGraph: Production-grade distributed graph database
- πΎ SQLite: Single-user deployments with SQL capabilities
- π JSON File: Development and testing with human-readable storage
Component | Technology | Purpose |
---|---|---|
Graph Storage | JanusGraph / SQLite / JSON | Knowledge relationships |
Vector Database | Milvus / ChromaDB / NumPy | Similarity search |
LLM Providers | Gemini / OpenAI / Anthropic / Ollama / HuggingFace | Knowledge extraction |
Embedding Providers | Gemini / OpenAI / Sentence Transformers / Ollama | Vector generation |
Agent Framework | Google ADK | Conversational interfaces |
Web Framework | FastAPI | REST API endpoints |
Language | Python 3.8+ | Core implementation |
# Unit tests only
pytest tests/ -k "not integration" -v
# All tests (requires infrastructure)
pytest tests/ -v
# With coverage
pytest tests/ --cov=memory_core --cov-report=html
# Install development dependencies
pip install pytest pytest-cov black isort mypy
# Format code
black memory_core/ tests/
isort memory_core/ tests/
# Type checking
mypy memory_core/
# Pre-commit hooks
pip install pre-commit
pre-commit install
Performance characteristics will vary depending on your hardware, data complexity, and configuration. We recommend testing with your specific use case and data to establish realistic benchmarks.
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes with tests
- Ensure all tests pass (
pytest
) - Format code (
black . && isort .
) - Commit changes (
git commit -m 'Add amazing feature'
) - Push to branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Code Quality: All code must pass linting and type checking
- Testing: Maintain >90% test coverage
- Documentation: Update docs for any API changes
- Performance: Benchmark performance-critical changes
This project is licensed under the Hippocratic License 3.0 - an ethical source license that promotes responsible use of software while protecting human rights and environmental sustainability.
- π Documentation: Check the
docs/
directory - π Issues: Report bugs or request features via GitHub Issues
- π¬ Discussions: Join conversations in GitHub Discussions
- π§ Troubleshooting: See the troubleshooting guide
- Contributing: See CONTRIBUTING.md for guidelines
- Code of Conduct: Please read our CODE_OF_CONDUCT.md
- Security: Report security issues via SECURITY.md
β οΈ Development Status: Alpha version - breaking changes expected- π Documentation: Basic setup and usage guides available
- π§ͺ Testing: Core functionality tested, expanding coverage
- π§ Stability: Experimental - not recommended for production use yet
Memory Engine - Transforming information into intelligence π§ β¨