A production-ready AI-powered recommendation system built with microservices architecture. The system provides real-time model training, multiple data source integration, and comprehensive REST APIs for building intelligent recommendation features into any application.
VRecommendation is a complete recommendation engine that combines machine learning models with a modern web interface. It supports collaborative filtering, content-based filtering, and hybrid recommendation algorithms with real-time data processing from multiple sources including databases, CSV files, REST APIs, and Kafka message queues.
The system consists of three main services:
- API Server (Go/Fiber): High-performance gateway handling authentication, request routing, and caching
- AI Server (Python/FastAPI): Machine learning engine managing model training, data processing, and predictions
- Frontend (React/TypeScript): Modern dashboard for system management and monitoring
Supporting infrastructure:
- Redis: Caching layer and session management
- Kafka: Real-time event streaming and data ingestion
- Zookeeper: Kafka coordination service
- Prometheus: Metrics collection and monitoring
- Kafka UI: Web interface for Kafka cluster management
All services run in Docker containers orchestrated with Docker Compose.
- Support for SVD, matrix factorization, and collaborative filtering algorithms
- Incremental learning from streaming data
- Automatic model retraining on configurable schedules
- Model versioning and metadata tracking
- Real-time prediction API
- CSV file processing with batch reading
- SQL database connections (MySQL, PostgreSQL)
- NoSQL database support (MongoDB)
- REST API data ingestion
- Kafka message queue consumer
- Automatic data transformation and validation
- Web-based administration dashboard
- Real-time metrics and monitoring
- Scheduled task management
- Model lifecycle management
- Data pipeline configuration
- JWT-based authentication
- Google OAuth integration
- Redis caching for fast responses
- Connection pooling and resource management
- Prometheus metrics collection
- Docker Engine 20.10 or higher
- Docker Compose 2.0 or higher
- 4GB RAM minimum (8GB recommended)
- 10GB available disk space
Optional for local development:
- Go 1.21 or higher
- Python 3.11 or higher
- Node.js 18 or higher
git clone https://github.com/yourusername/VRecommendation.git
cd VRecommendationCopy the example environment file and configure your settings:
cp example-env .envEdit the .env file with your configuration. Key variables:
# JWT Secret (REQUIRED - change this in production)
JWT_SECRET_KEY=your-secure-secret-key-here
# API Server Configuration
API_SERVER_HOST=0.0.0.0
API_SERVER_PORT=2030
# AI Server Configuration
AI_SERVER_HOST=0.0.0.0
AI_SERVER_PORT=9999
# Frontend Configuration
FRONTEND_PORT=5173
VITE_API_SERVER_URL=http://localhost:2030
VITE_AI_SERVER_URL=http://localhost:9999
# Redis Configuration
REDIS_HOST=redis
REDIS_PORT=6379
# Kafka Configuration
KAFKA_BOOTSTRAP_SERVERS=kafka:9093
KAFKA_PORT=9092docker-compose up -dThis will start all services in the background. First-time startup may take several minutes to download images and build containers.
Check that all services are running:
docker-compose psAll services should show status as "Up". Test the endpoints:
# Test API Server
curl http://localhost:2030/api/v1/ping
# Test AI Server
curl http://localhost:9999/api/v1/healthExpected responses indicate healthy services.
- Frontend Dashboard: http://localhost:5173
- API Server: http://localhost:2030
- AI Server: http://localhost:9999
- Prometheus Metrics: http://localhost:9090
- Kafka UI: http://localhost:8080
VRecommendation/
├── backend/
│ ├── ai_server/ # Python FastAPI ML service
│ │ ├── src/
│ │ │ └── ai_server/
│ │ │ ├── handlers/ # Business logic handlers
│ │ │ ├── models/ # ML model implementations
│ │ │ ├── routers/ # API route definitions
│ │ │ ├── services/ # Core services
│ │ │ ├── tasks/ # Background tasks
│ │ │ └── utils/ # Utility functions
│ │ ├── config/ # Configuration files
│ │ ├── models/ # Trained model files
│ │ ├── tasks/ # Task definitions (JSON)
│ │ ├── data/ # Data storage
│ │ ├── Dockerfile
│ │ └── pyproject.toml
│ │
│ └── api_server/ # Go Fiber gateway service
│ ├── app/
│ │ ├── controllers/ # HTTP handlers
│ │ ├── middleware/ # Custom middleware
│ │ └── models/ # Data models
│ ├── internal/
│ │ ├── auth/ # Authentication logic
│ │ ├── cache/ # Redis operations
│ │ ├── initialize/ # App initialization
│ │ └── proxy/ # AI server proxy
│ ├── pkg/ # Reusable packages
│ ├── config/ # Configuration files
│ ├── Dockerfile
│ ├── go.mod
│ └── main.go
│
├── frontend/
│ └── project/ # React TypeScript application
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── pages/ # Page components
│ │ ├── services/ # API services
│ │ ├── contexts/ # React contexts
│ │ └── hooks/ # Custom hooks
│ ├── public/ # Static assets
│ ├── Dockerfile
│ └── package.json
│
├── tests/
│ ├── demo-website/ # Demo e-commerce site
│ ├── kafka-server/ # Standalone Kafka for testing
│ └── test-data/ # Test datasets
│
├── scripts/ # Utility scripts
├── docs/ # Additional documentation
├── diagrams/ # Architecture diagrams
├── docker-compose.yml # Main orchestration file
├── Makefile # Build automation
└── README.md # This file
The system uses a single .env file in the project root. Key configuration sections:
JWT_SECRET_KEY=your-secret-key # JWT signing secret
SESSION_SECRET=your-session-secret # Session encryption key
JWT_EXPIRE_MINUTES=1440 # Token expiration (24 hours)API_SERVER_HOST=0.0.0.0
API_SERVER_PORT=2030
AI_SERVER_HOST=0.0.0.0
AI_SERVER_PORT=9999
FRONTEND_PORT=5173# MySQL
MYSQL_HOST=your-mysql-host
MYSQL_PORT=3306
MYSQL_USER=your-username
MYSQL_PASSWORD=your-password
MYSQL_DATABASE=your-database
# MongoDB
MONGODB_HOST=your-mongodb-host
MONGODB_PORT=27017
MONGODB_USERNAME=your-username
MONGODB_PASSWORD=your-passwordKAFKA_BOOTSTRAP_SERVERS=kafka:9093 # Internal broker address
KAFKA_PORT=9092 # External broker port
KAFKA_GROUP_ID=vrecom_consumer_group # Consumer group IDEach service has additional configuration files:
-
AI Server:
backend/ai_server/config/local.yaml: Main configurationrestaurant_data.yaml: Data source definitions
-
API Server:
backend/api_server/config/config.yaml: Server settingscors.yaml: CORS configuration
-
Frontend:
frontend/project/.env- API endpoint URLs
- Feature flags
Create a data chef to connect to your data source:
# Using CSV file
curl -X POST http://localhost:9999/api/v1/create_data_chef_from_csv \
-H "Content-Type: application/json" \
-d '{
"name": "my_interactions",
"path": "/app/data/interactions.csv",
"rename_columns": "userId:user_id,itemId:item_id,rating:rating"
}'
# Using SQL database
curl -X POST http://localhost:9999/api/v1/create_data_chef_from_sql \
-H "Content-Type: application/json" \
-d '{
"name": "my_interactions",
"query": "SELECT user_id, item_id, rating FROM interactions",
"rename_columns": ""
}'
# Using Kafka topic
# Configure in backend/ai_server/config/restaurant_data.yaml:
# my_kafka_data:
# type: messaging_queue
# brokers: kafka:9093
# topic: interactions
# group_id: my_consumer_groupCreate a JSON file in backend/ai_server/models/:
{
"model_name": "My Recommendation Model",
"model_id": "my_model",
"type": "svd",
"algorithm": "svd",
"hyperparameters": {
"n_components": 50,
"algorithm": "randomized",
"n_iter": 10,
"random_state": 42
},
"message": "Production recommendation model"
}Create a task file in backend/ai_server/tasks/:
{
"task_name": "my_model",
"model_id": "my_model",
"interactions_data_chef_id": "my_interactions",
"item_features_data_chef_id": null,
"user_features_data_chef_id": null,
"interval": 3600
}The interval is in seconds. The model will automatically retrain at this interval.
docker-compose restart ai_serverThe model will begin training according to the schedule.
curl "http://localhost:2030/api/v1/recommend?user_id=user123&model_id=my_model&n=10"Response format:
{
"user_id": "user123",
"model_id": "my_model",
"predictions": {
"user123": [
{"item_id": "item456", "score": 4.8},
{"item_id": "item789", "score": 4.6},
{"item_id": "item321", "score": 4.3}
]
},
"n_recommendations": 10,
"status": "completed",
"datetime": "2024-11-23T10:30:00Z"
}# Send interaction events
echo '{"user_id": "user1", "item_id": "item1", "rating": 5.0}
{"user_id": "user2", "item_id": "item2", "rating": 4.5}
{"user_id": "user3", "item_id": "item3", "rating": 4.0}' | \
docker exec -i vrecom_kafka kafka-console-producer \
--bootstrap-server localhost:9092 \
--topic interactionsdocker exec vrecom_kafka kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic interactions \
--from-beginning \
--max-messages 10docker exec vrecom_kafka kafka-consumer-groups \
--bootstrap-server localhost:9092 \
--group your_consumer_group \
--describeThis shows the current offset, lag, and consumption status.
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f ai_server
docker-compose logs -f api_server
# Last N lines
docker-compose logs --tail=100 ai_server# Restart specific service
docker-compose restart ai_server
# Restart all services
docker-compose restart
# Stop and start (full restart)
docker-compose down
docker-compose up -d# Check service status
docker-compose ps
# Check resource usage
docker stats
# View metrics in Prometheus
# Open http://localhost:9090# Flush all Redis cache
docker exec vrecom_redis redis-cli FLUSHDB
# Clear specific keys
docker exec vrecom_redis redis-cli DEL "key_pattern"GET /api/v1/ping
Response: {"message": "pong", "timestamp": "2024-11-23T10:00:00Z"}
GET /api/v1/recommend?user_id={userId}&model_id={modelId}&n={count}
Parameters:
user_id: User identifier (required)model_id: Model to use for recommendations (required)n: Number of recommendations (default: 5)
POST /api/v1/auth/login
POST /api/v1/auth/logout
GET /api/v1/auth/status
GET /api/v1/health
GET /api/v1/list_models
POST /api/v1/create_model
DELETE /api/v1/delete_model?model_id={modelId}
GET /api/v1/get_model?model_id={modelId}
GET /api/v1/list_data_chefs
POST /api/v1/create_data_chef_from_csv
POST /api/v1/create_data_chef_from_sql
POST /api/v1/create_data_chef_from_nosql
POST /api/v1/create_data_chef_from_api
DELETE /api/v1/delete_data_chef?name={chefName}
GET /api/v1/list_tasks
POST /api/v1/add_model_task
DELETE /api/v1/remove_model_task?task_name={taskName}
GET /api/v1/get_scheduler_status
POST /api/v1/stop_scheduler
POST /api/v1/restart_scheduler
POST /api/v1/recommend
{
"user_id": "user123",
"model_id": "my_model",
"n": 10
}
For AI Server (Python):
cd backend/ai_server
poetry install
poetry run serverFor API Server (Go):
cd backend/api_server
go mod download
go run main.gocd frontend/project
npm install
npm run devcd backend/ai_server
poetry run pytest tests/ -vcd backend/api_server
go test ./... -vcd frontend/project
npm test# Format code
poetry run black src/
# Lint code
poetry run flake8 src/
# Type checking
poetry run mypy src/# Format code
go fmt ./...
# Lint code
golangci-lint run
# Vet code
go vet ./...# Lint code
npm run lint
# Type checking
npm run type-check
# Format code
npm run format- Create model class in
backend/ai_server/src/ai_server/models/:
from ai_server.models.base_model import BaseRecommendationModel
class MyNewModel(BaseRecommendationModel):
def __init__(self, **hyperparameters):
super().__init__(**hyperparameters)
def fit(self, interactions_df, **kwargs):
# Training implementation
pass
def predict(self, user_id, n=5):
# Prediction implementation
pass-
Register in
backend/ai_server/src/ai_server/services/model_service.py -
Add tests in
backend/ai_server/tests/ -
Update documentation
- Create data chef handler in
backend/ai_server/src/ai_server/services/data_chef_service.py:
def _cook_my_source(param1, param2):
# Data fetching logic
for record in data_source:
yield record-
Add to
_cook_raw_data_source()switch statement -
Create API endpoint in routers
-
Add configuration example
Before deploying to production:
- Change all default secrets in
.env - Use strong JWT secret keys
- Configure proper CORS origins
- Enable HTTPS with valid SSL certificates
- Set up firewall rules
- Use environment-specific configurations
- Enable rate limiting
- Configure log rotation
- Set up automated backups
- Enable monitoring and alerting
- Review and remove debug endpoints
- Use secrets management (e.g., Vault, AWS Secrets Manager)
Build optimized production images:
docker-compose -f docker-compose.prod.yml build
docker-compose -f docker-compose.prod.yml up -dRecommended minimum resources:
- API Server: 512MB RAM, 0.5 CPU
- AI Server: 2GB RAM, 1 CPU
- Frontend: 256MB RAM, 0.25 CPU
- Redis: 512MB RAM, 0.25 CPU
- Kafka: 1GB RAM, 0.5 CPU
Adjust in docker-compose.yml:
services:
ai_server:
deploy:
resources:
limits:
cpus: '1.0'
memory: 2G
reservations:
cpus: '0.5'
memory: 1GConfigure Prometheus targets in prometheus.yml:
scrape_configs:
- job_name: 'ai_server'
static_configs:
- targets: ['ai_server:9999']
- job_name: 'api_server'
static_configs:
- targets: ['api_server:2030']Set up Grafana dashboards for visualization (optional).
# Backup Redis data
docker exec vrecom_redis redis-cli SAVE
docker cp vrecom_redis:/data/dump.rdb ./backups/redis/
# Backup Kafka data
docker exec vrecom_kafka tar -czf /tmp/kafka-backup.tar.gz /var/lib/kafka/data
docker cp vrecom_kafka:/tmp/kafka-backup.tar.gz ./backups/kafka/# Backup trained models
docker cp vrecom_ai_server:/app/models ./backups/models/Automate with cron jobs or use volume backups.
Check Docker and Docker Compose versions:
docker --version # Should be 20.10+
docker-compose --version # Should be 2.0+Check for port conflicts:
# Linux/Mac
lsof -i :2030
lsof -i :9999
lsof -i :5173
# Windows
netstat -ano | findstr :2030View service logs for errors:
docker-compose logsVerify Kafka is running and accessible:
docker exec vrecom_kafka kafka-broker-api-versions \
--bootstrap-server localhost:9092Check consumer group status:
docker exec vrecom_kafka kafka-consumer-groups \
--bootstrap-server localhost:9092 \
--listReset consumer offset if needed:
docker exec vrecom_kafka kafka-consumer-groups \
--bootstrap-server localhost:9092 \
--group your_group_id \
--reset-offsets \
--to-earliest \
--topic your_topic \
--executeCheck if data chef is configured correctly:
curl http://localhost:9999/api/v1/list_data_chefsVerify data source connectivity from AI server container:
docker exec vrecom_ai_server python -c "
from ai_server.services.data_chef_service import DataChefService
service = DataChefService()
print(service.list_data_chefs())
"Check task scheduler status:
curl http://localhost:9999/api/v1/get_scheduler_statusView AI server logs for training errors:
docker-compose logs -f ai_server | grep -i "error\|exception"Verify the model exists and is trained:
curl http://localhost:9999/api/v1/list_modelsCheck if user exists in training data:
# View model metadata
docker exec vrecom_ai_server cat models/your_model_metadata.jsonClear recommendation cache:
docker exec vrecom_redis redis-cli KEYS "recommend:*"
docker exec vrecom_redis redis-cli DEL "recommend:user123:my_model:10"Monitor container resources:
docker statsReduce batch sizes in training configuration.
Clear unused data:
# Clear Redis cache
docker exec vrecom_redis redis-cli FLUSHDB
# Prune Docker system
docker system prune -aEnable debug logging by setting in .env:
DEBUG=true
LOG_LEVEL=debugRestart services:
docker-compose restartView detailed logs:
docker-compose logs -f --tail=100- Check service logs:
docker-compose logs <service_name> - Review configuration files for typos or missing values
- Verify network connectivity between containers
- Check disk space:
df -h - Review Docker resources:
docker system df
If issues persist:
- Check the GitHub issues page
- Review documentation in the
docs/directory - Enable debug mode and collect logs
The system uses Redis for multiple caching layers:
- API Response Cache: Frequently requested recommendations
- Model Metadata Cache: Model information and configurations
- Session Cache: User session data
Configure cache TTL in backend/api_server/internal/cache/:
const (
RecommendationCacheTTL = 3600 // 1 hour
ModelMetadataCacheTTL = 7200 // 2 hours
)Configure optimal pool sizes in .env:
# API Server
REDIS_POOL_SIZE=20
REDIS_MAX_IDLE=10
# AI Server (adjust in code)
SQL_POOL_SIZE=10
SQL_MAX_OVERFLOW=20Adjust hyperparameters for faster training:
{
"hyperparameters": {
"n_components": 20,
"n_iter": 5,
"batch_size": 512
}
}Lower values train faster but may reduce accuracy.
Set appropriate limits in docker-compose.yml:
services:
ai_server:
deploy:
resources:
limits:
cpus: '2.0'
memory: 4GWe welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature - Make your changes following the code style guidelines
- Add tests for new functionality
- Update documentation as needed
- Commit your changes:
git commit -m 'Add new feature' - Push to the branch:
git push origin feature/new-feature - Submit a pull request
- Python: Follow PEP 8, use Black formatter
- Go: Follow Go standard style, use gofmt
- TypeScript: Follow Airbnb style guide, use ESLint
- Write meaningful commit messages
- Add comments for complex logic
- Update tests and documentation
This project is licensed under the terms specified in the LICENSE.txt file.
For questions, issues, or feature requests:
- Open an issue on GitHub
- Check existing documentation in the
docs/directory - Review API documentation at service
/docsendpoints - Consult troubleshooting section above
Built with open-source technologies:
- FastAPI for Python backend
- Fiber for Go backend
- React and TypeScript for frontend
- Docker for containerization
- Kafka for message streaming
- Redis for caching
- Prometheus for monitoring