A lightweight, cost-effective AutoML platform built on AWS serverless architecture. Upload CSV files, automatically detect problem types, and train/predict machine learning models with just a few clicks.
| Workflow | Main | Dev |
|---|---|---|
| CI Terraform | ||
| Deploy Infrastructure | ||
| Deploy Lambda API | ||
| Deploy Training Container | ||
| Deploy Frontend |
- Smart Problem Detection: Automatically classifies tasks as regression or classification based on data characteristics
- Automated EDA: Generates comprehensive exploratory data analysis reports
- Model Training: Uses FLAML for efficient AutoML with auto-calculated time budgets
- Training History: Track all your experiments with DynamoDB
- Cost-Effective: ~$3-25/month ($0 when idle) vs ~$36-171/month for SageMaker endpoints.
- Portable Models: Download trained models (.pkl and .onnx) for local use with Docker
- Serverless Model Inference: Deploy models and make predictions via Lambda (no SageMaker needed!)
- Model Comparison: Side-by-side comparison of multiple training runs
- Dark Mode: System preference detection with manual toggle
- ONNX Export: Cross-platform model deployment format
Note: Screenshots are organized by problem type. The examples below show a classification workflow. Regression screenshots with metrics like Rยฒ, RMSE, and MAE are available in the
screenshots/regression/folder.
Configure Training - Select target column with auto problem detection
Shows unique value counts per column and automatic classification/regression detection
Training Progress - Real-time training status monitoring
Live updates showing current training phase and elapsed time
Results - Model Metrics - Classification metrics dashboard
Displays Accuracy, F1 Score, Precision, and Recall (or Rยฒ, RMSE, MAE for regression)
Model Deployment - Prediction Playground - Test your model interactively
Serverless Lambda inference with real-time predictions and probability scores
Training Report - Feature Importance - Downloadable HTML report with interactive charts
Bar chart showing which features contributed most to the model's predictions
EDA Report - Comprehensive exploratory data analysis
Automated data quality analysis with warnings, correlations, and distributions
๐ 41 total screenshots available in the screenshots folder:
- Common (7): Compare models, time budget, jobs history, download/usage guides
- Classification (20): Complete classification workflow with EDA & training reports
- Regression (14): Complete regression workflow with EDA & training reports
Screenshots are organized by problem type. See screenshots/README.md for the complete catalog.
Text version
User โ AWS Amplify (Frontend - Next.js SSR)
โ
API Gateway โ Lambda (FastAPI - No containers, direct code)
โ
DynamoDB + S3 (Metadata & Files)
โ
AWS Batch โ Fargate Spot (Training - Docker container)
Why containers only for training?
- Backend API: Direct Lambda deployment (5MB code)
- Training: Requires Docker due to 265MB ML dependencies (FLAML, scikit-learn, XGBoost) and jobs >15min
- See ARCHITECTURE_DECISIONS.md for detailed analysis
- AWS Account
- AWS CLI v2 configured
- Terraform >= 1.9
- Docker installed
- Node.js 20+ (for frontend)
- Python 3.11+
git clone https://github.com/cristofima/AWS-AutoML-Lite.git
cd AWS-AutoML-Litecd infrastructure/terraform
terraform init
terraform apply# See QUICKSTART.md for complete instructions
ECR_URL=$(terraform output -raw ecr_repository_url)
cd ../../backend/training
docker build -t automl-training:latest .
docker tag automl-training:latest $ECR_URL:latest
docker push $ECR_URL:latestcd ../../infrastructure/terraform
terraform output api_gateway_url๐ Full instructions: See QUICKSTART.md
- QUICKSTART.md - Complete deployment guide
- PROJECT_REFERENCE.md - Technical documentation
- ROADMAP.md - Product roadmap & future features
- SETUP_CICD.md - CI/CD with GitHub Actions
- ARCHITECTURE_DECISIONS.md - Container usage rationale
- LESSONS_LEARNED.md - Challenges, solutions & best practices
- FRONTEND_DEPLOYMENT_ANALYSIS.md - Frontend deployment decision analysis
- CONTRIBUTING.md - Contribution guidelines
- CHANGELOG.md - Version history
Based on moderate usage (20 training jobs/month):
| Service | Monthly Cost |
|---|---|
| AWS Amplify (Frontend) | $0-15 (Free Tier eligible) |
| Lambda + API Gateway | $1-2 |
| AWS Batch (Fargate Spot) | $1-5 |
| S3 + DynamoDB | $1-3 |
| Total | ~$3-25/month |
Note
Why $0-15 for Amplify? Most side projects will stay within the AWS Free Tier ($0). The $15 estimate covers conservative usage for projects that exceed Free Tier limits (1,000 build minutes/month) or have higher traffic requiring more SSR compute (Lambda) resources.
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with values from: terraform output
# 2. Start Backend API
docker-compose up
# 3. Start Frontend (separate terminal)
cd frontend
cp .env.local.example .env.local
# Edit .env.local with API URL
pnpm install && pnpm dev# Backend
cd backend
python -m venv venv
source venv/bin/activate # Windows: .\venv\Scripts\Activate.ps1
pip install -r requirements.txt
uvicorn api.main:app --reload
# Frontend (separate terminal)
cd frontend
pnpm install && pnpm dev- Upload a CSV file
- Select your target column (UI shows unique values and auto-detects problem type)
- Optionally configure time budget (auto-calculated based on dataset size if left empty)
- Wait for training to complete
- Download your model and view metrics
| Feature | Description |
|---|---|
| Problem Type Detection | Automatically detects Classification vs Regression using smart heuristics |
| Smart Classification | Integer-like values with โค10 unique values โ Classification |
| Smart Regression | Float values with decimals (35.5, 40.2) โ Regression (even with low unique count) |
| Auto Time Budget | Based on dataset size: <1K rowsโ2min, 1K-10Kโ5min, 10K-50Kโ10min, >50Kโ20min |
| Column Statistics | Shows unique values count for each column to help with target selection |
| ID Detection | Automatically excludes identifier columns (order_id, customer_id, etc.) |
| ONNX Export | Cross-platform model format for deployment in any language |
After downloading your model (.pkl file), use Docker for predictions:
# Build prediction container (one time)
docker build -f scripts/Dockerfile.predict -t automl-predict .
# Show model info and required features
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl --info
# Generate sample input JSON (auto-detects features from model)
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl -g /data/sample_input.json
# Edit sample_input.json with your values, then predict
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl --json /data/sample_input.json
# Batch predictions from CSV
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl -i /data/test.csv -o /data/predictions.csvSee scripts/README.md for detailed documentation.
| Component | README | Description |
|---|---|---|
| Backend | backend/README.md | API development & Docker Compose |
| Frontend | frontend/README.md | Next.js setup & pages |
| Training | backend/training/ | ML training container |
| Terraform | infrastructure/terraform/README.md | Infrastructure as Code |
| Scripts | scripts/README.md | Local training, predictions & diagram generation |
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Cristopher Coronado - AWS Community Builder
- GitHub: @cristofima
- Built with FastAPI, FLAML, and Next.js
- Inspired by SageMaker Autopilot
- Part of AWS Community Builder program
Status: โ MVP Complete (Backend โ | Training โ | Frontend โ )
