Skip to content

A lightweight, cost-effective AutoML platform built on AWS serverless architecture. Upload CSV files, automatically detect problem types, and train machine learning models with just a few clicks.

License

Notifications You must be signed in to change notification settings

cristofima/AWS-AutoML-Lite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AWS AutoML Lite

Terraform Python Node.js Next.js AWS License

A lightweight, cost-effective AutoML platform built on AWS serverless architecture. Upload CSV files, automatically detect problem types, and train/predict machine learning models with just a few clicks.

๐Ÿ”„ CI/CD Status

Workflow Main Dev
CI Terraform CI CI
Deploy Infrastructure Deploy Deploy
Deploy Lambda API Deploy Deploy
Deploy Training Container Deploy Deploy
Deploy Frontend Deploy Deploy

๐Ÿš€ Features

  • Smart Problem Detection: Automatically classifies tasks as regression or classification based on data characteristics
  • Automated EDA: Generates comprehensive exploratory data analysis reports
  • Model Training: Uses FLAML for efficient AutoML with auto-calculated time budgets
  • Training History: Track all your experiments with DynamoDB
  • Cost-Effective: ~$3-25/month ($0 when idle) vs ~$36-171/month for SageMaker endpoints.
  • Portable Models: Download trained models (.pkl and .onnx) for local use with Docker

โœจ New in v1.1.0

  • Serverless Model Inference: Deploy models and make predictions via Lambda (no SageMaker needed!)
  • Model Comparison: Side-by-side comparison of multiple training runs
  • Dark Mode: System preference detection with manual toggle
  • ONNX Export: Cross-platform model deployment format

๐Ÿ“ธ Screenshots

Note: Screenshots are organized by problem type. The examples below show a classification workflow. Regression screenshots with metrics like Rยฒ, RMSE, and MAE are available in the screenshots/regression/ folder.

Configure Training - Select target column with auto problem detection

Configure - Target Selection Shows unique value counts per column and automatic classification/regression detection

Training Progress - Real-time training status monitoring

Training - Running Live updates showing current training phase and elapsed time

Results - Model Metrics - Classification metrics dashboard

Results - Metrics Displays Accuracy, F1 Score, Precision, and Recall (or Rยฒ, RMSE, MAE for regression)

Model Deployment - Prediction Playground - Test your model interactively

Results - Predictions Serverless Lambda inference with real-time predictions and probability scores

Training Report - Feature Importance - Downloadable HTML report with interactive charts

Training Report - Feature Importance Bar chart showing which features contributed most to the model's predictions

EDA Report - Comprehensive exploratory data analysis

EDA Report - Overview Automated data quality analysis with warnings, correlations, and distributions

๐Ÿ“ 41 total screenshots available in the screenshots folder:

  • Common (7): Compare models, time budget, jobs history, download/usage guides
  • Classification (20): Complete classification workflow with EDA & training reports
  • Regression (14): Complete regression workflow with EDA & training reports

Screenshots are organized by problem type. See screenshots/README.md for the complete catalog.

๐Ÿ—๏ธ Architecture

AWS AutoML Lite Architecture

Text version
User โ†’ AWS Amplify (Frontend - Next.js SSR)
         โ†“
    API Gateway โ†’ Lambda (FastAPI - No containers, direct code)
         โ†“
    DynamoDB + S3 (Metadata & Files)
         โ†“
    AWS Batch โ†’ Fargate Spot (Training - Docker container)

Why containers only for training?

  • Backend API: Direct Lambda deployment (5MB code)
  • Training: Requires Docker due to 265MB ML dependencies (FLAML, scikit-learn, XGBoost) and jobs >15min
  • See ARCHITECTURE_DECISIONS.md for detailed analysis

๐Ÿ“‹ Prerequisites

  • AWS Account
  • AWS CLI v2 configured
  • Terraform >= 1.9
  • Docker installed
  • Node.js 20+ (for frontend)
  • Python 3.11+

๐Ÿš€ Quick Start

1. Clone the repository

git clone https://github.com/cristofima/AWS-AutoML-Lite.git
cd AWS-AutoML-Lite

2. Deploy Infrastructure

cd infrastructure/terraform
terraform init
terraform apply

3. Build and Push Training Container

# See QUICKSTART.md for complete instructions
ECR_URL=$(terraform output -raw ecr_repository_url)
cd ../../backend/training
docker build -t automl-training:latest .
docker tag automl-training:latest $ECR_URL:latest
docker push $ECR_URL:latest

4. Get Your API URL

cd ../../infrastructure/terraform
terraform output api_gateway_url

๐Ÿ“– Full instructions: See QUICKSTART.md

๐Ÿ“– Documentation

๐Ÿ’ฐ Cost Estimation

Based on moderate usage (20 training jobs/month):

Service Monthly Cost
AWS Amplify (Frontend) $0-15 (Free Tier eligible)
Lambda + API Gateway $1-2
AWS Batch (Fargate Spot) $1-5
S3 + DynamoDB $1-3
Total ~$3-25/month

Note

Why $0-15 for Amplify? Most side projects will stay within the AWS Free Tier ($0). The $15 estimate covers conservative usage for projects that exceed Free Tier limits (1,000 build minutes/month) or have higher traffic requiring more SSR compute (Lambda) resources.

๐Ÿงช Local Development

Using Docker Compose (Recommended)

# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with values from: terraform output

# 2. Start Backend API
docker-compose up

# 3. Start Frontend (separate terminal)
cd frontend
cp .env.local.example .env.local
# Edit .env.local with API URL
pnpm install && pnpm dev

Without Docker

# Backend
cd backend
python -m venv venv
source venv/bin/activate  # Windows: .\venv\Scripts\Activate.ps1
pip install -r requirements.txt
uvicorn api.main:app --reload

# Frontend (separate terminal)
cd frontend
pnpm install && pnpm dev

๐Ÿ“ Usage

  1. Upload a CSV file
  2. Select your target column (UI shows unique values and auto-detects problem type)
  3. Optionally configure time budget (auto-calculated based on dataset size if left empty)
  4. Wait for training to complete
  5. Download your model and view metrics

Smart Features

Feature Description
Problem Type Detection Automatically detects Classification vs Regression using smart heuristics
Smart Classification Integer-like values with โ‰ค10 unique values โ†’ Classification
Smart Regression Float values with decimals (35.5, 40.2) โ†’ Regression (even with low unique count)
Auto Time Budget Based on dataset size: <1K rowsโ†’2min, 1K-10Kโ†’5min, 10K-50Kโ†’10min, >50Kโ†’20min
Column Statistics Shows unique values count for each column to help with target selection
ID Detection Automatically excludes identifier columns (order_id, customer_id, etc.)
ONNX Export Cross-platform model format for deployment in any language

๐Ÿ”ฎ Using Your Trained Model

After downloading your model (.pkl file), use Docker for predictions:

# Build prediction container (one time)
docker build -f scripts/Dockerfile.predict -t automl-predict .

# Show model info and required features
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl --info

# Generate sample input JSON (auto-detects features from model)
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl -g /data/sample_input.json

# Edit sample_input.json with your values, then predict
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl --json /data/sample_input.json

# Batch predictions from CSV
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl -i /data/test.csv -o /data/predictions.csv

See scripts/README.md for detailed documentation.

๐Ÿ“ Component Documentation

Component README Description
Backend backend/README.md API development & Docker Compose
Frontend frontend/README.md Next.js setup & pages
Training backend/training/ ML training container
Terraform infrastructure/terraform/README.md Infrastructure as Code
Scripts scripts/README.md Local training, predictions & diagram generation

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ‘ค Author

Cristopher Coronado - AWS Community Builder

๐Ÿ™ Acknowledgments

  • Built with FastAPI, FLAML, and Next.js
  • Inspired by SageMaker Autopilot
  • Part of AWS Community Builder program

Status: โœ… MVP Complete (Backend โœ… | Training โœ… | Frontend โœ…)

About

A lightweight, cost-effective AutoML platform built on AWS serverless architecture. Upload CSV files, automatically detect problem types, and train machine learning models with just a few clicks.

Topics

Resources

License

Contributing

Stars

Watchers

Forks