Skip to content

Traditional machine learning workflows with scikit-learn demonstrating basic MLOps patterns

License

Notifications You must be signed in to change notification settings

OpenCloudHub/ai-ml-sklearn

OpenCloudHub Logo

Wine Classifier - MLOps Demo

Scikit-learn wine classification with a modern MLOps pipeline featuring MLflow tracking, Ray for distributed training and serving, hyperparameter optimization, and production-ready deployment patterns.
Explore OpenCloudHub Β»


πŸ“‘ Table of Contents
  1. About
  2. Features
  3. Getting Started
  4. Usage
  5. Project Structure
  6. MLOps Pipeline
  7. Contributing
  8. License
  9. Contact

🍷 About

This repository demonstrates a complete MLOps pipeline for wine classification using scikit-learn and the UCI Wine dataset. It showcases production-ready machine learning practices including experiment tracking, hyperparameter optimization, model registration, and containerized deployment.
Ray is used for distributed training and scalable model serving.

Key Technologies:

  • ML Framework: Scikit-learn (Logistic Regression)
  • Distributed Training & Serving: Ray
  • Experiment Tracking: MLflow
  • Hyperparameter Optimization: Optuna
  • Containerization: Docker
  • Dependency Management: UV
  • Development: DevContainers for consistent environments

✨ Features

  • πŸ”¬ Experiment Tracking: MLflow integration with model registry
  • 🎯 Hyperparameter Tuning: Automated optimization using Optuna
  • 🐳 Containerized Training: Docker-based training environment
  • ⚑ Distributed Training & Serving: Ray for scalable workflows
  • πŸ“Š Model Evaluation: Comprehensive metrics and visualization
  • πŸš€ CI/CD Ready: GitHub Actions workflows for automated training
  • πŸ“ MLflow Projects: Standardized, reproducible ML workflows
  • πŸ”„ Model Registration: Threshold-based automatic model promotion
  • πŸ§ͺ Development Environment: VS Code DevContainer setup

πŸš€ Getting Started

Prerequisites

  • Docker and Docker Compose
  • VS Code with DevContainers extension (recommended)
  • MLflow tracking server (for remote tracking)
  • Ray (for distributed training/serving)

Local Development

  1. Clone the repository

    git clone https://github.com/opencloudhub/ai-ml-sklearn.git
    cd ai-ml-sklearn
  2. Open in DevContainer (Recommended)

    code .
    # VS Code will prompt to reopen in container
  3. Or setup locally

    # Install UV
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Install dependencies
    uv sync --dev

MLflow Tracking Server

Start MLflow locally (accessible from Docker containers):

mlflow server --host 0.0.0.0 --port 8081
export MLFLOW_TRACKING_URI=http://0.0.0.0:8081
export MLFLOW_EXPERIMENT_NAME=wine-quality
export MLFLOW_TRACKING_INSECURE_TLS=true

Ray Development Workflow

1. Start a Local Ray Cluster

ray start --head

2. Training Workflows

Submit Ray jobs for training and hyperparameter optimization:

RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python src/training/train.py
RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python src/training/optimize_hyperparameters.py

3. Model Serving with Ray Serve

Make sure you have promoted a model to prod.wine-classifier with @champion alias, as service is looking for that To run the model serving application locally:

serve run --working-dir /workspace/project src.serving.wine_classifier:deployment

πŸ’» Usage

Training

python src/training/train.py --C 1.0 --max_iter 100 --solver lbfgs

Hyperparameter Optimization

python src/training/optimize_hyperparameters.py --n_trials 50 --test_size 0.2

Local Model Serving

serve run --working-dir /workspace/project src.serving.wine_classifier:deployment

To test the model, run:

python tests/test_wine_classifier.py

You can also visit the Swagger documentetion of the Application at http://localhost:8000/docs


πŸ“ Project Structure

ai-ml-sklearn/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ training/                       # Training and optimization scripts
β”‚   β”‚   β”œβ”€β”€ train.py
β”‚   β”‚   β”œβ”€β”€ optimize_hyperparameters.py
β”‚   β”‚   └── evaluate.py
β”‚   β”œβ”€β”€ serving/                        # Model serving (Ray Serve/FastAPI)
β”‚   β”‚   └── wine_classifier.py
β”‚   └── _utils/                         # Shared utilities
β”‚       β”œβ”€β”€ get_or_create_experiment.py
β”‚       β”œβ”€β”€ logging_callback.py
β”‚       └── logging_config.py
β”œβ”€β”€ tests/                              # Unit tests
β”œβ”€β”€ .devcontainer/                      # VS Code DevContainer config
β”œβ”€β”€ .github/workflows/                  # CI/CD workflows
β”œβ”€β”€ Dockerfile                          # Multi-stage container build
β”œβ”€β”€ MLproject                           # MLflow project definition
β”œβ”€β”€ pyproject.toml                      # Project dependencies and config
└── uv.lock                             # Dependency lock file

πŸ”„ MLOps Pipeline

  1. Development & Experimentation

    • Local development in DevContainers
    • Jupyter notebooks for data exploration
    • MLflow experiment tracking
  2. Training & Optimization

    • Distributed training and hyperparameter tuning with Ray and Optuna
    • Model evaluation and metrics logging
    • Threshold-based model registration
  3. Model Registry

    • Automatic promotion to staging registry
    • Model versioning and lineage tracking
    • Performance comparison and rollback capability
  4. Deployment

    • Ray Serve for scalable, production-ready model serving
    • (Planned) KServe integration and GitOps-based deployment automation

πŸ‘₯ Contributing

Contributions are welcome! This project follows OpenCloudHub's contribution standards.

Please see our Contributing Guidelines and Code of Conduct for more details.


πŸ“„ License

Distributed under the Apache 2.0 License. See LICENSE for more information.


πŸ“¬ Contact

Organization Link: https://github.com/OpenCloudHub

Project Link: https://github.com/opencloudhub/ai-ml-sklearn


πŸ™ Acknowledgements

  • UCI Wine Dataset - The dataset used for classification
  • MLflow - ML lifecycle management
  • Optuna - Hyperparameter optimization framework
  • Ray - Distributed computing and serving
  • UV - Fast Python package manager

(back to top)

About

Traditional machine learning workflows with scikit-learn demonstrating basic MLOps patterns

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published