Scikit-learn wine classification with a modern MLOps pipeline featuring MLflow tracking, Ray for distributed training and serving, hyperparameter optimization, and production-ready deployment patterns.
Explore OpenCloudHub Β»
π Table of Contents
This repository demonstrates a complete MLOps pipeline for wine classification using scikit-learn and the UCI Wine dataset. It showcases production-ready machine learning practices including experiment tracking, hyperparameter optimization, model registration, and containerized deployment.
Ray is used for distributed training and scalable model serving.
Key Technologies:
- ML Framework: Scikit-learn (Logistic Regression)
- Distributed Training & Serving: Ray
- Experiment Tracking: MLflow
- Hyperparameter Optimization: Optuna
- Containerization: Docker
- Dependency Management: UV
- Development: DevContainers for consistent environments
- π¬ Experiment Tracking: MLflow integration with model registry
- π― Hyperparameter Tuning: Automated optimization using Optuna
- π³ Containerized Training: Docker-based training environment
- β‘ Distributed Training & Serving: Ray for scalable workflows
- π Model Evaluation: Comprehensive metrics and visualization
- π CI/CD Ready: GitHub Actions workflows for automated training
- π MLflow Projects: Standardized, reproducible ML workflows
- π Model Registration: Threshold-based automatic model promotion
- π§ͺ Development Environment: VS Code DevContainer setup
- Docker and Docker Compose
- VS Code with DevContainers extension (recommended)
- MLflow tracking server (for remote tracking)
- Ray (for distributed training/serving)
-
Clone the repository
git clone https://github.com/opencloudhub/ai-ml-sklearn.git cd ai-ml-sklearn
-
Open in DevContainer (Recommended)
code . # VS Code will prompt to reopen in container
-
Or setup locally
# Install UV curl -LsSf https://astral.sh/uv/install.sh | sh # Install dependencies uv sync --dev
Start MLflow locally (accessible from Docker containers):
mlflow server --host 0.0.0.0 --port 8081
export MLFLOW_TRACKING_URI=http://0.0.0.0:8081
export MLFLOW_EXPERIMENT_NAME=wine-quality
export MLFLOW_TRACKING_INSECURE_TLS=true
ray start --head
Submit Ray jobs for training and hyperparameter optimization:
RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python src/training/train.py
RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python src/training/optimize_hyperparameters.py
Make sure you have promoted a model to prod.wine-classifier with @champion alias, as service is looking for that To run the model serving application locally:
serve run --working-dir /workspace/project src.serving.wine_classifier:deployment
python src/training/train.py --C 1.0 --max_iter 100 --solver lbfgs
python src/training/optimize_hyperparameters.py --n_trials 50 --test_size 0.2
serve run --working-dir /workspace/project src.serving.wine_classifier:deployment
To test the model, run:
python tests/test_wine_classifier.py
You can also visit the Swagger documentetion of the Application at http://localhost:8000/docs
ai-ml-sklearn/
βββ src/
β βββ training/ # Training and optimization scripts
β β βββ train.py
β β βββ optimize_hyperparameters.py
β β βββ evaluate.py
β βββ serving/ # Model serving (Ray Serve/FastAPI)
β β βββ wine_classifier.py
β βββ _utils/ # Shared utilities
β βββ get_or_create_experiment.py
β βββ logging_callback.py
β βββ logging_config.py
βββ tests/ # Unit tests
βββ .devcontainer/ # VS Code DevContainer config
βββ .github/workflows/ # CI/CD workflows
βββ Dockerfile # Multi-stage container build
βββ MLproject # MLflow project definition
βββ pyproject.toml # Project dependencies and config
βββ uv.lock # Dependency lock file
-
Development & Experimentation
- Local development in DevContainers
- Jupyter notebooks for data exploration
- MLflow experiment tracking
-
Training & Optimization
- Distributed training and hyperparameter tuning with Ray and Optuna
- Model evaluation and metrics logging
- Threshold-based model registration
-
Model Registry
- Automatic promotion to staging registry
- Model versioning and lineage tracking
- Performance comparison and rollback capability
-
Deployment
- Ray Serve for scalable, production-ready model serving
- (Planned) KServe integration and GitOps-based deployment automation
Contributions are welcome! This project follows OpenCloudHub's contribution standards.
Please see our Contributing Guidelines and Code of Conduct for more details.
Distributed under the Apache 2.0 License. See LICENSE for more information.
Organization Link: https://github.com/OpenCloudHub
Project Link: https://github.com/opencloudhub/ai-ml-sklearn
- UCI Wine Dataset - The dataset used for classification
- MLflow - ML lifecycle management
- Optuna - Hyperparameter optimization framework
- Ray - Distributed computing and serving
- UV - Fast Python package manager