This project sets up a robust local machine learning experimentation platform using Docker Compose, featuring:
- MLflow for experiment tracking and artifact versioning
- Prefect for orchestrating statistical tests and training pipelines
- PostgreSQL as the metadata and data warehouse backend
- MinIO as the S3-compatible artifact store
- pgAdmin for easy database management
- Auto-initialized SQL schemas for a structured DWH
- Custom client container to run training scripts inside Docker
ML-POSTGRES/
│
├── 📂 data/                  # SQL files to initialize the data warehouse
│   ├── schemas.sql
│   ├── stored-procedures.sql
│   ├── tables.sql
│   └── titanic.sql
│
├── 📂 mlflow/                # MLflow tracking server container
│   │
│   ├──📂 mlscripts/             # Model training scripts to mount/run in clients
│   │  └── train.py              # (Optional) MLflow training script
│   │
│   ├── Dockerfile.mlflow     # Custom Dockerfile for MLflow
│   └── Dockerfile.client     # Custom Dockerfile for MLflow
│   
│
├── 📂 mlscripts/             # Model training scripts to mount/run in clients
│   └── train.py
│
├── 📂 pgadmin4/              # pgAdmin pre-config files
│   └── servers.json          # Pre-defined server list
│ 
├── 📄 .env                   # Environment variables for Docker Compose
├── 📄 docker-compose.yml     # Main Docker orchestration file
└── 📄 MakeFile               # (Optional) Automation commands
- Backend URI: PostgreSQL (mlflow-db)
- Artifact Store: MinIO (s3://mlflow-artifacts)
- Web UI: http://localhost:5000
- dwh: General-purpose DWH loaded from- ./data/*.sql
- mlflow-db: MLflow's backend store for metadata
- prefect-db: Prefect's backend store for metadata. Default is SQLite
- S3-compatible object store for model artifacts
- Web UI: http://localhost:9001
- Bucket mlflow-artifactsis auto-created and public
- Web UI: http://localhost:5050
- Pre-configured access to both dwhandmlflow-db
The PostgreSQL warehouse follows a multi-layered architecture, inspired by modern lakehouse design:
- Bronze: Raw ingested data (e.g., CSV dumps, source system captures)
- Silver: Cleansed, normalized, and joined datasets
- Gold: High-value, curated datasets used in downstream tasks
| Schema | Purpose | 
|---|---|
| GOLD_ML | ML-ready feature tables used in training/inference pipelines | 
| GOLD_BI | Business-optimized aggregates used in dashboards and BI tools | 
This platform primarily utilizes
GOLD_MLfor experimentation workflows in MLflow.
Create a .env file with the following:
POSTGRES_USER=admin
POSTGRES_PASSWORD=admin
POSTGRES_DB=mldb
PGADMIN_DEFAULT_EMAIL=admin@example.com
PGADMIN_DEFAULT_PASSWORD=admin
MLFLOW_TRACKING_URI=http://mlflow:5000
MLFLOW_BACKEND_STORE_URI=postgresql://mlflow:mlflow123@mlflow-db:5432/mlflowdb
MLFLOW_ARTIFACT_ROOT=s3://mlflow-artifacts
MLFLOW_S3_ENDPOINT_URL=http://minio:9000
AWS_ACCESS_KEY_ID=admin
AWS_SECRET_ACCESS_KEY=passworddocker-compose up --build -d- MLflow UI → http://localhost:5000
- pgAdmin → http://localhost:5050
- MinIO → http://localhost:9001
Sample training script:
# mlscripts/train.py
import mlflow
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
mlflow.set_tracking_uri("http://mlflow:5000")
mlflow.set_experiment(experiment_id="0")
mlflow.autolog()
data = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target)
model = RandomForestRegressor(n_estimators=100, max_depth=6)
model.fit(X_train, y_train)Then build & run:
docker build -t mlflow-client -f Dockerfile.client .
docker run --network=ml-postgres_backend mlflow-clientdocker-compose down -v- If you update the schema SQLs, restart PostgreSQL or mount fresh volumes.
- Logs available via docker logs <container-name>.
- All containers share a single backendnetwork for internal resolution.
| Service | Username | Password | 
|---|---|---|
| MinIO | admin | password | 
| pgAdmin | admin@example.com | admin | 
| PostgreSQL | admin | admin | 
| MLflow Backend | mlflow | mlflow123 |