Skip to content

zenml-io/zenml-workshop-mlops

Repository files navigation

ZenML Pipeline Conversion Workshop

Convert traditional ML notebooks into production-ready ZenML pipelines! This workshop teaches you how to transform messy, unstructured ML code into clean, reproducible, and scalable MLOps workflows.

๐ŸŽฏ Workshop Objectives

By the end of this workshop, you will:

  • Understand the problems with traditional notebook-based ML workflows
  • Learn ZenML fundamentals including steps, pipelines, and artifacts
  • Convert messy ML code into clean, structured ZenML pipelines
  • Implement both training and inference pipelines with proper separation of concerns
  • Experience the benefits of MLOps best practices including versioning, tracking, and reproducibility

๐Ÿ“ Workshop Structure

workshop-scaffold/
โ”œโ”€โ”€ ๐Ÿ“Š data/
โ”‚   โ”œโ”€โ”€ customer_churn.csv          # Sample dataset (generated)
โ”‚   โ””โ”€โ”€ generate_sample_data.py     # Script to create dataset
โ”œโ”€โ”€ ๐Ÿ““ workshop_notebook.ipynb      # Traditional ML workflow (BEFORE)
โ”œโ”€โ”€ ๐Ÿ”ง training_pipeline_scaffold.py    # Training pipeline template (TODO)
โ”œโ”€โ”€ ๐Ÿ”ฎ inference_pipeline_scaffold.py   # Inference pipeline template (TODO)
โ”œโ”€โ”€ ๐Ÿ“ฆ requirements.txt             # All necessary dependencies
โ””โ”€โ”€ ๐Ÿ“– README.md                   # This file

๐Ÿš€ Getting Started

1. Environment Setup

# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install uv
uv pip install -r requirements.txt

# Initialize ZenML
zenml init
zenml login
zenml integration install gcp github -y --uv
zenml stack set zenml-workshop-local-stack

2. Explore the Traditional Workflow

Open and run workshop_notebook.ipynb to see a typical data scientist's workflow:

# Start Jupyter
jupyter notebook workshop_notebook.ipynb

๐Ÿ”ด Notice the Problems:

  • Hardcoded file paths
  • Mixed concerns in single cells
  • Poor model versioning (model_final_v2_actually_final_BEST.pkl)
  • Manual preprocessing steps
  • No experiment tracking
  • Difficult to reproduce

๐Ÿ“š Workshop Activities

Activity 1: Analyze the Traditional Workflow (10 minutes)

Run through workshop_notebook.ipynb and identify:

  • What could go wrong in production?
  • How hard would it be to collaborate on this code?
  • What happens when you need to retrain the model?

Activity 2: Convert to ZenML Training Pipeline (30 minutes)

Work on training_pipeline_scaffold.py:

Activity 3: Create your production stack (10 minutes)

Now that your pipeline ran locally, let's try running it on a remote orchestrator - to do this, you need to create a new stack:

Create Stack

Go for the Manual stack creation Manual Stack Creation

Pick the orchestrator, artifact store, image builder and container registry.

Pick Components

Finally go into your Terminal and set this new stack with:

zenml stack set <STACK NAME>

Now run your training pipeline again, and see what happens.

Activity 4: Create a Run Template

Congratulations, you have run your pipeline in the remote environment. You can now create what is called a Run Template in ZenML. You can read more about them here

To create a run template, head on over to the dashboard and find the training pipeline that you just ran on your production stack.

Select Run

Now just click on the "New Template" button on top, give it a name and create it.

Create New Template

Congrats, you now have a Run Template. You can now tie this into larger workflows by calling the ZenML API. You can also go through the dashboard, change the configuration as needed, an run the template.

First navigate to the Run Templates section in your project ...

Select it

... open your template ...

Open it

.. and click on Run Template

Run it

  1. You can now adjust the step parameters
  2. ... and run it

Activity 5: Convert to ZenML Inference Pipeline (25 minutes)

Work on inference_pipeline_scaffold.py:

Activity 6: Promote model and make Inference Pipeline dependant on it

As you can see here, the inference pipeline curently will always pick the latest trained model. In a production setting we do not want this. We want to make sure we remain in control of which model is used.

Dependency on Latest

In order to promote your model, head on over to the frontend, find your model version of choice and promote it into Production.

Promote

Now go ahead and change the Model version for the inference pipeline.

Dependency on production

And voilร , you will now only use the chosen model version for inference, until you choose to promote another one.

Activity 7: Compare and Reflect (10 minutes)

Run both approaches and discuss:

  • What's better about the ZenML version?
  • How does artifact tracking work?
  • What would deployment look like?

๐Ÿ† Key Learning Points

Traditional ML Workflow Problems

Problem Example Impact
Hardcoded Paths pd.read_csv('data/file.csv') Breaks when files move
Mixed Concerns Training + evaluation in one cell Hard to debug/modify
Poor Versioning model_final_v2_BEST.pkl Can't track what changed
Manual Steps Copy-paste preprocessing Inconsistent between train/inference
No Tracking Print statements for metrics Can't compare experiments

ZenML Solutions

ZenML Feature Benefit
Steps Single responsibility
Pipelines Clear workflow DAG
Artifacts Automatic versioning
Type Hints Better lineage tracking
Caching Skip unchanged steps

๐Ÿ” Expected Outputs

After completing the workshop:

โœ… Working Training Pipeline

  • Clean, modular steps
  • Automatic artifact storage
  • Experiment tracking
  • Reproducible runs

โœ… Working Inference Pipeline

  • Consistent preprocessing
  • Model loading from registry
  • Batch prediction capability
  • Timestamped outputs

โœ… Better ML Practices

  • Version control friendly code
  • Easy collaboration
  • Production deployment ready
  • Monitoring capabilities

๐ŸŽ“ Solutions

If you get stuck, check the solutions

  • training_pipeline_complete.py - Fully implemented training pipeline
  • inference_pipeline_complete.py - Fully implemented inference pipeline

๐Ÿš€ Running the Solutions

# Run the complete training pipeline
python training_pipeline_complete.py

# Run the complete inference pipeline  
python inference_pipeline_complete.py

# View your pipeline runs
zenml pipeline runs list

# Explore artifacts
zenml artifact list

๐Ÿ”ง ZenML Commands Reference

# Initialize ZenML
zenml init

# Login to ZenML
zenml login

# Set ZenML Stack
zenml stack set zenml-workshop-stack

# View pipelines
zenml pipeline list

# View pipeline runs
zenml pipeline runs list

# View artifacts
zenml artifact list

# View models
zenml model list

# Start ZenML dashboard
zenml up

๐Ÿ“ˆ Next Steps After Workshop

  1. Add More Steps: Data validation, feature engineering, model comparison
  2. Integrate MLflow: For enhanced experiment tracking
  3. Add Deployment: Using ZenML's deployment capabilities
  4. Set Up Monitoring: Track model performance over time
  5. Cloud Integration: Deploy to AWS, GCP, or Azure
  6. Team Collaboration: Share pipelines and artifacts

๐Ÿ’ก Production Considerations

For Real-World Usage:

Data Validation

@step
def validate_data(df: pd.DataFrame) -> pd.DataFrame:
    # Check schema, data quality, distributions
    return df

Model Monitoring

@step  
def monitor_predictions(predictions: pd.DataFrame) -> dict:
    # Track prediction distributions, detect drift
    return monitoring_metrics

A/B Testing

@step
def compare_models(model_a: Model, model_b: Model) -> Model:
    # Statistical comparison, champion/challenger
    return best_model

๐Ÿ†˜ Troubleshooting

Common Issues:

  1. Import Errors: Make sure you've installed all requirements
  2. File Not Found: Run the data generation script first
  3. ZenML Not Initialized: Run zenml init
  4. Permission Errors: Check file permissions in working directory

Getting Help:

๐ŸŽ‰ Workshop Completion

Congratulations! You've learned how to:

โœ… Identify problems in traditional ML workflows
โœ… Structure ML code using ZenML steps and pipelines
โœ… Implement artifact versioning and experiment tracking
โœ… Create production-ready training and inference pipelines
โœ… Experience the benefits of MLOps best practices

Keep learning: Try applying these concepts to your own ML projects!

About

Repo as starting off point for in-person workshops.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published