Convert traditional ML notebooks into production-ready ZenML pipelines! This workshop teaches you how to transform messy, unstructured ML code into clean, reproducible, and scalable MLOps workflows.
By the end of this workshop, you will:
- Understand the problems with traditional notebook-based ML workflows
- Learn ZenML fundamentals including steps, pipelines, and artifacts
- Convert messy ML code into clean, structured ZenML pipelines
- Implement both training and inference pipelines with proper separation of concerns
- Experience the benefits of MLOps best practices including versioning, tracking, and reproducibility
workshop-scaffold/
โโโ ๐ data/
โ โโโ customer_churn.csv # Sample dataset (generated)
โ โโโ generate_sample_data.py # Script to create dataset
โโโ ๐ workshop_notebook.ipynb # Traditional ML workflow (BEFORE)
โโโ ๐ง training_pipeline_scaffold.py # Training pipeline template (TODO)
โโโ ๐ฎ inference_pipeline_scaffold.py # Inference pipeline template (TODO)
โโโ ๐ฆ requirements.txt # All necessary dependencies
โโโ ๐ README.md # This file
# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install uv
uv pip install -r requirements.txt
# Initialize ZenML
zenml init
zenml login
zenml integration install gcp github -y --uv
zenml stack set zenml-workshop-local-stack
Open and run workshop_notebook.ipynb
to see a typical data scientist's workflow:
# Start Jupyter
jupyter notebook workshop_notebook.ipynb
๐ด Notice the Problems:
- Hardcoded file paths
- Mixed concerns in single cells
- Poor model versioning (
model_final_v2_actually_final_BEST.pkl
) - Manual preprocessing steps
- No experiment tracking
- Difficult to reproduce
Run through workshop_notebook.ipynb
and identify:
- What could go wrong in production?
- How hard would it be to collaborate on this code?
- What happens when you need to retrain the model?
Work on training_pipeline_scaffold.py
:
Now that your pipeline ran locally, let's try running it on a remote orchestrator - to do this, you need to create a new stack:
Go for the Manual stack creation
Pick the orchestrator, artifact store, image builder and container registry.
Finally go into your Terminal and set this new stack with:
zenml stack set <STACK NAME>
Now run your training pipeline again, and see what happens.
Congratulations, you have run your pipeline in the remote environment. You can now create what is called a Run Template
in ZenML. You can read more about them here
To create a run template, head on over to the dashboard and find the training pipeline that you just ran on your production stack.
Now just click on the "New Template" button on top, give it a name and create it.
Congrats, you now have a Run Template. You can now tie this into larger workflows by calling the ZenML API. You can also go through the dashboard, change the configuration as needed, an run the template.
First navigate to the Run Templates section in your project ...
... open your template ...
.. and click on Run Template
- You can now adjust the step parameters
- ... and run it
Work on inference_pipeline_scaffold.py
:
As you can see here, the inference pipeline curently will always pick the latest trained model. In a production setting we do not want this. We want to make sure we remain in control of which model is used.
In order to promote your model, head on over to the frontend, find your model version of choice and promote it into Production.
Now go ahead and change the Model version for the inference pipeline.
And voilร , you will now only use the chosen model version for inference, until you choose to promote another one.
Run both approaches and discuss:
- What's better about the ZenML version?
- How does artifact tracking work?
- What would deployment look like?
Problem | Example | Impact |
---|---|---|
Hardcoded Paths | pd.read_csv('data/file.csv') |
Breaks when files move |
Mixed Concerns | Training + evaluation in one cell | Hard to debug/modify |
Poor Versioning | model_final_v2_BEST.pkl |
Can't track what changed |
Manual Steps | Copy-paste preprocessing | Inconsistent between train/inference |
No Tracking | Print statements for metrics | Can't compare experiments |
ZenML Feature | Benefit |
---|---|
Steps | Single responsibility |
Pipelines | Clear workflow DAG |
Artifacts | Automatic versioning |
Type Hints | Better lineage tracking |
Caching | Skip unchanged steps |
After completing the workshop:
- Clean, modular steps
- Automatic artifact storage
- Experiment tracking
- Reproducible runs
- Consistent preprocessing
- Model loading from registry
- Batch prediction capability
- Timestamped outputs
- Version control friendly code
- Easy collaboration
- Production deployment ready
- Monitoring capabilities
If you get stuck, check the solutions
training_pipeline_complete.py
- Fully implemented training pipelineinference_pipeline_complete.py
- Fully implemented inference pipeline
# Run the complete training pipeline
python training_pipeline_complete.py
# Run the complete inference pipeline
python inference_pipeline_complete.py
# View your pipeline runs
zenml pipeline runs list
# Explore artifacts
zenml artifact list
# Initialize ZenML
zenml init
# Login to ZenML
zenml login
# Set ZenML Stack
zenml stack set zenml-workshop-stack
# View pipelines
zenml pipeline list
# View pipeline runs
zenml pipeline runs list
# View artifacts
zenml artifact list
# View models
zenml model list
# Start ZenML dashboard
zenml up
- Add More Steps: Data validation, feature engineering, model comparison
- Integrate MLflow: For enhanced experiment tracking
- Add Deployment: Using ZenML's deployment capabilities
- Set Up Monitoring: Track model performance over time
- Cloud Integration: Deploy to AWS, GCP, or Azure
- Team Collaboration: Share pipelines and artifacts
Data Validation
@step
def validate_data(df: pd.DataFrame) -> pd.DataFrame:
# Check schema, data quality, distributions
return df
Model Monitoring
@step
def monitor_predictions(predictions: pd.DataFrame) -> dict:
# Track prediction distributions, detect drift
return monitoring_metrics
A/B Testing
@step
def compare_models(model_a: Model, model_b: Model) -> Model:
# Statistical comparison, champion/challenger
return best_model
Common Issues:
- Import Errors: Make sure you've installed all requirements
- File Not Found: Run the data generation script first
- ZenML Not Initialized: Run
zenml init
- Permission Errors: Check file permissions in working directory
Getting Help:
- ZenML Documentation: https://docs.zenml.io/
- ZenML Discord: https://zenml.io/slack-invite
- GitHub Issues: https://github.com/zenml-io/zenml/issues
Congratulations! You've learned how to:
โ
Identify problems in traditional ML workflows
โ
Structure ML code using ZenML steps and pipelines
โ
Implement artifact versioning and experiment tracking
โ
Create production-ready training and inference pipelines
โ
Experience the benefits of MLOps best practices
Keep learning: Try applying these concepts to your own ML projects!