Skip to content

ml-cube/ml3-drift

Repository files navigation

ml3-drift

Easy-to-embed drift detection

Build GitHub PyPi PyPi Downloads

ml3-drift is an open source AI library that provides seamless integration of drift detection algorithms into existing Machine Learning and AI frameworks. The purpose is to simplify the implementation process and enable developers to easily incorporate drift detection into their pipelines.

✅ Supported Frameworks

These are the frameworks we currently support. We will add much more in the future! Let us know if you are interested in a specific framework!

Framework How Example
scikit-learn scikit-learn Provides a scikit-learn compatible drift detector that integrates easily into existing scikit-learn pipelines. Mixed data monitoring
huggingface transformers (by huggingface) A minimal wrapper for the Pipeline object that looks like a Pipeline, behaves like a Pipeline but also monitors the output of the wrapped Pipeline.. Works with any feature extraction pipeline, both images and text. Text data monitoring

🛠️ Usage

ml3-drift components are designed to be easily integrated into your existing code. You should be able to use them with minimal changes to your code.

Here is a simple example with scikit-learn:

import logging

import numpy as np
from ml3_drift.sklearn.univariate.ks import KSDriftDetector
from sklearn.tree import DecisionTreeRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from ml3_drift.callbacks.base import logger_callback
from functools import partial

logger = logging.getLogger(__name__)

# Define your pipeline as usual, but also add a drift detector.
# The detector accepts a list of functions to be called when a drift is detected.
# The first argument of the function is a dataclass containing some information
# about the drift (check it out in ml3_drift/callbacks/models.py).
drift_detector = KSDriftDetector(
    callbacks=[
        partial(
            logger_callback,
            logger=logger,
            level=logging.CRITICAL,
        )
    ]
)

pipeline = Pipeline(
    steps=[
        ("preprocessor", StandardScaler()),
        ("monitoring", drift_detector),
        (
            "model",
            DecisionTreeRegressor(),
        ),
    ]
)

# When fitting the pipeline, the drift detector will
# save the training data as reference data.
# No effect on the model training.
pipeline = pipeline.fit(X_train, y_train)

# When making predictions, the drift detector will
# check if the incoming data is similar to the reference data
# and execute the callback you specified if a drift is detected.
predictions = pipeline.predict(X_test)

The example callback we provided will simply log a message when a drift is detected. For instance:

Drift detected on feature at index 0 by drift detector KSDriftDetector.
 p-value = 2.2027963703339932e-07
 Threshold = 0.005

You can find other examples in the examples folder. For more information, please refer to the documentation.

📦 Installation

ml3-drift is available on PyPI and supports Python versions from 3.10 to 3.13, included.

The integration with the different frameworks are managed through extra dependencies. The plain ml3-drift package comes without any dependency, which means that you need to specify the framework you want to use when installing the package. Otherwise, if you are just experimenting, you can install the package with all the available extras.

You can use pip:

pip install ml3-drift[all] # install all the dependencies
pip install ml3-drift[sklearn] # install only sklearn dependency
pip install ml3-drift[huggingface] # install huggingface dependency

or uv

uv add ml3-drift --all-extras # install all the dependencies
uv add ml3-drift --extra sklearn # install only sklearn dependency
uv add ml3-drift --extra huggingface # install only huggingface dependency

❓ What is drift detection? Why do we need it?

Machine Learning algorithms rely on the assumption that the data used during training comes from the same distribution as the data seen in production.

However, this assumption rarely holds true in the real world, where conditions are dynamic and constantly evolving. These distributional changes, if not addressed properly, can lead to a decline in model performance. This, in turn, can result in inaccurate predictions or estimations, potentially harming the business.

Drift Detection, often referred to as Monitoring, is the process of continuously tracking the performance of a model and the distribution of the data it is operating on. The objective is to quickly detect any changes in data distribution or behavior, so that corrective actions can be taken in a timely manner.

😅 Yet another drift detection library?

Not really. While there are many great open source drift detection libraries out there (nannyml, river, evidently just to name a few), we observed a lack of standardization in the API and misalignments with common ML interfaces. Our goal is to offer known drift detection algorithms behind a single unified API, tailored for relevant ML and AI frameworks such as scikit-learn and huggingface. Hopefully, this won't be the 15th competing standard 😉.

🚀 Contributing

We welcome contributions to ml3-drift! Since we are at a very early stage, we are looking forward to feedbacks, ideas and bug reports. Feel free to open an issue if you have any questions or suggestions.

Local Development

These are the steps you need to follow to set up your local development environment.

We use uv as package manager and just as command runner. Once you have both installed, you can clone the repository and run the following command to set up your development environment:

just dev-sync

The previous command will install all optional dependencies. If you want to install only one of them, run:

just dev-sync-extra extra-to-install
# for instance, just dev-sync-extra sklearn

Make sure you install the pre-commit hooks by running:

just install-hooks

To format your code, lint it and run tests, you can use the following command:

just validate

Notice that tests are run according to the installed libraries. If you don't have scikit-learn installed, all tests related to it will be skipped.

📜 License

This project is licensed under the terms of the Apache License Version 2.0. For more details, please refer to the LICENSE file. All contributions to this project will be distributed under the same license.

👥 Authors

This project was originally developed at ML cube and has been open-sourced to benefit the ML community, from which we deeply welcome contributions.

While ml3-drift provides easy to use and integrated drift detection algorithms, companies requiring enterprise-grade monitoring, advanced analytics and insights capabilities might be interested in trying out our product, the ML cube Platform.

The ML cube Platform (website, docs) is a comprehensive end-to-end ModelOps framework that helps you trust your AI models and GenAI applications by providing several functionalities, such as data and model monitoring, drift root cause analysis, performance-safe model retraining and LLM security. It can both be used during the development phase of your models and in production, to ensure that your models are performing as expected and quickly detect and understand any issues that may arise.

If you'd like to learn more about our product or wonder how we can help you with your AI projects, visit our websites or contact us at info@mlcube.com.