Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 19 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,36 +21,33 @@

<p align="center">Frouros is a Python library for drift detection in Machine Learning problems.</p>

Frouros provides a combination of classical and more recent algorithms for drift detection, both for the supervised and unsupervised parts, as well as some semi-supervised algorithms. The library tries to fulfill two main objectives: 1. to be able to easily integrate in a machine learning model development pipeline with the [scikit-learn](https://github.com/scikit-learn/scikit-learn) library; 2. to unify in a single library the part of concept drift detection and
adaptation (traditionally researched and used for streaming/evolving data streams and incremental learning) with the research of change detection in the covariate distributions (also known as data shift, related to the field of statistical two-sample testing and methods that measure distance between distributions).
Frouros provides a combination of classical and more recent algorithms for drift detection, both for detecting concept and data drift.

## Quickstart

As a quick and easy example, we can generate two bivariate normal distribution in order to use an unsupervised method like MMD (Maximum Mean Discrepancy). This method tries to verify if generated samples come from the same distribution or not. If they come from different distributions, it means that there is data drift.
As a quick and easy example, we can generate two normal distributions in order to use a data drift detector like Kolmogorov-Smirnov. This method tries to verify if generated samples come from the same distribution or not. If they come from different distributions, it means that there is data drift.

```python
from sklearn.gaussian_process.kernels import RBF
import numpy as np
from frouros.unsupervised.distance_based import MMD
from frouros.data_drift.batch import KSTest

np.random.seed(31)
# X samples from a normal distribution with mean = [1. 1.] and cov = [[2. 0.][0. 2.]]
x_mean = np.ones(2)
x_cov = 2*np.eye(2)
# Y samples a normal distribution with mean = [0. 0.] and cov = [[2. 1.][1. 2.]]
y_mean = np.zeros(2)
y_cov = np.eye(2) + 1
# X samples from a normal distribution with mean=2 and std=2
x_mean = 2
x_std = 2
# Y samples a normal distribution with mean=1 and std=2
y_mean = 1
y_std = 2

num_samples = 200
X_ref = np.random.multivariate_normal(x_mean, x_cov, num_samples)
X_test = np.random.multivariate_normal(y_mean, y_cov, num_samples)
num_samples = 10000
X_ref = np.random.normal(x_mean, x_std, num_samples)
X_test = np.random.normal(y_mean, y_std, num_samples)

alpha = 0.01 # significance level for the hypothesis test

detector = MMD(num_permutations=1000, kernel=RBF(length_scale=1.0), random_state=31)
detector = KSTest()
detector.fit(X=X_ref)
detector.transform(X=X_test)
mmd, p_value = detector.distance
statistic, p_value = detector.compare(X=X_test)

p_value < alpha
>>> True # Drift detected. We can reject H0, so both samples come from different distributions.
Expand All @@ -65,10 +62,7 @@ Frouros supports Python 3.8, 3.9 and 3.10 versions. It can be installed via pip:
```bash
pip install frouros
```
there is also the option to use [PyTorch](https://github.com/pytorch/pytorch) models with the help of [skorch](https://github.com/skorch-dev/skorch):
```bash
pip install frouros[pytorch]
```

Latest main branch modifications can be installed via:
```bash
pip install git+https://github.com/IFCA/frouros.git
Expand All @@ -89,9 +83,9 @@ The currently supported methods are listed in the following table. They are divi
<tbody>
<tr>
<td rowspan="12">
<a href="https://github.com/jaime-cespedes-sisniega/frouros/blob/main/frouros/supervised/base.py">
<a href="https://github.com/jaime-cespedes-sisniega/frouros/blob/main/frouros/concept_drift/base.py">
<div style="height:100%;width:100%">
Supervised
Concept drift
</div>
</a>
</td>
Expand Down Expand Up @@ -219,43 +213,11 @@ The currently supported methods are listed in the following table. They are divi
</td>
</tr>
</tr>
<tr>
<td rowspan="2">
<a href="https://github.com/jaime-cespedes-sisniega/frouros/blob/main/frouros/semi_supervised/base.py">
<div style="height:100%;width:100%">
Semi-supervised
</div>
</a>
</td>
<td rowspan="2">
<a href="https://github.com/jaime-cespedes-sisniega/frouros/blob/main/frouros/semi_supervised/margin_density_based/base.py">
<div style="height:100%;width:100%">
Margin Density Based
</div>
</a>
</td>
<td>
<a href="https://github.com/jaime-cespedes-sisniega/frouros/blob/main/frouros/semi_supervised/margin_density_based/md3.py">
<div style="height:100%;width:100%">
MD3-SVM
</div>
</a>
</td>
<tr>
<td>
<a href="https://github.com/jaime-cespedes-sisniega/frouros/blob/main/frouros/supervised/margin_density_based/md3.py">
<div style="height:100%;width:100%">
MD3-RS
</div>
</a>
</td>
</tr>
</tr>
<tr>
<td rowspan="10">
<a href="https://github.com/jaime-cespedes-sisniega/frouros/blob/main/frouros/unsupervised/base.py">
<a href="https://github.com/jaime-cespedes-sisniega/frouros/blob/main/frouros/data_drift/base.py">
<div style="height:100%;width:100%">
Unsupervised
Data drift
</div>
</a>
</td>
Expand Down
35 changes: 0 additions & 35 deletions frouros/common/update.py

This file was deleted.

59 changes: 59 additions & 0 deletions frouros/concept_drift/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""Concept drift detection methods init."""

from .cusum_based import (
CUSUM,
CUSUMConfig,
GeometricMovingAverage,
GeometricMovingAverageConfig,
PageHinkley,
PageHinkleyConfig,
)
from .ddm_based import (
DDM,
DDMConfig,
ECDDWT,
ECDDWTConfig,
EDDM,
EDDMConfig,
HDDMA,
HDDMAConfig,
HDDMW,
HDDMWConfig,
RDDM,
RDDMConfig,
STEPD,
STEPDConfig,
)
from .window_based import (
ADWIN,
ADWINConfig,
KSWIN,
KSWINConfig,
)

__all__ = [
"ADWIN",
"ADWINConfig",
"CUSUM",
"CUSUMConfig",
"DDM",
"DDMConfig",
"ECDDWT",
"ECDDWTConfig",
"EDDM",
"EDDMConfig",
"GeometricMovingAverage",
"GeometricMovingAverageConfig",
"HDDMA",
"HDDMAConfig",
"HDDMW",
"HDDMWConfig",
"KSWIN",
"KSWINConfig",
"PageHinkley",
"PageHinkleyConfig",
"RDDM",
"RDDMConfig",
"STEPD",
"STEPDConfig",
]
118 changes: 118 additions & 0 deletions frouros/concept_drift/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
"""Concept drift base module."""

import abc
from typing import ( # noqa: TYP001
Dict,
Union,
)


class ConceptDriftBaseConfig(abc.ABC):
"""Abstract class representing a concept drift configuration class."""

def __init__(
self,
min_num_instances: int,
) -> None:
"""Init method.

:param min_num_instances: minimum numbers of instances
to start looking for changes
:type min_num_instances: int
"""
self.min_num_instances = min_num_instances

@property
def min_num_instances(self) -> int:
"""Minimum number of instances property.

:return: minimum number of instances to start looking for changes
:rtype: int
"""
return self._min_num_instances

@min_num_instances.setter
def min_num_instances(self, value: int) -> None:
"""Minimum number of instances setter.

:param value: value to be set
:type value: int
"""
self._min_num_instances = value


class ConceptDriftBase(abc.ABC):
"""Abstract class representing a delayed target."""

def __init__(
self,
config: ConceptDriftBaseConfig,
) -> None:
"""Init method.

:param config: configuration parameters
:type config: ConceptDriftBaseConfig
"""
self.config = config
self.num_instances = 0

@property
def config(self) -> ConceptDriftBaseConfig:
"""Config property.

:return: configuration parameters of the estimator
:rtype: ConceptDriftBaseConfig
"""
return self._config

@config.setter
def config(self, value: ConceptDriftBaseConfig) -> None:
"""Config setter.

:param value: value to be set
:type value: ConceptDriftBaseConfig
:raises TypeError: Type error exception
"""
if not isinstance(value, ConceptDriftBaseConfig):
raise TypeError("value must be of type ConceptDriftBaseConfig.")
self._config = value

@property
def num_instances(self) -> int:
"""Number of instances counter property.

:return: Number of instances counter value
:rtype: int
"""
return self._num_instances

@num_instances.setter
def num_instances(self, value: int) -> None:
"""Number of instances counter setter.

:param value: value to be set
:type value: int
:raises ValueError: Value error exception
"""
if value < 0:
raise ValueError("num_instances must be greater or equal than 0.")
self._num_instances = value

def reset(self, *args, **kwargs) -> None:
"""Reset method."""

@property
def status(self) -> Dict[str, bool]:
"""Status property.

:return: status dict
:rtype: Dict[str, bool]
"""

@abc.abstractmethod
def update(self, value: Union[int, float]) -> None:
"""Abstract update method.

:param value: value to update detector
:type value: Union[int, float]
"""
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Supervised CUSUM based detection methods' init."""
"""Concept drift CUSUM based detection methods' init."""

from .cusum import CUSUM, CUSUMConfig
from .geometric_moving_average import (
Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
"""Supervised CUSUM based base module."""
"""Concept drift CUSUM based base module."""

import abc
from typing import ( # noqa: TYP001
Dict,
Union,
)

from sklearn.base import BaseEstimator # type: ignore

from frouros.supervised.base import SupervisedBaseEstimator, SupervisedBaseConfig
from frouros.concept_drift.base import ConceptDriftBase, ConceptDriftBaseConfig
from frouros.utils.stats import Mean


class CUSUMBaseConfig(SupervisedBaseConfig):
class CUSUMBaseConfig(ConceptDriftBaseConfig):
"""Class representing a CUSUM based configuration class."""

def __init__(
Expand Down Expand Up @@ -125,22 +123,19 @@ def alpha(self, value: float) -> None:
self._alpha = value


class CUSUMBaseEstimator(SupervisedBaseEstimator):
class CUSUMBase(ConceptDriftBase):
"""CUSUM based algorithm class."""

def __init__(
self,
estimator: BaseEstimator,
config: CUSUMBaseConfig,
) -> None:
"""Init method.

:param estimator: sklearn estimator
:type estimator: BaseEstimator
:param config: configuration parameters
:type config: CUSUMBaseConfig
"""
super().__init__(estimator=estimator, config=config)
super().__init__(config=config)
self.mean_error_rate = Mean()
self.sum_ = 0.0
self.drift = False
Expand Down
Loading