Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions docs/source/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,6 @@ The different types of changes that are considered as a form of drift can be cat
${P_{[0, t]}(X) \neq P_{[t+1, \infty)}(X)}$. [Data drift methods](#data-drift) are designed to try to detect this type drift. Unlike *concept drift* taking place, the presence of *data drift* does not guarantee that model's performance is being affected, but it is highly probable that is happening. We have renamed *dataset shift* {cite}`rabanser2019failing` to *data drift*
in order to maintain consistency with the *concept drift* definition. These *data drift* methods can also be used to detect *label drift*, also known as *prior probability shift* {cite}`storkey2009training`, where the label distribution ${P(Y)}$ is the one that changes over time, in such a way that ${P_{[0, t]}(Y) \neq P_{[t+1, \infty)}(Y)}$.

## Why do I need to use a drift detector?

One of the main mistakes when deploying a machine learning model for consumption is to assume that the data used for inference will come from the same distribution as the data on which the model was trained, i.e. that the data will be stationary. It may also be the case that the data use at inference time is still similar to those used for training, but the concept of what was learned in the first instance has changed over time, making the model obsolete in terms of performance.

Drift detectors make it possible to monitor model performance or feature distributions in order to detect significant deviations that can cause model performance decay. By using them it is possible to know when it is necessary to replace the current model with a new one trained on more recent data.

## Verification latency or delay

According to {cite}`dos2016fast`, is defined as the period between a model's prediction and the availability of the ground-truth label (in case of a classification problem) or the target value (in case of a regression problem).
Expand Down
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@
]
myst_url_schemes = ("http", "https", "mailto")
myst_heading_anchors = 3
myst_all_links_external = True

# MyST-NB configuration
nb_execution_timeout = 480
Expand Down
85 changes: 85 additions & 0 deletions docs/source/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# FAQ

Here we will try to answer some of the most common questions about drift detection and the Frouros library.

## What is the difference between *concept drift* and *data drift*?

Concept drift refers to changes in the underlying concept being modeled, such as changes in the relationship between
the input features and the target variable. It can be caused by changes in the conditional probability $P(y|X)$ with or
without a change in $P(X)$. Data drift, on the other hand, refers to changes in the distribution of the input features
$P(X)$, such as changes in the feature distributions over time. It focuses on detecting when the incoming data no longer
resembles the data the model was trained on, potentially leading to decreased performance or reliability.

## What is the difference between *out-of-distribution* detection and *data drift* detection?

Out-of-distribution detection focuses on identifying samples that fall outside the training distribution, often used
to detect anomalies or novel data. It aims to detect instances that differ significantly from the data the model was
trained on. Data drift detection, on the other hand, is concerned with identifying shifts or changes in the
distribution of the data over time.

## How can I detect *concept drift* without having access to the ground truth labels at inference time?

In cases where ground truth labels are not available at inference time or the verification latency is high, it may not
be possible to directly detect concept drift using traditional methods. In such cases, it may be necessary to use
alternative techniques, such as data drift detection, to monitor changes in the feature distributions and identify
potential drift. By monitoring the feature distributions, it may be possible to detect when the incoming data no
longer resembles the data the model was trained on, even in the absence of ground truth labels.

## Why do I need to use a *drift* detector?

One of the main mistakes when deploying a machine learning model for consumption is to assume that the data used for
inference will come from the same distribution as the data on which the model was trained, i.e., that the data will be
stationary. It may also be the case that the data used at inference time is still similar to those used for training,
but the concept of what was learned in the first instance has changed over time, making the model obsolete in terms of
performance.

Drift detectors make it possible to monitor model performance or feature distributions to detect significant deviations
that can cause model performance decay. By using them, it is possible to know when it is necessary to replace the
current model with a new one trained on more recent data.

## Is *model drift* the same as *concept drift*?

Model drift is a term used to describe the degradation of a model's performance over time. This can be caused by a
variety of factors, including concept drift, data drift, or other issues such as model aging. Concept drift, on the
other hand, refers specifically to changes in the underlying concept being modeled, such as changes in the relationship
between the input features and the target variable. While concept drift can lead to model drift, model drift can also be
caused by other factors and may not always be directly related to changes in the underlying concept.

## What actions should I take if *drift* is detected in my model?

If drift is detected in your model, it is important to take action to address the underlying cause of the drift.
This may involve retraining the model on more recent data, updating the model's features or architecture, or taking
other steps to ensure that the model remains accurate and reliable. In some cases, it may also be necessary to
re-evaluate the model's performance and consider whether it is still suitable for its intended use case.

## Can Frouros be integrated with popular machine learning frameworks such as TensorFlow or PyTorch?

Yes, Frouros is designed to be compatible with any machine learning frameworks such as TensorFlow or PyTorch. It is
framework-agnostic and can be used with any machine learning model or pipeline.

For instance, we provide an [example](./examples/data_drift/MMD_advance.html) that shows how to integrate Frouros with a PyTorch model to detect data
drift for a computer vision use case. In addition, there is an [example](./examples/concept_drift/DDM_advance.html) that shows how to integrate Frouros with
scikit-learn to detect concept drift in a streaming manner.

## How frequently should I run *drift* detection checks in my machine learning pipeline?

The frequency of drift detection checks will depend on the specific use case and the nature of the data being
processed. In general, it is a good practice to run drift detection checks regularly, such as after each batch of
data or at regular intervals, to ensure that any drift is detected and addressed in a timely manner.

## What are some common causes of *drift* in machine learning models?

Drift in machine learning models can be caused by a variety of factors, including changes in the underlying concept
being modeled, changes in the distribution of the input features, changes in the relationship between the input
features and the target variable, and other issues such as model aging or degradation. It is important to monitor
models for drift and take action to address any detected drift to maintain model accuracy and reliability.

## How can I contribute to the development of Frouros or report issues?

The [contribute section](./contribute.html#how-to-contribute) provides information on how to contribute to the development of Frouros,
including guidelines for reporting issues, submitting feature requests, and contributing code or documentation.

## Does Frouros provide visualization tools for *drift* detection results?

Frouros does not currently provide built-in visualization tools for drift detection results, but it is planned to
include them in future releases.
6 changes: 5 additions & 1 deletion docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@
:end-before: ⚡️ Quickstart
```

In order to start using `frouros`, we highly recommend to check {doc}`concepts <concepts>` section to get a quick idea of what `frouros` is capable of, and what it is not yet capable of. Subsequently, we recommend taking a look at the {doc}`examples <examples>` section since it is the best way to start using `frouros`.
In order to start using `frouros`, we highly recommend to check {doc}`concepts <concepts>` and
{doc}`FAQ <faq>` sections to get a quick idea of what `frouros` is capable of, and what it is not yet capable
of. Subsequently, we recommend taking a look at the {doc}`examples <examples>` section since it is the best way to
start using `frouros`.

Read {doc}`installation <installation>` instructions to start using `frouros`.

Expand All @@ -26,4 +29,5 @@ concepts
api_reference
examples
contribute
faq
```