CI | Test |
|
Style |
|
|
Doc |
|
|
Doc | Readthedocs |
|
Checks | Code style |
|
Types |
|
|
Build |
|
|
Install | Pip |
|
Conda |
|
|
Github |
|
|
Cite |
|
You can make pigs fly
, [Kolter&Madry, 2018]
skwdro
is a Python package that offers WDRO versions for a large range of estimators, either by extending scikit-learn
estimators or by providing a wrapper for pytorch
modules.
Have a look at skwdro
documentation!
(Saw a figure at one of our presentation that is not in the doc, and want to see the code? Take a look at our experiments repo!)
First install hatch
and clone the archive. In the root folder, make shell
gives you an interactive shell in the correct environment and make test
runs the tests (it can be launched from both an interactive shell and a normal shell).
make reset_env
removes installed environments (useful in case of troubles).
Run the following command to get the latest version of the package
pip install -U skwdro
For uv
users:
uv pip install skwdro
It is also available via conda and alikes (mamba, etc) and can be installed using, for instance:
conda install flvincen::skwdro
Robust estimators from skwdro
can be used as drop-in replacements for scikit-learn
estimators (they actually inherit from scikit-learn
estimators and classifier classes.). skwdro
provides robust estimators for standard problems such as linear regression or logistic regression. LinearRegression
from skwdro.linear_model
is a robust version of LinearRegression
from scikit-learn
and be used in the same way. The only difference is that now an uncertainty radius rho
is required.
We assume that we are given X_train
of shape (n_train, n_features)
and y_train
of shape (n_train,)
as training data and X_test
of shape (n_test, n_features)
as test data.
import numpy as np
from sklearn.linear_model import LinearRegression as ERMRegression
from skwdro.linear_models import LinearRegression as DRORegression
# Some toy linear problem: e.g. additive noise level shift
rng = np.random.RandomState(666)
X_train = rng.randn(10, 1)
X_test = rng.randn(5, 1) + .5
y_train = 2. * X_train.flatten() + .01 * rng.randn(10)
y_test = 2. * X_test.flatten() + .1 * rng.randn(5)
# Uncertainty radius
rho = 0.1
# Fit the model
erm_model = ERMRegression()
robust_model = DRORegression(rho=rho)
erm_model.fit(X_train, y_train)
robust_model.fit(X_train, y_train)
# Predict the target values
y_pred = erm_model.predict(X_test)
y_pred = robust_model.predict(X_test)
You can refer to the documentation to explore the list of skwdro
's already-made estimators.
Didn't find a estimator that suits you? You can compose your own using the pytorch
interface: it allows more flexibility, custom models and optimizers.
Assume now that the data is given as a dataloader train_loader
.
import torch as pt
import torch.nn as nn
import torch.optim as optim
from skwdro.torch import robustify
# Toy data
n_features = 3
X = pt.randn(32, n_features)
y = X @ pt.rand(n_features, 1) + 1.
train_loader = pt.utils.data.DataLoader(
pt.utils.data.TensorDataset(X, y),
batch_size=4
)
# Uncertainty radius
rho = pt.tensor(.1)
# Define the model
model = nn.Linear(n_features, 1)
# Define the loss function
loss_fn = nn.MSELoss(reduction='none')
# Define a sample batch for initialization
sample_batch_x, sample_batch_y = X[:16, :], y[:16, :]
# Robust loss
robust_loss = robustify(loss_fn, model, rho, sample_batch_x, sample_batch_y)
# Define the optimizer
optimizer = optim.AdamW(model.parameters(), lr=.1)
# Training loop
for epoch in range(100):
avg_loss = 0.
robust_loss.get_initial_guess_at_dual(X, y)
for batch_x, batch_y in train_loader:
optimizer.zero_grad()
loss = robust_loss(batch_x, batch_y)
loss.backward()
optimizer.step()
avg_loss += loss.detach().item()
print(f"=== Loss (epoch \t{epoch}): {avg_loss/len(train_loader)}")
You will find detailed description on how to robustify
modules in the documentation.
skwdro
is the result of a research project. It is licensed under BSD 3-Clause. You are free to use it and if you do so, please cite
@article{vincent2024skwdro,
title={skwdro: a library for Wasserstein distributionally robust machine learning},
author={Vincent, Florian and Azizian, Wa{\"\i}ss and Iutzeler, Franck and Malick, J{\'e}r{\^o}me},
journal={arXiv preprint arXiv:2410.21231},
year={2024}
}