Same, same but different ...
samesame implements classifier two-sample tests (CTSTs) and as a bonus extension, a noninferiority
test (NIT). These tests are either missing or implemented with significant tradeoffs (looking at you, sample-splitting) in existing libraries.
samesame is versatile, extensible, lightweight, powerful, and agnostic to your inference strategy
so long as it is valid (e.g. cross-fitting, sample splitting, etc.).
samesame is for those who need statistical tests for:
- Data validation - Verify that data distributions meet expectations
- Model performance monitoring - Detect performance degradation over time
- Drift detection - Identify dataset shifts between training and production
- Statistical process control - Monitor system behavior and quality
- Covariate balance - Assess balance in observational studies
A motivating example is available from the related R package dsos, which provides some of the same functionality.
To install, run the following command:
python -m pip install samesameThis example demonstrates the key distinction between tests of equal distribution and noninferiority tests—a critical difference for avoiding false alarms in production systems.
Simulate outlier scores to test for no adverse shift:
from samesame.ctst import CTST
from samesame.nit import DSOS
from sklearn.metrics import roc_auc_score
import numpy as np
n_size = 600
rng = np.random.default_rng(123_456)
os_train = rng.normal(size=n_size)
os_test = rng.normal(size=n_size)
null_ctst = CTST.from_samples(os_train, os_test, metric=roc_auc_score)
null_dsos = DSOS.from_samples(os_train, os_test)Test of equal distribution (CTST): Rejects the null of equal distributions
print(f"{null_ctst.pvalue=:.4f}")
# null_ctst.pvalue=0.0358Noninferiority test (DSOS): Fails to reject the null of no adverse shift
print(f"{null_dsos.pvalue=:.4f}")
# null_dsos.pvalue=0.9500Key insight: While the test sample (os_test) has a statistically different distribution from the training sample (os_train), it does not contain disproportionally more outliers. This distinction is exactly what samesame highlights—many practitioners conflate "different distribution" with "problematic shift," but samesame helps you distinguish between the two.
Below, you will find an overview of common modules in samesame.
| Function | Module |
|---|---|
| Bayesian inference | samesame.bayes |
| Classifier two-sample tests (CTSTs) | samesame.ctst |
| Noninferiority tests (NITs) | samesame.nit |
When the method is a statistical test, samesame saves (stores) the results of
some potentially computationally intensive results in attributes. These
attributes, when available, can be accessed as follows.
| Attribute | Description |
|---|---|
.statistic |
The test statistic for the hypothesis. |
.null |
The null distribution for the hypothesis. |
.pvalue |
The p-value for the hypothesis. |
.posterior |
The posterior distribution for the hypothesis. |
.bayes_factor |
The bayes factor for the hypothesis. |
To get started, please see the examples in the docs.
samesame has minimal dependencies beyond the Python standard library, making it a lightweight addition to most machine learning projects. It is built on top of, and fully compatible with, scikit-learn and numpy.