Skip to content

vathymut/samesame

samesame

Python PyPI - Downloads Static Badge License: LGPLv3 UAI 2022 uv Ruff

Same, same but different ...

samesame implements classifier two-sample tests (CTSTs) and as a bonus extension, a noninferiority test (NIT). These tests are either missing or implemented with significant tradeoffs (looking at you, sample-splitting) in existing libraries.

samesame is versatile, extensible, lightweight, powerful, and agnostic to your inference strategy so long as it is valid (e.g. cross-fitting, sample splitting, etc.).

Motivation

samesame is for those who need statistical tests for:

  • Data validation - Verify that data distributions meet expectations
  • Model performance monitoring - Detect performance degradation over time
  • Drift detection - Identify dataset shifts between training and production
  • Statistical process control - Monitor system behavior and quality
  • Covariate balance - Assess balance in observational studies

A motivating example is available from the related R package dsos, which provides some of the same functionality.

Installation

To install, run the following command:

python -m pip install samesame

Quick Start

This example demonstrates the key distinction between tests of equal distribution and noninferiority tests—a critical difference for avoiding false alarms in production systems.

Simulate outlier scores to test for no adverse shift:

from samesame.ctst import CTST
from samesame.nit import DSOS
from sklearn.metrics import roc_auc_score
import numpy as np

n_size = 600
rng = np.random.default_rng(123_456)
os_train = rng.normal(size=n_size)
os_test = rng.normal(size=n_size)
null_ctst = CTST.from_samples(os_train, os_test, metric=roc_auc_score)
null_dsos = DSOS.from_samples(os_train, os_test)

Test of equal distribution (CTST): Rejects the null of equal distributions

print(f"{null_ctst.pvalue=:.4f}")
# null_ctst.pvalue=0.0358

Noninferiority test (DSOS): Fails to reject the null of no adverse shift

print(f"{null_dsos.pvalue=:.4f}")
# null_dsos.pvalue=0.9500

Key insight: While the test sample (os_test) has a statistically different distribution from the training sample (os_train), it does not contain disproportionally more outliers. This distinction is exactly what samesame highlights—many practitioners conflate "different distribution" with "problematic shift," but samesame helps you distinguish between the two.

Usage

Functionality

Below, you will find an overview of common modules in samesame.

Function Module
Bayesian inference samesame.bayes
Classifier two-sample tests (CTSTs) samesame.ctst
Noninferiority tests (NITs) samesame.nit

Attributes

When the method is a statistical test, samesame saves (stores) the results of some potentially computationally intensive results in attributes. These attributes, when available, can be accessed as follows.

Attribute Description
.statistic The test statistic for the hypothesis.
.null The null distribution for the hypothesis.
.pvalue The p-value for the hypothesis.
.posterior The posterior distribution for the hypothesis.
.bayes_factor The bayes factor for the hypothesis.

Examples

To get started, please see the examples in the docs.

Dependencies

samesame has minimal dependencies beyond the Python standard library, making it a lightweight addition to most machine learning projects. It is built on top of, and fully compatible with, scikit-learn and numpy.