Skip to content

CyrilJl/BatchStats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

160 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo BatchStats

PyPI Version conda Version Documentation Status Unit tests Codacy Badge

BatchStats

BatchStats computes statistics on data that arrives in batches, so you can stream or process large datasets without loading everything into memory. Feed batches with update_batch, then call the object to get the final result.

Installation

pip install batchstats

Or with conda/mamba:

conda install -c conda-forge batchstats

Quick Start

import numpy as np
from batchstats import BatchMean, BatchVar

data_stream = (np.random.randn(100, 10) for _ in range(10))

batch_mean = BatchMean()
batch_var = BatchVar()

for batch in data_stream:
    batch_mean.update_batch(batch)
    batch_var.update_batch(batch)

mean = batch_mean()
variance = batch_var()

print(f"Mean shape: {mean.shape}")
print(f"Variance shape: {variance.shape}")

Available Statistics

  • BatchSum / BatchNanSum
  • BatchWeightedSum
  • BatchMean / BatchNanMean
  • BatchWeightedMean
  • BatchMin / BatchNanMin
  • BatchMax / BatchNanMax
  • BatchPeakToPeak / BatchNanPeakToPeak
  • BatchVar
  • BatchStd
  • BatchCov
  • BatchCorr

Docs: https://batchstats.readthedocs.io

About

Python package for efficient, online statistical computations on streaming or large-scale data

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages