Skip to content

krkawzq/biosparse

Repository files navigation

Python License

🧬 biosparse

Sparse matrices. Reimagined for biology.

1000x faster than scipy. 10-100x faster than scanpy.
Zero-cost slicing. Numba-native. Production-ready.


Why biosparse?

biosparse is built on three pillars:

1️⃣ Biology-First Sparse Matrices

A custom sparse matrix format designed for how biologists actually work:

  • Zero-cost slicing & stacking - Subset genes/cells without copying data
  • scipy/numpy compatible - from_scipy(), to_scipy(), works with your existing code
  • Memory efficient - Views instead of copies, reduced memory footprint
from biosparse import CSRF64
import scipy.sparse as sp

# From scipy (zero-copy available)
csr = CSRF64.from_scipy(scipy_mat, copy=False)

# Zero-cost operations
subset = csr[1000:2000, :]           # No data copy
stacked = CSRF64.vstack([csr1, csr2])  # Efficient concatenation

# Back to scipy when needed
scipy_mat = csr.to_scipy()

2️⃣ High-Performance Kernels

Battle-tested algorithms built on our sparse matrix, compiled with Numba JIT:

Algorithm vs scipy vs scanpy
Sparse nonlinear ops 1,000 - 10,000x -
HVG selection - 10 - 100x
Mann-Whitney U - 10 - 100x
t-test - 10 - 100x

Speedup scales with core count

Supported:

  • HVG: Seurat, Seurat V3, Cell Ranger, Pearson residuals
  • Stats: Mann-Whitney U, Welch's t-test, Student's t-test, MMD

3️⃣ Numba Optimization Toolkit

The secret sauce: tools that make Numba JIT outperform hand-written C++.

from biosparse.optim import parallel_jit, assume, vectorize, likely

@parallel_jit
def my_kernel(csr):
    assume(csr.nrows > 0)  # Enable compiler optimizations
    
    for row in prange(csr.nrows):
        values, indices = csr.row_to_numpy(row)
        
        vectorize(8)  # SIMD hint
        for v in values:
            if likely(v > 0):  # Branch prediction
                # ...

Includes:

  • LLVM intrinsics: assume, likely, unlikely, prefetch
  • Loop hints: vectorize, unroll, interleave, distribute
  • Complete tutorial - 7 chapters from basics to expert

Quick Start

pip install biosparse
from biosparse import CSRF64
from biosparse.kernel import hvg

# Load your data
import scanpy as sc
adata = sc.read_h5ad("data.h5ad")

# Convert (zero-copy)
csr = CSRF64.from_scipy(adata.X.T)

# 100x faster HVG selection
indices, mask, *_ = hvg.hvg_seurat_v3(csr, n_top_genes=2000)

# Use with scanpy
adata.var['highly_variable'] = mask.astype(bool)

Documentation

Resource Description
Tutorial 7-chapter guide: from basics to outperforming C++
Sparse API CSR/CSC matrix reference
Kernels HVG, MWU, t-test documentation
Optimization LLVM intrinsics & loop hints

License

MIT


Sparse. Fast. Biological.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published