🧬 biosparse

Sparse matrices. Reimagined for biology.

1000x faster than scipy. 10-100x faster than scanpy.
Zero-cost slicing. Numba-native. Production-ready.

Why biosparse?

biosparse is built on three pillars:

1️⃣ Biology-First Sparse Matrices

A custom sparse matrix format designed for how biologists actually work:

Zero-cost slicing & stacking - Subset genes/cells without copying data
scipy/numpy compatible - from_scipy(), to_scipy(), works with your existing code
Memory efficient - Views instead of copies, reduced memory footprint

from biosparse import CSRF64
import scipy.sparse as sp

# From scipy (zero-copy available)
csr = CSRF64.from_scipy(scipy_mat, copy=False)

# Zero-cost operations
subset = csr[1000:2000, :]           # No data copy
stacked = CSRF64.vstack([csr1, csr2])  # Efficient concatenation

# Back to scipy when needed
scipy_mat = csr.to_scipy()

2️⃣ High-Performance Kernels

Battle-tested algorithms built on our sparse matrix, compiled with Numba JIT:

Algorithm	vs scipy	vs scanpy
Sparse nonlinear ops	1,000 - 10,000x	-
HVG selection	-	10 - 100x
Mann-Whitney U	-	10 - 100x
t-test	-	10 - 100x

Speedup scales with core count

Supported:

HVG: Seurat, Seurat V3, Cell Ranger, Pearson residuals
Stats: Mann-Whitney U, Welch's t-test, Student's t-test, MMD

3️⃣ Numba Optimization Toolkit

The secret sauce: tools that make Numba JIT outperform hand-written C++.

from biosparse.optim import parallel_jit, assume, vectorize, likely

@parallel_jit
def my_kernel(csr):
    assume(csr.nrows > 0)  # Enable compiler optimizations
    
    for row in prange(csr.nrows):
        values, indices = csr.row_to_numpy(row)
        
        vectorize(8)  # SIMD hint
        for v in values:
            if likely(v > 0):  # Branch prediction
                # ...

Includes:

LLVM intrinsics: assume, likely, unlikely, prefetch
Loop hints: vectorize, unroll, interleave, distribute
Complete tutorial - 7 chapters from basics to expert

Quick Start

pip install biosparse

from biosparse import CSRF64
from biosparse.kernel import hvg

# Load your data
import scanpy as sc
adata = sc.read_h5ad("data.h5ad")

# Convert (zero-copy)
csr = CSRF64.from_scipy(adata.X.T)

# 100x faster HVG selection
indices, mask, *_ = hvg.hvg_seurat_v3(csr, n_top_genes=2000)

# Use with scanpy
adata.var['highly_variable'] = mask.astype(bool)

Documentation

Resource	Description
Tutorial	7-chapter guide: from basics to outperforming C++
Sparse API	CSR/CSC matrix reference
Kernels	HVG, MWU, t-test documentation
Optimization	LLVM intrinsics & loop hints

License

MIT

Sparse. Fast. Biological.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
scripts		scripts
src		src
tests		tests
tutorial		tutorial
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
proposal.md		proposal.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 biosparse

Why biosparse?

1️⃣ Biology-First Sparse Matrices

2️⃣ High-Performance Kernels

3️⃣ Numba Optimization Toolkit

Quick Start

Documentation

License

About

Uh oh!

Releases

Packages

Languages

License

krkawzq/biosparse

Folders and files

Latest commit

History

Repository files navigation

🧬 biosparse

Why biosparse?

1️⃣ Biology-First Sparse Matrices

2️⃣ High-Performance Kernels

3️⃣ Numba Optimization Toolkit

Quick Start

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages