Skip to content

Flow matching primitives (ndarray-first) with semidiscrete FM + RFM experiments

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

arclabs561/flowmatch

Repository files navigation

flowmatch

Flow matching as a small, backend-agnostic Rust library primitive.

This crate exists to make a minimal, readable reference implementation for a few flow-matching variants that are useful as building blocks in larger pipelines (and as a testbed for evaluation).

This crate currently focuses on a semidiscrete flow matching setup:

  • Discrete target support: a finite set of prototypes (y_j) with weights (b_j).
  • Semidiscrete conditioning / assignment: uses wass::semidiscrete potentials + hard assignment to pick an index (j).
  • Flow matching regression: trains a conditional vector field (v_\theta(x,t; y_j)) against a simple linear-path target.

Code entrypoints:

  • flowmatch::sd_fm::train_sd_fm_semidiscrete_linear
  • flowmatch::sd_fm::TrainedSdFm::{sample,sample_with_x0}
  • flowmatch::linear::LinearCondField (intentionally “boring baseline”)

Related primitives implemented in this crate:

  • flowmatch::ode: fixed-step ODE samplers (Euler, Heun)
  • flowmatch::rfm: coupling helpers for rectified / OT-based pairing
  • flowmatch::simplex: simplex validation + Dirichlet sampling (simplex-based “discrete FM” scaffolding)
  • flowmatch::discrete_ctmc: CTMC generator validation + a minimal probability evolution step
  • flowmatch::non_euclidean: geodesic interpolant scaffolding (currently includes only Euclidean baseline)

Related (adjacent meaning of “distribution matching”):

  • decipher/: symbolic distribution matching for classical text deciphers (letter-frequency scoring, etc.). See canon/topics/distribution-matching.md.

Best starting points (examples)

  • Semidiscrete FM baseline: sd_fm_semidiscrete_linear (end-to-end, intentionally simple).
  • RFM on real geodata: rfm_usgs_full_pipeline_report (flowmatch + tier + jin; includes metrics + timings).
  • Cluster-mass evaluation: rfm_usgs_earthquakes_cluster_mass (structure-aware scoring; tier-evals feature).

References (why this crate is called flowmatch)

These are the conceptual anchors for the objective + design space:

  • Lipman et al., Flow Matching for Generative Modeling (arXiv:2210.02747).
    Link: arXiv
  • Lipman et al., Flow Matching Guide and Code (arXiv:2412.06264).
    Link: arXiv

Also useful as an applications-oriented map (especially for discrete / non-Euclidean variants):

  • Li et al., Flow Matching Meets Biology and Life Science: A Survey (arXiv:2507.17731, 2025).
    Link: arXiv
    Curated resources: Awesome list

Extensions (related objectives)

  • Chen & Lipman, Flow Matching on General Geometries (arXiv:2302.03660) — Riemannian FM
    Link: arXiv
  • Dao et al., Flow Matching in Latent Space (arXiv:2307.08698) — latent FM and guidance
    Link: arXiv
  • Gat et al., Discrete Flow Matching (NeurIPS 2024) — discrete state spaces
    Link: NeurIPS
  • Klein et al., Equivariant Flow Matching (NeurIPS 2023) — symmetry/equivariance constraints
    Link: NeurIPS

Running the demo

cargo run -p flowmatch --example sd_fm_semidiscrete_linear

RFM minibatch OT demo:

cargo run -p flowmatch --example rfm_minibatch_ot_linear

RFM demo on token embeddings + TF-IDF-ish weights:

cargo run -p flowmatch --example rfm_textish_tokens

RFM demo on real USGS earthquake locations (sphere-ish geodata):

cargo run -p flowmatch --example rfm_usgs_earthquakes_sphere

RFM demo on real USGS earthquake locations, evaluated via cluster-mass structure (uses tier):

cargo run -p flowmatch --example rfm_usgs_earthquakes_cluster_mass --features tier-evals

Full engine composition demo (flowmatch + tier + jin): kNN graph → Leiden communities:

cargo run -p flowmatch --example rfm_usgs_knn_leiden --features tier-evals

Full pipeline report (all metrics + timings, including deterministic exact-kNN Leiden and optional HNSW-kNN):

cargo run -p flowmatch --example rfm_usgs_full_pipeline_report --features tier-evals

NFE/steps curve (paper-style “few-step” evaluation):

cargo run -p flowmatch --example rfm_usgs_nfe_curve

Solver NFE tradeoff (Euler vs Heun under equal evaluation budgets):

cargo run -p flowmatch --example rfm_usgs_solver_nfe_tradeoff

Protein torsions NFE/steps curve (seed-averaged, Ramachandran JS):

cargo run -p flowmatch --example rfm_torsions_nfe_curve

Minibatch OT outlier forcing + partial pairing mitigation:

cargo run -p flowmatch --example rfm_minibatch_outlier_partial

Controls:

  • FLOWMATCH_PAIRING=partial_rowwise uses RfmMinibatchPairing::PartialRowwise
  • FLOWMATCH_PAIRING=sinkhorn_selective, uses Sinkhorn then selective matching
  • FLOWMATCH_PAIRING_PARTIAL_KEEP_FRAC=0.8 controls the fraction of rows that are forced one-to-one

Speed knobs for the full pipeline report:

# Default (highest quality): Sinkhorn pairing every step.

# Faster: reuse Sinkhorn pairing for 4 SGD steps (usually ~4× faster coupling).
FLOWMATCH_PAIRING_EVERY=4 cargo run -p flowmatch --example rfm_usgs_full_pipeline_report

# Fastest: no Sinkhorn at all (row-wise nearest pairing).
FLOWMATCH_PAIRING=rowwise cargo run -p flowmatch --example rfm_usgs_full_pipeline_report

# U-shaped timestep sampling (more weight near t=0 and t=1)
FLOWMATCH_T_SCHEDULE=ushaped cargo run -p flowmatch --example rfm_usgs_full_pipeline_report

RFM demo on real protein φ/ψ torsions (a torus-shaped domain, scored via Ramachandran JS divergence):

cargo run -p flowmatch --example rfm_protein_torsions_1bpi

Timing breakdown (“poor man's profiling”): where time goes (sampling vs Sinkhorn vs SGD):

cargo run -p flowmatch --example profile_breakdown_usgs
cargo run -p flowmatch --example profile_breakdown_torsions

Running tests

cargo test -p flowmatch

Burn backend (opt-in)

flowmatch is ndarray-only by default, but it now includes an opt-in Burn-backed Euclidean FM module behind the burn feature (see flowmatch::burn_euclidean / flowmatch::burn_sd_fm).

cargo test -p flowmatch --features burn

Run the Burn-backed toy examples:

cargo run -p flowmatch --example burn_sd_fm_semidiscrete_linear --features burn
cargo run -p flowmatch --example burn_rfm_minibatch_ot_linear --features burn

About

Flow matching primitives (ndarray-first) with semidiscrete FM + RFM experiments

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages