Skip to content

Add pytorch benchmark #23

@d33bs

Description

@d33bs

PyTorch-Focused Benchmarking for Image-Based Profiling File Format

Context

We already maintain baseline benchmarks for the file format itself
(e.g., storage size, raw read/write throughput, sequential I/O).
This effort should focus only on PyTorch-facing performance:
how the format behaves when accessed via torch.utils.data.Dataset
and DataLoader, and how that affects real training or inference workloads.

The intent is to benchmark what users actually experience when using
the format in PyTorch-based image profiling pipelines.


Goals

Design and implement a benchmark suite that answers:

  1. How fast and stable is Dataset.__getitem__ under realistic access patterns?
  2. How does performance scale with common PyTorch DataLoader settings?
  3. Does the format reduce data-loading stalls in end-to-end model workflows?

Non-goals

  • Repeating generic file-format benchmarks (on-disk size, raw I/O MB/s)
  • Evaluating or optimizing model accuracy
  • GPU kernel or model architecture benchmarking

Benchmark Scope

Track 1 — Dataset / __getitem__ Microbenchmark

Evaluate the behavior of the Dataset implementation itself.

Access patterns

  • Random object-level access (e.g., random object IDs)
  • Grouped access (e.g., all objects from a site, well, or contiguous range)
  • Optional: paired reads per sample (two views, contrastive-style workflows)

Metrics

  • __getitem__ latency: p50 / p95 / p99
  • Samples per second in a tight loop
  • Warm-up vs steady-state behavior

Configuration dimensions

  • Crop size and shape
  • Channel selection
  • Output dtype
  • Decode path
  • Transform tier:
    • none (I/O ceiling)
    • light (normalize, resize)
    • typical (domain-relevant light augmentation)

Output

  • Structured, machine-readable results (JSON or CSV)
  • One row per run, including configuration and environment metadata

Track 2 — PyTorch DataLoader Throughput

Measure performance at the DataLoader output boundary.

Parameters to explore

  • num_workers
  • batch_size
  • pin_memory
  • persistent_workers
  • prefetch_factor (when applicable)

Metrics

  • Samples per second
  • Batch time distribution (p50 / p95)
  • First-batch latency (worker startup overhead)

Modes

  • I/O-only transforms
  • Typical profiling transforms

Track 3 — End-to-End Fixed-Step Model Loop

Evaluate the format in a realistic PyTorch workload.

Workloads

  • Embedding extraction (forward pass only), or
  • Small, standard training loop using a simple reference model

Controls

  • Fixed number of steps (not epochs)
  • Fixed random seed
  • Fixed input shape and model
  • Optional AMP (must be consistent across runs)

Metrics

  • Step time (p50 / p95)
  • Images per second
  • Fraction of time waiting on data (input stall proxy)
  • Optional: GPU utilization (best-effort)

Dataset & Sampling Expectations

  • Access patterns should reflect image-based profiling use cases:
    • object-level crops
    • grouped site/well reads
  • Sampling should be deterministic when seeded
  • Support both:
    • small subsets for CI or smoke testing
    • full datasets for real benchmarking

Reproducibility & Reporting

Each benchmark run should capture:

  • Configuration parameters used
  • Random seed
  • Timestamp
  • Software versions (PyTorch, CUDA, Python)
  • Hardware summary (CPU, RAM, GPU if applicable)
  • Storage type if detectable

Results should be easy to aggregate and plot across runs.


Acceptance Criteria

  • Benchmarks produce structured, machine-readable output
  • Results clearly separate:
    • Dataset-level costs
    • DataLoader-level scaling effects
    • End-to-end training/inference behavior
  • Documentation explains:
    • how to run benchmarks
    • how to interpret reported metrics
  • A minimal configuration exists for quick execution in CI or local testing
  • We generate a plot similar to what's already generated.
  • We place the code in similar format and placement in the repo in alignment with other files.

Notes & Risks

  • Transform cost must be clearly separated from I/O cost
  • Warm-up effects should be measured and reported
  • Random-access performance may be affected by OS caching;
    first-pass vs steady-state behavior should be distinguished where possible

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions