-
Notifications
You must be signed in to change notification settings - Fork 1
Description
PyTorch-Focused Benchmarking for Image-Based Profiling File Format
Context
We already maintain baseline benchmarks for the file format itself
(e.g., storage size, raw read/write throughput, sequential I/O).
This effort should focus only on PyTorch-facing performance:
how the format behaves when accessed via torch.utils.data.Dataset
and DataLoader, and how that affects real training or inference workloads.
The intent is to benchmark what users actually experience when using
the format in PyTorch-based image profiling pipelines.
Goals
Design and implement a benchmark suite that answers:
- How fast and stable is
Dataset.__getitem__under realistic access patterns? - How does performance scale with common PyTorch
DataLoadersettings? - Does the format reduce data-loading stalls in end-to-end model workflows?
Non-goals
- Repeating generic file-format benchmarks (on-disk size, raw I/O MB/s)
- Evaluating or optimizing model accuracy
- GPU kernel or model architecture benchmarking
Benchmark Scope
Track 1 — Dataset / __getitem__ Microbenchmark
Evaluate the behavior of the Dataset implementation itself.
Access patterns
- Random object-level access (e.g., random object IDs)
- Grouped access (e.g., all objects from a site, well, or contiguous range)
- Optional: paired reads per sample (two views, contrastive-style workflows)
Metrics
__getitem__latency: p50 / p95 / p99- Samples per second in a tight loop
- Warm-up vs steady-state behavior
Configuration dimensions
- Crop size and shape
- Channel selection
- Output dtype
- Decode path
- Transform tier:
- none (I/O ceiling)
- light (normalize, resize)
- typical (domain-relevant light augmentation)
Output
- Structured, machine-readable results (JSON or CSV)
- One row per run, including configuration and environment metadata
Track 2 — PyTorch DataLoader Throughput
Measure performance at the DataLoader output boundary.
Parameters to explore
num_workersbatch_sizepin_memorypersistent_workersprefetch_factor(when applicable)
Metrics
- Samples per second
- Batch time distribution (p50 / p95)
- First-batch latency (worker startup overhead)
Modes
- I/O-only transforms
- Typical profiling transforms
Track 3 — End-to-End Fixed-Step Model Loop
Evaluate the format in a realistic PyTorch workload.
Workloads
- Embedding extraction (forward pass only), or
- Small, standard training loop using a simple reference model
Controls
- Fixed number of steps (not epochs)
- Fixed random seed
- Fixed input shape and model
- Optional AMP (must be consistent across runs)
Metrics
- Step time (p50 / p95)
- Images per second
- Fraction of time waiting on data (input stall proxy)
- Optional: GPU utilization (best-effort)
Dataset & Sampling Expectations
- Access patterns should reflect image-based profiling use cases:
- object-level crops
- grouped site/well reads
- Sampling should be deterministic when seeded
- Support both:
- small subsets for CI or smoke testing
- full datasets for real benchmarking
Reproducibility & Reporting
Each benchmark run should capture:
- Configuration parameters used
- Random seed
- Timestamp
- Software versions (PyTorch, CUDA, Python)
- Hardware summary (CPU, RAM, GPU if applicable)
- Storage type if detectable
Results should be easy to aggregate and plot across runs.
Acceptance Criteria
- Benchmarks produce structured, machine-readable output
- Results clearly separate:
- Dataset-level costs
- DataLoader-level scaling effects
- End-to-end training/inference behavior
- Documentation explains:
- how to run benchmarks
- how to interpret reported metrics
- A minimal configuration exists for quick execution in CI or local testing
- We generate a plot similar to what's already generated.
- We place the code in similar format and placement in the repo in alignment with other files.
Notes & Risks
- Transform cost must be clearly separated from I/O cost
- Warm-up effects should be measured and reported
- Random-access performance may be affected by OS caching;
first-pass vs steady-state behavior should be distinguished where possible