Add Track 4: Numpy vs Torch tensor conversion benchmark #28

Copilot · 2026-02-04T21:32:19Z

The PyTorch benchmark measures end-to-end performance but doesn't separate numpy array loading from tensor conversion overhead. This makes it difficult to identify whether I/O or conversion is the bottleneck.

Changes

New Dataset class: OMEArrowDatasetNumpy

Returns np.ndarray instead of torch.Tensor
Enables isolated measurement of numpy loading time

New benchmark: benchmark_numpy_vs_torch()
Measures three operations separately:

Numpy array loading (Dataset → numpy)
Tensor conversion (torch.from_numpy().float())
Total time (Dataset → torch)

Metrics tracked:

p50/p95/p99 latencies for each operation
Conversion overhead as percentage of total time
Throughput (samples/sec) for numpy vs torch

Integration:

Runs as Track 4 alongside existing tracks
Results: data/pytorch_benchmark_track4.parquet
Plots: Time breakdown and conversion overhead percentage

Example output

[Track 4] format=Parquet
  Run 1/3:
    Numpy:      p50=0.042ms, throughput=23809.5 samples/s
    Conversion: p50=0.008ms, overhead=16.0%
    Torch:      p50=0.050ms, throughput=20000.0 samples/s

Expected conversion overhead: 10-20% for table formats, higher for small images (20-40%), lower for large images (5-10%).

When overhead >30%: Consider zero-copy operations, batched conversions, or staying in torch throughout pipeline.

When overhead <10%: Focus on I/O optimization (storage, caching, workers) rather than conversion.

Original prompt

This section details on the original issue you should resolve

<issue_title>Add pytorch benchmark</issue_title>
<issue_description># PyTorch-Focused Benchmarking for Image-Based Profiling File Format

Context

We already maintain baseline benchmarks for the file format itself
(e.g., storage size, raw read/write throughput, sequential I/O).
This effort should focus only on PyTorch-facing performance:
how the format behaves when accessed via torch.utils.data.Dataset
and DataLoader, and how that affects real training or inference workloads.

The intent is to benchmark what users actually experience when using
the format in PyTorch-based image profiling pipelines.

Goals

Design and implement a benchmark suite that answers:

How fast and stable is Dataset.__getitem__ under realistic access patterns?

How does performance scale with common PyTorch DataLoader settings?

Does the format reduce data-loading stalls in end-to-end model workflows?

Non-goals

Repeating generic file-format benchmarks (on-disk size, raw I/O MB/s)

Evaluating or optimizing model accuracy

GPU kernel or model architecture benchmarking

Benchmark Scope

Track 1 — Dataset / __getitem__ Microbenchmark

Evaluate the behavior of the Dataset implementation itself.

Access patterns

Random object-level access (e.g., random object IDs)

Grouped access (e.g., all objects from a site, well, or contiguous range)

Optional: paired reads per sample (two views, contrastive-style workflows)

Metrics

__getitem__ latency: p50 / p95 / p99

Samples per second in a tight loop

Warm-up vs steady-state behavior

Configuration dimensions

Crop size and shape

Channel selection

Output dtype

Decode path

Transform tier:

none (I/O ceiling)

light (normalize, resize)

typical (domain-relevant light augmentation)

Output

Structured, machine-readable results (JSON or CSV)

One row per run, including configuration and environment metadata

Track 2 — PyTorch DataLoader Throughput

Measure performance at the DataLoader output boundary.

Parameters to explore

num_workers

batch_size

pin_memory

persistent_workers

prefetch_factor (when applicable)

Metrics

Samples per second

Batch time distribution (p50 / p95)

First-batch latency (worker startup overhead)

Modes

I/O-only transforms

Typical profiling transforms

Track 3 — End-to-End Fixed-Step Model Loop

Evaluate the format in a realistic PyTorch workload.

Workloads

Embedding extraction (forward pass only), or

Small, standard training loop using a simple reference model

Controls

Fixed number of steps (not epochs)

Fixed random seed

Fixed input shape and model

Optional AMP (must be consistent across runs)

Metrics

Step time (p50 / p95)

Images per second

Fraction of time waiting on data (input stall proxy)

Optional: GPU utilization (best-effort)

Dataset & Sampling Expectations

Access patterns should reflect image-based profiling use cases:

object-level crops

grouped site/well reads

Sampling should be deterministic when seeded

Support both:

small subsets for CI or smoke testing

full datasets for real benchmarking

Reproducibility & Reporting

Each benchmark run should capture:

Configuration parameters used

Random seed

Timestamp

Software versions (PyTorch, CUDA, Python)

Hardware summary (CPU, RAM, GPU if applicable)

Storage type if detectable

Results should be easy to aggregate and plot across runs.

Acceptance Criteria

Benchmarks produce structured, machine-readable output

Results clearly separate:

Dataset-level costs

DataLoader-level scaling effects

End-to-end training/inference behavior

Documentation explains:

how to run benchmarks

how to interpret reported metrics

A minimal configuration exists for quick execution in CI or local testing

We generate a plot similar to what's already generated.

We place the code in similar format and placement in the repo in alignment with other files.

Notes & Risks

Transform cost must be clearly separated from I/O cost

Warm-up effects should be measured and reported

Random-access performance may be affected by OS caching;
first-pass vs steady-state behavior should be distinguished where possible
</issue_description>

<agent_instructions>Please work on this in alignment with other work which has occurred in this project.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes Add pytorch benchmark #23

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

…mats are generated Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

d33bs · 2026-02-06T21:53:32Z

This was an experiment with GH copilot. Closing this for now with a later follow up which is more careful.

Initial plan

63cf0f7

Copilot AI assigned Copilot and d33bs Feb 4, 2026

Copilot started work on behalf of d33bs February 4, 2026 21:32 View session

Copilot AI and others added 3 commits February 4, 2026 21:35

Add PyTorch benchmark implementation with 3 tracks

bc64106

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Add documentation and fix JSON serialization in PyTorch benchmark

1201cdb

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Address code review feedback

e6f9e9a

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add PyTorch benchmarking for image-based profiling~~ Add PyTorch DataLoader benchmarks for OME-Arrow format Feb 4, 2026

Copilot AI requested a review from d33bs February 4, 2026 21:50

Copilot finished work on behalf of d33bs February 4, 2026 21:50

Copilot started work on behalf of d33bs February 4, 2026 22:05 View session

Copilot AI and others added 3 commits February 4, 2026 22:10

Add support for all file formats and improve plot labels

52e690b

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Make Vortex optional and test all format support

a71b90a

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Update documentation to reflect all supported formats

9ff4289

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Copilot AI changed the title ~~Add PyTorch DataLoader benchmarks for OME-Arrow format~~ Add missing formats and improve label readability in PyTorch benchmark Feb 4, 2026

Copilot finished work on behalf of d33bs February 4, 2026 22:18

Copilot started work on behalf of d33bs February 4, 2026 22:23 View session

Copilot AI and others added 2 commits February 4, 2026 22:27

Optimize DataLoader performance with pixel caching and ensure all for…

d25998c

…mats are generated Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Add comprehensive documentation for performance optimization

685a8ae

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Copilot AI changed the title ~~Add missing formats and improve label readability in PyTorch benchmark~~ Optimize DataLoader performance and fix missing format data generation Feb 4, 2026

Copilot finished work on behalf of d33bs February 4, 2026 22:32

Copilot started work on behalf of d33bs February 6, 2026 21:02 View session

Copilot AI and others added 4 commits February 6, 2026 21:07

Add Track 4: Numpy vs Torch tensor conversion benchmark

9c9c527

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Update documentation for Track 4 numpy vs torch benchmark

c2d9a88

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Add comprehensive Track 4 implementation documentation and test script

80730a3

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Add Track 4 quick reference guide

5be7400

Co-authored-by: d33bs <3738008+d33bs@users.noreply.github.com>

Copilot AI changed the title ~~Optimize DataLoader performance and fix missing format data generation~~ Add Track 4: Numpy vs Torch tensor conversion benchmark Feb 6, 2026

Copilot finished work on behalf of d33bs February 6, 2026 21:12

d33bs closed this Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Track 4: Numpy vs Torch tensor conversion benchmark #28

Add Track 4: Numpy vs Torch tensor conversion benchmark #28

Uh oh!

Copilot AI commented Feb 4, 2026 •

edited

Loading

Uh oh!

d33bs commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Track 4: Numpy vs Torch tensor conversion benchmark #28

Add Track 4: Numpy vs Torch tensor conversion benchmark #28

Uh oh!

Conversation

Copilot AI commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example output

Context

Goals

Non-goals

Benchmark Scope

Track 1 — Dataset / __getitem__ Microbenchmark

Access patterns

Metrics

Configuration dimensions

Output

Track 2 — PyTorch DataLoader Throughput

Parameters to explore

Metrics

Modes

Track 3 — End-to-End Fixed-Step Model Loop

Workloads

Controls

Metrics

Dataset & Sampling Expectations

Reproducibility & Reporting

Acceptance Criteria

Notes & Risks

Comments on the Issue (you are @copilot in this section)

Uh oh!

d33bs commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 4, 2026 •

edited

Loading

Track 1 — Dataset / `getitem` Microbenchmark