Skip to content

Re-verify performance claims with zenbench #9

@lilith

Description

@lilith

Summary

Several performance claims in TRADEOFFS.md and README.md were written based on criterion benchmarks that have since been replaced with zenbench (PR #8). A fresh benchmark run shows some claims no longer hold — likely due to code layout effects in the original criterion harness that zenbench's interleaved measurement avoids.

Claims that need correction

TRADEOFFS.md:

  • "StopToken(Stopper) at 2.57µs beats generic impl Stop at 3.41µs" — now reversed (impl_stop 3.1µs, stoptoken 4.1µs in hot_loop_stopper)
  • "For Stopper, impl Stop is the slowest path" — now the fastest
  • "Don't recommend impl Stop for hot inner functions" — performance justification doesn't hold; recommendation may still be fine on ergonomic grounds

README.md:

  • "25% faster than generic for Stopper" — not reproducible; StopToken is slightly slower in micro hot loops

Claims that hold

  • WithTimeout ~16ns (confirmed 16.5ns)
  • Type Overview table timings (all confirmed)
  • may_stop().then_some() matches StopToken for Unstoppable (confirmed, actually understated — dyn_may_stop is the fastest path)
  • Codec-realistic benchmarks: all variants within 2% (confirmed at 17.5–17.9 GiB/s)
  • DebouncedTimeout 10x faster than WithTimeout (confirmed: 595M vs 58M checks/s)

Root cause

The old criterion benchmark measured each variant in its own function with criterion_group! / criterion_main!. Different functions land at different instruction addresses, causing code layout bias (Mytkowicz et al., ASPLOS 2009). The stop_check_zen benchmark already accounted for this by routing all variants through a single #[inline(never)] fn decode(&dyn Stop), and its results (all within noise for real codec work) were always correct.

The micro hot-loop benchmarks (hot_loop_stopper, hot_loop_unstoppable) still have inherent layout sensitivity because each variant is a separate closure. The relative ordering between runs may flip. The key takeaway remains: for real codec workloads, the dispatch path doesn't matter.

Action items

  • Update TRADEOFFS.md to remove specific µs numbers from hot-loop claims and focus on the layout-immune codec findings
  • Update README.md to remove the "25% faster" claim; replace with "within noise for real workloads"
  • Consider whether the stop_check (ported) benchmark adds value beyond stop_check_zen, or should be removed to avoid generating misleading micro-numbers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions