Summary
Several performance claims in TRADEOFFS.md and README.md were written based on criterion benchmarks that have since been replaced with zenbench (PR #8). A fresh benchmark run shows some claims no longer hold — likely due to code layout effects in the original criterion harness that zenbench's interleaved measurement avoids.
Claims that need correction
TRADEOFFS.md:
- "StopToken(Stopper) at 2.57µs beats generic impl Stop at 3.41µs" — now reversed (impl_stop 3.1µs, stoptoken 4.1µs in
hot_loop_stopper)
- "For Stopper, impl Stop is the slowest path" — now the fastest
- "Don't recommend impl Stop for hot inner functions" — performance justification doesn't hold; recommendation may still be fine on ergonomic grounds
README.md:
- "25% faster than generic for Stopper" — not reproducible; StopToken is slightly slower in micro hot loops
Claims that hold
- WithTimeout ~16ns (confirmed 16.5ns)
- Type Overview table timings (all confirmed)
- may_stop().then_some() matches StopToken for Unstoppable (confirmed, actually understated — dyn_may_stop is the fastest path)
- Codec-realistic benchmarks: all variants within 2% (confirmed at 17.5–17.9 GiB/s)
- DebouncedTimeout 10x faster than WithTimeout (confirmed: 595M vs 58M checks/s)
Root cause
The old criterion benchmark measured each variant in its own function with criterion_group! / criterion_main!. Different functions land at different instruction addresses, causing code layout bias (Mytkowicz et al., ASPLOS 2009). The stop_check_zen benchmark already accounted for this by routing all variants through a single #[inline(never)] fn decode(&dyn Stop), and its results (all within noise for real codec work) were always correct.
The micro hot-loop benchmarks (hot_loop_stopper, hot_loop_unstoppable) still have inherent layout sensitivity because each variant is a separate closure. The relative ordering between runs may flip. The key takeaway remains: for real codec workloads, the dispatch path doesn't matter.
Action items
Summary
Several performance claims in TRADEOFFS.md and README.md were written based on criterion benchmarks that have since been replaced with zenbench (PR #8). A fresh benchmark run shows some claims no longer hold — likely due to code layout effects in the original criterion harness that zenbench's interleaved measurement avoids.
Claims that need correction
TRADEOFFS.md:
hot_loop_stopper)README.md:
Claims that hold
Root cause
The old criterion benchmark measured each variant in its own function with
criterion_group!/criterion_main!. Different functions land at different instruction addresses, causing code layout bias (Mytkowicz et al., ASPLOS 2009). Thestop_check_zenbenchmark already accounted for this by routing all variants through a single#[inline(never)] fn decode(&dyn Stop), and its results (all within noise for real codec work) were always correct.The micro hot-loop benchmarks (
hot_loop_stopper,hot_loop_unstoppable) still have inherent layout sensitivity because each variant is a separate closure. The relative ordering between runs may flip. The key takeaway remains: for real codec workloads, the dispatch path doesn't matter.Action items
stop_check(ported) benchmark adds value beyondstop_check_zen, or should be removed to avoid generating misleading micro-numbers