Skip to content

Conversation

@floatdrop
Copy link
Contributor

std.http.HeadParser is good baseline to bench with – but I have no ideas, why hyperbench shows this results:

Benchmark 1: ./hparse/zig-out/bin/hparse
  Time (mean ± σ):      2.015 s ±  0.126 s    [User: 1.944 s, System: 0.068 s]
  Range (min … max):    1.848 s …  2.221 s    10 runs

Benchmark 2: ./picohttpparser/picohttpparser
  Time (mean ± σ):      1.252 s ±  0.016 s    [User: 1.215 s, System: 0.035 s]
  Range (min … max):    1.228 s …  1.271 s    10 runs

Benchmark 3: ./bench-httparse/target/release/bench-httparse
  Time (mean ± σ):     888.5 ms ± 161.1 ms    [User: 816.0 ms, System: 23.2 ms]
  Range (min … max):   821.5 ms … 1346.1 ms    10 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (1.346 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 4: ./headparser/zig-out/bin/headparser
  Time (mean ± σ):     660.0 ms ±  14.8 ms    [User: 639.3 ms, System: 19.7 ms]
  Range (min … max):   630.8 ms … 679.7 ms    10 runs

Summary
  ./headparser/zig-out/bin/headparser ran
    1.35 ± 0.25 times faster than ./bench-httparse/target/release/bench-httparse
    1.90 ± 0.05 times faster than ./picohttpparser/picohttpparser
    3.05 ± 0.20 times faster than ./hparse/zig-out/bin/hparse

Maybe because I'm running on M3 and some vectorization are not available on my CPU.

@nikneym
Copy link
Owner

nikneym commented Oct 18, 2025

Interesting find, the normal benchmarks also perform worse on Apple Silicon. I was using an AMD64 Linux machine while developing/testing the module.

Let's keep this PR open; I'll try to figure out why it performs worse on such platforms. (#3)

@floatdrop
Copy link
Contributor Author

I finally obtained a proper linux machine:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 117
model name      : AMD Ryzen 7 8745HS w/ Radeon 780M Graphics
stepping        : 2
microcode       : 0xa705208
cpu MHz         : 4826.095
cache size      : 1024 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso ibpb_no_ret spectre_v2_user tsa vmscape
bogomips        : 7585.60
TLB size        : 3584 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] [15]

But benchmark results are still in favor of std.http.Headparser:

Benchmark 1 (44 runs): ./picohttpparser/picohttpparser
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.14s  ± 48.2ms    1.06s  … 1.28s           0 ( 0%)        0%
  peak_rss           1.24MB ± 33.9KB    1.11MB … 1.33MB         10 (23%)        0%
  cpu_cycles         5.54G  ±  238M     5.17G  … 6.20G           0 ( 0%)        0%
  instructions       34.7G  ±  102      34.7G  … 34.7G           0 ( 0%)        0%
  cache_references    107K  ± 20.1K     73.0K  …  196K           6 (14%)        0%
  cache_misses       3.95K  ±  347      3.58K  … 5.50K           4 ( 9%)        0%
  branch_misses      15.8K  ± 1.38K     14.2K  … 21.4K           6 (14%)        0%
Benchmark 2 (38 runs): ./bench-httparse/target/release/bench-httparse
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.33s  ± 16.5ms    1.30s  … 1.38s           0 ( 0%)        💩+ 16.6% ±  1.4%
  peak_rss           2.12MB ± 91.1KB    1.92MB … 2.25MB          0 ( 0%)        💩+ 70.4% ±  2.4%
  cpu_cycles         6.44G  ± 79.8M     6.34G  … 6.65G           0 ( 0%)        💩+ 16.3% ±  1.5%
  instructions       25.2G  ±  284      25.2G  … 25.2G           0 ( 0%)        ⚡- 27.4% ±  0.0%
  cache_references    209K  ± 14.3K      186K  …  244K           1 ( 3%)        💩+ 94.6% ±  7.2%
  cache_misses       7.00K  ±  622      6.20K  … 8.68K           0 ( 0%)        💩+ 77.2% ±  5.5%
  branch_misses      18.0K  ±  545      17.3K  … 20.2K           2 ( 5%)        💩+ 13.9% ±  3.0%
Benchmark 3 (50 runs): ./hparse/zig-out/bin/hparse
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.01s  ± 9.75ms    1.01s  … 1.04s           6 (12%)        ⚡- 11.3% ±  1.2%
  peak_rss            811KB ±    0       811KB …  811KB          0 ( 0%)        ⚡- 34.8% ±  0.8%
  cpu_cycles         4.91G  ± 1.68M     4.91G  … 4.91G           3 ( 6%)        ⚡- 11.4% ±  1.2%
  instructions       8.28G  ± 28.7      8.28G  … 8.28G           8 (16%)        ⚡- 76.1% ±  0.0%
  cache_references   39.4K  ± 10.0K     32.6K  … 91.5K           6 (12%)        ⚡- 63.2% ±  6.0%
  cache_misses        285   ±  260        84   … 1.82K           4 ( 8%)        ⚡- 92.8% ±  3.2%
  branch_misses      4.80K  ± 1.29K     3.97K  … 11.0K           3 ( 6%)        ⚡- 69.6% ±  3.5%
Benchmark 4 (79 runs): ./headparser/zig-out/bin/headparser
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           641ms ± 5.60ms     636ms …  666ms          5 ( 6%)        ⚡- 43.8% ±  0.9%
  peak_rss            811KB ±    0       811KB …  811KB          0 ( 0%)        ⚡- 34.8% ±  0.6%
  cpu_cycles         3.10G  ± 7.38M     3.09G  … 3.14G           5 ( 6%)        ⚡- 44.0% ±  0.9%
  instructions       19.7G  ± 22.1      19.7G  … 19.7G           8 (10%)        ⚡- 43.2% ±  0.0%
  cache_references   40.9K  ± 14.7K     31.0K  …  110K          10 (13%)        ⚡- 61.9% ±  5.8%
  cache_misses        260   ±  146        55   … 1.04K           3 ( 4%)        ⚡- 93.4% ±  2.2%
  branch_misses      2.67K  ±  973      2.03K  … 7.57K           8 (10%)        ⚡- 83.1% ±  2.6%

@nikneym
Copy link
Owner

nikneym commented Oct 24, 2025

I think the comparison between HTTP parser libraries and HeadParser isn't totally fair; HeadParser seem to only parse start line (or request line). It doesn't parse or validate headers since its the task of HeaderIterator.

Its also worth noting that HeaderIterator is not spec-compliant; it seem to be just finding delimiters, not doing validation. The other parser libraries used in the benchmark do validation and try to follow spec (though its a bit loose since websites do whatever they prefer).

@floatdrop
Copy link
Contributor Author

I think the comparison between HTTP parser libraries and HeadParser isn't totally fair;

It is. I don't know how I missed, that HeadParser just counts bytes and not parsing actual data. std.http.Server.Request.Head.parse is the right thing to benchmark with and it is twice slower, than hparse. Mystery solved! (I just can't read sources)

@nikneym nikneym merged commit 6a57754 into nikneym:main Oct 24, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants