http: optimize header map implementation for improved performance #39252

agrawroh · 2025-04-28T16:41:48Z

Description:

This PR introduces an optimized implementation of the HeaderMap interface that provides better performance for common header operations. It uses a combination of vector and hash map for header storage and lookup. This implementation shows significant performance improvements:

1.7x faster header insertions
3.3x faster header lookups
Up to 3.6x faster header iteration

This optimization will benefit all Envoy deployments that handle significant HTTP traffic, particularly those with large numbers of headers or high request rates. The improvements in lookup and iteration performance will reduce CPU usage in header processing code paths.

Benchmark

Benchmarks are performed on both MacOS and Linux using the checked in benchmarking suite.

bazel run -c opt //test/benchmark:header_map_benchmark_test

Benchmark Results

Linux

Run on (32 X 3499.53 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 1280 KiB (x16)
  L3 Unified 55296 KiB (x1)
Load Average: 10.41, 17.06, 16.77
------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------
BM_HeaderMapImplInsert/10/10                 793 ns          793 ns       881801 items_per_second=12.6078M/s
BM_HeaderMapImplInsert/50/20                4327 ns         4327 ns       162107 items_per_second=11.556M/s
BM_HeaderMapImplInsert/100/50               9559 ns         9559 ns        73359 items_per_second=10.4614M/s
BM_HeaderMapOptimizedInsert/10/10            678 ns          678 ns      1037087 items_per_second=14.7545M/s
BM_HeaderMapOptimizedInsert/50/20           3528 ns         3528 ns       194380 items_per_second=14.1736M/s
BM_HeaderMapOptimizedInsert/100/50          8098 ns         8098 ns        85731 items_per_second=12.3495M/s
BM_HeaderMapImplLookup/10/10                 355 ns          355 ns      1973543 items_per_second=28.194M/s
BM_HeaderMapImplLookup/50/20                1909 ns         1909 ns       366985 items_per_second=26.1928M/s
BM_HeaderMapImplLookup/100/50               3914 ns         3914 ns       178833 items_per_second=25.5496M/s
BM_HeaderMapOptimizedLookup/10/10            136 ns          136 ns      5100593 items_per_second=73.6141M/s
BM_HeaderMapOptimizedLookup/50/20            719 ns          719 ns       939043 items_per_second=69.5108M/s
BM_HeaderMapOptimizedLookup/100/50          1423 ns         1423 ns       494690 items_per_second=70.2956M/s
BM_HeaderMapImplIteration/10/10             24.2 ns         24.2 ns     29162575 items_per_second=413.807M/s
BM_HeaderMapImplIteration/50/20             87.6 ns         87.6 ns      7993415 items_per_second=570.628M/s
BM_HeaderMapImplIteration/100/50             160 ns          160 ns      4395914 items_per_second=623.138M/s
BM_HeaderMapOptimizedIteration/10/10        5.76 ns         5.76 ns    122007983 items_per_second=1.73746G/s
BM_HeaderMapOptimizedIteration/50/20        28.3 ns         28.3 ns     24778009 items_per_second=1.76795G/s
BM_HeaderMapOptimizedIteration/100/50       56.2 ns         56.2 ns     12446980 items_per_second=1.77809G/s

NOTE: Lower is better.

MacOS

Run on (12 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x12)
Load Average: 13.55, 9.45, 7.33
------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------
BM_HeaderMapImplInsert/10/10                 934 ns          933 ns       741706 items_per_second=10.7156M/s
BM_HeaderMapImplInsert/50/20                4058 ns         4057 ns       168786 items_per_second=12.3242M/s
BM_HeaderMapImplInsert/100/50               7996 ns         7996 ns        87223 items_per_second=12.5057M/s
BM_HeaderMapOptimizedInsert/10/10            634 ns          634 ns      1091669 items_per_second=15.7718M/s
BM_HeaderMapOptimizedInsert/50/20           3272 ns         3261 ns       215173 items_per_second=15.334M/s
BM_HeaderMapOptimizedInsert/100/50          6628 ns         6624 ns       104356 items_per_second=15.0962M/s
BM_HeaderMapImplLookup/10/10                 256 ns          256 ns      2726993 items_per_second=39.1321M/s
BM_HeaderMapImplLookup/50/20                1315 ns         1315 ns       539204 items_per_second=38.015M/s
BM_HeaderMapImplLookup/100/50               2749 ns         2748 ns       259331 items_per_second=36.392M/s
BM_HeaderMapOptimizedLookup/10/10           68.7 ns         68.6 ns     10155672 items_per_second=145.786M/s
BM_HeaderMapOptimizedLookup/50/20            355 ns          355 ns      1815089 items_per_second=140.964M/s
BM_HeaderMapOptimizedLookup/100/50           748 ns          748 ns       927435 items_per_second=133.734M/s
BM_HeaderMapImplIteration/10/10             12.3 ns         12.3 ns     55999552 items_per_second=812.345M/s
BM_HeaderMapImplIteration/50/20             53.9 ns         53.9 ns     12935893 items_per_second=928.35M/s
BM_HeaderMapImplIteration/100/50             101 ns          101 ns      6858711 items_per_second=987.991M/s
BM_HeaderMapOptimizedIteration/10/10        3.55 ns         3.55 ns    196841537 items_per_second=2.81886G/s
BM_HeaderMapOptimizedIteration/50/20        15.5 ns         15.5 ns     44665646 items_per_second=3.23397G/s
BM_HeaderMapOptimizedIteration/100/50       30.4 ns         30.4 ns     22931722 items_per_second=3.28603G/s

NOTE: Lower is better.

Commit Message: http: optimize header map implementation for improved performance
Risk Level: Low
Testing: Unit tests and benchmarks added
Docs Changes: N/A
Release Notes: Added
Platform Specific: No

repokitteh-read-only · 2025-04-28T16:41:54Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #39252 was opened by agrawroh.

see: more, trace.

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>

ggreenway · 2025-04-28T23:22:50Z

I'd like to see a performance comparison that mimics real-world access patterns. Specifically, compare lots of use of core headers like :path and :authority and the other headers that envoy must access for routing/processing. In the current implementation there are two classes of headers, and all of the core-protocol headers are in the "fast" path. So I'd like to see comparison of that.

Also, I'd like to see results in a full Envoy end-to-end test to see how much of a difference this change makes in overall performance.

agrawroh force-pushed the perf-hdr branch 2 times, most recently from 77ef6ec to 0bdfe7b Compare April 28, 2025 17:18

http: optimize header map implementation for improved performance

9408c0d

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>

agrawroh force-pushed the perf-hdr branch from 0bdfe7b to 9408c0d Compare April 28, 2025 22:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http: optimize header map implementation for improved performance #39252

http: optimize header map implementation for improved performance #39252

agrawroh commented Apr 28, 2025 •

edited

Loading

repokitteh-read-only bot commented Apr 28, 2025

ggreenway commented Apr 28, 2025

http: optimize header map implementation for improved performance #39252

Are you sure you want to change the base?

http: optimize header map implementation for improved performance #39252

Conversation

agrawroh commented Apr 28, 2025 • edited Loading

Description:

Benchmark

Benchmark Results

Linux

MacOS

repokitteh-read-only bot commented Apr 28, 2025

ggreenway commented Apr 28, 2025

agrawroh commented Apr 28, 2025 •

edited

Loading