Skip to content

contrib: implement Peak EWMA load balancing policy#40653

Merged
tonya11en merged 8 commits into
envoyproxy:mainfrom
rroblak:rroblak/peak-ewma-squashed
Oct 24, 2025
Merged

contrib: implement Peak EWMA load balancing policy#40653
tonya11en merged 8 commits into
envoyproxy:mainfrom
rroblak:rroblak/peak-ewma-squashed

Conversation

@rroblak
Copy link
Copy Markdown
Contributor

@rroblak rroblak commented Aug 8, 2025

Commit Message

Adds Peak EWMA (Exponentially Weighted Moving Average) load balancing policy that uses Power of Two Choices algorithm with real-time RTT measurements for latency-aware request routing.

Key components:

  • Load balancer: envoy.load_balancing_policies.peak_ewma
  • HTTP filter: envoy.filters.http.peak_ewma for RTT measurement
  • Configuration: decay_time, aggregation_interval, max_samples_per_host, default_rtt, penalty_value

Implementation uses lock-free atomic ring buffers for RTT sample collection and host-attached storage pattern. Draws from Finagle's Peak EWMA algorithm while avoiding locks, and patterns after Envoy's existing client-side WRR load balancing implementation for main/worker thread coordination.

Fixes #20907

Additional Description

This PR implements a new contrib load balancing policy based on the Peak EWMA (Exponentially Weighted Moving Average) algorithm, which provides latency-aware request routing using real-time RTT measurements.

This addresses the feature request in #20907 for a Peak EWMA load balancer implementation.

Performance Validation

Benchmark results demonstrate Peak EWMA's effectiveness at avoiding slow servers.

Test setup: 10 clients, 10 upstream servers (1 server 10x slower than others):

Algorithm Average P50 P75 P90 P95 P99 Min Max Std Dev
round_robin 19.08 8.46 10.01 101.79 103.39 105.58 5.79 108.02 29.8
least_request 11.00 8.92 10.71 13.10 15.38 103.00 5.98 109.35 11.72
random 18.66 8.47 10.64 65.51 103.28 106.18 5.95 112.42 28.67
peak_ewma 10.16 9.66 11.13 13.29 14.52 18.56 6.31 26.16 2.45

Peak EWMA demonstrates a dramatically lower tail latency than existing Envoy load balancing algos.

Risk Level

Medium

Testing

  • Unit Tests: Comprehensive coverage for all components (Cost, Observability, HostData, Config)
  • Integration Tests: End-to-end load balancing behavior with latency simulation
  • HTTP Filter Tests: RTT measurement and sample recording functionality
  • Manual Testing: Verified 100% traffic routing to fast servers vs slow servers

Docs Changes

  • Added comprehensive API documentation in docs/root/api-v3/config/contrib/load_balancing_policies/peak_ewma/peak_ewma.rst
  • Documented all 5 config parameters with detailed explanations and examples
  • Added statistics section following standard Envoy format
  • Proto files include extensive inline documentation

Release Notes

Added to contrib extensions metadata. This is a new contrib extension so requires no changes to main release notes.

Platform Specific Features

N/A

Runtime Guard

N/A - This is a new contrib extension that must be explicitly configured

Issues

Fixes #20907

Deprecated

N/A

API Changes

This adds new contrib API surfaces:

  • envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma - Load balancer configuration
  • envoy.extensions.filters.http.peak_ewma.v3alpha.PeakEwmaConfig - HTTP filter configuration

Both are marked as work_in_progress following contrib extension patterns. No changes to existing APIs.

@repokitteh-read-only
Copy link
Copy Markdown

Hi @rroblak, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #40653 was opened by rroblak.

see: more, trace.

@jukie
Copy link
Copy Markdown
Contributor

jukie commented Aug 9, 2025

This looks really good, thanks for working on this! Will it be possible to support things like localityLbConfig/zoneAwareLbConfig or slow start mode? I'm not suggesting that needs to be included here (also not a maintainer) but I'm curious if any of the core logic here would restrict that.

@rroblak rroblak force-pushed the rroblak/peak-ewma-squashed branch from 07e1f68 to 3e5e9d4 Compare August 9, 2025 05:32
@mathetake mathetake requested a review from tonya11en August 11, 2025 19:59
@frittentheke
Copy link
Copy Markdown
Contributor

This looks really good, thanks for working on this! Will it be possible to support things like localityLbConfig/zoneAwareLbConfig or slow start mode? I'm not suggesting that needs to be included here (also not a maintainer) but I'm curious if any of the core logic here would restrict that.

I was just about to ask about this aspect as well ... also see my comment in the istio/istio#35102 (comment).

With Peak EWMA being about maintaining a low latency, crossing or not crossing a zone barrier comes into question.
Even though potentially more expensive, there could be a trade-off to make (or not)

@rroblak
Copy link
Copy Markdown
Contributor Author

rroblak commented Aug 19, 2025

Thanks for the great questions, @jukie and @frittentheke!

localityLbConfig/zoneAwareLbConfig

This implementation should be compatible with localityLbConfig/zoneAwareLbConfig. They could act as a pre-filter to select a host set, similar to how the current implementation uses host health filtering to only consider healthy hosts. Then the Peak EWMA policy would P2C on that subset to choose the fastest host.

That said, I haven't looked at the object/data model for localityLbConfig/zoneAwareLbConfig so I'd need to dig through to be 100% sure.

Needless to say, however, locality/zone-awareness highlights one of the strengths— and elegance— of Peak EWMA in comparison to load balancing algorithms that don't consider request latency as an input: given an undifferentiated pool of upstream hosts Peak EWMA will dynamically weight traffic toward the upstream hosts with the lowest RTT, which is in practice is the local zone first, followed by increasingly distant zones.

In my experience running this the past few years across data centers, Peak EWMA simplifies configuration and also mitigates partial-failure issues where an upstream host's RTT increases substantially but not enough to cause the host to be marked unhealthy. In this scenario, Peak EWMA will significantly reduce traffic to the affected upstream host (even if it's in the local zone), whereas fixed configurations will continue to send requests to such a host.

Slow Start

Regarding slow start, in a way it's already implemented via the default_rtt and penalty_value parameters. Those ensure that new hosts only receive a trickle of requests until more data is gathered on their respective request latencies. This again highlights Peak EWMA's elegance in contrast to a fixed slow start config.

If we wanted to incorporate the common LB config slow start params we'd need think a bit about how to do that since they would overlap with the default_rtt and penalty_value params I defined. It's likely possible, but not immediately obvious to me how to do it.

Hope that helps! Let me know your thoughts.

@tonya11en
Copy link
Copy Markdown
Member

@rroblak CI won't run because of the DCO check failing. You can follow the instructions here to fix it and let the baseline tests run.

@tonya11en
Copy link
Copy Markdown
Member

/wait

@rroblak rroblak force-pushed the rroblak/peak-ewma-squashed branch 9 times, most recently from 2a81114 to f53dd24 Compare September 9, 2025 19:07
Adds Peak EWMA (Exponentially Weighted Moving Average) load balancing policy
that uses Power of Two Choices algorithm with real-time RTT measurements for
latency-aware request routing.

Key components:
- Load balancer: envoy.load_balancing_policies.peak_ewma
- HTTP filter: envoy.filters.http.peak_ewma for RTT measurement
- Configuration: decay_time, aggregation_interval, max_samples_per_host,
  default_rtt, penalty_value

Implementation uses lock-free atomic ring buffers for RTT sample collection
and host-attached storage pattern. Draws from Finagle's Peak EWMA algorithm
while avoiding locks, and patterns after Envoy's existing client-side WRR
load balancing implementation for main/worker thread coordination.

Fixes envoyproxy#20907

Signed-off-by: Ryan Oblak <rroblak@gmail.com>
@nezdolik
Copy link
Copy Markdown
Member

friendly ping @tonya11en

@tonya11en
Copy link
Copy Markdown
Member

I'm waiting for an end-user to chime in on the original issue before reviewing. We need an end-user sponsor that is willing to use this.

@KBaichoo
Copy link
Copy Markdown
Contributor

/wait

Pending an end-user of this extension

@tonya11en
Copy link
Copy Markdown
Member

Alright, we have an end-user (#20907 (comment)). I'll start combing through this, just give me until EOW if you don't mind.

@tonya11en tonya11en added the contrib PRs for contrig label Oct 6, 2025
Copy link
Copy Markdown
Member

@tonya11en tonya11en left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an enormous PR, so it'll take a couple passes for me to parse all of it. I made a few comments.

// RTT sample recorded successfully
}
} else {
// Host missing Peak EWMA data - should not happen after initialization
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably detect whether it's happening after initialization and at least emit a warning log. Also, explain the circumstances in which this would happen.

*upstream_timing.first_upstream_rx_byte_received_ -
*upstream_timing.first_upstream_tx_byte_sent_);

// Record RTT sample in host-attached atomic storage
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminate your comments with a period, please.

namespace LoadBalancingPolicies {
namespace PeakEwma {

double Cost::compute(double rtt_ewma_ms, double active_requests, double default_rtt_ms) const {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert the params are non-negative.

Comment on lines +10 to +17
: cost_stat_(scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".cost",
Stats::Gauge::ImportMode::NeverImport)),
ewma_rtt_stat_(
scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".ewma_rtt_ms",
Stats::Gauge::ImportMode::NeverImport)),
active_requests_stat_(
scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".active_requests",
Stats::Gauge::ImportMode::NeverImport)),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, this will cause a cardinality explosion by including the host addresses in the metric name. If folks use this LB policy without knowing this, it'll potentially bring down their TSDB or cause them to run out of quota with their metrics vendor.

However, this is going into contrib so I'm not going to hold the PR up over it. If you're ok with this, then just be sure to mention it in the docs or make it opt-in.

Comment on lines +32 to +36
void Observability::report(
const absl::flat_hash_map<Upstream::HostConstSharedPtr,
std::unique_ptr<GlobalHostStats>>& /* all_host_stats */) {
// Stats are published during aggregation - this is a placeholder for consistency
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you need this?

Copy link
Copy Markdown
Contributor Author

@rroblak rroblak Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. That was vestigial from previous development iterations. I removed it and refactored the code that is used into peak_ewma_lb.cc.

Comment on lines +260 to +261
// Write index wrapped around
return (max_samples - last_processed) + current_write;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case for if the current_write variable overflows?


// Process all new samples in chronological order
size_t processed_index = last_processed;
for (size_t i = 0; i < num_new_samples; ++i) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you handle num_new_samples being larger than the max samples? I'm either missing something or this loop will process some samples more than once under high load.

Comment thread CODEOWNERS
Comment on lines +433 to +434
/contrib/peak_ewma/filters/http/ @rroblak @UNOWNED
/contrib/peak_ewma/load_balancing_policies/ @rroblak @UNOWNED
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may as well just own /contrib/peak_ewma.

@tonya11en
Copy link
Copy Markdown
Member

/wait

@tonya11en
Copy link
Copy Markdown
Member

CI can't run until the DCO check succeeds. I'll approve once it's all green.

@tonya11en
Copy link
Copy Markdown
Member

/wait

Signed-off-by: Ryan Oblak <rroblak@gmail.com>
Signed-off-by: Ryan Oblak <rroblak@gmail.com>
…rver avoidance

Signed-off-by: Ryan Oblak <rroblak@gmail.com>
Signed-off-by: Ryan Oblak <rroblak@gmail.com>
Signed-off-by: Ryan Oblak <rroblak@gmail.com>
Signed-off-by: Ryan Oblak <rroblak@gmail.com>
@rroblak rroblak force-pushed the rroblak/peak-ewma-squashed branch from 3814720 to 0378af9 Compare October 20, 2025 18:16
@tonya11en tonya11en enabled auto-merge (squash) October 24, 2025 19:29
@tonya11en tonya11en merged commit 4364cb5 into envoyproxy:main Oct 24, 2025
25 checks passed
grnmeira pushed a commit to grnmeira/envoy that referenced this pull request Mar 20, 2026
# Commit Message

Adds Peak EWMA (Exponentially Weighted Moving Average) load balancing
policy that uses Power of Two Choices algorithm with real-time RTT
measurements for latency-aware request routing.

Key components:
- Load balancer: `envoy.load_balancing_policies.peak_ewma`
- HTTP filter: `envoy.filters.http.peak_ewma for RTT measurement`
- Configuration: `decay_time`, `aggregation_interval`,
`max_samples_per_host`, `default_rtt`, `penalty_value`

Implementation uses lock-free atomic ring buffers for RTT sample
collection and host-attached storage pattern. Draws from Finagle's Peak
EWMA algorithm while avoiding locks, and patterns after Envoy's existing
client-side WRR load balancing implementation for main/worker thread
coordination.

Fixes envoyproxy#20907

# Additional Description

This PR implements a new contrib load balancing policy based on the Peak
EWMA (Exponentially Weighted Moving Average) algorithm, which provides
latency-aware request routing using real-time RTT measurements.

This addresses the feature request in envoyproxy#20907 for a Peak EWMA load
balancer implementation.

## Performance Validation

Benchmark results demonstrate Peak EWMA's effectiveness at avoiding slow
servers.

Test setup: 10 clients, 10 upstream servers (1 server 10x slower than
others):

| Algorithm | Average | P50 | P75 | P90 | P95 | P99 | Min | Max | Std
Dev |

|---------------|---------|------|-------|--------|--------|--------|------|--------|---------|
| round_robin | 19.08 | 8.46 | 10.01 | 101.79 | 103.39 | 105.58 | 5.79 |
108.02 | 29.8 |
| least_request | 11.00 | 8.92 | 10.71 | 13.10 | 15.38 | 103.00 | 5.98 |
109.35 | 11.72 |
| random | 18.66 | 8.47 | 10.64 | 65.51 | 103.28 | 106.18 | 5.95 |
112.42 | 28.67 |
| **peak_ewma** | **10.16** | **9.66** | **11.13** | **13.29** |
**14.52** | **18.56** | **6.31** | **26.16** | **2.45** |

Peak EWMA demonstrates a dramatically lower tail latency than existing
Envoy load balancing algos.

# Risk Level

Medium

# Testing

- Unit Tests: Comprehensive coverage for all components (Cost,
Observability, HostData, Config)
- Integration Tests: End-to-end load balancing behavior with latency
simulation
- HTTP Filter Tests: RTT measurement and sample recording functionality
- Manual Testing: Verified 100% traffic routing to fast servers vs slow
servers

# Docs Changes

- Added comprehensive API documentation in
`docs/root/api-v3/config/contrib/load_balancing_policies/peak_ewma/peak_ewma.rst`
- Documented all 5 config parameters with detailed explanations and
examples
  - Added statistics section following standard Envoy format
  - Proto files include extensive inline documentation

# Release Notes

Added to contrib extensions metadata. This is a new contrib extension so
requires no changes to main release notes.

# Platform Specific Features

N/A

# Runtime Guard

N/A - This is a new contrib extension that must be explicitly configured

# Issues

Fixes envoyproxy#20907

# Deprecated

N/A

# API Changes

This adds new contrib API surfaces:
- `envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma`
- Load balancer configuration
- `envoy.extensions.filters.http.peak_ewma.v3alpha.PeakEwmaConfig` -
HTTP filter configuration

Both are marked as `work_in_progress` following contrib extension
patterns. No changes to existing APIs.

---------

Signed-off-by: Ryan Oblak <rroblak@gmail.com>
Signed-off-by: Gustavo <grnmeira@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contrib PRs for contrig

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Peak EWMA load balancing

6 participants