Skip to content

Add EWMA and load biasing crates for failure-aware P2C balancing#4537

Open
unleashed wants to merge 21 commits into
mainfrom
amr/load-biaser
Open

Add EWMA and load biasing crates for failure-aware P2C balancing#4537
unleashed wants to merge 21 commits into
mainfrom
amr/load-biaser

Conversation

@unleashed
Copy link
Copy Markdown
Member

Today the proxy's P2C load balancer uses Tower's PeakEwma, which tracks
only round-trip time. An endpoint returning fast 503s or 429s looks
"fast" to PeakEwma, so P2C keeps routing traffic to it. This is exactly the
opposite of what operators want.

This PR adds the building blocks to make P2C failure-aware, but does not
wire anything in the proxy stack yet to keep the reviews' scope manageable.
Follow-up PR's will make use of these building blocks to activate this code
and implement related features in the circuit breaker.

Here are the main components:

  • linkerd-ewma. A standalone EWMA crate that supports non-mutating
    time-projected reads and dual-metric tracking (RTT + penalty) under a
    single lock. Tower's internal RttEstimate is private, mutates on read,
    and cannot support the penalty dimension.

  • retry_after module in linkerd-http-classify. Parsers for HTTP
    Retry-After (delay-seconds and HTTP-date per RFC 7231) and gRPC
    grpc-retry-pushback-ms (per gRPC A6 spec), so the load biaser and the
    upcoming circuit breaker can honor server backoff hints.

  • linkerd-load-biaser. A Tower Service wrapper implementing
    tower::load::Load that tracks per-endpoint RTT via EWMA and injects
    temporary load penalties on failure responses (HTTP 429/503/5xx, gRPC
    RESOURCE_EXHAUSTED/UNAVAILABLE). When a Retry-After hint is present the
    penalty is amplified to remain meaningful through the server-requested
    backoff window. The load metric is max(rtt * (pending + 1), penalty),
    giving P2C the ability to steer traffic away from unhealthy endpoints while
    preserving the same behavior as PeakEwma when all of them are healthy.

unleashed added 5 commits May 21, 2026 20:25
Introduce linkerd-ewma, a general-purpose exponentially-weighted moving
average crate. The crate provides five public methods on an Ewma struct:
new (initializes with INFINITY sentinel), get (returns stored value),
add (blends a new sample using exponential decay), add_peak (replaces
stored value when the new sample exceeds it), and add_rate (derives a
rate from the inverse of the elapsed interval and feeds it through add).

This is being added in spite of tower::PeakEwma because this is not
limited to middleware-based RTT computing. We specifically plan to
use this implementation for a load biasing feature and a
success-rate circuit breaker policy, which would otherwise not be
possible.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Extend linkerd-ewma with the API surface needed for success-rate circuit
breaking. A MIN_DECAY constant (1 ms) is now applied in both constructors
so that a zero-duration decay never produces division-by-zero or NaN
results in downstream arithmetic.

New methods: new_with_value sets an explicit initial sample instead of the
INFINITY sentinel, reset overwrites both value and timestamp for breaker
recovery, and get_at projects the stored value forward through exponential
decay without mutating internal state.

Also add_peak is now decay-aware: it projects the stored value to the
candidate timestamp before deciding whether to replace it, and it
unconditionally replaces INFINITY so that the first real sample always
takes effect even at the construction timestamp.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Add a retry_after module to linkerd-http-classify with shared parsing
functions for extracting backoff hints from HTTP and gRPC responses.

parse_retry_after handles 429/503 responses with both delay-seconds and
HTTP-date formats per RFC 7231, capping the returned duration at a
caller-specified maximum. parse_grpc_retry_pushback reads the
grpc-retry-pushback-ms header per the gRPC A6 spec, rejecting negative
values and capping positive ones.

We use the httpdate crate for the actual RFC 7231 HTTP-date parsing.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
…re penalties

Introduce the linkerd-load-biaser crate, which wraps any tower::Service to
provide per-endpoint load metrics for P2C balancing. The crate tracks request
latency via EWMA and injects penalties when failure responses are detected,
steering traffic away from unhealthy endpoints.

Penalty injection covers HTTP 429/503/5xx and gRPC RESOURCE_EXHAUSTED/UNAVAILABLE
trailers-only responses (not streaming gRPC failures since we can only
access headers here). For responses with backoff hints, Retry-After on
HTTP 429/503 or grpc-retry-pushback-ms on gRPC trailers-only errors, the
penalty is amplified so that the EWMA value remains meaningful through
the server-requested backoff window. The amplification is clamped to
prevent infinity from permanently disabling the endpoint.

The load metric is computed as `max(rtt * (pending + 1), penalty)`, where
`rtt` is the peak-EWMA latency, and `pending` is the number of in-flight
requests. This is returned via tower::load::Load for direct P2C
integration.

The load biaser is disabled by default, preserving RTT-only behavior
(PeakEwma equivalent), unless explicitly activated.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
These cover the complete load biasing lifecycle, including penalty
injection, hint parsing, cancellation safety via PinnedDrop, and
backwards-compatible behavior when disabled (ie. RTT-only behavior
equivalent to PeakEwma).

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
@unleashed unleashed requested a review from cratelyn May 21, 2026 18:36
@unleashed unleashed requested a review from a team as a code owner May 21, 2026 18:36
Copy link
Copy Markdown

@raykroeker raykroeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unleashed Thanks for the documentation. It really helps understand the intent.
+100

Comment thread linkerd/http/classify/Cargo.toml Outdated
Comment thread linkerd/ewma/src/lib.rs
Comment thread linkerd/ewma/Cargo.toml Outdated
publish = { workspace = true }

[dependencies]
tokio = { version = "1", features = ["time"] }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe default-features = false would be nice here too, since we're not setting up a runtime or anything else in this crate. time interfaces seem like all we're using here!

#[test]
fn parse_grpc_pushback_positive() {
let mut headers = HeaderMap::new();
headers.insert("grpc-retry-pushback-ms", HeaderValue::from_static("5000"));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this and the tests below use the GRPC_RETRY_PUSHBACK_MS? tests above use http::header::RETRY_AFTER, so that'd be consistent.

Comment thread linkerd/load-biaser/Cargo.toml Outdated
Comment thread linkerd/load-biaser/src/lib.rs
cratelyn
cratelyn previously approved these changes May 22, 2026
Copy link
Copy Markdown
Member

@cratelyn cratelyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for breaking these additions out into a standalone pull request, separate from the changes we'll be making in our proxy stack(s). that really helped expedite review of this.

Comment thread linkerd/load-biaser/src/lib.rs Outdated
Comment thread linkerd/load-biaser/src/lib.rs Outdated
/// via `rate_limit_hint(max)`, so different callers (e.g. load biaser vs
/// circuit breaker) can use different maximums from the same cached value.
#[derive(Clone, Copy, Debug)]
pub struct CachedRateLimitHint(pub Duration);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be pub?

the uncapped value, along with the fact that this is intended for use with rate_limit_hint, makes me wonder if a constructor pub fn new could work for creating these, while preventing accidents with an uncapped duration in the future.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, I'll adapt it to use a constructor.

unleashed and others added 2 commits May 26, 2026 13:40
Co-authored-by: katelyn martin <kate@buoyant.io>
Co-authored-by: katelyn martin <kate@buoyant.io>
unleashed added 8 commits May 26, 2026 20:52
…_rate_limit_hint

The _max parameter was accepted for API symmetry with rate_limit_hint(max) but
intentionally unused: the method always caches the uncapped raw value so each
consumer can apply its own cap via rate_limit_hint(max). Removing the parameter
for now since we probably won't need it in the future, and if so we can
always put it back in place.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
…or and accessor

Make the inner Duration field private and provide CachedRateLimitHint::new() for
construction and duration_capped(max) for reads. This prevents consumers from
bypassing the per-caller cap that rate_limit_hint(max) enforces, since the cached
value is intentionally uncapped.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Explain why a standalone EWMA crate exists instead of using Tower's
RttEstimate: it is private, mutates on read, and cannot support the
penalty dimension that failure-aware load balancing requires.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The crate only uses tokio::time, so disable the default feature set to
avoid pulling unnecessary features into the dependency declaration.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The cancellation test uses tokio::sync::oneshot which requires the sync
feature. This compiled only because workspace feature unification pulled
it in from other crates.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Replace raw string literals with the module-level constant for
consistency with how HTTP tests use http::header::RETRY_AFTER.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Consistent with Ewma::new which already has this attribute.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
@cratelyn cratelyn dismissed their stale review May 26, 2026 20:12

the shape of this looks good, but i want to hold off on merging it until we have consensus about load biasing and changes to the control plane.

unleashed added 5 commits May 28, 2026 11:46
Inspect the grpc-status header only on HTTP 200 responses whose
content-type starts with application/grpc. Without this a non-gRPC
upstream that happens to include a grpc-status header would be
considered a gRPC failure and penalized by the load biaser.

The same check is applied to the gRPC retry-pushback-ms parsing in
the ReponseFailureHint trait implementation.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Up until now we mapped every non-zero gRPC status code to
FailureHint::InternalError, penalizing client errors like CANCELLED,
INVALID_ARGUMENT, NOT_FOUND, etc. These don't indicate server
health issues and should not steer traffic away from the endpoint.

Restrict penalty injection to server-side error codes that indicate
endpoint problems: UNKNOWN (2), DEADLINE_EXCEEDED (4), INTERNAL (13),
and DATA_LOSS (15), alongside the existing RESOURCE_EXHAUSTED (8)
and UNAVAILABLE (14) statuses.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Ensure only those gRPC status codes indicating server-side errors
inject penalties.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Verify that consecutive 429 responses at 1s intervals keep the
penalty at the configured level, confirming the EWMA peak resets
the decayed value rather than accumulating.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Add a `last_update()` getter that returns the timestamp of the most
recent EWMA update. Callers that need to detect staleness (ie. idle
periods where the EWMA has decayed to the point that a single sample
dominates) can compare this against the current time to detect this
exact circumstance (and, for example, require more samples before
taking decisions).

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants