grpc_field_extraction: add value_type: STRING to fix ratelimit filter integration#45948
Draft
yusofg2 wants to merge 2 commits into
Draft
grpc_field_extraction: add value_type: STRING to fix ratelimit filter integration#45948yusofg2 wants to merge 2 commits into
yusofg2 wants to merge 2 commits into
Conversation
… integration The extraction filter always writes dynamic metadata as a ListValue, even for singular scalar fields. The ratelimit filter's MetaDataAction calls .string_value() on whatever Value it finds; calling .string_value() on a ListValue returns "" (wrong oneof arm), causing populateDescriptor() to return false and silently disabling rate limiting for every gRPC request. Add a ValueType enum to RequestFieldValueDisposition (LIST = default, STRING = opt-in). When STRING is set, the filter promotes list_value().values(0) to a StringValue before writing to metadata, making the value consumable by the ratelimit filter's metadata descriptor action without a Lua bridge. LIST remains the default for backward compatibility. STRING is only correct when every segment of the field path is non-repeated; the proto comment documents the repeated-field undercounting hazard explicitly. Signed-off-by: Yusof Ganji <yganji@salesforce.com>
… integration The extraction filter always writes dynamic metadata as a ListValue, even for singular scalar fields. The ratelimit filter's MetaDataAction calls .string_value() on whatever Value it finds; calling .string_value() on a ListValue returns "" (wrong oneof arm), causing populateDescriptor() to return false and silently disabling rate limiting for every gRPC request. Add a ValueType enum to RequestFieldValueDisposition (LIST = default, STRING = opt-in). When STRING is set, the filter promotes list_value().values(0) to a StringValue before writing to metadata, making the value consumable by the ratelimit filter's metadata descriptor action without a Lua bridge. LIST remains the default for backward compatibility. STRING is only correct when every segment of the field path is non-repeated; the proto comment documents the repeated-field undercounting hazard explicitly. Signed-off-by: Yusof Ganji <yganji@salesforce.com>
|
Hi @yusofg2, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit Message:
The grpc extraction filter always writes dynamic metadata as a ListValue, even for singular scalar fields. The ratelimit filter's MetaDataAction calls .string_value() on whatever Value it finds; calling .string_value() on a ListValue returns "" (wrong oneof arm), causing populateDescriptor() to return false and silently disabling rate limiting for every gRPC request.
Add a ValueType enum to RequestFieldValueDisposition (LIST = default, STRING = opt-in). When STRING is set, the filter promotes list_value().values(0) to a StringValue before writing to metadata, making the value consumable by the ratelimit filter's metadata descriptor action without a Lua bridge.
LIST remains the default for backward compatibility. STRING is only correct when every segment of the field path is non-repeated; the proto comment documents the repeated-field undercounting hazard explicitly.
Co-Authored-By: Claude Sonnet 4.6 (1M context) noreply@anthropic.com
Additional Description:
gRPC Field Extraction ↔ Rate Limit Filter Incompatibility
envoy.filters.http.grpc_field_extractionandenvoy.filters.http.ratelimitareincompatible by default. When a gRPC field is extracted and the ratelimit filter
is configured to use it as a descriptor, rate limiting is silently disabled for every
request — the ratelimit service is never called.
This is not a configuration error. It is a type mismatch baked into both filters.
Root Cause
What
grpc_field_extractionwritesThe extraction filter uses the
proto_field_extractionlibrary'sExtractValue(),which always returns a
Protobuf::Valuewithkind = kListValueon success —even for singular scalar fields. For a field
tenant_id: "acme"the metadata entrywritten to
envoy.filters.http.grpc_field_extractionis:{ "tenant_id": ["acme"] } // ListValue wrapping a single StringValue elementWhat
ratelimitreadsMetaDataAction::populateDescriptorinsource/common/router/router_ratelimit.cc:241:metadataValue(...).string_value()calls the protobufstring_value()accessoron a
ListValue. ProtobufValueis aoneof— calling the wrong arm returns thezero value, which for strings is
"".The failure chain
grpc_field_extractionwritesListValue { values: ["acme"] }to dynamic metadataMetaDataAction::populateDescriptorcalls.string_value()on theListValue""(wrongoneofarm)metadata_string_valueis""default_value_check — also empty (no default configured)skip_if_absent_which isfalseby defaultpopulateDescriptorreturnsfalse— the entire rate limit check is abortedThe outcome is worse than "rate limiting doesn't fire." With
skip_if_absent: false(the default), rate limiting is completely disabled for every affected request.
With
skip_if_absent: truethe descriptor is silently skipped, which is less badbut still means the field has no effect on rate limiting.
Options Considered
Option A —
value_type: STRINGingrpc_field_extraction(implemented in this PR)Add a
value_typeenum field toRequestFieldValueDispositionin the filter'sproto. When set to
STRING, the filter promotes the firstListValueelement toa
StringValuebefore writing to metadata. The ratelimit filter then reads aStringValueand.string_value()returns the correct value.Config:
What gets written to metadata:
{ "tenant_id": "acme" } // StringValue — ratelimit reads this correctlyCompatibility matrix:
LIST(default)STRING(opt-in)ratelimitmetadata descriptorext_authzforwardingrbacpolicy matching%DYNAMIC_METADATA%Limitations:
STRINGmode takes onlyvalues(0), silently discarding all subsequent elements.This is correct for singular scalar fields (
tenant_id,key.display_name) butcauses rate-limit undercounting for repeated fields: a request touching N values
is charged only against the first value's bucket.
value_type: STRINGis correct if and only if every segment of thefield path is singular (non-repeated) in the proto definition.
ext_authz) and a string (forratelimit) are neededfrom the same field, a Lua bridge is required — see Option C.
Code locations:
api/envoy/extensions/filters/http/grpc_field_extraction/v3/config.protosource/extensions/filters/http/grpc_field_extraction/extractor_impl.ccsource/extensions/filters/http/grpc_field_extraction/extractor_impl.htest/extensions/filters/http/grpc_field_extraction/filter_test.ccOption B — Fix
MetaDataActionin the ratelimit filter to unwrapListValueModify
router_ratelimit.cc:242to checkkind_case()and unwrapvalues(0)whenthe value is a
ListValue:Would it break anything?
In practice, no. Any existing consumer hitting a
ListValuefrom a metadata fieldis already receiving
""and their rate limiting is already silently broken. Thereis nothing to regress.
Problems with this approach:
metadata: { descriptor_key: tenant_id }sees no signal that list-unwrappingis happening. The extraction filter has the proto context; the ratelimit filter
does not.
extraction semantics. The extraction filter is the right place to own value-type
decisions because it knows whether the field is singular or repeated.
values(0)silently drops the rest, same as Option A, but now without even a config signal
to warn the operator.
Option C — Ratelimit filter fan-out over
ListValueelementsA more ambitious extension: when
MetaDataActionsees aListValue, emit oneRateLimitDescriptorper element. For a request withtenant_ids: ["a", "b", "c"],three descriptors would be submitted, charging each bucket.
This would solve the repeated-field undercounting problem that both A and B
cannot address. But:
the ratelimit service wire protocol (lyft/ratelimit expects a specific number of
descriptors per call).
(deduplicate hits, etc.).
Verdict: Maybe a long-term solution for repeated-field rate limiting, but a
different and much larger feature. Out of scope for the current spike.
Option D — Lua bridge for dual-consumer use case
When the same extracted field must be consumed by both the ratelimit filter
(needs
StringValue) and another filter (needs the fullListValue):value_type: LIST(full list preserved).grpc_field_extractionthat reads the list and writesa separate string key for the ratelimit filter:
The ratelimit descriptor then points at namespace
ratelimit.extractedinstead ofenvoy.filters.http.grpc_field_extraction.When this is needed: Only when the same field must feed both ratelimit
(string) and ext_authz / RBAC / access logs (full list). For pure GRLS pipelines,
value_type: STRING(Option A) is sufficient.Field Path Compatibility Reference
STRING?tenant_idkey.display_namekey.namesupported_types.stringrepeated_supported_types.stringkeys.namekeysis repeatedkey.tagsSafe rule:
value_type: STRINGis correct iff every segment of the dot-separatedpath is declared
singular(notrepeated) in the proto definition.Risk Level: Low
Testing:
To validate the fix end-to-end, a local test environment was assembled with the patched Envoy binary running as a Docker container fronting a gRPC echo service, with a Lyft ratelimit (GRLS) instance backed by Redis. A test script
drives gRPC requests through the full filter chain — grpc_field_extraction (with value_type: STRING) followed by the ratelimit filter's metadata descriptor action — with no Lua filter present. The script verifies that early requests
succeed and that once the configured limit is reached the ratelimit filter correctly rejects subsequent requests, confirming that the extracted field value reached GRLS and was acted on. A secondary check queries Redis directly to
confirm GRLS wrote a counter key for the descriptor, ruling out any false-positive from failure_mode_deny: false silently passing requests through. As a negative control, the same test was run against an unpatched Envoy with the
default LIST extraction mode, where all requests succeed, GRLS receives no ShouldRateLimit calls, and Redis remains empty — confirming the incompatibility is the default behavior and the fix is what enables the integration.
Docs Changes: N/A
Release Notes: Added changelog fragment.
Platform Specific Features: None