Skip to content

grpc_field_extraction: add value_type: STRING to fix ratelimit filter integration#45948

Draft
yusofg2 wants to merge 2 commits into
envoyproxy:mainfrom
yusofg2:grpc-extraction2string
Draft

grpc_field_extraction: add value_type: STRING to fix ratelimit filter integration#45948
yusofg2 wants to merge 2 commits into
envoyproxy:mainfrom
yusofg2:grpc-extraction2string

Conversation

@yusofg2

@yusofg2 yusofg2 commented Jul 2, 2026

Copy link
Copy Markdown

Commit Message:
The grpc extraction filter always writes dynamic metadata as a ListValue, even for singular scalar fields. The ratelimit filter's MetaDataAction calls .string_value() on whatever Value it finds; calling .string_value() on a ListValue returns "" (wrong oneof arm), causing populateDescriptor() to return false and silently disabling rate limiting for every gRPC request.

Add a ValueType enum to RequestFieldValueDisposition (LIST = default, STRING = opt-in). When STRING is set, the filter promotes list_value().values(0) to a StringValue before writing to metadata, making the value consumable by the ratelimit filter's metadata descriptor action without a Lua bridge.

LIST remains the default for backward compatibility. STRING is only correct when every segment of the field path is non-repeated; the proto comment documents the repeated-field undercounting hazard explicitly.

Co-Authored-By: Claude Sonnet 4.6 (1M context) noreply@anthropic.com

Additional Description:

gRPC Field Extraction ↔ Rate Limit Filter Incompatibility

envoy.filters.http.grpc_field_extraction and envoy.filters.http.ratelimit are
incompatible by default. When a gRPC field is extracted and the ratelimit filter
is configured to use it as a descriptor, rate limiting is silently disabled for every
request — the ratelimit service is never called.

This is not a configuration error. It is a type mismatch baked into both filters.


Root Cause

What grpc_field_extraction writes

The extraction filter uses the proto_field_extraction library's ExtractValue(),
which always returns a Protobuf::Value with kind = kListValue on success —
even for singular scalar fields. For a field tenant_id: "acme" the metadata entry
written to envoy.filters.http.grpc_field_extraction is:

{ "tenant_id": ["acme"] }   // ListValue wrapping a single StringValue element

What ratelimit reads

MetaDataAction::populateDescriptor in
source/common/router/router_ratelimit.cc:241:

const std::string metadata_string_value =
    Envoy::Config::Metadata::metadataValue(metadata_source, metadata_key_).string_value();

if (!metadata_string_value.empty()) {
    descriptor_entry = {descriptor_key_, metadata_string_value};
    return true;
} else if (metadata_string_value.empty() && !default_value_.empty()) {
    descriptor_entry = {descriptor_key_, default_value_};
    return true;
}
return skip_if_absent_;   // defaults to false

metadataValue(...).string_value() calls the protobuf string_value() accessor
on a ListValue. Protobuf Value is a oneof — calling the wrong arm returns the
zero value, which for strings is "".

The failure chain

Step What happens
1 grpc_field_extraction writes ListValue { values: ["acme"] } to dynamic metadata
2 MetaDataAction::populateDescriptor calls .string_value() on the ListValue
3 Protobuf returns "" (wrong oneof arm)
4 metadata_string_value is ""
5 Falls through to default_value_ check — also empty (no default configured)
6 Returns skip_if_absent_ which is false by default
7 populateDescriptor returns false — the entire rate limit check is aborted
8 The ratelimit service is never called; the request passes through without limit

The outcome is worse than "rate limiting doesn't fire." With skip_if_absent: false
(the default), rate limiting is completely disabled for every affected request.
With skip_if_absent: true the descriptor is silently skipped, which is less bad
but still means the field has no effect on rate limiting.


Options Considered

Option A — value_type: STRING in grpc_field_extraction (implemented in this PR)

Add a value_type enum field to RequestFieldValueDisposition in the filter's
proto. When set to STRING, the filter promotes the first ListValue element to
a StringValue before writing to metadata. The ratelimit filter then reads a
StringValue and .string_value() returns the correct value.

Config:

request_field_extractions:
  tenant_id:
    value_type: STRING    # opt-in; LIST is the default

What gets written to metadata:

{ "tenant_id": "acme" }   // StringValue — ratelimit reads this correctly

Compatibility matrix:

Consumer LIST (default) STRING (opt-in)
ratelimit metadata descriptor ❌ silently broken ✅ works
ext_authz forwarding
rbac policy matching
Lua scripts
Access logs %DYNAMIC_METADATA% JSON array bare string
WASM extensions

Limitations:

  • STRING mode takes only values(0), silently discarding all subsequent elements.
    This is correct for singular scalar fields (tenant_id, key.display_name) but
    causes rate-limit undercounting for repeated fields: a request touching N values
    is charged only against the first value's bucket.
  • The safe rule: value_type: STRING is correct if and only if every segment of the
    field path is singular (non-repeated) in the proto definition.
  • If both the full list (for ext_authz) and a string (for ratelimit) are needed
    from the same field, a Lua bridge is required — see Option C.

Code locations:

  • Proto definition: api/envoy/extensions/filters/http/grpc_field_extraction/v3/config.proto
  • Implementation: source/extensions/filters/http/grpc_field_extraction/extractor_impl.cc
  • Header (parallel map): source/extensions/filters/http/grpc_field_extraction/extractor_impl.h
  • Tests: test/extensions/filters/http/grpc_field_extraction/filter_test.cc

Option B — Fix MetaDataAction in the ratelimit filter to unwrap ListValue

Modify router_ratelimit.cc:242 to check kind_case() and unwrap values(0) when
the value is a ListValue:

// hypothetical change
const Protobuf::Value& raw = Envoy::Config::Metadata::metadataValue(...);
const std::string metadata_string_value =
    raw.kind_case() == Protobuf::Value::kListValue && raw.list_value().values_size() > 0
        ? raw.list_value().values(0).string_value()
        : raw.string_value();

Would it break anything?
In practice, no. Any existing consumer hitting a ListValue from a metadata field
is already receiving "" and their rate limiting is already silently broken. There
is nothing to regress.

Problems with this approach:

  • Implicit behavior change. An operator reading
    metadata: { descriptor_key: tenant_id } sees no signal that list-unwrapping
    is happening. The extraction filter has the proto context; the ratelimit filter
    does not.
  • Mixes concerns. The ratelimit filter becomes implicitly aware of gRPC
    extraction semantics. The extraction filter is the right place to own value-type
    decisions because it knows whether the field is singular or repeated.
  • Still has the repeated-field undercounting problem — taking values(0)
    silently drops the rest, same as Option A, but now without even a config signal
    to warn the operator.

Option C — Ratelimit filter fan-out over ListValue elements

A more ambitious extension: when MetaDataAction sees a ListValue, emit one
RateLimitDescriptor per element. For a request with tenant_ids: ["a", "b", "c"],
three descriptors would be submitted, charging each bucket.

This would solve the repeated-field undercounting problem that both A and B
cannot address. But:

  • It changes the number of descriptors emitted per request — a breaking change to
    the ratelimit service wire protocol (lyft/ratelimit expects a specific number of
    descriptors per call).
  • The ratelimit service itself would need changes to handle fan-out correctly
    (deduplicate hits, etc.).
  • Much larger scope; not achievable within the extraction filter.

Verdict: Maybe a long-term solution for repeated-field rate limiting, but a
different and much larger feature. Out of scope for the current spike.


Option D — Lua bridge for dual-consumer use case

When the same extracted field must be consumed by both the ratelimit filter
(needs StringValue) and another filter (needs the full ListValue):

  1. Extract with value_type: LIST (full list preserved).
  2. Insert a Lua filter after grpc_field_extraction that reads the list and writes
    a separate string key for the ratelimit filter:
local meta = request_handle:streamInfo():dynamicMetadata()
local ns = "envoy.filters.http.grpc_field_extraction"
local extracted = meta:get(ns)
if extracted and extracted["tenant_id"] then
  local list = extracted["tenant_id"]
  if #list > 0 then
    request_handle:streamInfo():dynamicMetadata():set(
      "ratelimit.extracted", "tenant_id", list[1])
  end
end

The ratelimit descriptor then points at namespace ratelimit.extracted instead of
envoy.filters.http.grpc_field_extraction.

When this is needed: Only when the same field must feed both ratelimit
(string) and ext_authz / RBAC / access logs (full list). For pure GRLS pipelines,
value_type: STRING (Option A) is sufficient.


Field Path Compatibility Reference

Field path Path is safe for STRING? Notes
tenant_id Top-level singular scalar
key.display_name All segments singular
key.name All segments singular
supported_types.string Singular nested message, singular terminal field
repeated_supported_types.string Terminal field is repeated
keys.name Intermediate keys is repeated
key.tags Terminal field is repeated

Safe rule: value_type: STRING is correct iff every segment of the dot-separated
path is declared singular (not repeated) in the proto definition.


Risk Level: Low

Testing:
To validate the fix end-to-end, a local test environment was assembled with the patched Envoy binary running as a Docker container fronting a gRPC echo service, with a Lyft ratelimit (GRLS) instance backed by Redis. A test script
drives gRPC requests through the full filter chain — grpc_field_extraction (with value_type: STRING) followed by the ratelimit filter's metadata descriptor action — with no Lua filter present. The script verifies that early requests
succeed and that once the configured limit is reached the ratelimit filter correctly rejects subsequent requests, confirming that the extracted field value reached GRLS and was acted on. A secondary check queries Redis directly to
confirm GRLS wrote a counter key for the descriptor, ruling out any false-positive from failure_mode_deny: false silently passing requests through. As a negative control, the same test was run against an unpatched Envoy with the
default LIST extraction mode, where all requests succeed, GRLS receives no ShouldRateLimit calls, and Redis remains empty — confirming the incompatibility is the default behavior and the fix is what enables the integration.

Docs Changes: N/A
Release Notes: Added changelog fragment.
Platform Specific Features: None

yusofg2 added 2 commits July 2, 2026 11:53
… integration

The extraction filter always writes dynamic metadata as a ListValue, even
for singular scalar fields. The ratelimit filter's MetaDataAction calls
.string_value() on whatever Value it finds; calling .string_value() on a
ListValue returns "" (wrong oneof arm), causing populateDescriptor() to
return false and silently disabling rate limiting for every gRPC request.

Add a ValueType enum to RequestFieldValueDisposition (LIST = default,
STRING = opt-in). When STRING is set, the filter promotes
list_value().values(0) to a StringValue before writing to metadata,
making the value consumable by the ratelimit filter's metadata descriptor
action without a Lua bridge.

LIST remains the default for backward compatibility. STRING is only
correct when every segment of the field path is non-repeated; the proto
comment documents the repeated-field undercounting hazard explicitly.

Signed-off-by: Yusof Ganji <yganji@salesforce.com>
… integration

The extraction filter always writes dynamic metadata as a ListValue, even
for singular scalar fields. The ratelimit filter's MetaDataAction calls
.string_value() on whatever Value it finds; calling .string_value() on a
ListValue returns "" (wrong oneof arm), causing populateDescriptor() to
return false and silently disabling rate limiting for every gRPC request.

Add a ValueType enum to RequestFieldValueDisposition (LIST = default,
STRING = opt-in). When STRING is set, the filter promotes
list_value().values(0) to a StringValue before writing to metadata,
making the value consumable by the ratelimit filter's metadata descriptor
action without a Lua bridge.

LIST remains the default for backward compatibility. STRING is only
correct when every segment of the field path is non-repeated; the proto
comment documents the repeated-field undercounting hazard explicitly.

Signed-off-by: Yusof Ganji <yganji@salesforce.com>
@yusofg2 yusofg2 requested a deployment to external-contributors July 2, 2026 19:47 — with GitHub Actions Waiting
@repokitteh-read-only

Copy link
Copy Markdown

Hi @yusofg2, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #45948 was opened by yusofg2.

see: more, trace.

@repokitteh-read-only

Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #45948 was opened by yusofg2.

see: more, trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant