Skip to content

Commit 7a0a1ed

Browse files
committed
update bullet list formatting.
1 parent 061846d commit 7a0a1ed

File tree

1 file changed

+16
-13
lines changed

1 file changed

+16
-13
lines changed

site-src/guides/implementers.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,6 @@ spec:
5353
extensionRef:
5454
name: vllm-llama3-8b-instruct-epp
5555
```
56-
mkdocs.yml
5756
There are mainly two options for how to treat the Inference Pool in your controller.
5857

5958
**Option 1: Shadow Service Creation**
@@ -103,40 +102,44 @@ The EPP communicates the chosen endpoint to the proxy via the `x-gateway-destina
103102
To conform with the Inference Extensions API, Gateway data planes must implement the [Endpoint Picker Protocol](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/004-endpoint-picker-protocol).
104103

105104
At a high level, the protocol consists of metadata key/value pairs exchanged between the data plane and extensions containing relevant endpoint selection information:
105+
106106
- From extension to data plane: the metadata contains the selected endpoints.
107107
- From data plane to extension: the metadata contains an optional subset of endpoints that the extension should pick from.
108108

109109
The key requirements for implementing the GIE protocol are as follows:
110+
110111
- Relies on the [ext_proc (External Processing)](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) protocol as the foundation for exchanging HTTP stream payload and metadata throughout the various HTTP lifecycle events; several key details:
111-
- ext_proc relies on gRPC (bidirectional streaming) as the transport protocol
112-
- ext_proc supports several processing modes, including buffered and streaming options for payload exchange
113-
- ext_proc supports structured metadata passed as part of requests and responses for each processing stage
112+
- ext_proc relies on gRPC (bidirectional streaming) as the transport protocol
113+
- ext_proc supports several processing modes, including buffered and streaming options for payload exchange
114+
- ext_proc supports structured metadata passed as part of requests and responses for each processing stage
114115
- The Inference Extension protocol exchanges data between proxy and extension servers as metadata — either via HTTP headers or the structured fields in the ext_proc messages — using well defined names and values:
115-
- **x-gateway-destination-endpoint**
116-
- Informs the proxy of the selected (primary) endpoint along with fallback endpoints for retries (if needed).
117-
- Sent by the extension service to the data plane as [ProcessingResponse](https://github.com/envoyproxy/envoy/blob/v1.34.2/api/envoy/service/ext_proc/v3/external_processor.proto) metadata in response to HTTP request stage events.
118-
- **x-gateway-destination-endpoint-subset (optional)**
119-
- Contains the subset of endpoints the extension should pick from.
120-
- Sent by the data plane to the extension service as [ProcessingRequest](https://github.com/envoyproxy/envoy/blob/v1.34.2/api/envoy/service/ext_proc/v3/external_processor.proto) metadata during HTTP request stage events
116+
- **x-gateway-destination-endpoint**
117+
- Informs the proxy of the selected (primary) endpoint along with fallback endpoints for retries (if needed).
118+
- Sent by the extension service to the data plane as [ProcessingResponse](https://github.com/envoyproxy/envoy/blob/v1.34.2/api/envoy/service/ext_proc/v3/external_processor.proto) metadata in response to HTTP request stage events.
119+
- **x-gateway-destination-endpoint-subset (optional)**
120+
- Contains the subset of endpoints the extension should pick from.
121+
- Sent by the data plane to the extension service as [ProcessingRequest](https://github.com/envoyproxy/envoy/blob/v1.34.2/api/envoy/service/ext_proc/v3/external_processor.proto) metadata during HTTP request stage events
121122

122123
#### External Processing Protocol
123124

124125
ext_proc is a mature protocol, implemented by Envoy to support communication with external processing services. It has gained adoption across several types of use cases:
126+
125127
- [Google Cloud Load Balancer and CDN Service Extensions](https://cloud.google.com/service-extensions/docs/overview)
126-
- Supports generic “service callouts” not restricted to genAI serving or AI use cases; e.g., mutation of cache keys for caching.
128+
- Supports generic “service callouts” not restricted to genAI serving or AI use cases; e.g., mutation of cache keys for caching.
127129
- [Alibaba Cloud](https://www.alibabacloud.com/help/en/asm/user-guide/use-envoy-external-processing-for-custom-processing-of-requests)
128130
- GenAI serving
129131
- [AIBrix](https://aibrix.readthedocs.io/latest/features/gateway-plugins.html)
130-
- Enables inference optimized routing for the Gateway in Bytedance’s genAI inference infrastructure.
132+
- Enables inference optimized routing for the Gateway in Bytedance’s genAI inference infrastructure.
131133
- [Envoy AI Gateway](https://aigateway.envoyproxy.io/docs/concepts/architecture/data-plane)
132-
- Enables AI model based routing, request transformations and upstream authn.
134+
- Enables AI model based routing, request transformations and upstream authn.
133135
- [Atlassian Guard](https://www.atlassian.com/software/guard)
134136

135137
Supporting this broad range of extension capabilities (including for inference, as evidenced above) requires hooks into all HTTP stream (i.e., request and response) lifecycle events as well as the corresponding headers, trailers and payload. This is the core value proposition for ext_proc, along with configurable options (such as for buffering and streaming modes) that enable its use across a variety of deployment scenarios and networking topologies.
136138

137139
#### Native Implementations
138140

139141
Several native implementations can be used as references:
142+
140143
- A fully featured [reference implementation](https://github.com/envoyproxy/envoy/tree/main/source/extensions/filters/http/ext_proc) (C++) can be found in the Envoy GitHub repository.
141144
- A second implementation (Rust, non-Envoy) is available in [Agent Gateway](https://github.com/agentgateway/agentgateway/blob/v0.5.2/crates/proxy/src/ext_proc.rs).
142145

0 commit comments

Comments
 (0)