Skip to content

Commit 90f7f03

Browse files
committed
ObservabilityPolicy Enhancement Proposal (nginx#1795)
Problem: We want a design for Observability-related configuration settings, such as tracing, to be applied at the HTTPRoute level. Solution: Add enhancement proposal introducing `ObservabilityPolicy`.
1 parent 528051f commit 90f7f03

File tree

1 file changed

+316
-0
lines changed

1 file changed

+316
-0
lines changed

docs/proposals/observability.md

Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
# Enhancement Proposal-1778: Observability Policy
2+
3+
- Issue: https://github.com/nginxinc/nginx-gateway-fabric/issues/1778
4+
- Status: Implementable
5+
6+
## Summary
7+
8+
This Enhancement Proposal introduces the `ObservabilityPolicy` API, which allows Application Developers to define settings related to tracing, metrics, or logging at the HTTPRoute level.
9+
10+
## Goals
11+
12+
- Define the Observability policy.
13+
- Define an API for the Observability policy.
14+
15+
## Non-Goals
16+
17+
- Provide implementation details for implementing the Observability policy.
18+
19+
## Introduction
20+
21+
### Observability Policy
22+
23+
The Observability Policy contains settings to configure NGINX to expose information through tracing, metrics, and/or logging. This is a Direct Policy that is attached to an HTTPRoute by an Application Developer. It works in conjunction with an [NginxProxy](gateway-settings.md) configuration that contains higher level settings to enable Observability at this lower level. The [NginxProxy](gateway-settings.md) configuration is managed by a Cluster Operator.
24+
25+
Since this policy is attached to an HTTPRoute, the Observability settings should just apply to the relevant `location` contexts of the NGINX config for that route.
26+
27+
To begin, the Observability Policy will include the following NGINX directives (focusing on OpenTelemetry tracing):
28+
29+
- [`otel_trace`](https://nginx.org/en/docs/ngx_otel_module.html#otel_trace): enable tracing and set sampler rate
30+
- [`otel_trace_context`](https://nginx.org/en/docs/ngx_otel_module.html#otel_trace_context): export, inject, propagate, ignore.
31+
- [`otel_span_name`](https://nginx.org/en/docs/ngx_otel_module.html#otel_span_name)
32+
- [`otel_span_attr`](https://nginx.org/en/docs/ngx_otel_module.html#otel_span_attr): these span attributes will be merged with any set at the global level in the `NginxProxy` config.
33+
34+
Tracing will be disabled by default. The Application Developer will be able to use this Policy to enable and configure tracing for their routes. This Policy will only be applied if the OTel endpoint has been set by the Cluster Operator on the [NginxProxy](gateway-settings.md).
35+
36+
Ratio and parent-based tracing should be supported as shown in the [nginx-otel examples](https://github.com/nginxinc/nginx-otel?tab=readme-ov-file#examples).
37+
38+
In the future, this config will be extended to support other functionality, such as those defined in the [NGINX Extensions Proposal](nginx-extensions.md#observability).
39+
40+
## API, Customer Driven Interfaces, and User Experience
41+
42+
The `ObservabilityPolicy` API is a CRD that is a part of the `gateway.nginx.org` Group. It is a namespaced resource that will reference an HTTPRoute as its target.
43+
44+
### Go
45+
46+
Below is the Golang API for the `ObservabilityPolicy` API:
47+
48+
```go
49+
package v1alpha1
50+
51+
import (
52+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
53+
gatewayv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2"
54+
)
55+
56+
type ObservabilityPolicy struct {
57+
metav1.TypeMeta `json:",inline"`
58+
metav1.ObjectMeta `json:"metadata,omitempty"`
59+
60+
// Spec defines the desired state of the ObservabilityPolicy.
61+
Spec ObservabilityPolicySpec `json:"spec"`
62+
63+
// Status defines the state of the ObservabilityPolicy.
64+
Status gatewayv1alpha2.PolicyStatus `json:"status,omitempty"`
65+
}
66+
67+
type ObservabilityPolicySpec struct {
68+
// TargetRef identifies an API object to apply the policy to.
69+
// Object must be in the same namespace as the policy.
70+
// Support: HTTPRoute
71+
TargetRef gatewayv1alpha2.PolicyTargetReference `json:"targetRef"`
72+
73+
// Tracing allows for enabling and configuring tracing.
74+
//
75+
// +optional
76+
Tracing *Tracing `json:"tracing,omitempty"`
77+
}
78+
79+
// Tracing allows for enabling and configuring OpenTelemetry tracing.
80+
type Tracing struct {
81+
// Ratio is the percentage of traffic that should be sampled. Integer from 0 to 100.
82+
// By default, 100% of http requests are traced. Not applicable for parent-based tracing.
83+
//
84+
// +optional
85+
Ratio *int32 `json:"ratio,omitempty"`
86+
87+
// Context specifies how to propagate traceparent/tracestate headers. By default is 'ignore'.
88+
//
89+
// +optional
90+
Context *TraceContext `json:"context,omitempty"`
91+
92+
// SpanName defines the name of the Otel span. By default is the name of the location for a request.
93+
// If specified, applies to all locations that are created for a route.
94+
//
95+
// +optional
96+
SpanName *string `json:"spanName,omitempty"`
97+
98+
// SpanAttributes are custom key/value attributes that are added to each span.
99+
//
100+
// +optional
101+
SpanAttributes map[string]string `json:"spanAttributes,omitempty"`
102+
103+
// Enable defines if tracing is enabled, disabled, or parent-based.
104+
Enable TraceType `json:"enable"`
105+
}
106+
107+
// TraceType defines if tracing is enabled.
108+
type TraceType string
109+
110+
const (
111+
// TraceTypeOn enables tracing.
112+
TraceTypeOn TraceType = "on"
113+
114+
// TraceTypeOff disables tracing.
115+
TraceTypeOff TraceType = "off"
116+
117+
// TraceTypeParent enables tracing and only records spans if the parent span was sampled.
118+
TraceTypeParent TraceType = "parent"
119+
)
120+
121+
// TraceContext specifies how to propagate traceparent/tracestate headers.
122+
type TraceContext string
123+
124+
const (
125+
// TraceContextExtract uses an existing trace context from the request, so that the identifiers
126+
// of a trace and the parent span are inherited from the incoming request.
127+
TraceContextExtract TraceContext = "extract"
128+
129+
// TraceContextInject adds a new context to the request, overwriting existing headers, if any.
130+
TraceContextInject TraceContext = "inject"
131+
132+
// TraceContextPropagate updates the existing context (combines extract and inject).
133+
TraceContextPropagate TraceContext = "propagate"
134+
135+
// TraceContextIgnore skips context headers processing.
136+
TraceContextIgnore TraceContext = "ignore"
137+
)
138+
```
139+
140+
### YAML
141+
142+
Below is an example YAML version of an `ObservabilityPolicy`:
143+
144+
```yaml
145+
apiVersion: gateway.nginx.org/v1alpha1
146+
kind: ObservabilityPolicy
147+
metadata:
148+
name: example-observability-policy
149+
namespace: default
150+
spec:
151+
targetRef:
152+
group: gateway.networking.k8s.io
153+
kind: HTTPRoute
154+
name: example-route
155+
sectionName: example-section
156+
tracing:
157+
ratio: 10
158+
context: inject
159+
spanName: example-span
160+
spanAttributes:
161+
attribute1: value1
162+
attribute2: value2
163+
enable: "on"
164+
status:
165+
ancestors:
166+
ancestorRef:
167+
group: gateway.networking.k8s.io
168+
kind: Gateway
169+
name: example-gateway
170+
namespace: default
171+
conditions:
172+
- type: Accepted
173+
status: "True"
174+
reason: Accepted
175+
message: Policy is accepted
176+
```
177+
178+
and the HTTPRoute it is attached to:
179+
180+
```yaml
181+
apiVersion: gateway.networking.k8s.io/v1
182+
kind: HTTPRoute
183+
metadata:
184+
name: example-route
185+
spec:
186+
gatewayClassName: nginx
187+
listeners:
188+
- name: example-section
189+
port: 80
190+
protocol: HTTP
191+
hostname: "*.example.com"
192+
status:
193+
conditions:
194+
...
195+
- type: gateway.nginx.org/ObservabilityPolicyAffected # new condition
196+
status: "True"
197+
reason: PolicyAffected
198+
message: Object affected by an ObservabilityPolicy.
199+
```
200+
201+
### Status
202+
203+
#### CRD Label
204+
205+
According to the [Policy and Metaresources GEP](https://gateway-api.sigs.k8s.io/geps/gep-713/), the `ObservabilityPolicy` CRD must have the `gateway.networking.k8s.io/policy: direct` label to specify that it is a direct policy.
206+
This label will help with discoverability and will be used by the planned Gateway API Policy [kubectl plugin](https://gateway-api.sigs.k8s.io/geps/gep-713/#kubectl-plugin-or-command-line-tool).
207+
208+
#### Conditions/Policy Ancestor Status
209+
210+
According to the [Policy and Metaresources GEP](https://gateway-api.sigs.k8s.io/geps/gep-713/), the `ObservabilityPolicy` CRD must include a `status` stanza with a slice of Conditions.
211+
212+
The `Accepted` Condition must be populated on the `ObservabilityPolicy` CRD using the reasons defined in the [PolicyCondition API](https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1alpha2/policy_types.go). If these reasons are not sufficient, we can add implementation-specific reasons.
213+
214+
One reason for being `not Accepted` would be the fact that the `NginxProxy` Policy is not configured, which is a requirement in order for the `ObservabilityPolicy` to work. This will be a custom reason `NginxProxyConfigNotSet`.
215+
216+
The Condition stanza may need to be namespaced using the `controllerName` if more than one controller could reconcile the Policy.
217+
218+
In the updated version of the [Policy and Metaresources GEP](https://github.com/kubernetes-sigs/gateway-api/pull/2813/files), which is still under review, the `PolicyAncestorStatus` applies to Direct Policies.
219+
[`PolicyAncestorStatus`](https://github.com/kubernetes-sigs/gateway-api/blob/f1758d1bc233d78a3e1e6cfba34336526655d03d/apis/v1alpha2/policy_types.go#L156) contains a list of ancestor resources (usually Gateways) that are associated with the policy, and the status of the policy for each ancestor.
220+
This status provides a view of the resources the policy is affecting. It is beneficial for policies implemented by multiple controllers (e.g., BackendTLSPolicy) or that attach to resources with different capabilities.
221+
222+
#### Setting Status on Objects Affected by a Policy
223+
224+
In the Policy and Metaresources GEP, there's a provisional status described [here](https://gateway-api.sigs.k8s.io/geps/gep-713/#standard-status-condition-on-policy-affected-objects) that involves adding a Condition or annotation to all objects affected by a Policy.
225+
226+
This solution gives the object owners some knowledge that their object is affected by a policy but minimizes status updates by limiting them to when the affected object starts or stops being affected by a policy.
227+
Even though this status is provisional, implementing it now will help with discoverability and allow us to give feedback on the solution.
228+
229+
Implementing this involves defining a new Condition type and reason:
230+
231+
```go
232+
package conditions
233+
234+
import (
235+
gatewayv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2"
236+
)
237+
238+
239+
const (
240+
ObservabilityPolicyAffected gatewayv1alpha2.PolicyConditionType = "gateway.nginx.org/ObservabilityPolicyAffected"
241+
PolicyAffectedReason gatewayv1alpha2.PolicyConditionReason = "PolicyAffected"
242+
)
243+
244+
```
245+
246+
NGINX Gateway Fabric must set this Condition on all HTTPRoutes affected by an `ObservabilityPolicy`.
247+
Below is an example of what this Condition may look like:
248+
249+
```yaml
250+
Conditions:
251+
Type: gateway.nginx.org/ObservabilityPolicyAffected
252+
Message: Object affected by a ObservabilityPolicy.
253+
Observed Generation: 1
254+
Reason: PolicyAffected
255+
Status: True
256+
```
257+
258+
Some additional rules:
259+
260+
- This Condition should be added when the affected object starts being affected by a `ObservabilityPolicy`.
261+
- When the last `ObservabilityPolicy` affecting that object is removed, the Condition should be removed.
262+
- The Observed Generation is the generation of the affected object, not the generation of the `ObservabilityPolicy`.
263+
264+
## Attachment
265+
266+
An `ObservabilityPolicy` can be attached to an HTTPRoute.
267+
268+
The policy will only take effect if an [NginxProxy](gateway-settings.md) configuration has been linked to the GatewayClass. Otherwise, the `ObservabilityPolicy` should not be `Accepted`.
269+
270+
Future: Attached to an HTTPRoute rule, using a [sectionName](https://gateway-api.sigs.k8s.io/geps/gep-713/#apply-policies-to-sections-of-a-resource).
271+
272+
### Creating the Effective Policy in NGINX Config
273+
274+
To determine how to reliably and consistently create the effective policy in NGINX config, we need to apply the policies for each attachment scenario to the three NGINX mappings described [here](/docs/developer/mapping.md).
275+
276+
The following examples use the `ClientSettingsPolicy`, but the rules are the same for the `ObservabilityPolicy`.
277+
278+
A. Distinct Hostname:
279+
![example-a2](/docs/images/client-settings/example-a2.png)
280+
281+
B. Same Hostname:
282+
![example-b2](/docs/images/client-settings/example-b2.png)
283+
284+
C. Internal Redirect
285+
![example-c2](/docs/images/client-settings/example-c2.png)
286+
287+
For this attachment scenario, specifying the directives in the _final_ location blocks generated from the HTTPRoute with the policy attached achieves the effective policy. _Final_ means the location that ultimately handles the request.
288+
289+
## Use Cases
290+
291+
- As an Application Developer, I want to enable observability -- such as tracing -- for traffic flowing to my application, so I can easily debug issues or understand the use of my application.
292+
293+
## Testing
294+
295+
- Unit tests
296+
- Functional tests that verify the attachment of the CRD to a Route, and that NGINX behaves properly based on the configuration. This includes verifying tracing works as expected.
297+
298+
## Security Considerations
299+
300+
Validating all fields in the `ObservabilityPolicy` is critical to ensuring that the NGINX config generated by NGINX Gateway Fabric is correct and secure.
301+
302+
All fields in the `ObservabilityPolicy` will be validated with Open API Schema. If the Open API Schema validation rules are not sufficient, we will use [CEL](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-rules).
303+
304+
RBAC via the Kubernetes API server will ensure that only authorized users can update the CRD.
305+
306+
## Alternatives
307+
308+
- Combine with OTel settings in `NginxProxy` for one OTel Policy: Rather than splitting tracing across two Policies, we could create a single tracing Policy. The issue with this approach is that some tracing settings -- such as exporter endpoint -- should be restricted to Cluster Operators, while settings like attributes should be available to Application Developers. If we combine these settings, RBAC will not be sufficient to restrict access across the settings. We will have to disallow certain fields based on the resource the Policy is attached to. This is a bad user experience.
309+
- Inherited Policy: An Inherited Policy would be useful if there is a use case for the Cluster Operator enforcing or defaulting the OTel tracing settings included in this policy.
310+
311+
312+
## References
313+
314+
- [NGINX Extensions Enhancement Proposal](nginx-extensions.md)
315+
- [Policy and Metaresources GEP](https://gateway-api.sigs.k8s.io/geps/gep-713/)
316+
- [Kubernetes API Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md)

0 commit comments

Comments
 (0)