[PodSecurity] Update monitoring proposal

kubernetes · k8s-ci-robot · Oct 5, 2021 · Sep 23, 2021 · Sep 23, 2021 · Oct 5, 2021
commit c85d1e9abb9e4d812b91b391cffc0d1ee81ecb6b
diff --git a/keps/sig-auth/2579-psp-replacement/README.md b/keps/sig-auth/2579-psp-replacement/README.md
@@ -598,31 +598,67 @@ coverage of unit tests.
 
 ### Monitoring
 
-A single metric will be added to track policy evaluations against pods and [templated pods].
-[Namespace evaluations](#namespace-policy-update-warnings) are not counted.
+Three metrics will be introduced:
 
 ```
 pod_security_evaluations_total
 ```
 
+This metric will be added to track policy evaluations against pods and [templated pods].
+[Namespace evaluations](#namespace-policy-update-warnings) are not counted.
+The metric will only be incremented when the policy check is actually performed. In other words,
+this metric will not be incremented if any of the following are true:
+
+- Ignored resource types, subresources, or workload resources without a pod template
+- Update requests that are out of scope (see [Updates](#updates) above)
+- Exempt requests
+- Errors that make policy evaluation impossible
+
 The metric will use the following labels:
 
-1. `decision {exempt, allow, deny, error}` - The policy decision. Error is reserved for panics or
-   other errors in policy evaluation. Update requests that are out of scope (see [Updates](#updates)
-   above) are not counted.
+1. `decision {allow, deny}` - The policy decision. `allow` is only recorded with `enforce` mode.
 3. `policy_level {privileged, baseline, restricted}` - The policy level that the request was
    evaluated against.
 4. `policy_version {v1.X, v1.Y, latest, future}` - The policy version that was used for the evaluation.
    Explicit versions less than or equal to the build of the API server or webhook are recorded in the form `v1.x` (e.g. `v1.22`).
    Explicit versions greater than the build of the API server or webhook (which are evaluated as `latest`) are recorded as `future`.
    Explicit use of the `latest` version or implicit use by omitting a version or specifying an unparseable version will be recorded as `latest`.
 5. `mode {enforce, warn, audit}` - The type of evaluation mode being recorded. Note that a single
-   request can increment this metric 3 times, once for each mode. If this admission controller is
-   enabled, every every create request and in-scope update request will at least increment the
-   `enforce` total. Privileged evaluations for warn and audit modes are not counted.
+   request can increment this metric 3 times, once for each mode. `audit` and `warn` mode metrics
+   are only incremented for violations. If this admission controller is enabled, every
+   evaluated request will at least increment the `enforce` total.
 6. `request_operation {create, update}` - The operation of the request being checked.
 7. `resource {pod, controller}` - Whether the request object is a Pod, or a [templated
    pod](#podtemplate-resources) resource.
+8. `subresource {ephemeralcontainers}` - The subresource, when relevant & in scope.
+
+```
+pod_security_exemptions_total
+```
+
+This metric will be added to track requests that are considered exempt. Ignored resources and out of
+scope requests do not count towards the total. Errors encountered before the exemption logic will
+not be counted as exempt.
+
+The metric will use the following labels. The definitions match from the above label definitions.
+
+1. `request_operation {create, update}`
+2. `resource {pod, controller}`
+3. `subresource {ephemeralcontainers}`
+
+```
+pod_security_errors_total
+```
+
+This metric will be added to track errors encountered during request evaluation.
+
+The metric will use the following labels. The definitions match from the above label definitions.
+
+1. `fatal {true, false}` - Whether the error prevented evaluation (short-circuit deny). If
+   `fatal=false` then the latest restricted profile may be used to evaluate the pod.
+2. `request_operation {create, update}`
+3. `resource {pod, controller}`
+4. `subresource {ephemeralcontainers}`
 
 ### Audit Annotations