Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PodSecurity] Update monitoring proposal #2990

Merged
merged 3 commits into from
Oct 5, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[PodSecurity] Update monitoring proposal
  • Loading branch information
tallclair committed Sep 23, 2021
commit c85d1e9abb9e4d812b91b391cffc0d1ee81ecb6b
52 changes: 44 additions & 8 deletions keps/sig-auth/2579-psp-replacement/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -598,31 +598,67 @@ coverage of unit tests.

### Monitoring

A single metric will be added to track policy evaluations against pods and [templated pods].
[Namespace evaluations](#namespace-policy-update-warnings) are not counted.
Three metrics will be introduced:

```
pod_security_evaluations_total
```

This metric will be added to track policy evaluations against pods and [templated pods].
[Namespace evaluations](#namespace-policy-update-warnings) are not counted.
The metric will only be incremented when the policy check is actually performed. In other words,
this metric will not be incremented if any of the following are true:

- Ignored resource types, subresources, or workload resources without a pod template
- Update requests that are out of scope (see [Updates](#updates) above)
- Exempt requests
tallclair marked this conversation as resolved.
Show resolved Hide resolved
- Errors that make policy evaluation impossible
tallclair marked this conversation as resolved.
Show resolved Hide resolved

The metric will use the following labels:

1. `decision {exempt, allow, deny, error}` - The policy decision. Error is reserved for panics or
other errors in policy evaluation. Update requests that are out of scope (see [Updates](#updates)
above) are not counted.
1. `decision {allow, deny}` - The policy decision. `allow` is only recorded with `enforce` mode.
tallclair marked this conversation as resolved.
Show resolved Hide resolved
3. `policy_level {privileged, baseline, restricted}` - The policy level that the request was
evaluated against.
4. `policy_version {v1.X, v1.Y, latest, future}` - The policy version that was used for the evaluation.
Explicit versions less than or equal to the build of the API server or webhook are recorded in the form `v1.x` (e.g. `v1.22`).
Explicit versions greater than the build of the API server or webhook (which are evaluated as `latest`) are recorded as `future`.
Explicit use of the `latest` version or implicit use by omitting a version or specifying an unparseable version will be recorded as `latest`.
5. `mode {enforce, warn, audit}` - The type of evaluation mode being recorded. Note that a single
request can increment this metric 3 times, once for each mode. If this admission controller is
enabled, every every create request and in-scope update request will at least increment the
`enforce` total. Privileged evaluations for warn and audit modes are not counted.
request can increment this metric 3 times, once for each mode. `audit` and `warn` mode metrics
are only incremented for violations. If this admission controller is enabled, every
evaluated request will at least increment the `enforce` total.
Comment on lines +627 to +629
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if someone wanted to figure out the proportion of allowed/denied audit or warn requests, they'd now have to compare the number of denied audit or warn requests to the total number of mode=enforce requests, right? that could be ok, but is non-obvious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We could have a separate metric for tracking total evaluations, but that seems unnecessary. I agree it's non-obvious, but maybe it's something we can just add to the playbook...

6. `request_operation {create, update}` - The operation of the request being checked.
7. `resource {pod, controller}` - Whether the request object is a Pod, or a [templated
pod](#podtemplate-resources) resource.
8. `subresource {ephemeralcontainers}` - The subresource, when relevant & in scope.

```
pod_security_exemptions_total
tallclair marked this conversation as resolved.
Show resolved Hide resolved
```

This metric will be added to track requests that are considered exempt. Ignored resources and out of
scope requests do not count towards the total. Errors encountered before the exemption logic will
not be counted as exempt.

The metric will use the following labels. The definitions match from the above label definitions.

1. `request_operation {create, update}`
2. `resource {pod, controller}`
3. `subresource {ephemeralcontainers}`

```
pod_security_errors_total
```

This metric will be added to track errors encountered during request evaluation.

The metric will use the following labels. The definitions match from the above label definitions.

1. `fatal {true, false}` - Whether the error prevented evaluation (short-circuit deny). If
`fatal=false` then the latest restricted profile may be used to evaluate the pod.
tallclair marked this conversation as resolved.
Show resolved Hide resolved
2. `request_operation {create, update}`
3. `resource {pod, controller}`
4. `subresource {ephemeralcontainers}`

tallclair marked this conversation as resolved.
Show resolved Hide resolved
### Audit Annotations

Expand Down