Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify custom resource state metrics API using CEL #2059

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions docs/design/simplify-custom-resource-metrics-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# Kube-State-Metrics - Simplify Custom Resource State Metrics API Proposal


---

Author: Catherine Fang (CatherineF-dev@), Han Kang (logicalhan@)

Date: 7. May 2023

Target release: v

---


## Glossary
- CR: custom resource, similar to an instance of a class
- CRD: custom resource definition, similar to a class

## Problem Statement

### Background
Current [Custom Resource State Metrics](https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md#multiple-metricskitchen-sink) supports 8+ operations to extract metric value and labels from custom resource.
- each
- path
- labelFromKey
- labelsFromPath
- valueFrom
- commonLabels
- labelsFromPath
- *.
- ...

### Problem
1. Custom resource metrics API isn't scalable and it's a little hard to maintain.
1.1 The maintaining work is O(8) and there are several bugs around these 8 operations. For example, Crash on nonexistent metric paths in custom resources (#1992).
1.2 More additional operations might be added to satisfy other needs.
2. Custom resource metrics API with existing 8 operations is not complete, which means some cases aren't covered. For example, it doesn't support querying number of CRs under one CRD.

## Goal

- Simplify 8 operations into one operation to reduce maintaining work.
- A complete API, so that can support more cases. For example, querying number of CRs under one CRD.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure what a complete API means here? AFAICT the proposal targets redefining a subset of configuration constructs, specifically ones that work with path resolutions. Is that not the case?


## Proposal

Use common expression language ([cel](https://kubernetes.io/docs/reference/using-api/cel/)) to extract fields from custom resource as metric labels or metric value.


```
kind: CustomResourceStateMetricsV2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kind: CustomResourceStateMetricsV2
kind: CustomResourceStateMetrics
apiVersion: v2

I'd propose to use the common versioning keys :-) We could even provide conversion from the old to the new version if we want

spec:
resources:
- groupVersionKind:
group: myteam.io
kind: "Foo"
version: "v1"
mode: for_loop # or merged
metrics:
- name: "ready_count"
help: "Number Foo Bars ready"
values: x.cel_selection_1 // [2, 4]
labels:
- x.cel_selection_2 // [{"cr_name": "bar"}], it will be copied into 2 same items
- x.cel_selection_3 // [{active": 1}, {"active": 3}]
- x.cel_selection_4 // [{"name": "type-a"}, {"name": "type-b"}]
```

Mode has two options:
- for_loop: it assigns x to each CR.
- merged: it assigns x to the merged CR of all CRs under one CRD. x := {"cr_name_foo": cr1, "cr_name_bar": cr2, ...}. It can count number of CRs under one CRD.

In this example (mode: for_loop), x is one CR under CRD (myteam.io/v1 Foo).
Assume it has N CRs under this CRD, it will generate these metrics:
- ready_count{cr_name=cr_1, active=1, name=type-a} = 2
- ready_count{cr_name=cr_1, active=3, name=type-b} = 4
- ...
- ready_count{cr_name=cr_n, active=2, name=type-c} = 5
- ready_count{cr_name=cr_n, active=3, name=type-d} = 6

### Mapping between existing operations and CEL

| Existing operation | CEL |
| :--- | :--- |
| path: [status, sub] \n labelFromKey: type | x.status.sub.map(y, {"name": y}) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what's x here? Is this a representation for all CRs that come under the specified GVK?

| path: [status, sub] \n valueFrom: [ready] | x.status.sub.map(y, x.status.sub[y].ready) |
| commonLabels: \n custom_metric: "yes" | [{ "custom_metric":"yes" }] |
| labelsFromPath: "*": [metadata, labels] | [x.metadata.labels] |
| labelsFromPath \n foo: [metadata, labels, foo] | [{'name': x.metadata.name}] |

## Example
### CR
```
kind: Foo
apiVersion: myteam.io/vl
metadata:
annotations:
bar: baz
qux: quxx
labels:
foo: bar
name: foo
spec:
version: v1.2.3
order:
- id: 1
value: true
- id: 3
value: false
replicas: 1
status:
phase: Pending
active:
type-a: 1
type-b: 3
conditions:
- name: a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have a configuration example using CEL that loops over this list and generates the following metrics?

ready_count{phase="Pending", condition="a"}  45
ready_count{phase="Pending", condition="b"}  66

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add on to what Manuel asked, would it be possible to reproduce this example using CEL?

value: 45
- name: b
value: 66
sub:
type-a:
active: 1
ready: 2
type-b:
active: 3
ready: 4
uptime: 43.21
```

### Existing API - CustomResourceStateMetrics
```
kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: myteam.io
kind: "Foo"
version: "v1"
# labels can be added to all metrics from a resource
commonLabels:
crd_type: "foo"
labelsFromPath:
name: [metadata, name]
metrics:
- name: "ready_count"
help: "Number Foo Bars ready"
each:
type: Gauge
gauge:
# targeting an object or array will produce a metric for each element
# labelsFromPath and value are relative to this path
path: [status, sub]

# if path targets an object, the object key will be used as label value
# This is not supported for StateSet type as all values will be truthy, which is redundant.
labelFromKey: type
# label values can be resolved specific to this path
labelsFromPath:
active: [active]
# The actual field to use as metric value. Should be a number, boolean or RFC3339 timestamp string.
valueFrom: [ready]
commonLabels:
custom_metric: "yes"
labelsFromPath:
# whole objects may be copied into labels by prefixing with "*"
# *anything will be copied into labels, with the highest sorted * strings first
"*": [metadata, labels]
"**": [metadata, annotations]

# or specific fields may be copied. these fields will always override values from *s
name: [metadata, name]
foo: [metadata, labels, foo]
```

### Proposed API - CustomResourceStateMetricsV2
```
kind: CustomResourceStateMetricsV2
spec:
resources:
- groupVersionKind:
group: myteam.io
kind: "Foo"
version: "v1"
mode: for_loop # or merged
metrics:
- name: "ready_count"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the metrics type defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume we'll still need fields with hardcoded values, eg., type.

Copy link
Member

@rexagod rexagod Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting here that we still have non-path-based constructs, like so:

  • type (used to check OpenMetrics conformance, and if value field is applicable)
  • path (relative paths when used with labelsFrom*, these can be CEL expressions with the following subsections)
    • non-relative
    • relative
    • multi-relative (relative to different paths) (KSM does not support this right now)
  • commonLabels
  • list

Something we might want to persue under the configuration revamp is to keep such constructs to a minimum (and try to replace most of these using CEL, wherever possible), as we may find ourselves implementing custom logic and introducing complexity as such constructs grow over time.

help: "Number Foo Bars ready"
values: x.status.sub.map(y, x.status.sub[y].ready) # a cel query. jq '[.status.sub[].ready]', valueFrom: [ready] // [2,4]
labels:
- x.status.sub.map(y, {"name": y}) # a cel query. jq '[ .status.sub | keys | .[] | {name: .}]', labelFromKey: type // [{"name": "type-a"}, {"name": "type-b"}]
- [{ "custom_metric":"yes" }] # a cel query. jq '[{ custom_metric:"yes" }]', custom_metric: "yes" // [{custom_metric="yes"}]
- [x.metadata.labels] # a cel query. jq '[.metadata.labels]', "*": [metadata, labels] // [{"foo": "bar"}]
- [x.metadata.annotations] # a cel query. jq '[.metadata.annotations]', "**": [metadata, annotations] // [{"bar": "baz","qux": "quxx"}]
- [{'name': x.metadata.name}] # a cel query. jq '[{ name: .metadata.name }]', name: [metadata, name] // [{"name": "foo"}]
- [{'foo': x.metadata.labels.foo}] # a cel query. jq '[{ foo: .metadata.labels.foo }]' # foo: [metadata, labels, foo] // [{foo": "bar"}]
- [x.status.sub.map(y, {"active": x.status.sub[y].active})] # a cel query. jq '[.status.sub[].active | {active: .}]',labelsFromPath: active: [active] // [{active": 1}, {"active": 3}]
```