Simplify custom resource state metrics API

kubernetes · May 7, 2023 · 0e9c7c1 · 0e9c7c1
1 parent eb45f33
commit 0e9c7c1
Showing 1 changed file with 142 additions and 0 deletions.
diff --git a/docs/design/simplify-custom-resource-metrics-api.md b/docs/design/simplify-custom-resource-metrics-api.md
@@ -0,0 +1,142 @@
+# Kube-State-Metrics - Simplify Custom Resource State Metrics API Proposal
+
+
+---
+
+Author: Catherine Fang (CatherineF-dev@), Han Kang (logicalhan@)
+
+Date: 7. May 2023
+
+Target release: v
+
+---
+
+
+## Glossary
+- CR: custom resource, similar to an instance of a class
+- CRD: custom resource definition, similar to a class
+
+## Problem Statement
+
+### Background
+Current [Custom Resource State Metrics](https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md#multiple-metricskitchen-sink) supports 8+ operations to extract metric value and labels from custom resource.
+- each
+- path
+- labelFromKey
+- labelsFromPath
+- valueFrom
+- commonLabels
+- labelsFromPath
+- *.
+- ...
+
+### Problem 
+1. Custom resource metrics API isn't scalable and it's a little hard to maintain. 
+  1.1 The maintaining work is O(8) and there are several bugs around these 8 operations. For example, Crash on nonexistent metric paths in custom resources (#1992).
+  1.2 More additional operations might be added to satisfy other needs.
+2. Custom resource metrics API with existing 8 operations is not complete, which means some cases aren't covered. For example, it doesn't support querying number of CRs under one CRD.
+
+## Goal
+
+- Simplify 8 operations into one operation to reduce maintaining work.
+- A complete API, so that can support more cases. For example, querying number of CRs under one CRD.
+
+## Proposal
+
+Use common expression language ([cel](https://kubernetes.io/docs/reference/using-api/cel/)) to extract fields from custom resource as metric labels or metric value.
+
+
+```
+kind: CustomResourceStateMetricsV2
+spec:
+  resources:
+    - groupVersionKind:
+        group: myteam.io
+        kind: "Foo"
+        version: "v1"
+      mode: for_loop # or merged
+      metrics:
+        - name: "ready_count"
+          help: "Number Foo Bars ready"
+          values:  x.cel_selection_1 // [2, 4]
+          labels:
+          - x.cel_selection_2 // [{"cr_name": "bar"}], it will be copied into 2 same items
+          - x.cel_selection_3 // [{active": 1}, {"active": 3}]
+          - x.cel_selection_4 // [{"name": "type-a"}, {"name": "type-b"}]
+```
+
+Mode has two options:
+- for_loop: it assigns x to each CR.
+- merged: it assigns x to the merged CR of all CRs under one CRD. x := {"cr_name_foo": cr1, "cr_name_bar": cr2, ...}. It can count number of CRs under one CRD.
+
+In this example (mode: for_loop), x is one CR under CRD (myteam.io/v1 Foo).
+Assume it has N CRs under this CRD, it will generate these metrics:
+- ready_count{cr_name=cr_1, active=1, name=type-a} = 2
+- ready_count{cr_name=cr_1, active=3, name=type-b} = 4
+- ...
+- ready_count{cr_name=cr_n, active=2, name=type-c} = 5
+- ready_count{cr_name=cr_n, active=3, name=type-d} = 6
+
+## Example
+### CR
+```
+kind: Foo
+apiVersion: myteam.io/vl
+metadata:
+    annotations:
+        bar: baz
+        qux: quxx
+    labels:
+        foo: bar
+    name: foo
+spec:
+    version: v1.2.3
+    order:
+        - id: 1
+          value: true
+        - id: 3
+          value: false
+    replicas: 1
+status:
+    phase: Pending
+    active:
+        type-a: 1
+        type-b: 3
+    conditions:
+        - name: a
+          value: 45
+        - name: b
+          value: 66
+    sub:
+        type-a:
+            active: 1
+            ready: 2
+        type-b:
+            active: 3
+            ready: 4
+    uptime: 43.21
+```
+### CustomResourceStateMetricsV2
+```
+kind: CustomResourceStateMetricsV2
+spec:
+  resources:
+    - groupVersionKind:
+        group: myteam.io
+        kind: "Foo"
+        version: "v1"
+      mode: for_loop # or merged
+      metrics:
+        - name: "ready_count"
+          help: "Number Foo Bars ready"
+          values: x.status.sub.map(y, x.status.sub[y].ready) # a cel query. jq '[.status.sub[].ready]', valueFrom: [ready] // [2,4]
+          labels:
+          - x.status.sub.map(y, {"name": y}) # a cel query. jq '[ .status.sub | keys | .[] | {name: .}]', labelFromKey: type // [{"name": "type-a"}, {"name": "type-b"}]
+          - [{ "custom_metric":"yes" }] # a cel query. jq '[{ custom_metric:"yes" }]', custom_metric: "yes" // [{custom_metric="yes"}]
+          - [x.metadata.labels] # a cel query. jq '[.metadata.labels]', "*": [metadata, labels] // [{"foo": "bar"}]
+          - [x.metadata.annotations] # a cel query. jq '[.metadata.annotations]', "**": [metadata, annotations] // [{"bar": "baz","qux": "quxx"}]
+          - [{'name': x.metadata.name}] # a cel query. jq '[{ name: .metadata.name }]', name: [metadata, name] // [{"name": "foo"}]
+          - [{'foo': x.metadata.labels.foo}] # a cel query. jq '[{ foo: .metadata.labels.foo }]' # foo: [metadata, labels, foo] // [{foo": "bar"}]
+          - [x.status.sub.map(y, {"active": x.status.sub[y].active})] # a cel query. jq '[.status.sub[].active | {active: .}]',labelsFromPath:  active: [active] // [{active": 1}, {"active": 3}]
+```
+