KEP 4595: CEL for CRD AdditionalPrinterColumns #4602

sreeram-venkitesh · 2024-04-27T18:38:57Z

One-line PR description: Provide a more powerful way to express human readable printer columns exposed by CRDs than simple JSON paths.

Issue link: https://kep.k8s.io/4595

Other comments:

sftim

It's a nice idea. Thanks for suggesting it.

Here's some feedback on the draft.

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/README.md

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/kep.yaml

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/README.md

k8s-triage-robot · 2024-09-01T11:28:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sreeram-venkitesh · 2024-09-01T16:37:26Z

/remove-lifecycle stale

k8s-triage-robot · 2024-11-30T17:13:14Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-12-30T17:45:59Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sreeram-venkitesh · 2025-01-02T13:09:57Z

/remove-lifecycle rotten

k8s-triage-robot · 2025-04-02T14:04:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sreeram-venkitesh · 2025-04-12T18:38:05Z

/remove-lifecycle stale

Priyankasaggu11929 · 2025-05-27T07:25:23Z

@cici37 @jpbetz - we have refreshed the KEP documentation with the current content. There're a few TODO blocks - we will push them by this week itself, but overall it's complete for first review.

Could you please add the lead-opted-in label for this enhancement to be tracked for 1.34 cycle. Thanks!

jpbetz · 2025-05-28T17:32:01Z

/milestone v1.34
/label lead-opted-in

sreeram-venkitesh · 2025-05-28T18:14:09Z

Thank you @jpbetz! I think we added the lead-opted-in in the PR instead of the KEP issue though.

jpbetz · 2025-05-28T18:18:46Z

Thank you @jpbetz! I think we added the lead-opted-in in the PR instead of the KEP issue though.

Oops. I've tagged the issue now too.

sreeram-venkitesh · 2025-05-30T17:41:58Z

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/README.md

+When running the benchmark tests individually, we observed that CEL was consistently ~20-50x slower than JSONPath. 
+
+Based on these current findings, the end users should not find a noticeable difference in the performance when working with CEL for additionalPrinterColumns.
+
+<!--
+
+Hence, the end users might start seeing a non-negligible increase in the total time and CPU usage for apiserver when they are working with a large number of Custom Resources. 
+
+-->


@jpbetz, @cici37: Our KEP PR is ready for review!

We've done a performance analysis by running benchmark tests to measure the time taken for both CEL and JSONPath additionalPrinterColumns for the following:

Time taken for TableConvertor.New(), which has JSONPath parsing and CEL compilation

Time taken for convertor.ConvertToTable(), which has FindResults() and PrintResults()

Based on the numbers we are seeing, we assume that overall compilation times for CEL based additionalPrinterColumns might become noticeably longer than JSONPath if there are a large number of CEL based additionalPrinterColumns. We haven't ran our benchmark tests with CRDs with a large number of columns or large number of custom resources to confirm what the impact would be. What can be considered as a normal number of custom resources and additionalPrinterColumns? Also are 1000s of custom resources being in use common? (As this might impact the performance of FindResults and PrintResults)

Also, could you help us with coming up with a value for perCallLimit for the CEL compilation here? We've compared the time of compilation and evaluation for both CEL and JSONPath but we couldn't conclusively decide a value for the perCallLimit.

Thanks in advance!

The cel expressions can be compiled once when the CRD is first registered (to validate them), and then again when the CRD is loaded.

When serving read requests, the already compiled cel expressions can then be used.

For benchmarks, I only care about the read request latency.

I think the next step is to:

Benchmark an expensive JSON Path additionalPrinterColumns operation (just the part that finds a value using the JSON Path library)

Assume CEL operations take about 50ns each, divide to figure out how many CEL operations we could do in no more time than the JSON Path case would have taken

Sanity benchmark of some CEL evaluations (not compilations) to make sure our latency numbers are good

I don't think the cel compilation time would be a big concern in this feature either. The concern is focused on using cel in read path VS the existing JSON path and if there would be a significant latency added.

Yup, we're compiling the expression first at CRD validation and then again inside TableConvertor.New(), which I believe is happening as part of CRD loading.

We have ran benchmarks for the FindResults() and PrintResults() calls, which would give us the read request latencies for both CEL and JSONPath. We'll update the conclusion of the benchmarking in the KEP to highlight this. Both are in the μs range with CEL being around 15 times higher (0.8μs vs 13.5μs).

Thanks for the data!
Regarding with the limit, we could always begin with a more strict limit/smaller number and loose it later if it's absolutely necessary. For the benchmark result, looks like on average cel is ~20-50 times slower on provided expression. Do we have a estimation on the latency it's added with different numbers of expressions? Just wanna mention that user would be able to write more expensive expression with CEL after it's introduced. Since it would be only run while converting to table, I think 10ms could be a start point. However, we may want to limit the number of expressions, estimation cost and overall cost there as well based on the analysis.

Do we have a estimation on the latency it's added with different numbers of expressions?

@cici37 – do you mean to benchmark a scenario where there are multiple additional printer columns defined using CEL expressions, and record how much latency it adds?

At the moment, no, we don't have these benchmark stats, but we can add these additional scenarios and share data, but confirming if I understood the requirement correctly. Thanks!

because printer columns runs on every item in a list, and there can be thousands of items in a list response, 20-50x performance costs do not seem reasonable to me

Thanks for pointing out! The large list read is common and I agree that having the latency added over 20 times does not make sense. If we add timeout and fallback to not showing columns with cel expression when it takes too long, would it worth considering?

failure modes of non-deterministic list responses seems pretty questionable...

I'd expect to prereq getting CEL performance at least close to parity with jsonPath cost for this

BenTheElder · 2025-06-04T19:01:43Z

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/README.md

+Considering the following 10 iterations of the `Benchmark_CEL` and `Benchmark_JSONPath` tests:
+
+```
+  CEL Benchmark Results (10 runs):


We should be clear that this also includes compiling CEL, which shouldn't happen in the hot loop in production.

I see the data below that splits these out, we should be clear about that.

We should be clear that this also includes compiling CEL, which shouldn't happen in the hot loop in production.

Ack. will clarify above in the further commits.

BenTheElder · 2025-06-04T19:03:11Z

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/README.md

+-->
+
+<!-- - <test>: <link to test coverage> -->
+We will test all cases in integration test and unit test. If needed, we can add e2e tests before beta graduation. We are planning to extend the existing [e2e tests for CRDs](https://github.com/kubernetes/kubernetes/blob/3df3b83226530bda69ffcb7b4450026139b2cd11/test/e2e/apimachinery/custom_resource_definition.go).


SGTM, this should be fully testable with integration and unit, but we might as well extend the existing CRD tests while we're at it.

BenTheElder · 2025-06-04T19:06:03Z

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/README.md

+
+This feature will not impact rollouts or already-running workloads.
+
+###### What specific metrics should inform a rollback?


Possibly api-server load? On the off-chance this is misbehaving and increases resource usage, since that's pretty much the only plausible failure mode here.

Ack. Will update the above PRR question with a pointer to api-server load metrics. Thanks!

@BenTheElder – 6cf3aa1 address the metrics related PRR question. Please review. Thanks!

BenTheElder · 2025-06-04T19:13:09Z

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/README.md

+The following table provides an average performance analysis across CEL and JSONPath based additionalPrinterColumns:
+
+
+|                                      | CEL (Benchmark_CEL)                                                                                                   | JSONPath (Benchmark_JSONPath)                                                                                   |


Instead of post-processing the data, it might be better to have separate benchmarks for compile versus runtime.

Also, what hardware / architecture was this run on?

Instead of post-processing the data, it might be better to have separate benchmarks for compile versus runtime.

Ack. We are working on benchmarking more expensive use cases per above thread's discussion.
Will address adding separate compilation and runtime benchmark stats as part of the same update.

Also, what hardware / architecture was this run on?

The current benchmarks stats are from an arm64 machine (Apple M3 Pro with 12 cores, 18 GB RAM).

We also have access to an x86_64 machine (11th Gen Intel Core i7-11800H × 16 cores, 48 GB RAM) if further stats needed.

jpbetz · 2025-06-05T18:28:19Z

/approve
For PRR for alpha
For beta we will need to fully nail down the CEL cost limits, but for alpha, that's not a blocking PRR concern. Please continue the benchmark discussions though.

@cici37 Leaving LGTM with you on this one. Once you're satisfied with the details please tag to merge for 1.34. I'm OK with the high level.

Priyankasaggu11929 · 2025-06-09T13:53:50Z

(I was out the whole of last week for family reasons. Just getting back to work today)

Thank you so much everyone, for helping with the review.
We are working on addressing the remaining items, and will update the PR soon.

Priyankasaggu11929 · 2025-06-18T04:22:04Z

Folks, thank you all for the discussion during the last SIG API Machinery call (June 11, 2025).

@cici37 @jpbetz @deads2k – following up, since the KEP documentation can't be merged as implementable in its current state until we get improved benchmark results for CEL (equivalent or at most 2x to the JSONPath cost).
Are there any further improvements we can make to the KEP documentation to merge as provisional for the time being while we experiment optimizing the performance of our PoC further? Thanks!

jpbetz · 2025-07-10T15:31:41Z

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/README.md

+#### Alpha
+
+- Feature implemented behind a feature flag
+- Initial benchmarks to compare performance of JSONPath with CEL columns and set an appropriate CEL cost


Update this to reflect the multiplier agreed on with the SIG (was it 2.5x ?)

jpbetz · 2025-07-10T15:32:53Z

keps/sig-api-machinery/4595-cel-crd-additionalprintercolumns/kep.yaml

+  - "@jpbetz"
+creation-date: "2024-04-26"
+last-updated: "v1.34"
+status: implementable


We COULD merge this as provisional. If we do that, the best thing to do would be to include or link to the analysis done so far so this KEP serves as a good reference for the performance challenges encountered so far.

to include or link to the analysis done

do you mean that we include links to the following gists, containing the benchmark test definition and results?

https://gist.github.com/sreeram-venkitesh/f4aff1ae7957a5a3b9c6c53e869b7403

https://gist.github.com/Priyankasaggu11929/43cc9ece4d6215ee4cfe0d1523a919d6

Or a link to a fork of k/k which contains the changes from our PoC of this KEP?

@jpbetz, we have addressed all open review feedback comments in the latest 2 commits, for merging the KEP in provisional state.

Please review. Thanks!

jpbetz · 2025-08-01T15:04:57Z

keps/prod-readiness/sig-api-machinery/4595.yaml

@@ -0,0 +1,3 @@
+kep-number: 4595
+alpha:
+  approver: "@jpbetz"


Let's remove this PRR approval file from this PR since we're in provisional status.

addressed in recent commit.

jpbetz · 2025-08-04T19:23:05Z

/approve
/lgtm
As provisional

k8s-ci-robot · 2025-08-04T19:23:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, sreeram-venkitesh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-api-machinery/OWNERS~~ [jpbetz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Apr 27, 2024

k8s-ci-robot requested review from apelisse and sttts April 27, 2024 18:39

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 27, 2024

sreeram-venkitesh mentioned this pull request Apr 27, 2024

CEL for CRD AdditionalPrinterColumns #4595

Open

4 tasks

sftim reviewed May 7, 2024

View reviewed changes

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 3, 2024

joelanford mentioned this pull request Jul 8, 2024

✨ [WIP] Add support for kubectl printer columns operator-framework/operator-controller#1019

Closed

4 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 30, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 30, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 2, 2025

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2025

sreeram-venkitesh added 4 commits May 26, 2025 18:38

Added basic KEP template for KEP 4595

d3bc62c

Review comments from Tim

7c60004

Added design details, PRR questionnaire

252fa57

Cleaned up toc

d5cb5bd

Priyankasaggu11929 added 3 commits May 26, 2025 19:22

add sections: proposal, user stories, caveats, risks and mitigations

320a389

update reviewers, approvers, latest-milestone and stage

97fabfb

add PRR file

1af582b

Priyankasaggu11929 force-pushed the 4595-cel-crd-additionalprintercolumns branch from 4643db3 to 1af582b Compare May 26, 2025 14:00

Added output for code snippets, example for risks

31ba7b1

k8s-ci-robot added this to the v1.34 milestone May 28, 2025

k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label May 28, 2025

Priyankasaggu11929 and others added 2 commits May 30, 2025 19:11

add benchmark performance analysis

df7d0be

Updated toc

cd58ad7

sreeram-venkitesh commented May 30, 2025

View reviewed changes

BenTheElder reviewed Jun 4, 2025

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2025

Clarified metrics to inform rollback

6cf3aa1

jpbetz reviewed Jul 10, 2025

View reviewed changes

sreeram-venkitesh added 2 commits August 1, 2025 13:58

Update performance analysis section

976bc7d

Update KEP status to provisional

81d1a34

jpbetz reviewed Aug 1, 2025

View reviewed changes

remove PRR file

61dff38

k8s-ci-robot assigned jpbetz Aug 4, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2025

k8s-ci-robot merged commit 3f84dc1 into kubernetes:master Aug 4, 2025
4 checks passed


		This feature will not impact rollouts or already-running workloads.

		###### What specific metrics should inform a rollback?

		The following table provides an average performance analysis across CEL and JSONPath based additionalPrinterColumns:


		\| \| CEL (Benchmark_CEL) \| JSONPath (Benchmark_JSONPath) \|

KEP 4595: CEL for CRD AdditionalPrinterColumns #4602

KEP 4595: CEL for CRD AdditionalPrinterColumns #4602

Conversation

sreeram-venkitesh commented Apr 27, 2024

Uh oh!

sftim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k8s-triage-robot commented Sep 1, 2024

Uh oh!

sreeram-venkitesh commented Sep 1, 2024

Uh oh!

k8s-triage-robot commented Nov 30, 2024

Uh oh!

k8s-triage-robot commented Dec 30, 2024

Uh oh!

sreeram-venkitesh commented Jan 2, 2025

Uh oh!

k8s-triage-robot commented Apr 2, 2025

Uh oh!

sreeram-venkitesh commented Apr 12, 2025

Uh oh!

Priyankasaggu11929 commented May 27, 2025

Uh oh!

jpbetz commented May 28, 2025

Uh oh!

sreeram-venkitesh commented May 28, 2025

Uh oh!

jpbetz commented May 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpbetz May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpbetz commented Jun 5, 2025

Uh oh!

Priyankasaggu11929 commented Jun 9, 2025

Uh oh!

Priyankasaggu11929 commented Jun 18, 2025

jpbetz May 31, 2025 •

edited

Loading