Skip to content

Conversation

michaelasp
Copy link
Contributor

  • One-line PR description: Adding new KEP for comparable resource version
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 28, 2025
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 28, 2025
@michaelasp
Copy link
Contributor Author

/cc @liggitt
/cc @jpbetz


## Alternatives

N/A
Copy link

@lmktfy lmktfy Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have alternatives, and I'm about to suggest one. There may be more.

We have a rich discovery API. I think we can use that, drawing inspiration from Rust and its concept of traits.

Rust traits are like interfaces, but can also be kind of zero sized. For example, the PartialEq trait from Rust's standard library means that you can use = to check for equality. The Eq trait (which requires also implementing PartialEq) makes a promise that all values can be compared against other values. Some nullable types only implement PartialEq. There are also traits to say if some, or all, values can be compared against each other: PartialOrd and Ord.

(Eq and Ord are just marker traits; they don't add any API surface, they are just making a stronger promise than their supertrait).

We can mark our APIs with something similar to traits and declare, per API, whether resourceVersion comparisons are meaningful for sorting
Doing it like this means we can have easy soft adoption; well, relatively easy.

Every in-tree API would implement resourceVersion ordering and can mark itself as such. A metrics adapter that doesn't implement it would not need to worry about clients making the wrong assumption.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add something like traits, I think we could mark our stable APIs as stable, too. It wouldn't be much extra code.

There might be some other traits-like things we can think up, such as marking if an API kind uses the .metadata.generation / .status.observedGeneration pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing that as an alternative is fine, but I think it's reasonable to proceed without needing opt-in comparability indicators in discovery.

There are three categories of APIs

1. Servers where resourceVersion is a comparable integer

  • built-in kube types served by kube-apiserver
  • built-in kube types served by something other than kube-apiserver (ensured via the conformance test proposed in this KEP)
  • CRD-backed types served by kube-apiserver
  • CRD-backed types served by something other than kube-apiserver (ensured via the conformance test proposed in this KEP)
  • extension API servers built with k8s.io/apiserver using normal CRUD storage
  • extension API servers built with other technologies but following kube-apiserver behavior

2. Servers where resourceVersion is not an integer

For servers like this, a local well-formedness check when comparing two resourceVersion strings triggering a comparison error (as proposed in the function signature) would be enough to guard against misinterpretation, without needing to augment discovery and make the client check discovery.

All the examples of unusual aggregated servers I'm aware of fall in this category, or have other server-side correctness issues that already mishandle resource versions in ways inconsistent with client-expected behavior:

3. Servers where resourceVersion is an integer, but is not comparable

Obviously this is possible, but do we have any actual known examples of real servers that do this?

I would really want to push extension API server authors to live in category 1 or 2. Either make your integer resourceVersions comparable like everyone expects (preferred), or make them clearly not integers if that's not possible.

Unless there's evidence that there are actually servers like this in use that won't adjust to be in category 1 or 2, I'd push back on the complexity of adding something like "integerResourceVersionIsComparable: false" to discovery per-type and making clients check it.


## Summary

Resource version is currently defined as an opaque string from the view of a client, with the only operation that is supported being equality comparisons. This differs from the internal apiserver implementation, where it is clearly defined as a monotonically incrementing integer. There are increasing requirements being required from clients consuming object metadata, where stronger comparisons than just equality are required.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The apiserver" is not the only implementation of .metadata, though: you can also integrate your control plane with an extension API server via the aggregation layer.


### Goals

The goals for this KEP are fairly straightforward, firstly we will expose a utility function that clients can use on the resource version to check comparisons between resource versions. This will take the opaque resource version string and return a boolean and an error if it occurs. Along with that we will update the documentation to specify that a ResourceVersion must be a monotonically increasing integer.
Copy link

@lmktfy lmktfy Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This risks implying that all APIs support this comparison. I'd prefer to make that promise on an API-by-API basis. See #5505 (comment) for the specifics of what I have in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that not-monotonic RV is ok for an API without Watch like Metrics API, however the advantage of having monotonic RV for informers is huge. Being able to do consistent reads, monitor staleness is deal breaker that will make non-compliant mostly API useless and force them to implement it.

I would also point out that it would force us to maintain two different informer implementations based on whether API has monotonic RV, because fully utilizing monotonic RV will motivate non trivial code changes.


## Alternatives

N/A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing that as an alternative is fine, but I think it's reasonable to proceed without needing opt-in comparability indicators in discovery.

There are three categories of APIs

1. Servers where resourceVersion is a comparable integer

  • built-in kube types served by kube-apiserver
  • built-in kube types served by something other than kube-apiserver (ensured via the conformance test proposed in this KEP)
  • CRD-backed types served by kube-apiserver
  • CRD-backed types served by something other than kube-apiserver (ensured via the conformance test proposed in this KEP)
  • extension API servers built with k8s.io/apiserver using normal CRUD storage
  • extension API servers built with other technologies but following kube-apiserver behavior

2. Servers where resourceVersion is not an integer

For servers like this, a local well-formedness check when comparing two resourceVersion strings triggering a comparison error (as proposed in the function signature) would be enough to guard against misinterpretation, without needing to augment discovery and make the client check discovery.

All the examples of unusual aggregated servers I'm aware of fall in this category, or have other server-side correctness issues that already mishandle resource versions in ways inconsistent with client-expected behavior:

3. Servers where resourceVersion is an integer, but is not comparable

Obviously this is possible, but do we have any actual known examples of real servers that do this?

I would really want to push extension API server authors to live in category 1 or 2. Either make your integer resourceVersions comparable like everyone expects (preferred), or make them clearly not integers if that's not possible.

Unless there's evidence that there are actually servers like this in use that won't adjust to be in category 1 or 2, I'd push back on the complexity of adding something like "integerResourceVersionIsComparable: false" to discovery per-type and making clients check it.

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did another sweep, this is looking good

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Sep 3, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Sep 3, 2025
Copy link
Member

@stlaz stlaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does merging this KEP effectively turn the InformerResourceVersion feature gate GA?

The feature gate is not very well documented but as far as I understand, it was in place just to wait until ResourceVersions are comparable to make sure the invocation of cache.Controller{}.LastSyncResourceVersion() returns a sensible result.

cc @nilekhc

@liggitt
Copy link
Member

liggitt commented Sep 4, 2025

Does merging this KEP effectively turn the InformerResourceVersion feature gate GA?

Um.... not quite? (I didn't actually know about that gate). This KEP defines how client-side comparisons should happen, but those comparisons can return errors... I don't see any comparison or error-handling path in the spots where the InformerResourceVersion gate is used (or where the underlying LastSyncResourceVersion() is set or retrieved). InformerResourceVersion would have to be updated to use the defined comparison approach, and handle encountered errors properly, I think.

@liggitt
Copy link
Member

liggitt commented Sep 4, 2025

I'm pretty happy with the current state. I'd like @deads2k and @jpbetz to give it a pass.

/assign
/assign @deads2k @jpbetz

@michaelasp
Copy link
Contributor Author

Does merging this KEP effectively turn the InformerResourceVersion feature gate GA?

Um.... not quite? (I didn't actually know about that gate). This KEP defines how client-side comparisons should happen, but those comparisons can return errors... I don't see any comparison or error-handling path in the spots where the InformerResourceVersion gate is used (or where the underlying LastSyncResourceVersion() is set or retrieved). InformerResourceVersion would have to be updated to use the defined comparison approach, and handle encountered errors properly, I think.

I think these are two separate things, although I may be misunderstanding the feature gate. All I think that InformerResourceVersion does is return the last seen resource version of the informer, not much else. Since the dummy informer returns an empty string when this is called and we still need the last resource version the feature gate was added. I think it's unrelated to any comparison logic.

client as well, particularly the ability to consume the resource version as an
integer, and the ability to compare resource versions to each other for more
than equality. Clients can use the new semantics in order to determine the
relative order of two different resource versions for the same type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: type -> kind

Copy link
Member

@liggitt liggitt Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, probably resource, actually

(if a Wardle kind was served under two resource endpoints like wardles and clusterwardles, a resourceVersion would only be comparable within the stream of one of those resources)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched the wording to resource stream, I think that makes the most sense? Lmk what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"resource" has a specific definition, I'd probably stick with that instead of "resource stream"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sg, updated

@liggitt
Copy link
Member

liggitt commented Sep 23, 2025

lgtm

@alvaroaleman
Copy link
Member

This KEP is mostly about extending the promise of the existing implementation rather than any change to it. Would it be appropriate to start relying on this as described in Helper Function as soon as this KEP merges, including for older Kubernetes APIServer versions?

@liggitt
Copy link
Member

liggitt commented Sep 23, 2025

This KEP is mostly about extending the promise of the existing implementation rather than any change to it. Would it be appropriate to start relying on this as described in Helper Function as soon as this KEP merges, including for older Kubernetes APIServer versions?

I would expect so. The addition of the conformance test in 1.35 will make it official, but the described approach is valid to use against any historical kube-apiserver as well.

@jpbetz
Copy link
Contributor

jpbetz commented Sep 23, 2025

/approve
/lgtm
Let's merge and iterate as needed.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, michaelasp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 23, 2025
@k8s-ci-robot k8s-ci-robot merged commit cebf304 into kubernetes:master Sep 23, 2025
3 of 4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.35 milestone Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.