Skip to content

Comments

🐛 Cache reader: Wait for cache sync when ReaderFailOnMissingInformer is true#3425

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
troy0820:troy0820/cache-bug
Jan 25, 2026
Merged

🐛 Cache reader: Wait for cache sync when ReaderFailOnMissingInformer is true#3425
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
troy0820:troy0820/cache-bug

Conversation

@troy0820
Copy link
Member

Resolves #3424

/assign @alvaroaleman

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 21, 2026
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 21, 2026
if !ok {
return false, nil, &ErrResourceNotCached{GVK: gvk}
}
if !cache.Informer.HasSynced() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might still differ too much from the behavior in Informers.Get

Maybe ic.readerFailOnMissingInformer should be passed into Informers.Get and handled there to make it easier to keep both branches in sync.

@alvaroaleman WDYT?

Copy link
Member

@alvaroaleman alvaroaleman Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should use the same code to determine and do the syncing regardless of if readerFailOnMissingInformer is set or not, that it was added outside of that was basically the main mistake back then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should use the same code to determine and do the syncing regardless of if readerFailOnMissingInformer is set or not, that it was added outside of that was basically the main mistake back then.

Wouldn't that change the test that is with it?

		Describe("as a Reader", func() {
			Context("with structured objects", func() {
				It("should not be able to list objects that haven't been watched previously", func(ctx SpecContext) {
					By("listing all services in the cluster")
					listObj := &corev1.ServiceList{}
					Expect(errors.As(informerCache.List(ctx, listObj), &errNotCached)).To(BeTrue())
				})

I am trying to understand the behavior we want because if we don't care whether this is set or not:

  1. Should we fail if this is set?
  2. Should we add to the informerMap and then wait for sync, then fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the above test is for a cache that has ReaderFailOnMissingInformer it needs to keep passing, yes. The only difference in Get between that option being set and not should be that we error out if the informer is not in the map already

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that's how I understood it after I re-read the issue and the setting. I am dealing with a context cancelled test failure at the moment, that I am hunting down.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 22, 2026
i, started, ok := ip.Peek(gvk, obj)
if !ok {
if readerFailOnMissingInformer {
return false, nil, fmt.Errorf("failed to get informer for %v: informer not found", gvk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, this here needs to return ErrResourceNotCached. That in turn requires to move that to internal, something like this will do that without breaking the export:

+++ b/pkg/cache/informer_cache.go
@@ -51,16 +51,7 @@ var _ error = (*ErrCacheNotStarted)(nil)
 // ErrResourceNotCached indicates that the resource type
 // the client asked the cache for is not cached, i.e. the
 // corresponding informer does not exist yet.
-type ErrResourceNotCached struct {
-       GVK schema.GroupVersionKind
-}
-
-// Error returns the error
-func (r ErrResourceNotCached) Error() string {
-       return fmt.Sprintf("%s is not cached", r.GVK.String())
-}
-
-var _ error = (*ErrResourceNotCached)(nil)
+type ErrResourceNotCached = internal.ErrResourceNotCached

return started, cache, nil
started, cache, err := ic.Informers.Get(ctx, gvk, obj, ic.readerFailOnMissingInformer, &internal.GetOptions{})
if err != nil {
return false, nil, &ErrResourceNotCached{GVK: gvk}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here needs to just return the error and not overwrite it with ErrResourceNotCached, as there is a range of other errors that Informers.Get may return

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 25, 2026
return i.Informer, nil
}

func (ic *informerCache) getInformerForKind(ctx context.Context, gvk schema.GroupVersionKind, obj runtime.Object) (bool, *internal.Cache, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not very obvious from the code but this is the only place where we need to pass ic.readerFailOnMissingInformer, in the two others we need to pass false.
This is because this here is only used by the cache reader whereas GetInformer and GetInformerForKind are methods on the cache. If those error out when readerFailOnMissingInformer is true, it becomes impossible to construct an informer at all

Signed-off-by: Troy Connor <troy0820@users.noreply.github.com>
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jan 25, 2026

@troy0820: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-controller-runtime-apidiff 8a2b6fe link false /test pull-controller-runtime-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@alvaroaleman alvaroaleman changed the title 🐛 Check to see if informer is synced and started before returning cache 🐛 Cache reader: Wait for cache sync when ReaderFailOnMissingInformer is true Jan 25, 2026
Copy link
Member

@alvaroaleman alvaroaleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2026
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: 04a492b074bb478257b0a0557e2adc0ea2fbe882

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, troy0820

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 25, 2026
@alvaroaleman
Copy link
Member

/cherrypick release-0.23
/cherrypick release-0.22

@k8s-infra-cherrypick-robot

@alvaroaleman: once the present PR merges, I will cherry-pick it on top of release-0.22, release-0.23 in new PRs and assign them to you.

Details

In response to this:

/cherrypick release-0.23
/cherrypick release-0.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot merged commit 82cc073 into kubernetes-sigs:main Jan 25, 2026
13 of 14 checks passed
@k8s-infra-cherrypick-robot

@alvaroaleman: new pull request created: #3432

Details

In response to this:

/cherrypick release-0.23
/cherrypick release-0.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@alvaroaleman: new pull request created: #3433

Details

In response to this:

/cherrypick release-0.23
/cherrypick release-0.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReaderFailOnMissingInformer causes the sync check on the cache to be short-circuited

5 participants