🐛 Quiet context.Canceled errors during shutdown #2745

cbandy · 2024-03-30T18:39:11Z

Runnable implementations that return ctx.Err() cause a spurious "error received after stop" log message.

Fixes #1927

k8s-ci-robot · 2024-03-30T18:39:21Z

Hi @cbandy. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alvaroaleman

Could you add a test, please?
/ok-to-test

cbandy · 2024-03-30T23:10:59Z

@alvaroaleman Thanks for taking a look. Test added in an amended commit.

alvaroaleman · 2024-03-30T23:30:26Z

/retest

alvaroaleman · 2024-03-30T23:51:11Z

pkg/manager/manager_test.go

+
+				logs := []string{}
+				options.Logger = funcr.NewJSON(func(object string) {
+					logs = append(logs, object)


This is a datarace which is why the tests are failing. You need to protect logs with a mutex or some such

A mutex here solved a race among append calls, but it revealed a race between this line and the assertion on L1091 during the LeaderElection suite. My guess is that this goroutine is to blame:

controller-runtime/pkg/manager/internal.go

Lines 423 to 427 in 2136860

// Start the leader election and all required runnables.

{

ctx, cancel := context.WithCancel(context.Background())

cm.leaderElectionCancel = cancel

go func() {

😞 I added a time.Sleep for that suite for now.

So the idea is that currently with leaderelection engageStopProcedure is still writing the the logger while the test is asserting on the logs?

My guess would be because the RunnableFunc is not a LeaderElectionRunnable. So it won't wait for leaderelection to close start

Okay reproduced locally. ^^ doesn't make sense as everything that doesn't implement LeaderElectionRunnable is treated like it needs leader election

I think the reason why it only occurs with leader election is because in that case we get one more log message:

(the other log messages are all produced "synchronously" within engageStopProcedure)

@cbandy @alvaroaleman If that makes sense to you both, let's add a godoc comment before the sleep and I think we're good

🤔 The fact that Start aggregates its err with stopErr makes me think it expects engageStopProcedure to return the outcome of stopping. Why does it inject a channel at all?

Should engageStopProcedure wait for the errChan-draining goroutine to finish?

🤔 The fact that Start aggregates its err with stopErr makes me think it expects engageStopProcedure to return the outcome of stopping. Why does it inject a channel at all?

Not sure. I think the goal of this case case err := <-cm.errChan: in engageStopProcedure is mostly to avoid running into deadlocks because the errChan blocks.

Should engageStopProcedure wait for the errChan-draining goroutine to finish?

With the way it currently works it can't because this defer (defer close(stopComplete)) in Start is called too late.

I guess we could wait in the Start func that the errChan-draining goroutine also finishes.

It would be a bit nicer, but I'm not sure if I want to introduce even more complexity into all of this. It's already pretty bad to reason about it.

vincepri · 2024-04-02T14:25:18Z

pkg/manager/internal.go

-			case err, ok := <-cm.errChan:
-				if ok {
+			case err := <-cm.errChan:
+				if !errors.Is(err, context.Canceled) {


Why not keep checking the ok value here?

I don't see any calls to close errChan, so ok is always true (today).

Also, in my experience, reading from a closed channel in select is for some kind of control flow (break or return or so), or the channel is assigned nil so that it is not selected again.

I don't see any calls to close errChan, so ok is always true (today).

Sounds okay to me. I guess if we ever start closing the channel we have to check all usages of the channel anyway.

pkg/manager/internal.go

pkg/manager/manager_test.go

k8s-triage-robot · 2024-07-16T12:37:34Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pkg/manager/manager_test.go

Runnable implementations that return ctx.Err() cause a spurious "error received after stop" log message.

cbandy · 2024-08-04T23:11:51Z

Rebased with recommended changes. PTAL.

alvaroaleman

Thanks! And sorry this ended up taking so long

k8s-ci-robot · 2024-08-04T23:22:36Z

LGTM label has been added.

Git tree hash: 00e7e5cbef412bab8bf37d3bc3f98327a20c6934

k8s-ci-robot · 2024-08-04T23:22:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, cbandy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alvaroaleman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sbueringer · 2024-08-05T15:51:43Z

/lgtm

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 30, 2024

k8s-ci-robot requested review from joelanford and varshaprasad96 March 30, 2024 18:39

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 30, 2024

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 30, 2024

alvaroaleman reviewed Mar 30, 2024

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 30, 2024

cbandy force-pushed the quiet-canceled branch from b7e8ceb to fc73db2 Compare March 30, 2024 23:09

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 30, 2024

alvaroaleman reviewed Mar 30, 2024

View reviewed changes

cbandy force-pushed the quiet-canceled branch from fc73db2 to 732bd60 Compare April 2, 2024 02:21

vincepri reviewed Apr 2, 2024

View reviewed changes

sbueringer reviewed Apr 15, 2024

View reviewed changes

pkg/manager/internal.go Show resolved Hide resolved

sbueringer reviewed Apr 15, 2024

View reviewed changes

pkg/manager/manager_test.go Show resolved Hide resolved

sbueringer reviewed Apr 16, 2024

View reviewed changes

pkg/manager/manager_test.go Show resolved Hide resolved

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 16, 2024

alvaroaleman reviewed Aug 4, 2024

View reviewed changes

pkg/manager/manager_test.go Outdated Show resolved Hide resolved

Quiet context.Canceled errors during shutdown

5af1f3e

Runnable implementations that return ctx.Err() cause a spurious "error received after stop" log message.

cbandy force-pushed the quiet-canceled branch from 732bd60 to 5af1f3e Compare August 4, 2024 23:07

alvaroaleman approved these changes Aug 4, 2024

View reviewed changes

k8s-ci-robot assigned alvaroaleman Aug 4, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2024

k8s-ci-robot merged commit 89b5dee into kubernetes-sigs:main Aug 4, 2024
7 checks passed

cbandy deleted the quiet-canceled branch August 4, 2024 23:43

	// Start the leader election and all required runnables.
	{
	ctx, cancel := context.WithCancel(context.Background())
	cm.leaderElectionCancel = cancel
	go func() {

🐛 Quiet context.Canceled errors during shutdown #2745

🐛 Quiet context.Canceled errors during shutdown #2745

Uh oh!

Conversation

cbandy commented Mar 30, 2024

Uh oh!

k8s-ci-robot commented Mar 30, 2024

Uh oh!

alvaroaleman left a comment

Choose a reason for hiding this comment

Uh oh!

cbandy commented Mar 30, 2024

Uh oh!

alvaroaleman commented Mar 30, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbandy Apr 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k8s-triage-robot commented Jul 16, 2024

Uh oh!

Uh oh!

cbandy commented Aug 4, 2024

Uh oh!

alvaroaleman left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Aug 4, 2024

Uh oh!

k8s-ci-robot commented Aug 4, 2024

Uh oh!

Uh oh!

sbueringer commented Aug 5, 2024

Uh oh!

Uh oh!

cbandy Apr 2, 2024 •

edited

Loading

sbueringer Apr 17, 2024 •

edited

Loading

sbueringer Apr 17, 2024 •

edited

Loading

sbueringer Apr 15, 2024 •

edited

Loading