-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context: ease debugging of where a context was canceled? #26356
Comments
Sorry, but there isn't enough information here to say anything useful. Please show us your code, or tell us how to recreate the problem. Thanks. |
The issue is that when we call GetCalendars for example, it will instantly return "context canceled". When I use an empty context with no timeout, everything works as expected. This all began failing in the past couple of days since updates were made in the /x/net package |
I think the behavior you're seeing is expected. I don't see a Go bug.
Your code has:
The "When I use an empty context with no timeout, everything works as expected. This all began failing in the past couple of days since updates were made in the /x/net package." You are describing to things that might have changed in your code (no longer using empty context, changes in x/net). If my theory above is correct: if you undo the x/net changes (to only change one variable) you'll see that any non-empty context will still have the problem you're seeing. |
I've seen this sort of issue crop up several times now. I wonder if Perhaps something along the lines of: package context
// A DoneReasoner describes the reason a Context is done.
// The Context implementations returned by this package implement DoneReasoner.
type DoneReasoner interface {
DoneReason() string
}
// DoneReason returns a human-readable description of the reason that ctx.Done() is closed,
// or the empty string if ctx.Done() is not closed.
func DoneReason(ctx context.Context) string {
select {
case <-ctx.Done():
default:
return ""
}
if r, ok := ctx.(DoneReasoner); ok {
return r.DoneReason()
}
return ctx.Err().Error()
} |
@meirf I think there was a misunderstanding in how I described the problem. This code has always been using two contexts like I described but only in the past two days since we pulled updates to the vendor package did it start failing. There were not any changes whatsoever related to any of this code for several months besides pulling in the package update. Only after we pulled in the update and everything started failing, we switched to contexts without timeouts (still multiple) and started seeing things succeed. If your explanation is correct, the defer cancel was not previously working as expected However, I believe your explanation is incorrect because the error message is returned when GetCalendars is called, not on client retrieval. We always receive a calendarClient back, so calendar.New is being called correctly |
Do you happen to have a record of which revision of the context package you were using previously, and which one you updated to? |
Which vendored packages did you update? (Just |
I've narrowed down the precise commit where things began to fail, and there are more vendor updates here than I'd previously thought. It looks like both packages (x/net and x/oauth2) were updated and that the previous version was decently old such that it may be very difficult to narrow anything down, but here are a couple of samples from the revisions: "checksumSHA1": "dr5+PfIRzXeN+l1VG+s0lea9qz8=", => "checksumSHA1": "GtamqiJoL7PGHsN454AoffBFMa8=", "checksumSHA1": "hyK05cmzm+vPH1OO+F1AkvES3sw=", => "checksumSHA1": "xaHFy3C2djwUXtiURpX3/ruY8IA=", Also, I re-tested the above code by leaving everything the same and completely removing the second context.WithTimeout inside of GetCalendars such that only the client has a timeout configured, and I see the same error |
https://golang.org/cl/45370 is in that range and loosely matches @meirf's diagnosis. It looks like your vendor update uncovered a latent bug within your code. |
I've confirmed that commenting out the Is the expectation that we exercise the client we retrieve to its fullest extent only to call Can we rely on GC and just not bother holding onto the cancel func? |
I think you have it backwards. Get rid of the context in newClient. |
@fraenkel I'm fine with whichever, but doesn't putting it on the client prevent me from needing to duplicate this context code across all calls? |
@matthewceravolo, please don't retitle the bug. We have a convention for how we use titles. It still reads above like this is user error and you got lucky in the past, so I'd like to keep this bug about thinking about ways to make this debugging easier for other users making similar errors. |
@bcmills I wouldn't use type assertion to introspect a Context because they are wrapped so often. If you want to attach a Done reason to a Context, I'd use a Context value. This is overhead, though, so I'd make it opt-in in somehow. Perhaps provide a context.WithDebug method that adds a Context value for storing debugging info. The code in WithTimeout and WithCancel can check for this debug value to decide whether to walk the call stack and record any information there. The context debug value needs to be installed at the top of the context stack to be seen by all derived contexts, so I'd enable it in the request handler (say) with a parameter like ?debugcontext=1. |
Good point. A
Ironically, |
I would store a *debug in the context to flatten the lookup, leaving it nil
when debugging is disabled. Associating that debug value with an
unexported context key allows the debug info to propagate through Context
implementations outside the context package.
…On Tue, Jul 17, 2018 at 3:52 PM Bryan C. Mills ***@***.***> wrote:
I wouldn't use type assertion to introspect a Context because they are
wrapped so often.
Good point. A Context.Value call with an exported (but opaque) key to
retrieve the reason code should be fine.
If you want to attach a Done reason to a Context, I'd use a Context value.
This is overhead, though, so I'd make it opt-in in somehow. Perhaps provide
a context.WithDebug method that adds a Context value for storing debugging
info. The code in WithTimeout and WithCancel can check for this debug value
to decide whether to walk the call stack and record any information there.
Ironically, Context.Value is O(N) with the depth of Context nesting,
whereas runtime.Caller(1) is O(1): the mere act of checking whether
debugging is enabled may be significantly more costly than storing the
debug information unconditionally.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26356 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AJSK3TSfBWY21reU6GLskizNeC3fM3mEks5uHkBugaJpZM4VNsJ2>
.
|
@andreimatei Others have suggested similar hooks, and IIRC we even prototyped some. That might be worthwhile, but IMO debugging context cancelation is a common enough problem that I'd like to make it better for everyone by default, if we can do so with acceptable cost. |
Perhaps this deserves its own issue. I'd certainly be for it. |
A given For example, a server might construct a |
Based on the feedback from several of you, I'll prototype an alternative
that allows passing a value in via the cancel function.
…On Fri, Jan 7, 2022 at 2:33 PM Bryan C. Mills ***@***.***> wrote:
Capturing the stack trace at the cancel() call is likely to be
uninteresting: usually cancel is called either via timer.AfterFunc (in the
case of a deadline) or via defer, which may make it harder to debug if
there are multiple defers in the same function.
A given WithContext call site may have many possible cancel call sites,
and while it's true that the cancel call is often just a deferred
function, in some of the most interesting cases it is decidedly nontrivial.
For example, a server might construct a context.WithCancel for a
background operation and store the cancel function in a map, and only
invoke that cancel function when either the background operation
completes or an RPC comes in to explicitly cancel it. In such a case it
seems very important to distinguish between “canceled because the operation
completed” and “canceled by RPC”, but the WithCancel call site would not
distinguish them at all.
—
Reply to this email directly, view it on GitHub
<#26356 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACKIVXOCQENNWK5TYHTYNDTUU45W7ANCNFSM4FJWYJ3A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I had missed the developments around #28728 and their implications. Those changes seem to let us avoid the extra goroutine, which is great! That is, as long as we don't want to override the
I'm excited to see what comes out of it! FWIW, here's what we're trying within CockroachDB: a colleague's PR and my commit on top of it. |
I've updated my CL with a new version that prototypes adding *Cause variants of WithCancel, Deadline, and Timeout. I need to write more tests, particularly for cases where there are multiple causes in a context chain, and with mixed cancelation orders. |
This is wonderful! Unless I'm missing something, your |
@andreimatei I need to write some tests with chains of contexts and multiple causes. Hopefully that will help us see whether the returned causes are sensible or surprising. I'm explicitly propagating causes from parents to children in propagateCancel and cancelCtx.cancel—the same places where the context's error gets propagated. Glad to hear this API is directionally correct. If others agree, we'll need to take this up for proposal review. |
@andreimatei I added some tests: |
I added test for cause vs. causeless cancels and more orderings. I also renamed things so the test is easier to follow. I think the behavior is pretty clear now: the first cancel cause sticks for all children, even if it's a nil cause. |
I don't think so. The cases I'm thinking of are about a custom implementation of context that overrides the Here's a small program that demonstrates what I'm talking about.
The output of |
Ah, thanks for the reproduction case. I would state the expectation in the reverse: Cause(ctx) is only valid (defined) if ctx.Err() != nil (which is the same as when ctx.Done() is closed). If ctx.Err() is nil, then Cause(ctx) is undefined. I can add this to the spec in the CL. We could strengthen the spec for Cause to mirror what we say for Err:
So we could add (and implement) "If ctx.Err returns nil, the Cause(ctx) returns nil." |
In my opinion, we should aim to relegate Another thing I wanted to bring up is making sure that custom ctx impls can participate in (*) If I'd have my druthers, we would make |
@andreimatei, the Go compatibility policy does not allow new methods to be added to exported interfaces. So there is no plausible way for The best we could perhaps do is retrofit a top-level |
Regarding compatibility, my goal with this CL is to avoid breaking anything, including existing implementations of the Context interface. Therefore I'm not adding Cause() to the Context interface. My goal with Cause is not to replace Err, but to allow people to supplement Err with additional information. I think you're saying that you'd like to check Cause(ctx) != nil instead of ctx.Err() != nil. If so, then indeed we'd need to make sure they are nil / not-nil at the same times. We'd probably also want to make Cause(ctx) == ctx.Err() when the user hasn't provided a cause. But this then makes it difficult to determine whether ctx.Done was closed due to cancelation or deadline exceeded (the user would need to encode this into their cause). Regarding implementing Cause for custom contexts, users can do this easily by wrapping a context created using WithCancelCause. I'd like to keep things simple until there's a demonstrated need to support anything fancier. |
That's what I was thinking.
Right. But isn't that a natural thing to do, since the user has to pass an error to
When people implement custom contexts, I think they primarily do so for efficiency reasons. In such cases, forcing them to wrap a stdlib ctx defeats the purpose. There's also a more philosophical contradiction here, in my opinion. I should say that I can nitpick and form semi-informed opinions on proposals, but ultimately I'll gladly take most improvements around the cause, compared to nothing. |
All good points, @andreimatei , thank you. |
Please take a look at #51365 and see if it addresses your use cases. Thanks! |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.10 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOCACHE="/home/matthew/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/matthew/work"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build356255000=/tmp/go-build -gno-record-gcc-switches"
What did you do?
used context.WithTimeout() to make requests to google calendar api and outlook calendar api
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
What did you expect to see?
Making requests using contexts with timeouts should cancel when the timeout is reached
What did you see instead?
Contexts with timeouts are instantly failing with "context canceled" even though the timeout is set to time.Minute. The error goes away if I remove the timeout context and use one without any limit. It also seems to be transient to some extent
The text was updated successfully, but these errors were encountered: