-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: close of a channel with pending sends incorrectly flagged by race detector #30372
Comments
Similar to #27769 |
It's a valid transaction race. There is no guarantee the sends complete before the close, which would result in a panic. The race detector actually helped you here. |
@as The code with data race is in Go's runtime, not in my code. Where does the spec declare this code as generating a data race? As stated in my bug report, I understand that this panics as that is what the spec says and I'm fine with that. I'm not fine with this also being reported as a data race as the spec doesn't say that closing a channel with pending sends is undefined behavior (which would be enough of an escape to consider the implementation conforming). |
Thank you for this report @kjk! In deed as @go101 noted, it is a duplicate of #27769 which is still open and trying to decide how to document this. As @as has noted it is a valid transaction race but as you too have found there is no documentation in the spec about this. Let's move this to that issue as it is also actionable but more discussion has gone on there. I'll close this one in favor of #27769 but please feel free to reopen this one. |
I don't have permission to re-open the bug, @odeke-em could you please re-open it. This is not a dup #27769. The discussion in that bug operates under the assumption that this:
is invalid Go program, that race detector is rightfully detecting a data race here and it's just a matter of documenting that. I've read the discussion and I didn't find a justification for that position. The fact that close of a channel with pending sends is considered invalid Go code would certainly violate my assumptions, given that raison d'etre for channels to allow thread-safe exchange of data between goroutines. I claim that the above is a valid Go code and there's a bug in Go's implementation of close/send or implementation of race detector. Maybe I'm wrong but I would like someone involved in the spec/compiler/race detector to opine on that. Without that closing this bug seems premature. |
Furthermore, #27070 describes a similar race between close and len/cap which was deemed a bug of enough severity that the fix was backported to 1.11. |
I can't quite match up the line numbers in That suggests the intent was read/writes of Furthermore Presumably always using atomic operations for |
There should be two checks. One fast path outside the critical section and another one inside. I'm not sure where there's suddenly an atomic being used in |
Reopening this since @odeke-em said "feel free to reopen this one" and the issue's author has requested it. |
@as Not sure how to parse this response. A write to As far as I can tell, if that happens on different threads, it's a data race. Here's
|
See https://golang.org/ref/mem#tmp_7. The happens-before relationship for channels is only defined between corresponding sends and receives, and between a Data races are defined by the memory model, not what the concrete channel implementation in a given version of the Go runtime happens to do: reads and writes to |
Sorry for pressing this but I don't feel this was fully addressed and would appreciate clarification on 3 points. 1. What this implies is: closing a channel with pending sends is incorrect Go code? If yes, how is one supposed to safely close such channel? If there is no safe way, it implies that this is invalid concurrency patter in Go: I launch goroutines to send, they block because no-one is reading yet and I want to terminate the communication by closing the channel? This is the essence of my program is doing. The implication is that if I ever want to close a channel, I'm not supposed to do a send on that channel on another goroutine. 2. A race between 3. A data race in the implementation of channel send and receive is relevant, because: a) it's a bug in itself Am I wrong that racing on |
Yes.
Wait for the senders to complete. (For your original example, that might look something like https://play.golang.org/p/4LtjIUA0tla.)
Correct, that pattern is invalid. You can terminate communication by closing a channel on which the goroutines are receiving, but not one on which they are sending. Note that in your original example, the goroutines are not terminated: if any are remaining, they either leak or panic. (See https://play.golang.org/p/2Tjo2a3Q3iN and https://play.golang.org/p/sHi6fxSgEwv respectively.) |
|
Since a send concurrent with a In contrast, |
Yes. In addition to what Brian said, consider the following two operations happening concurrently:
If the send happens first, both operations complete fine, and the channel reader sees x, then end-of-channel.
See Brian's answer.
Consider the following two operations happening concurrently:
We get the same final result no matter which order those two operations happen. So it's unlikely to be a bug in the user's program. (Digression: this is kind of a weird case, because presumably there is a receiver around somewhere, and the receive and len race with each other. So it's only odd cases where #27070 matters.)
The runtime has all sorts of data races in it. The runtime doesn't depend on the Go memory model, but the memory model of the underlying hardware. The particular case of reading I'm pretty sure there isn't a bug there, but I share your squeamishness. Sometimes we go squeamish to go fast... |
I believe the justification for fixing #27070 was:
As to closing the channel: in my case, there is no guarantee that readers will drain the channel hence I would deadlock waiting for sends to complete. For context: it's a NOSQL/JSON document database driver where a client can ask to be notified about future changes in the database, for example:
Those changes are async by nature (might or might not happen at some point in the future) so It seemed to me that the idiomatic way to solve that in Go would be to return a channel with those changes. My driver code has no control over when / if the client will read those notifications. For that reason, I do each send on a goroutine, to not block the thread that talks to the database. When the client calls I'm fine both with the panic that results from send on a close channel and with indeterminism of how many notifications will end read by the client. My point here is that this seems like a reasonable thing to do yet seemingly impossible to do without a data race (which I wouldn't care about because it is seemingly benign, but it does cause tests to fail). The "wait for readers to drain the channel" solution is not applicable here. And sure, I could re-architect this to e.g. use a callback function or implement a thread-safe queue of messages but channel seemed like the most elegant solution. |
The usual approach would be a synchronous callback function or explicit iterator. Channels in exported APIs are rarely the most elegant solution. (For more detail, see the first section of Rethinking Classical Concurrency Patterns.) |
The only reason to close a channel is to tell the receiver that there is nothing left to receive. Therefore, closing a channel should always be thought of a send operation on a channel, and only the sender should close a channel. When you have multiple senders, they must coordinate when to close the channel. Note also that sending on a closed channel causes a run-time panic. The most obvious way to see that your original code has a race is to remove the call to |
I feel like those justifications are mixing up different kind of races. This bug report is about data race as detected by race detector. An implementation of channel send and receive in the runtime is doing reads and writes to the same memory location and race detector flags it because the cpu considers this to produce an undefined result. The race you're talking about is about not being able to say if "a" happens before "b" or "b" happens before "a". It's a logical race related to ordering. It might or might not be a bug in my code, depending on whether I rely on the order or not. In my particular case I do not so it's not a bug. While I appreciate you take the time to offer guidance on how to structure the code, the snippet I provided is a minimized test case to demonstrate a data race in the runtime. As I described elsewhere in my real code I can't coordinate senders because they are blocked on readers that might never read. It might be bad code but the issue it surfaced is whether a close is safe for concurrent use. I understand that this generates a panic because the spec says so. I expect it, I handle it, it doesn't make the program incorrect as per Go spec. I concede that the spec allows it:
By not including But it also requires synchronization between To sum up: the spec requires synchronization between Additionally, a race between close and len was fixed recently even though, as far as I can tell, the spec treats them the same. |
(I assume you mean send and close, not send and receive?) The runtime arranges to have
The problem is that for almost everyone else, it is a bug. It is a known gotcha for inexperienced Go programmers. Their program works fine while testing, but then in production the scheduler starts doing something differently and their program panics. We're very on board with the fact that the race detector helps us detect this issue more reliably in testing.
If they are blocked on channel reads, then you can use a You're right that the spec is somewhat muddied in where exactly a race should be reported, and where it shouldn't. But we definitely want a race between send and close. Not because of the spec, but because reporting that race helps real users fix real bugs. |
I do agree closing a channel with pending sends is package main
import (
"fmt"
"time"
)
func main() {
ch := make(chan string)
for i := 0; i < 256; i++ {
go func(n int) {
defer func() {
recover()
}()
ch <- fmt.Sprintf("string %d", n)
}(i)
}
time.Sleep(time.Second)
close(ch)
} And I still feel this issue is a dup of #27769 |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Save this to e.g.
t.go
:and run
go -race t.go
.What did you see instead?
I see a data race reported:
What did you expect to see?
I didn't expect data race between
close(ch)
and pendingch <- v
send.The spec (https://golang.org/ref/spec#Channel_types) is curiously mum on the subject.
It says:
By not mentioning
close
it implies that one has to synchronizeclose
and sends.But I wouldn't know how to synchronize those. My sends have already been dispatched and closing a channel seems like a valid (and only) way to ensure those sending goroutines don't stick forever (assumingI recover the resulting panic).
The data race seems harmless, as I'm done with this channel, but I run my tests with
-race
detection enabled and this makes the test fail.The text was updated successfully, but these errors were encountered: