-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data race on BatchSpanProcessor #4249
Comments
Could you provide a reproduction example? |
@dmathieu the problem here is the approach the SDK is taking for handling the close/send of the queue channel. It's not guaranteeing that it won't send to the channel after closing, but instead relying on a recover after a panic. That's a valid data race as documented here:
It doesn't repro consistently because of another race condition in the SDK: My suggestion here would be to not close the queue channel at all, then combine the selects above so you just atop adding to the queue after the shutdown. |
Combining the selects wouldn't be a reliable solution either, as Go doesn't run select/case statements in the order they are specified, but in shuffle order. See https://medium.com/a-journey-with-go/go-ordering-in-select-statements-fd0ff80fd8d6 |
I quickly looked at the code and it uses channels improperly. The "consumer" goroutine is closing the channel: opentelemetry-go/sdk/trace/batch_span_processor.go Lines 321 to 350 in fd5284f
Moreover, the "producer" goroutine has some hacks to prevent panics if it sends to a closed channel.: opentelemetry-go/sdk/trace/batch_span_processor.go Lines 361 to 395 in fd5284f
I guess there may be more problems, but it looks clear that the synchronization is not implemented properly. |
We're seeing the data race below for SDK 1.14.0:
The text was updated successfully, but these errors were encountered: