Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

les/vflux/client, p2p/nodestate: fix data races #24058

Merged
merged 1 commit into from
Dec 14, 2021

Conversation

zsfelfoldi
Copy link
Contributor

This PR fixes some data races in the server pool tests and also one in the NodeStateMachine.
Fixes #23848

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether the semantics are ok, but I'm not able to find any race in the tests any longer with this PR, so that's good.
One nitpick, though

@@ -54,6 +54,8 @@ type ServerPoolTest struct {
db ethdb.KeyValueStore
clock *mclock.Simulated
quit chan chan struct{}
stopping bool
queryWg *sync.WaitGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a sync.WaitGroup instead of a pointer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove the pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back and added comments to explain why it needs to be recreated each time and how the waitgroup is used safely :)

@fjl
Copy link
Contributor

fjl commented Dec 6, 2021

I think the changes in les/vflux/client tests violate WaitGroup semantics, and will lead to occasional crashes in tests.

We could ignore this and merge anyway, but then the tests will subtly wrong in a different and less-detectable way. Also, the fix applied here is super ugly. To make a meaningful improvement, it should at least be documented what the scope of the lock is. Does it apply to everything in ServerPoolTest? From the code, it looks like only testNodes and specifically testNodes[x].connected require synchronization.

@zsfelfoldi
Copy link
Contributor Author

Now I think it's mergeable

@fjl fjl merged commit fc01a7c into ethereum:master Dec 14, 2021
@fjl fjl added this to the 1.10.14 milestone Dec 14, 2021
@fjl fjl removed the status:triage label Dec 14, 2021
JacekGlen pushed a commit to JacekGlen/go-ethereum that referenced this pull request May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some tests fail with data races
4 participants