-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix re-entrant GetOrHandshake
issues
#1044
Conversation
GetOrHandshake
issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with a small comment
lighthouse.go
Outdated
if lh.l.Level >= logrus.DebugLevel { | ||
lh.l.WithField("vpnIp", ip).Debug("Lighthouse query buffer was full, dropping request") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be higher than debug, since without debug logs on it would be hard to tell this is happening and that you need to increase the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to just make this a blocking write to a buffered channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have misread it, but I think the implementation may block when the channel is full.
@wadey got a deadlock with v1.8.0 and was able to pull the stack trace.
HandshakeManager.GetOrHandshake
can be re-entrant viaHandshakeManager.StartHandshake
through a call tohm.lightHouse.QueryServer()
. Aside from the double read lock on the main hostmap not being great, theConnectionManager
go routine had fired between the 1st and 2nd calls toHandshakeManager.GetOrHandshake
and was waiting on a write lock for the main hostmap while blocking any future read locks.This is fixed by adding a channel and handling the actual lighthouse queries in a go routine. Should also speed up the hot path when many handshakes are occurring.
There is also a case when a tunnel is being tested and is using a relay for a double read lock in
ConnectionManager
.This is fixed by turning the test packet into a traffic decision result and handling outside of the read lock.
My primary concern is in handling the
QueryServer
writes on a nonblocking buffered channel. I think we will want to block when full but I am leaving as nonblocking for now to review.