-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authority active net to use ArcSwap #2391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at the initial draft for active services that restart in #2388 . I think we should at the start of each service get a consistent set of Arcs, and use these until service restart?
pub struct ActiveAuthority<A> { | ||
// The local authority state | ||
pub state: Arc<AuthorityState>, | ||
// The network interfaces to other authorities | ||
pub net: Arc<AuthorityAggregator<A>>, | ||
pub net: ArcSwap<AuthorityAggregator<A>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we should do that: it encourages each access to do a load()
potentially, leading to different parts of the code using different nets / state / health. Instead: we can have a number of arcwaps to the inner Arcs, and upon restart we make an ActiveAuthority with a consistent set of Arcs, and use that for the lifetime of the service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we could do is instead of passing around AuthorityActive
in all the active functions, we pass around net
. Then we don't have to restart.
Another general question I have is this: why are the active processes unsafe with potential committee changes? Don't they need to deal with potential byzantine validator anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passive processes should be able to handle a swap but active once are going to need some sort of notification in order to do internal book keeping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So really I think we're going to need to have a reconfig notification go out on a broadcast channel as well to these arc swaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to mention that we cannot rely on ArcSwap in the future if we end up having multiple processes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we simply have to store the LifecycleSignalSender
struct, and signal a Restart
or Exit
, and the services will restart (and this is when they should re-load the config etc) or exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another general question I have is this: why are the active processes unsafe with potential committee changes? Don't they need to deal with potential byzantine validator anyway?
yeah, but they make the assumption that they know the correct committee and that 2/3 of it is correct. So if the committee changes half way through some task, and half the state is according to the old one, and the other half the new one, unknown things may happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a reasonable assumption by checkpoint process, but I assume the gossip process doesn't have that assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder for myself: we probably should also add some safety guards around all validator requests on the epoch number: that is, regardless how we do this, we should be able to handle the case where you are in epoch X, talking to a committee member who thinks it's epoch X + 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the gossip process doesn't have that assumption?
It kind of does: for example, it computes how long to wait for responses before trying to reconnect on the basis of the size and stake distribution of the committee to not spam everyone with connection requests before the network is up.
b2dbed1
to
95079a1
Compare
dd22e4e
to
f19aa38
Compare
95079a1
to
201bc4e
Compare
f64ffbe
to
a831624
Compare
Bottom line to summarize my position on the use of ArcSwaps: Yes, lets have an ArcSwap warp the contents of the active authority. BUT we read + clone this into a swap at the start of the active service. And rely on active service re-start to read a new version. Then we use the cloned arc consistently for the lifetime of the service. |
Yes I agree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, cool -- tell me how to want me to handle #2388 . One option is to use it for restart on epoch boundary. But we can also just use it for clean shutdown.
Do we want to use it here though? At epoch boundaries, most of the gossip peers will likely remain the same. It seems inefficient if we have to wait to drain all the existing peers and recreate new peers if most of them will be the same anyway |
a831624
to
ffc4813
Compare
Make the
net
field of authority active anArcSwap
. This will make it easy to swap with a new value latter.