Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authority active net to use ArcSwap #2391

Merged
merged 1 commit into from
Jun 6, 2022

Conversation

lxfind
Copy link
Contributor

@lxfind lxfind commented Jun 2, 2022

Make the net field of authority active an ArcSwap. This will make it easy to swap with a new value latter.

@lxfind lxfind requested review from gdanezis, bmwill and asonnino June 2, 2022 14:41
Copy link
Collaborator

@gdanezis gdanezis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at the initial draft for active services that restart in #2388 . I think we should at the start of each service get a consistent set of Arcs, and use these until service restart?

pub struct ActiveAuthority<A> {
// The local authority state
pub state: Arc<AuthorityState>,
// The network interfaces to other authorities
pub net: Arc<AuthorityAggregator<A>>,
pub net: ArcSwap<AuthorityAggregator<A>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should do that: it encourages each access to do a load() potentially, leading to different parts of the code using different nets / state / health. Instead: we can have a number of arcwaps to the inner Arcs, and upon restart we make an ActiveAuthority with a consistent set of Arcs, and use that for the lifetime of the service?

Copy link
Contributor Author

@lxfind lxfind Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we could do is instead of passing around AuthorityActive in all the active functions, we pass around net. Then we don't have to restart.
Another general question I have is this: why are the active processes unsafe with potential committee changes? Don't they need to deal with potential byzantine validator anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passive processes should be able to handle a swap but active once are going to need some sort of notification in order to do internal book keeping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So really I think we're going to need to have a reconfig notification go out on a broadcast channel as well to these arc swaps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to mention that we cannot rely on ArcSwap in the future if we end up having multiple processes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we simply have to store the LifecycleSignalSender struct, and signal a Restart or Exit, and the services will restart (and this is when they should re-load the config etc) or exit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another general question I have is this: why are the active processes unsafe with potential committee changes? Don't they need to deal with potential byzantine validator anyway?

yeah, but they make the assumption that they know the correct committee and that 2/3 of it is correct. So if the committee changes half way through some task, and half the state is according to the old one, and the other half the new one, unknown things may happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable assumption by checkpoint process, but I assume the gossip process doesn't have that assumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder for myself: we probably should also add some safety guards around all validator requests on the epoch number: that is, regardless how we do this, we should be able to handle the case where you are in epoch X, talking to a committee member who thinks it's epoch X + 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the gossip process doesn't have that assumption?

It kind of does: for example, it computes how long to wait for responses before trying to reconnect on the basis of the size and stake distribution of the committee to not spam everyone with connection requests before the network is up.

@lxfind lxfind marked this pull request as ready for review June 2, 2022 19:35
@lxfind lxfind force-pushed the reconfig-remove-committee-checkpointstore branch from b2dbed1 to 95079a1 Compare June 2, 2022 19:39
@lxfind lxfind force-pushed the epoch-authority-active-net-use-arcswap branch from dd22e4e to f19aa38 Compare June 2, 2022 19:41
@lxfind lxfind force-pushed the reconfig-remove-committee-checkpointstore branch from 95079a1 to 201bc4e Compare June 3, 2022 02:37
Base automatically changed from reconfig-remove-committee-checkpointstore to main June 3, 2022 02:48
@lxfind lxfind force-pushed the epoch-authority-active-net-use-arcswap branch 2 times, most recently from f64ffbe to a831624 Compare June 3, 2022 02:50
@gdanezis
Copy link
Collaborator

gdanezis commented Jun 3, 2022

Bottom line to summarize my position on the use of ArcSwaps:

Yes, lets have an ArcSwap warp the contents of the active authority. BUT we read + clone this into a swap at the start of the active service. And rely on active service re-start to read a new version. Then we use the cloned arc consistently for the lifetime of the service.

@lxfind
Copy link
Contributor Author

lxfind commented Jun 3, 2022

Bottom line to summarize my position on the use of ArcSwaps:

Yes, lets have an ArcSwap warp the contents of the active authority. BUT we read + clone this into a swap at the start of the active service. And rely on active service re-start to read a new version. Then we use the cloned arc consistently for the lifetime of the service.

Yes I agree.

Copy link
Collaborator

@gdanezis gdanezis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool -- tell me how to want me to handle #2388 . One option is to use it for restart on epoch boundary. But we can also just use it for clean shutdown.

@lxfind
Copy link
Contributor Author

lxfind commented Jun 6, 2022

Ok, cool -- tell me how to want me to handle #2388 . One option is to use it for restart on epoch boundary. But we can also just use it for clean shutdown.

Do we want to use it here though? At epoch boundaries, most of the gossip peers will likely remain the same. It seems inefficient if we have to wait to drain all the existing peers and recreate new peers if most of them will be the same anyway

@lxfind lxfind force-pushed the epoch-authority-active-net-use-arcswap branch from a831624 to ffc4813 Compare June 6, 2022 16:15
@lxfind lxfind enabled auto-merge (squash) June 6, 2022 16:21
@lxfind lxfind disabled auto-merge June 6, 2022 16:21
@lxfind lxfind merged commit 192cbc7 into main Jun 6, 2022
@lxfind lxfind deleted the epoch-authority-active-net-use-arcswap branch June 6, 2022 16:44
stella3d pushed a commit that referenced this pull request Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants