Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus Adapter #1449

Merged
merged 9 commits into from
Apr 20, 2022
Merged

Consensus Adapter #1449

merged 9 commits into from
Apr 20, 2022

Conversation

asonnino
Copy link
Contributor

@asonnino asonnino commented Apr 19, 2022

What this PR does

This PR introduces the Consensus Adapter module who bridges the Sui with consensus. It works in the following way:

  • Client submit a certificate referencing shared objects to Sui
  • The consensus adapter checks the certificate and forwards it to the consensus node
  • The consensus node sequences the certificate, the AuthorityState assigns locks to the shared objects (not in this PR)
  • The consensus adapter notifies the Authority Server
  • The authority Server attemps to execute the certificate (if all locks are assigned correctly) and replies to the client

What this PR does not do

  • The top level SuiCommand module needs to spawn the consensus node (blocked until Narwhal is open sourced)
  • The Authority state needs to implement the trait ExecutionState and run a Narwhal client (blocked until Narhwal is open sourced)

@asonnino asonnino changed the title Consensus-adapter Consensus Adapter Apr 19, 2022
@asonnino asonnino added Priority: High Very important task, not blocking but potentially delaying milestones or limiting our offering Status: Needs Review PR is ready for review sui-node Type: Major Feature Major new functionality or integration, which has a significant impact on the network labels Apr 19, 2022
@asonnino asonnino self-assigned this Apr 19, 2022
@asonnino asonnino added this to the DevNet milestone Apr 19, 2022
@gdanezis
Copy link
Collaborator

Make me a favour on the checkpointing side, which is also going to use the the consensus:

  • Do add a description to the PR otherwise no one knows that they are reviewing.
  • Define an enum for the messages that are sequenced. Right now there will be one, containing a certificate, but down the line there will be info for checkpointing too.

@asonnino asonnino requested a review from gdanezis April 19, 2022 17:07
@asonnino
Copy link
Contributor Author

asonnino commented Apr 19, 2022

Make me a favour on the checkpointing side, which is also going to use the the consensus:

  • Do add a description to the PR otherwise no one knows that they are reviewing.

You have been too fast :) I was writing the description of the PR (but I am a slow writer)

  • Define an enum for the messages that are sequenced. Right now there will be one, containing a certificate, but down the line there will be info for checkpointing too.

Good idea fixing this

@codecov
Copy link

codecov bot commented Apr 19, 2022

Codecov Report

Merging #1449 (923995b) into main (c1e46f0) will increase coverage by 0.00%.
The diff coverage is 87.09%.

❗ Current head 923995b differs from pull request most recent head 195b4af. Consider uploading reports for the commit 195b4af to get more accurate results

@@           Coverage Diff            @@
##             main    #1449    +/-   ##
========================================
  Coverage   82.19%   82.19%            
========================================
  Files         104      107     +3     
  Lines       21145    21385   +240     
========================================
+ Hits        17380    17578   +198     
- Misses       3765     3807    +42     
Impacted Files Coverage Δ
sui/src/benchmark/load_generator.rs 0.00% <0.00%> (ø)
sui_core/src/lib.rs 100.00% <ø> (ø)
sui_types/src/error.rs 50.00% <ø> (ø)
test_utils/src/lib.rs 38.46% <33.33%> (-61.54%) ⬇️
sui_core/src/authority_server.rs 82.23% <47.36%> (-2.98%) ⬇️
test_utils/src/network.rs 91.66% <91.66%> (ø)
sui_core/src/consensus_adapter.rs 93.15% <93.15%> (ø)
sui/src/sui_commands.rs 77.07% <94.73%> (+1.15%) ⬆️
sui_core/src/unit_tests/consensus_adapter_tests.rs 98.50% <98.50%> (ø)
sui/src/config.rs 83.54% <100.00%> (+0.99%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fa31b4...195b4af. Read the comment docs.

@asonnino
Copy link
Contributor Author

asonnino commented Apr 19, 2022

Make me a favour on the checkpointing side, which is also going to use the the consensus:

  • Define an enum for the messages that are sequenced. Right now there will be one, containing a certificate, but down the line there will be info for checkpointing too.

Fixed

@asonnino asonnino linked an issue Apr 19, 2022 that may be closed by this pull request
@asonnino asonnino requested a review from huitseeker April 20, 2022 13:52
@todd-mystenlabs
Copy link
Contributor

@lxfind - I want to get this PR landed. Who is the best to review it?

@asonnino asonnino requested a review from lxfind April 20, 2022 18:22
@lxfind lxfind requested a review from bmwill April 20, 2022 18:22
@lxfind
Copy link
Contributor

lxfind commented Apr 20, 2022

@lxfind - I want to get this PR landed. Who is the best to review it?

I recommend @bmwill to take a look given this PR is primarily about network setup.

@bmwill
Copy link
Contributor

bmwill commented Apr 20, 2022

Do we know what we want the interface between consensus and sui to be? As in we expect this to simply be a point to point link between narwhal and sui? Maybe we can reuse some of the grpc work that's being worked on for this interface as well?

@asonnino
Copy link
Contributor Author

asonnino commented Apr 20, 2022

Do we know what we want the interface between consensus and sui to be? As in we expect this to simply be a point to point link between narwhal and sui? Maybe we can reuse some of the grpc work that's being worked on for this interface as well?

Yes eventually we may use the grpc interface. The challenge is that the Narwhal interface also needs to work for Celo and Somelier (not only Sui), so I am not sure this can be done before dev net

The current interface is simple: Sui submits serialized certificates to Narwhal (who see them as bytes). Consensus sequences those certificates and assigns a lock to each shared object. Sui (or the client) then pick up the certificate and execute it (on the Sui side).

Copy link
Contributor

@bmwill bmwill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given our timeline we can get this in once the tests are passing (they seemed to timeout?). I think we'll want to spend a bit more time looking at our interfaces but we can do that at a future time.

@asonnino asonnino enabled auto-merge (squash) April 20, 2022 18:53
@asonnino
Copy link
Contributor Author

Given our timeline we can get this in once the tests are passing (they seemed to timeout?). I think we'll want to spend a bit more time looking at our interfaces but we can do that at a future time.

Tests work fine on my local machine, debugging the CI now

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is going to run out of fds (not to mention ports) fast.

Comment on lines +95 to +104
let consensus_address = if let Some(val) = json.get("consensus_address") {
SocketAddr::deserialize(val).map_err(serde::de::Error::custom)?
} else {
let port = PORT_ALLOCATOR
.lock()
.map_err(serde::de::Error::custom)?
.next_port()
.ok_or_else(|| serde::de::Error::custom("No available port."))?;
format!("127.0.0.1:{port}").parse().unwrap()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up in regard to #1469 : what we're really managing here is a bind address (for lack of a way to determine the public address), and we should probably use variable names that reflect this. Dropping a log line here or in make_authority would also help.

cc. @bmwill for an ubiquitous pain point of the current implementation: we bind where we (think we) are reachable, and in absence of that public address, bind to localhost 😅


Some(output) = self.rx_consensus_output.recv() => {
// Notify the caller that the transaction has been sequenced.
let (outcome, transaction) = output;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You could bind this pattern two lines earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #1467

Comment on lines +89 to +90
if let Some(repliers) = self.pending.get_mut(&digest) {
if let Some(replier) = repliers.pop_front() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(repliers) = self.pending.get_mut(&digest) {
if let Some(replier) = repliers.pop_front() {
if let Some(replier) = self.pending.get_mut(&digest).and_then(|r| r.pop_front()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #1467

// Send certificate to consensus
let serialized = serialize_consensus_transaction(certificate);
let bytes = Bytes::from(serialized.clone());
// TODO [issue #1452]: We are re-creating a connection every time. This is wasteful but does not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this (#1452) is not so much that you're wasteful, it's that you're going to run against an fd limit pretty fast.
An immediate workaround is the tricks akin to 43e9190 but a better fix is just to have an ad-hoc connection pool using a library like bb8, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple other ways to approach this:

  1. Use a real RPC like framework that just handles these things for us (e.g. gRPC like we're doing for our other narwhal integrations)
  2. store an Option<TcpStream> and if its none do the reconnection otherwise use the existing connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially went for option 2 (store an Option<TcpStream>) but this function cannot take a mutable reference to self (for reasons linked to authority state). but what francois mentions is a problem I will try to fix it in a separate PR

@asonnino asonnino merged commit cf53ad5 into main Apr 20, 2022
@asonnino asonnino deleted the consensus-adapter branch April 20, 2022 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Very important task, not blocking but potentially delaying milestones or limiting our offering Status: Needs Review PR is ready for review sui-node Type: Major Feature Major new functionality or integration, which has a significant impact on the network
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spawn consensus adapter
6 participants