Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[narwhal] Add CertificateV2 #13777

Merged
merged 4 commits into from
Oct 10, 2023
Merged

[narwhal] Add CertificateV2 #13777

merged 4 commits into from
Oct 10, 2023

Conversation

arun-koshy
Copy link
Contributor

Added new version for NW Certificate which Includes new AggregateSignatureState. This holds AggregateSignatureBytes but with the added layer to specify if the signature was verified via a leader, verified directly, unverified or unsigned. This will be used to take advantage of the certificate chain that is formed via the DAG by only verifying the leaders of the certificate chain when they are fetched from validators during catchup.

This is gated by a protocol feature flag so this PR should have no effect yet. Still testing the followup PR which will include the usage of CertificateV2, but sharing here to provide more context for these changes.

@vercel
Copy link

vercel bot commented Sep 14, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
mysten-ui ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 10, 2023 6:45pm
sui-typescript-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 10, 2023 6:45pm
3 Ignored Deployments
Name Status Preview Comments Updated (UTC)
explorer ⬜️ Ignored (Inspect) Visit Preview Oct 10, 2023 6:45pm
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Oct 10, 2023 6:45pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Oct 10, 2023 6:45pm

}

/// Verifies the validity of the certificate.
/// TODO: Output a different type, similar to Sui VerifiedCertificate.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwtian was thinking about removing this TODO as will have this distinction with AggregateSignatureState. Lmk what you think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this TODO. I was hoping using an enum to represent signature and verification state would make migration easier. But there seem to be other issues. In particular, I think we still should not make verify() take a &mut self. How about:

  • From verify() and also sanitize(), return a Certificate where the verification state is VerifiedDirectly for CertificateV2
  • Add a verify_indirectly() function, that takes in a child certificate, checks the child is in a verified state and this certificate is one of its parent
    ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, verify() and sanitize() will take self by value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code so that the cert will not be modified in place and create a new cert. What is the reason why this is better than modifying the certificate in place? Is it to protect against data races due to concurrent access of the referenced certs?

I will leave out adding verify_indirectly() for now, I verify certs indirectly in the follow up PR where this method does not seem needed. But after you review that PR we can revisit if this is necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it is very hard to migrate to returning a different Certificate type from sanitize() and verify(), but we can use a similar semantics for these functions. Usually functions named sanitize() and verify() should not mutate the underlying data.

Comment on lines 1235 to 1253
VerifiedViaLeader(AggregateSignatureBytes),
VerifiedDirectly(AggregateSignatureBytes),
Unverified(AggregateSignatureBytes),
Unsigned(AggregateSignatureBytes),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add some comments to explain what every state means here

Copy link
Contributor

@akichidis akichidis Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I see the AggregateSignatureState actually representing the certificate's verification state, rather than the aggregate signature it self. So maybe (just a proposal) something like VerificationState / CertificateVerificationState would be more appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided on AggregateSignatureVerificationState, let me know what you think

@akichidis
Copy link
Contributor

akichidis commented Sep 18, 2023

From a first pass this looks great @arun-koshy (referring to the follow up PR as well) ! Do you have any metrics that show how much we verify/save compared to before from your tests? Just to confirm the expectation /theory here

narwhal/primary/src/synchronizer.rs Outdated Show resolved Hide resolved
narwhal/types/benches/verify_certificate.rs Outdated Show resolved Hide resolved
}

/// This function requires that certificate was verified against given committee
pub fn signed_authorities(&self, committee: &Committee) -> Vec<PublicKey> {
match self {
Certificate::V1(certificate) => certificate.signed_authorities(committee),
Certificate::V2(certificate) => certificate.signed_authorities(committee),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think method implementation like

match self {
    Certificate::V1(certificate) => certificate.signed_authorities(committee),
    Certificate::V2(certificate) => certificate.signed_authorities(committee),
}

Can be avoided by having signed_authorities() in CertificateAPI, and implementing the method separately for both V1 and V2. Is this feasible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding your suggestion, but I do have signed_authorities() in CertificateAPI. I need the match arm within the methods of Certificate to dispatch calls to the correct variants methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of signed_authorities() look different then the signature here, so I'm not sure how it is working right now. But I believe you don't need to implement signed_authorities() in impl Certificate with a match statement, as long as the method is implement properly for V1 and V2. The match statement will be generated by the #[enum_dispatch(CertificateAPI)]. We can follow up in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the confusion now, there are two methods with the same name but slightly different signatures. One is just a getter on the signed_authorities bitmap field this is part that the enum_dispatch will take care of. The other is the method that is part of the Certificate impl which is going from the bitmap to the public keys using the committee that is passed in.

narwhal/types/src/primary.rs Outdated Show resolved Hide resolved
narwhal/types/src/primary.rs Show resolved Hide resolved
narwhal/types/src/primary.rs Outdated Show resolved Hide resolved
}

/// Verifies the validity of the certificate.
/// TODO: Output a different type, similar to Sui VerifiedCertificate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this TODO. I was hoping using an enum to represent signature and verification state would make migration easier. But there seem to be other issues. In particular, I think we still should not make verify() take a &mut self. How about:

  • From verify() and also sanitize(), return a Certificate where the verification state is VerifiedDirectly for CertificateV2
  • Add a verify_indirectly() function, that takes in a child certificate, checks the child is in a verified state and this certificate is one of its parent
    ?

async fn process_certificate_internal(
&self,
certificate: Certificate,
mut certificate: Certificate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mut necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could just rebind the value in certificate if sanitize is true but it seemed clearer to indicate to the caller that this method may modify the passed in certificate and the comment above the method explains in which case this will happen.

@@ -745,12 +752,13 @@ impl Synchronizer {
let (sender, receiver) = oneshot::channel();
self.inner
.tx_certificate_acceptor
.send((certificate, sender, early_suspend))
.send((certificate.clone(), sender, early_suspend))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be useful to assert that the certificate is in a verified state at the receiving end of rx_certificate_acceptor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion, I will make sure to do this on the followup PR where we use CertificateV2

}

/// This function requires that certificate was verified against given committee
pub fn signed_authorities(&self, committee: &Committee) -> Vec<PublicKey> {
match self {
Certificate::V1(certificate) => certificate.signed_authorities(committee),
Certificate::V2(certificate) => certificate.signed_authorities(committee),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of signed_authorities() look different then the signature here, so I'm not sure how it is working right now. But I believe you don't need to implement signed_authorities() in impl Certificate with a match statement, as long as the method is implement properly for V1 and V2. The match statement will be generated by the #[enum_dispatch(CertificateAPI)]. We can follow up in another PR.

}

/// Verifies the validity of the certificate.
/// TODO: Output a different type, similar to Sui VerifiedCertificate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it is very hard to migrate to returning a different Certificate type from sanitize() and verify(), but we can use a similar semantics for these functions. Usually functions named sanitize() and verify() should not mutate the underlying data.

@vercel vercel bot temporarily deployed to Preview – mysten-ui October 10, 2023 18:45 Inactive
@arun-koshy arun-koshy enabled auto-merge (squash) October 10, 2023 18:59
@arun-koshy arun-koshy disabled auto-merge October 10, 2023 18:59
@arun-koshy arun-koshy merged commit 6ab618a into main Oct 10, 2023
@arun-koshy arun-koshy deleted the ak/certificate_v2 branch October 10, 2023 20:08
arun-koshy added a commit that referenced this pull request Oct 12, 2023
## Description 

[PR#13777](#13777) introduces new
`SignatureVerificationState` that can now be used in certificate
fetching to only verify the tip of certificate chains and ensure that
the verification state is reflected in storage. Doing so should save us
time in signature verification for a node that is trying to catchup.

## Test Plan 

Added unit tests. Catchup tests in private-testnet.

## Results 

#### NW Catchup Rate @ [200 TPS w/ 2 hours of
downtime](https://metrics.sui.io/d/ORCQSHfVk/nw-catchup-dashboard?var-Environment=8Xt1pVoVk&var-network=private-testnet&var-validator=ams-ptn-val-00&var-validator=ams-ptn-val-09&orgId=1&from=1696385905867&to=1696388154869)
![Screenshot 2023-10-10 at 4 25 53
PM](https://github.com/MystenLabs/sui/assets/97870774/ae71c43d-ec24-4d56-85f5-0703c9794ef8)


#### NW Catchup Rate @ [5K TPS w/ 1 hour of
downtime](https://metrics.sui.io/d/ORCQSHfVk/nw-catchup-dashboard?var-Environment=8Xt1pVoVk&var-network=private-testnet&var-validator=ams-ptn-val-00&var-validator=ams-ptn-val-02&var-validator=ams-ptn-val-09&var-validator=del-ptn-val-08&var-validator=sjc-ptn2-val-00&var-validator=ams-ptn-val-03&orgId=1&from=1696643114770&to=1696651260983)
![Screenshot 2023-10-10 at 4 26 24
PM](https://github.com/MystenLabs/sui/assets/97870774/1e5531eb-9152-4334-a715-a1d6136dd804)

## Known Issues to be investigated/fixed in follow up PRs

#### Narwhal catchup only hits full potential after state sync completes
![Screenshot 2023-10-10 at 4 31 21
PM](https://github.com/MystenLabs/sui/assets/97870774/23f25ffe-0998-40c1-af14-ab6e864b8b8d)

#### Execution bottleneck at high TPS preventing higher NW catchup rate
![Screenshot 2023-10-10 at 4 33 39
PM](https://github.com/MystenLabs/sui/assets/97870774/aa222598-4570-42a0-85ad-c2f46c77aba2)


---
If your changes are not user-facing and not a breaking change, you can
skip the following section. Otherwise, please indicate what changed, and
then add to the Release Notes section as highlighted during the release
process.

### Type of Change (Check all that apply)

- [X] protocol change
- [ ] user-visible impact
- [ ] breaking change for a client SDKs
- [ ] breaking change for FNs (FN binary must upgrade)
- [ ] breaking change for validators or node operators (must upgrade
binaries)
- [ ] breaking change for on-chain data layout
- [ ] necessitate either a data wipe or data migration

### Release notes

Protocol upgrade to version 28 which will enable the use of
CertificateV2 in narwhal that will be used to speed up processing of
certificates during certificate fetching/catchup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants