Skip to content

Comments

Add QBFT manager#77

Merged
jking-aus merged 13 commits intosigp:unstablefrom
dknopik:split-qbft
Jan 14, 2025
Merged

Add QBFT manager#77
jking-aus merged 13 commits intosigp:unstablefrom
dknopik:split-qbft

Conversation

@dknopik
Copy link
Member

@dknopik dknopik commented Dec 20, 2024

Refactor the current qbft crate to be sync and general purpose and create qbft_manager which integrates into the client architecture such as the processor. This module is going to be used by the validator store (to trigger QBFT instances) and the network (to pass on network messages related to QBFT).

There is still quite a bit to do to clean this up, but this initial version should be able to get us going.

Depends on #76.

@dknopik dknopik changed the title [WIP] Split qbft Add QBFT manager Jan 3, 2025
@dknopik dknopik marked this pull request as ready for review January 3, 2025 09:16
@dknopik dknopik requested a review from jking-aus January 3, 2025 09:16
@dknopik dknopik mentioned this pull request Jan 3, 2025
@dknopik dknopik added the ready-for-review This PR is ready to be reviewed label Jan 7, 2025
Copy link
Member

@jking-aus jking-aus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent work mate -- tonnes of great additions, in particular I like the idea of splitting out the instance manager from the consensus mechanism. Probably two or three PRs of work here tbh

*operator_id
== *committee
.get(((round.get() - Round::default().get()) + *instance_height) % committee.len())
.expect("slice bounds kept by modulo length")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this panic if committee.len = 0? assume there is a sanity check somewhere to prevent this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - yes, modulo/remainder panics if the len is 0. And no, I don't think this is currently checked 😅 - I will extend the validation done by the config builder.

@dknopik dknopik added ready-for-review This PR is ready to be reviewed and removed ready-for-review This PR is ready to be reviewed labels Jan 8, 2025
@dknopik dknopik mentioned this pull request Jan 8, 2025
// the fields better

#[derive(Clone, Debug)]
pub struct SignedSsvMessage<E: EthSpec> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was just a rough draft for my understanding as the comment says - we should probably switch to yours :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are better things in your code, like using an Enum for the raw data. We could keep both for now, but think about a way of merging them and where it should be in the code. Should we use the same type for the data parsed from the network and the one used in application logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use them in application code as well. In some cases, we rely on them being the same as networking messages - for example, PREPARE and COMMIT messages refer to the full data by hash, so we need the messages to be identical to the format used by go-ssv so that the hashes agree.

I propose we both merge our versions and then align them in an own PR. Intuitively, I would put them in ssv_types, but we can figure that out then :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the full_data is passed as raw bytes to the application and the decoding is done there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the network stack should verify that full_data is actually a valid object, so that illegal messages with junk data are not further gossiped. But I am not sure if that is actually done by go-ssv...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, I think we will (unfortunately) need some kind of type separation - we can only decode as soon as we know the data version (i.e. fork) that is specified in the message itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to create a valid object with junk data anyway. I believe the more comprehensive validation in Gossipsub is done by the application.

Copy link
Member

@diegomrsantos diegomrsantos Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub pr: usize,
pub committee_size: usize,
pub committee_members: HashSet<OperatorId>,
pub committee_members: IndexSet<OperatorId>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should those fields be private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, IMO users should be able to use the Config directly instead of using the builder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but then we should probably move the validation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, IMO users should be able to use the Config directly instead of using the builder

In this way, I don't see the point of the builder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Builder is for ease of use when default values are desired. As the defaults are dependent on other values, a Default impl does not suffice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this builder pattern I mentioned is to facilitate the creation of an immutable config instance making it clearer what fields are required and optional, as well as centralizing the validation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fair point 👍

/// The quorum size required for the committee to reach consensus
pub fn quorum_size(&self) -> usize {
self.quorum_size
pub fn commmittee_members(&self) -> &IndexSet<OperatorId> {
Copy link
Member

@diegomrsantos diegomrsantos Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add this to the beginning of this impl

    /// Private constructor so it can only be built by our `ConfigBuilder`.
    fn from_builder(builder: &ConfigBuilder<F>) -> Self {
        Self {
            operator_id: builder.operator_id,
            instance_height: builder.instance_height,
            committee_members: builder.committee_members.clone(),
            round: builder.round,
            round_time: builder.round_time,
            max_rounds: builder.max_rounds,
            quorum_size: builder.quorum_size,
            leader_fn: builder.leader_fn.clone(),
        }
    }

};

(in_sender, out_receiver, instance)
qbft.start_round();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the constructor is doing more than just initializing data structures - start_round() is changing state and sending messages. It might be cleaner and easier to work with if construction and execution were kept separate. This way, the caller can control when the QBFT process starts, and it could also make testing simpler. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not calling start_round leaves the instance in an invalid state, inviting misuse. I'd prefer to leave it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the constructor create an instance in an invalid state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, sorry, got confused, you are right! should be fine without, even if messages come in without calling it. so yeah, we can change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid comment but let's merge this in first

Copy link
Member

@diegomrsantos diegomrsantos Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth creating an issue to address that?

type RoundChangeMap<D> = HashMap<OperatorId, Option<ConsensusData<ValidatedData<D>>>>;
type RoundChangeMap<D> = HashMap<OperatorId, Option<ConsensusData<D>>>;

pub trait Data: Debug + Clone {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub trait Hashable: Debug + Clone {
    type Digest: Debug + Clone + Eq + Hash;
    
    fn digest(&self) -> Self::Digest;
}

What do you think about it? The motivation is to make the trait name more descriptive and avoid using Hash twice - it seems like a recursive definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named it Data intentionally, as it does more than hashing through the bounds on Debug and Clone.

I don't have a strong opinion on Digest vs Hash, so feel free to change it.

dknopik and others added 3 commits January 13, 2025 12:18
Co-authored-by: diegomrsantos <diegomrsantos@gmail.com>
Co-authored-by: diegomrsantos <diegomrsantos@gmail.com>
Comment on lines +156 to +187
pub fn operator_id(&self) -> OperatorId {
self.operator_id
}

pub fn committee_members(&self) -> &IndexSet<OperatorId> {
&self.committee_members
}

pub fn instance_height(&self) -> &InstanceHeight {
&self.instance_height
}
}

impl<F: LeaderFunction + Clone> ConfigBuilder<F> {
pub fn operator_id(&mut self, operator_id: OperatorId) -> &mut Self {
self.config.operator_id = operator_id;
pub fn round(&self) -> Round {
self.round
}

pub fn round_time(&self) -> Duration {
self.round_time
}

pub fn quorum_size(&self) -> usize {
self.quorum_size
}

pub fn max_rounds(&self) -> usize {
self.max_rounds
}

pub fn leader_fn(&self) -> &F {
&self.leader_fn
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need those?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, at least the test need to access the builder data (now that it is impossible to update the Config, see c742d1d)

I am unsure if we now ended up with the most ergonomic/clean setup, but I think it's fine for now. We can revisit it later.

Copy link
Member

@jking-aus jking-aus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work daniel - lots of good discussion as well

@jking-aus jking-aus merged commit 9389ccd into sigp:unstable Jan 14, 2025
9 checks passed
};
let operator_id = self.config.operator_id();
(self.send_message)(Message::Propose(operator_id, consensus_data.clone()));
self.received_propose(operator_id, consensus_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we calling the corresponding receive after each send?

@Zacholme7 Zacholme7 mentioned this pull request Jan 14, 2025
@dknopik dknopik deleted the split-qbft branch February 5, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review This PR is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants