Discussion: Onboarding of new validateers #273

clangenb · 2021-06-22T13:26:59Z

With the addition of the direct invocation api, a worker's state can't be constructed with onchain data only. Hence, when a new worker onboards, it needs to fetch data from other workers. Two obvious solutions exists.

A worker fetches the current state from another worker, which consists of both, the chain_relay_db and the STF's state. This would be the simple onboarding, which leverages the trusted nature of TEEs.
A worker syncs layer 1 blocks in the chain relay, whenever it observes a sidechain block confirmation extrinsic. It fetches the corresponding sidechain block from another worker. (A worker could also fetch sidechain blocks batch-wise in beforehand.)

Eventually, we should implement both ways:

Because it is simple and essentially provides all the trust regardless.
Because we always need to be able to reconstruct the history (even if it is only for auditability).

The text was updated successfully, but these errors were encountered:

brenzi · 2021-06-23T09:39:07Z

@haerdib has already implemented state snapshotting AFAIR. This way new workes have to request the latest state from an active validator (I suggest not to use the term worker for sidechain validators. Maybe validaTEErs? 8-P )

I believe it should be your (1) only, not (2). Scanning the chain takes very long and it isn't necessary in our threat model

clangenb · 2021-06-23T09:46:24Z

Yes, I agree. Came to the same conclusion yesterday after some thinking.

brenzi · 2021-09-07T14:13:20Z

connected to #345

brenzi · 2021-09-21T08:32:39Z

@mullefel can you please review this task and think about edge cases to tackle?

clangenb · 2021-09-23T09:14:39Z

Here we briefly discussed two different syncs depending on state recency:

Storing and pruning sidechain blocks in worker #214

murerfel · 2021-09-23T10:22:14Z

Please let me know if I got something wrong, or missed an important part. Much appreciated 🙏

Feature Requirements

Based on the proposed solution (1), I identified some feature requirements:

State syncing using a similar channel, or even the same, as sharing of state encryption key (i.e. RPC)
As a newly spawned validateer:
- determine if there is a discrepancy in state, and need to fetch an updated state. This is done using the side chain (?)
- determine from which 'mature' validateer we fetch the state. If validateers registered in the pallet teerex are guaranteed to be 'mature', i.e. up-to-date, we can choose randomly from those?
- As discovered in the edge cases, we might need multiple ways of syncing, to mitigate some of the latency issues. So it might make sense to be able to:
  1. sync state directly, fetching a full state
  2. sync state incrementally from side-chain blocks

Edge Cases

Scenario multiple new validateers spawned at once

setup: We have 1 up-to-date validateer and we launch multiple new validateers at once.
Need to make sure these new validateers sync a) from the correct up-to-date validateer (and not one of the other newly spawned ones) and b) that multiple concurrent requests for state updates can be processed in a consistent manner and don't deadlock.

Scenario validateer is offline for a short period

starting point: Two validateers V1 and V2 are running and are up-to-date. Validateer V2 goes offline for a short while and comes back online, during which the state in V1 changes. At the same time as V2 comes online again, a new validateer V3 is spawned that needs to sync its state.
Ensure that newly spawned validateer V3 syncs correct state. Either:
- Guarantee that V3 syncs from the unimpaired V1, because we have some way of knowing that only V1 but not V2 has the latest state.
- Multi-stage sync, where we sync from V2, but then notice we are missing some state, and then sync again, possibly from V1

Scenario onboarding latency

setup: We have an up-to-date validateer V1 from which a newly spawned validateer V2 syncs. The state to sync is large, so it takes enough time that in the meantime new state updates are applied to V1. So at the end of the sync, V2 is still not fully up-to-date.
Similar to the scenario above, we probably need multi-stage syncing, to ensure we get to a fully up-to-date state in V2 (similar issue to Storing and pruning sidechain blocks in worker #214)

haerdib · 2021-09-23T13:42:38Z

Continuing my work of commenting on everything: I think the feature requirements sound about correct, but @clangenb is the expert here :)
Regarding the edge cases: I don't think we need such strict requirements. As long as the fetched state is not too old, the "missed" updates can be refetched by requesting stored sidechainblocks. So we have somewhat of a "wider" time-window.

As an example in scenario Scenario validateer is offline for a short period: In case V3 gets it state from V2 (which is outdated, but not very) and notices that it's missing some blocks, it can just request these missing blocks from V1. This should not require any additional implementation, because every validateer needs to be able to rerequest sidechain blocks in case it missed some due to whatever reason.

What we need to keep in mind is that sidechain blocks will be deleted after some (configurable) time. So in case the state syncing latency gets too high, it might really end up in a scenario where a validateer syncs for some time and after it has finally finished syncing, it only then notices that its state is too oudated and it can not refetch the missing blocks.

brenzi · 2021-10-05T07:46:14Z

Please split all these learnings and decisions into tasks and schedule them. All in one epic

clangenb mentioned this issue Jun 22, 2021

sidechain block broadcasting logic #244

Closed

haerdib mentioned this issue Aug 25, 2021

snapshotting of sgx externalities #243

Closed

brenzi changed the title ~~Onboarding of new workers~~ Onboarding of new validateers Sep 7, 2021

brenzi assigned murerfel Sep 7, 2021

murerfel added the Epic label Oct 7, 2021

clangenb mentioned this issue Nov 11, 2021

Multivalidateer MVP #505

Closed

9 tasks

clangenb mentioned this issue Dec 7, 2021

Sync parentchain block import with sidechain block production #541

Merged

clangenb changed the title ~~Onboarding of new validateers~~ Discussion: Onboarding of new validateers Jan 10, 2022

murerfel mentioned this issue May 10, 2022

Integrate docker-compose setup into CI integration test #735

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Onboarding of new validateers #273

Discussion: Onboarding of new validateers #273

clangenb commented Jun 22, 2021

brenzi commented Jun 23, 2021

clangenb commented Jun 23, 2021

brenzi commented Sep 7, 2021

brenzi commented Sep 21, 2021

clangenb commented Sep 23, 2021

murerfel commented Sep 23, 2021

haerdib commented Sep 23, 2021 •

edited

Loading

brenzi commented Oct 5, 2021

Discussion: Onboarding of new validateers #273

Discussion: Onboarding of new validateers #273

Comments

clangenb commented Jun 22, 2021

brenzi commented Jun 23, 2021

clangenb commented Jun 23, 2021

brenzi commented Sep 7, 2021

brenzi commented Sep 21, 2021

clangenb commented Sep 23, 2021

murerfel commented Sep 23, 2021

Feature Requirements

Edge Cases

Scenario multiple new validateers spawned at once

Scenario validateer is offline for a short period

Scenario onboarding latency

haerdib commented Sep 23, 2021 • edited Loading

brenzi commented Oct 5, 2021

haerdib commented Sep 23, 2021 •

edited

Loading