Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Onboarding of new validateers #273

Open
clangenb opened this issue Jun 22, 2021 · 8 comments
Open

Discussion: Onboarding of new validateers #273

clangenb opened this issue Jun 22, 2021 · 8 comments
Assignees
Labels

Comments

@clangenb
Copy link
Contributor

With the addition of the direct invocation api, a worker's state can't be constructed with onchain data only. Hence, when a new worker onboards, it needs to fetch data from other workers. Two obvious solutions exists.

  1. A worker fetches the current state from another worker, which consists of both, the chain_relay_db and the STF's state. This would be the simple onboarding, which leverages the trusted nature of TEEs.
  2. A worker syncs layer 1 blocks in the chain relay, whenever it observes a sidechain block confirmation extrinsic. It fetches the corresponding sidechain block from another worker. (A worker could also fetch sidechain blocks batch-wise in beforehand.)

Eventually, we should implement both ways:

  1. Because it is simple and essentially provides all the trust regardless.
  2. Because we always need to be able to reconstruct the history (even if it is only for auditability).
@brenzi
Copy link
Collaborator

brenzi commented Jun 23, 2021

@haerdib has already implemented state snapshotting AFAIR. This way new workes have to request the latest state from an active validator (I suggest not to use the term worker for sidechain validators. Maybe validaTEErs? 8-P )

I believe it should be your (1) only, not (2). Scanning the chain takes very long and it isn't necessary in our threat model

@clangenb
Copy link
Contributor Author

Yes, I agree. Came to the same conclusion yesterday after some thinking.

@brenzi brenzi changed the title Onboarding of new workers Onboarding of new validateers Sep 7, 2021
@brenzi
Copy link
Collaborator

brenzi commented Sep 7, 2021

connected to #345

@brenzi
Copy link
Collaborator

brenzi commented Sep 21, 2021

@mullefel can you please review this task and think about edge cases to tackle?

@clangenb
Copy link
Contributor Author

Here we briefly discussed two different syncs depending on state recency:

@murerfel
Copy link
Contributor

Please let me know if I got something wrong, or missed an important part. Much appreciated 🙏

Feature Requirements

Based on the proposed solution (1), I identified some feature requirements:

  • State syncing using a similar channel, or even the same, as sharing of state encryption key (i.e. RPC)
  • As a newly spawned validateer:
    • determine if there is a discrepancy in state, and need to fetch an updated state. This is done using the side chain (?)
    • determine from which 'mature' validateer we fetch the state. If validateers registered in the pallet teerex are guaranteed to be 'mature', i.e. up-to-date, we can choose randomly from those?
    • As discovered in the edge cases, we might need multiple ways of syncing, to mitigate some of the latency issues. So it might make sense to be able to:
      1. sync state directly, fetching a full state
      2. sync state incrementally from side-chain blocks

Edge Cases

Scenario multiple new validateers spawned at once

  • setup: We have 1 up-to-date validateer and we launch multiple new validateers at once.
  • Need to make sure these new validateers sync a) from the correct up-to-date validateer (and not one of the other newly spawned ones) and b) that multiple concurrent requests for state updates can be processed in a consistent manner and don't deadlock.

Scenario validateer is offline for a short period

  • starting point: Two validateers V1 and V2 are running and are up-to-date. Validateer V2 goes offline for a short while and comes back online, during which the state in V1 changes. At the same time as V2 comes online again, a new validateer V3 is spawned that needs to sync its state.
  • Ensure that newly spawned validateer V3 syncs correct state. Either:
    • Guarantee that V3 syncs from the unimpaired V1, because we have some way of knowing that only V1 but not V2 has the latest state.
    • Multi-stage sync, where we sync from V2, but then notice we are missing some state, and then sync again, possibly from V1

Scenario onboarding latency

  • setup: We have an up-to-date validateer V1 from which a newly spawned validateer V2 syncs. The state to sync is large, so it takes enough time that in the meantime new state updates are applied to V1. So at the end of the sync, V2 is still not fully up-to-date.
  • Similar to the scenario above, we probably need multi-stage syncing, to ensure we get to a fully up-to-date state in V2 (similar issue to Storing and pruning sidechain blocks in worker #214)

@haerdib
Copy link
Contributor

haerdib commented Sep 23, 2021

Continuing my work of commenting on everything: I think the feature requirements sound about correct, but @clangenb is the expert here :)
Regarding the edge cases: I don't think we need such strict requirements. As long as the fetched state is not too old, the "missed" updates can be refetched by requesting stored sidechainblocks. So we have somewhat of a "wider" time-window.

As an example in scenario Scenario validateer is offline for a short period: In case V3 gets it state from V2 (which is outdated, but not very) and notices that it's missing some blocks, it can just request these missing blocks from V1. This should not require any additional implementation, because every validateer needs to be able to rerequest sidechain blocks in case it missed some due to whatever reason.

What we need to keep in mind is that sidechain blocks will be deleted after some (configurable) time. So in case the state syncing latency gets too high, it might really end up in a scenario where a validateer syncs for some time and after it has finally finished syncing, it only then notices that its state is too oudated and it can not refetch the missing blocks.

@brenzi
Copy link
Collaborator

brenzi commented Oct 5, 2021

Please split all these learnings and decisions into tasks and schedule them. All in one epic

@murerfel murerfel added the Epic label Oct 7, 2021
@clangenb clangenb mentioned this issue Nov 11, 2021
9 tasks
@clangenb clangenb changed the title Onboarding of new validateers Discussion: Onboarding of new validateers Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants