Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix provisioner issue for on-demand/coretime #3141

Closed
eskimor opened this issue Jan 30, 2024 · 3 comments · Fixed by #3233
Closed

Fix provisioner issue for on-demand/coretime #3141

eskimor opened this issue Jan 30, 2024 · 3 comments · Fixed by #3233
Assignees

Comments

@eskimor
Copy link
Member

eskimor commented Jan 30, 2024

Provisioner code does not seem to be fully on-demand ready.

// TODO: doesn't work for on-demand parachains. We lean hard on the

@antonva antonva self-assigned this Jan 31, 2024
@antonva
Copy link
Contributor

antonva commented Jan 31, 2024

This should be fine and is generally the same as #664,
which is that next_up_on_available() used to use the CoreIndex to index into the paras::parachains() vec directly to map to a ParaId.

Ever since we refactored this out of the scheduler and into the AssignmentProvider interface we no longer have this assumption.

@alindima
Copy link
Contributor

alindima commented Feb 1, 2024

Having a deeper look at the code and speaking with @antonva about this, we came to the following conclusion:

Assume core A is occupied by Para1, the next_up_on_available is Para2 and the bitfields for making the candidate of Para1 have been recorded on-chain.
The next block producer will call the provisioner, which will request from prospective-parachains the next backable candidate of Para2, which has the Para1 candidate as parent. There's obviously no such candidate so the core will transition to scheduled.
Para2 will be able to get a candidate in the next iteration.

So it's a slight inefficiency that is made less likely due to the core affinities.

I suggest we fix it by modifying the provisioner to check if the newly scheduled para is different than the previous one and if it is, request a backable candidate with the required_path of the new para instead (taking into account a possible parent candidate for Para 2 present on another core)

@alindima
Copy link
Contributor

I found a very similar problem in collation-generation: #3327

github-merge-queue bot pushed a commit that referenced this issue Mar 1, 2024
#3130

builds on top of #3160

Processes the availability cores and builds a record of how many
candidates it should request from prospective-parachains and their
predecessors.
Tries to supply as many candidates as the runtime can back. Note that
the runtime changes to back multiple candidates per para are not yet
done, but this paves the way for it.

The following backing/inclusion policy is assumed:
1. the runtime will never back candidates of the same para which don't
form a chain with the already backed candidates. Even if the others are
still pending availability. We're optimistic that they won't time out
and we don't want to back parachain forks (as the complexity would be
huge).
2. if a candidate is timed out of the core before being included, all of
its successors occupying a core will be evicted.
3. only the candidates which are made available and form a chain
starting from the on-chain para head may be included/enacted and cleared
from the cores. In other words, if para head is at A and the cores are
occupied by B->C->D, and B and D are made available, only B will be
included and its core cleared. C and D will remain on the cores awaiting
for C to be made available or timed out. As point (2) above already
says, if C is timed out, D will also be dropped.
4. The runtime will deduplicate candidates which form a cycle. For
example if the provisioner supplies candidates A->B->A, the runtime will
only back A (as the state output will be the same)

Note that if a candidate is timed out, we don't guarantee that in the
next relay chain block the block author will be able to fill all of the
timed out cores of the para. That increases complexity by a lot.
Instead, the provisioner will supply N candidates where N is the number
of candidates timed out, but doesn't include their successors which will
be also deleted by the runtime. This'll be backfilled in the next relay
chain block.

Adjacent changes:
- Also fixes: #3141
- For non prospective-parachains, don't supply multiple candidates per
para (we can't have elastic scaling without prospective parachains
enabled). paras_inherent should already sanitise this input but it's
more efficient this way.

Note: all of these changes are backwards-compatible with the
non-elastic-scaling scenario (one core per para).
skunert pushed a commit to skunert/polkadot-sdk that referenced this issue Mar 4, 2024
…ch#3233)

paritytech#3130

builds on top of paritytech#3160

Processes the availability cores and builds a record of how many
candidates it should request from prospective-parachains and their
predecessors.
Tries to supply as many candidates as the runtime can back. Note that
the runtime changes to back multiple candidates per para are not yet
done, but this paves the way for it.

The following backing/inclusion policy is assumed:
1. the runtime will never back candidates of the same para which don't
form a chain with the already backed candidates. Even if the others are
still pending availability. We're optimistic that they won't time out
and we don't want to back parachain forks (as the complexity would be
huge).
2. if a candidate is timed out of the core before being included, all of
its successors occupying a core will be evicted.
3. only the candidates which are made available and form a chain
starting from the on-chain para head may be included/enacted and cleared
from the cores. In other words, if para head is at A and the cores are
occupied by B->C->D, and B and D are made available, only B will be
included and its core cleared. C and D will remain on the cores awaiting
for C to be made available or timed out. As point (2) above already
says, if C is timed out, D will also be dropped.
4. The runtime will deduplicate candidates which form a cycle. For
example if the provisioner supplies candidates A->B->A, the runtime will
only back A (as the state output will be the same)

Note that if a candidate is timed out, we don't guarantee that in the
next relay chain block the block author will be able to fill all of the
timed out cores of the para. That increases complexity by a lot.
Instead, the provisioner will supply N candidates where N is the number
of candidates timed out, but doesn't include their successors which will
be also deleted by the runtime. This'll be backfilled in the next relay
chain block.

Adjacent changes:
- Also fixes: paritytech#3141
- For non prospective-parachains, don't supply multiple candidates per
para (we can't have elastic scaling without prospective parachains
enabled). paras_inherent should already sanitise this input but it's
more efficient this way.

Note: all of these changes are backwards-compatible with the
non-elastic-scaling scenario (one core per para).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Completed
Development

Successfully merging a pull request may close this issue.

3 participants