Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relay chain coretime assigner does not support more assignments than fit in a single XCM message (currently 28) #6102

Open
seadanda opened this issue Oct 17, 2024 · 1 comment
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. C2-good-first-issue A task for a first time contributor to become familiar with the Polkadot-SDK. D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. I2-bug The node fails to follow expected behavior. T8-polkadot This PR/Issue is related to/affects the Polkadot network.

Comments

@seadanda
Copy link
Contributor

The system allows interlacing right down to the single block level (80 assignments per timeslice, each with a CoreMask with one bit set)
The problem is that it creates a call that doesn't actually fit in an XCM message (we can fit max 28 assignments in a single XCM)

We can easily chunk that on the Coretime Chain side and send it over as four messages, however with the current design that means we need to call assign_core multiple times on the relay for a given timeslice which is disallowed by design due to some assumptions made by the scheduler.

Mitigation in the mean time:
28 assignments is the limit, but 27 assignments that don't add up to a complete mask will be rejected due to the requirement for a full mask on the relay. Therefore we take the first 27 and append an Idle assignment, taking it to 28.
This will make anybody who interlaces more than 27 times lose some assignments, but it's better than the current system, which just drops the entire core's assignments because the message is too big. Once this is missed, it's gone from the workplan and is a total mess. Far preferable to truncate and assign everything we can until we can drop some assumptions in the scheduler on the relay.

Mitigation for the Polkadot launch: polkadot-fellows/runtimes#434
Testnets mitigation: #6022

@seadanda seadanda added I2-bug The node fails to follow expected behavior. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Oct 17, 2024
@seadanda
Copy link
Contributor Author

Just copying the initial idea here that I had when this first popped up:
The likelihood of somebody interlacing down to 27 assignments is very low, so maybe something like assigning each chunk one block later than the previous could be a fix that maintains some of the assumptions in the implementation, with a potential short outage for the workloads who get the second or third chunk, but by timeslice 2 of the region they're all running as intended. Since the first 27 assignments are already on the relay, it should be possible to achieve this without any downtime.

assign_core has the signature

	pub fn assign_core(
		core_idx: CoreIndex,
		begin: BlockNumberFor<T>,
		assignments: Vec<(CoreAssignment, PartsOf57600)>,
		end_hint: Option<BlockNumberFor<T>>,
	) -> Result<(), DispatchError> {

and we just need to never call it more than once for the same core and begin combination.
so 80 assignments:
0..27 assigned on the begin
27..55 assigned on begin+1
55..80 assigned on begin+2

But we still need to change the relay logic to drop the requirement for each assign_core call to contain a fully scheduled core. As part of that we'd need to add logic in there to pad an underscheduled core with Idle, then when a further underscheduled assignment comes in within a timeslice (for example) it should try to remove the Idle padding, "append" the new parts and recompute the padding again.

@bkchr bkchr added D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. C2-good-first-issue A task for a first time contributor to become familiar with the Polkadot-SDK. labels Oct 17, 2024
@seadanda seadanda added the C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. C2-good-first-issue A task for a first time contributor to become familiar with the Polkadot-SDK. D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. I2-bug The node fails to follow expected behavior. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
None yet
Development

No branches or pull requests

2 participants