-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[embassy-executor]: Upstream "Earliest Deadline First" Scheduler #4035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
CC @hawkw for thoughts |
284187b
to
8510b48
Compare
https://crates.io/crates/cordyceps/0.3.3 is now published, I'll get this PR updated to use the published crate tomorrow. |
b718466
to
46a975b
Compare
Updated with released |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection from me, seems like a nice improvement with the cordyceps usage as well.
Some concerns:
|
Appreciate the reviews!
Yep, this could be addressed in a future PR, right now we've only been using this on targets with atomics. How we do this might depend on what we decide to do with Cordyceps.
I'd like to try and make the case for keeping the cordyceps dependency! In it's favor:
We certainly COULD "hard-fork" cordyceps, but by the time that we've brought over the datastructures and tests, we will have essentially vendored a significant part of the crate. I could also narrowly "backport" the changes needed for DRS on top of the hand-rolled impl that embassy used, but that would be disappointing, IMO, for the testing and shared usage reasons mentioned above.
I'd definitely look at the Linked trait used in cordyceps, IMO it is still a bit of an unsafe API, but DOES do a good job of being the basics you need to abstract over intrusive structures. This commonality allowed TaskHeaders to be moved between
The reason that it is done that way is that the TransferStack is atomically shared (with interrupts!), while the local It's maybe possible to do "search and extract" without a mutex, but it would definitely be "spicier" to reason about. Especially without miri and loom testing.
It comes from an internal proposal document, though I don't think there's any attachment to the name. We could rename it if you prefer. There's a chance for confusion if we name it something that doesn't work exactly the same, but I'm not too stressed about it. I'll ask internally if there's prior art for the chosen name. Either way, I'll resolve the merge conflict so it's clean! |
2d5003e
to
f914341
Compare
Rebased the PR, and changed the name to "Earliest Deadline First" scheduler. I also added an example using the scheduler and Happy to update if there's a better way though! |
The current basic demo gives this as output, by the way:
|
I'm not sure I understand the end goal, is it to test that the example works, or that the EDF works on nrf? For the latter, I think adding a test in tests/nrf/ with the appropriate
This should work I think, you add a feature in tests/nrf/Cargo.toml named 'edf' which enables it on the embassy-executor dep, and add the As for the other comments on the PR, I'm fine with the cordyceps deps, and actually think it's good to not roll our own (To me, it seems even better to reuse something outside for the additional 'hardening' you get, you can reduce risk by pinning the version). |
Hmm, probably not, mostly just ensuring there is some basic regression testing for the feature, in case something happens in future refactoring. I'll think about it a bit and do a follow-up PR if it makes sense.
Agreed :) I'd like to get an ok (at at least a "meh, whatever") from @Dirbaio since he raised objections before I actually hit merge tho. |
I would prefer everything to land together. Reasons:
re cordyceps: it's not about its quality (it's great), it's about the complexity and amount of code embassy-executor contributors and maintainers have to deal with. Status quo is a straightforward implementation of a transfer stack with two atomic ops. Also, using cordyceps won't magically make the executor correct. For example, one key requirement is you mustn't enqueue tasks that are already enqueued. This is ensured by If we want to catch these bugs we'd have to add Loom testing to the whole executor, and if we do that then the fact that Cordyceps has loom testing of the data structures in isolation doesn't help that much. We already do have miri testing btw.
This is a very good point. It should be doable though. When a task is enqueued it's "exclusively owned" by the executor thread until the RUN_QUEUED bit is cleared, so it should be fine to iterate and pop. You only have to worry about contention in the queue head I think. The reason I think "iterate queue to choose which task to poll" is better is because it makes it easier to decouple the scheduler from the runqueue. A "scheduler" could be a thing that gets to iterate the runqueue and emits a verdict of which task to poll next. The default scheduler would decide to poll the first task its sees, the deadline scheduler would iterate to find the earliest deadline and decide to poll that. |
This was why I re-implemented the existing atomic runqueue on top of the TransferStack, which is how I originally interpreted your request to de-duplicate. We're would be in the same state if this PR is merged as we are today, there are two runqueues, one atomic, and one non-atomic. The current code is (I believe) a proof of concept we are already at the point where you can implement different schedulers with the same runqueue - both "regular" and "edf" atomic variants share the same data structure, constructor, and enqueue method, only the dequeue method is different.
Performance wise, I think they will be equivalent, pop sort pop (cordyceps) is basically I worry about edge cases and complexity when it comes to the search + pop. We'd need to:
Basically, we would need to reimplement this code: With the added complexity that those two initial head special cases are ALSO now CAS loops instead of just regular pointer ops, and we might have to start over or do a more complicated removal if HEAD was moved at all while we do the first two steps. In general: I think trying to search in-place is going to be MUCH more complex for little gain, performance or API clarity wise. I'd probably push back on this request for the EDF scheduler. Regarding the "one true waitqueue", would you be interested in a CS-capable version of If not, there are two alternatives I can think of: The first would be to "inline" the functionality in embassy-executor, keeping two separate runqueues, and inventing a new api, essentially identical to cordyceps' TransferQueue, to allow abstraction of the single runqueue for different scheduler implementations. The second would be to rework the So, overall:
I'm more than happy to implement a CS-based version of cordyceps::TransferStack to allow for the simplification of the RunQueue interface, if having cordyceps isn't a hard no from you. Otherwise I'll pursue re-implementing SortedList manually to match the previous atomic runqueue. |
This PR is a first attempt to allow building of `cordyceps` on non-CAS enabled targets, such as `thumbv6m-none-eabi`/Cortex-M0+/rp2040. Doing this as a first feature, prior to implementing a non-atomic and `critical_section` based version of TransferStack, in service of embassy-rs/embassy#4035. Rather than making a default-on "CAS" feature, we instead gate on `cfg(target_has_atomic = "ptr")`, which was [stabilized in 1.60](rust-lang/rust#93824), which currently has the meaning of "this target has CAS". This automatically gates-out `TransferStack` and `MpscQueue`, while the "mutable" collections can still be used.
* Start hacking in cordyceps This adds a third kind of runqueue, for now it should work the same as the current "atomics" runqueue, but uses a cordyceps TransferStack instead of the existing home-rolled linked list. * Clean up, use new cordyceps feature * A bit more cleanup * Update docs to be more clear
This implements a minimal version of Deadline Rank Scheduling, as well as ways to access and set Deadlines. This still needs some UX improvements, but is likely Enough for testing.
ab70a97
to
8765aee
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@Dirbaio this is no longer blocked by hawkw/mycelium#537. |
Noting that @Dirbaio pointed out that there is a perf hit currently for non-CAS targets using the Mutex'd List. Repro case is here: #4374 Results:
IMO the largest impact of this is that using the I'll take a look at this shortly, it may be possible to refactor the code to rely on the provided CS token, and removing the use of the |
This allows the scheduler to better collaborate with existing critical sections
RP2040 on this branch w/o edf enabled:
RP2040 on this branch w/ EDF feature enabled
This updates the perf numbers to:
Therefore c2f61e4 reduces the baseline overhead from +78 cycles to +22 cycles. I'm guessing these extra +22 cycles are due to my use of a |
fwiw with
I'm not certain that is a sound change tho. |
9feeca5 makes the change from RefCell to UnsafeCell described above, and also adds SAFETY comments that explain why this is sound. These interfaces require a critical section, which guarantees freedom from concurrent access, and the fact that these methods only provide "interior" access and do not recurse, means that there is no possibility of re-entrant access. Therefore, with a critical section mutex and the current API, the requirements for exclusive access and creation of an &mut reference to the contained stack is sound in my belief. This means that the net perf impact to this is only 6 additional cycles in the bench case. |
pub(crate) unsafe fn enqueue(&self, task: TaskRef, _tok: super::state::Token) -> bool { | ||
self.stack.push_was_empty( | ||
task, | ||
#[cfg(not(target_has_atomic = "ptr"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dirbaio this might be "too cute", let me know if you'd prefer something like:
#[cfg(target_has_atomic = "ptr")]
self.stack.push_was_empty(task);
#[cfg(not(target_has_atomic = "ptr"))]
self.stack.push_was_empty(task, _tok);
instead.
Change deadline to use internal atomics
This PR builds on top of #4033, and shouldn't be merged until that PR is.This PR also requires hawkw/mycelium#520 to be merged and released before this PR can be merged.This PR adds the
"Deadline Rank Scheduler""Earliest Deadline First Scheduler", a cooperative (non-preemptive) scheduling algorithm that takes into account the deadline of a task when deciding which tasks in the WAITING state to poll first.This PR also re-implements the
run_queue_atomics
implementation to use thecordyceps::TransferStack
instead of the previously home-rolled atomic linked list/stack, in order to reduce the duplicated code.