-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: Add an "identifier" to AppId #1566
Conversation
d0c944e
to
756d1b9
Compare
kernel/src/sched.rs
Outdated
/// will be returned. Otherwise the closure will executed and passed a | ||
/// reference to the process. | ||
crate fn process_map_or<F, R>(&self, default: R, process_index: usize, closure: F) -> R | ||
/// not exist (i.e. it is `None` in the `processes` array, or has stopped, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that if a process has been stopped this will return default
? From looking through it seems that a stopped process still has a valid and unchanged AppId, and this function does not seem to call anything that checks if a process has been stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok good catch! This raises a case I hadn't really considered...what should happen if an app is in the StoppedFaulted
state (or perhaps an Ended
state if one were to exist)? The process is still there, and its the same process so its identifier probably shouldn't change, and certain capsules may want to be able to see that it exists (like process console). However, other capsules should be able to know it is no longer active (and will never be again). Perhaps process_map_or()
should not change, but users of process_map_or()
may need to check that the process is still active. In particular grant.enter() shouldn't succeed on an ended process.
In any case I will update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment.
Added a commit to address the issue of what happens with a "stopped" process, in particular a process that cannot be executed again. I call this an "inactive" process, and this can happen if the process crashes and the kernel does not restart it, or (hypothetically) if the process exits itself and the kernel does not free its The issue in this case is to ensure that capsules, or anything that holds an Instead, I went through all of the functions in the The canonical example of this is when accessing a grant. Capsules access a specific grant by calling |
70d4a4c
to
b046c51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, and is a great step towards better handling restarted apps.
That said, a few thoughts I have:
-
Whatever solution we choose for assigning the "identifier" field, we will probably need to be careful to avoid reuse of identifiers.
-
It seems like this could potentially be implemented without storing the
index
field at all. This would require restructuring IPC to store tuples of (callbacks/identifiers) and (AppSlices/identifiers), and replacing the few places where we directly index into the process array with searches for the identifier. Given that no board currently supports more than 20 processes, this is not a huge overhead, though I realize this could be a limiting design decision if that were to change. I think the main overhead is that you would have to iterate through the list on every call to.enter()
. The upside is that the resulting design is less confusing and entirely decouples processes from how the kernel stores them.
kernel/src/callback.rs
Outdated
} | ||
|
||
impl PartialEq for AppId { | ||
fn eq(&self, other: &AppId) -> bool { | ||
self.idx == other.idx | ||
self.index == other.index && self.identifier == other.identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just compare identifier
?
Semantically, I think an app reference is a reference to the same app whether or not that app is alive or dead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the current implementation it is probably more correct to compare both fields. The reason is that using an appid with a valid identifier but the wrong index will not work, but using one with both fields correct will. It would be a little strange to have AppIdA == AppIdB
, but then using AppIdA
fails while AppIdB
works.
At one point I did implement the logic for an AppId to automatically update its index, but since processes cannot currently move that seemed like unnecessary complexity right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not sure I'm convinced. The opposite argument here is that the .id()
method is provided to
retrieve a simple identifier for the process that can be communicated over a UART bus or syscall interface. This call is guaranteed to return a suitable identifier for the
AppId
So now we have a situation where it's possible that AppIdA.id() == AppIdB.id()
but AppIdA != AppIdB
.
I feel like the identifier is the identifier of an app, and the index is just an internal optimization that saves an otherwise expensive lookup.
What situation would arise where the identifiers match but the indexes don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What situation would arise where the identifiers match but the indexes don't?
The kernel moves the application to a different index in the processes array. But, there isn't any functionality to do this, so it's a case that doesn't matter.
The fact that processes cannot move slots in the array (since there is currently no code that will do that) can support either implementation of eq
. However, since just using identifier
would be required if there ever was a reason to move processes around in the array, that implementation makes more sense. I will update.
/// originally referred to, even if that app no longer exists. For example, | ||
/// the app may have restarted, or may have been ended or removed by the | ||
/// kernel. Therefore, calling `id()` is _not_ a valid way to check | ||
/// that an application still exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about global uniqueness of identifiers -- since we're proposing this as a mechanism that labels a process possibly off-core (here tockloader is given as example, but one would also imagine using as an RPC identifier), I think part of the future design discussion for the actual identifiers should include whether we have an entropy source of some kind in the kernel to try to ensure that identifiers are unique even across reboots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] whether we have an entropy source of some kind in the kernel to try to ensure that identifiers are unique even across reboots
Note that this would require more bits than usize
from a cryptographic perspective, if you really want to avoid collision in all realistic scenarios (even considering that embedded CPUs are "slow" enough that one can only generate so many identifiers per second). A collision within a set of random 32-bit identifiers requires only around 2^16 identifiers.
I don't have much context on the use cases, so I guess it depends whether uniqueness is a nice to have property for e.g. debugging, or if a collision is to be avoided at all costs.
Yes. My current simple implementation is just to keep counting up, and hope we don't hit 2^32. But we can discuss this more on the next PR.
Yes, definitely, and my original implementation did just that in two different forms. It definitely complicates IPC, and I tried to store tuples of <identifier, callback/slice>. While conceptually it is pretty straightforward, the rust code gets much more complicated because there is a lot more checking required, with more |
I came across a pretty obscure piece of code affected by this change while looking into modifying the ProcessConsole. Currently, the ProcessConsole lists process in the form:
where the PID is not actually the index of the process in the process array, but the index of the iterator through the processes array (as output by This is unchanged by this PR, but probably it would make more sense to have the ProcessConsole print the actual identifier of the process now that we have them. |
0dfa83a
to
8d4cdf5
Compare
I've tested this with various apps on Hail, including the tutorials/05 IPC apps, the Hail test app, and other collections of apps. I also tried these five: PID Name Quanta Syscalls Dropped Callbacks Restarts State
00 hello_loop 0 359 0 0 Yielded
01 button_print 0 7 0 0 Yielded
02 adc_continuous 0 675 0 0 Yielded
03 rot13_client 0 829 0 0 Yielded
04 org.tockos.ex.rot13 0 131 0 0 Yielded and all 5 seem to work. |
kernel/src/callback.rs
Outdated
} | ||
|
||
impl PartialEq for AppId { | ||
fn eq(&self, other: &AppId) -> bool { | ||
self.idx == other.idx | ||
self.index == other.index && self.identifier == other.identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not sure I'm convinced. The opposite argument here is that the .id()
method is provided to
retrieve a simple identifier for the process that can be communicated over a UART bus or syscall interface. This call is guaranteed to return a suitable identifier for the
AppId
So now we have a situation where it's possible that AppIdA.id() == AppIdB.id()
but AppIdA != AppIdB
.
I feel like the identifier is the identifier of an app, and the index is just an internal optimization that saves an otherwise expensive lookup.
What situation would arise where the identifiers match but the indexes don't?
kernel/src/process.rs
Outdated
@@ -769,6 +829,11 @@ impl<C: Chip> ProcessType for Process<'a, C> { | |||
} | |||
|
|||
fn setup_mpu(&self) { | |||
// Do not modify an inactive process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the proliferation of these:
- Are all the functions with these checks actually getting called on stopped processes?
- Should that be a bug rather than a silent ignore?
- How do we make sure we don't miss these in future process management stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Are all the functions with these checks actually getting called on stopped processes?
Well the one that initiated this change is, notably if a capsule holds a copy of an appid for a stopped process and then uses it to enter its grant region. The others are not as most are called as a result of the process being scheduled, which it won't since it is stopped.
* Should that be a bug rather than a silent ignore?
In the case of entering a grant region it returns an error to the capsule.
For the others, maybe? It adds complexity in the common case. Maybe the better approach (at least for now) is to make the documentation clear that calling the functions without return codes is only valid when the process is active.
* How do we make sure we don't miss these in future process management stuff?
I'm not sure. Generally, functions with return values need to handle this, as those return values likely are passed (eventually) outside of the kernel crate. Functions without return values tend to be only called inside the kernel.
This is also complicated a little by code like the process console which does obtain references to ProcessType
objects. But that is really a separate issue.
AppId.idx used to refer to the index of the app in the process array. This enforces a very strong association between the two, which hurts our ability to change AppId in the future to handle apps that restart, are moved, or are removed. This removes that direct association in code outside of sched.rs. Since at this point AppId.idx() is still the index of the process in the array, we go ahead and keep using it. But code that does not manage the processes array should not use it as a direct index.
The IPC structure was based on application IDs starting at 0 and being able to correspond to array indices. This adds a layer of indirection.
No more correlation between app id and index in the array.
This change allows AppId to both: - Use an index in the array to ensure fast access to a process since it is stored at a particular location in the array. - And express that the AppId is no longer valid by returning `None`. So far the index is always considered value (hence the return is always `Some(self.idx)` but eventually we will be able to check if the index is valid.
Allows the kernel struct more access to the AppId to make lookups more optimized. Also adds documentation.
If a process is in an inactive state (like it has faulted and the kernel stopped it) then many of the operations in the `ProcessType` trait should not be functional and instead should return errors. This makes those functions return errors or explicitly say that they are not valid. This is needed to help signal to capsules or other users of a process that the process is not active and will not run again. For example, if a capsule tries to call `.enter()` on a grant with an inactive process, the `Kernel` object in sched.rs will find the matching process, but trying to access one of its grant pointers will fail since it is not valid to setup or access a grant on an inactive process. This error will then get back to the capsule.
don't stop at processes which dont have the grant setup
Since index is an internal bookkeeping item we do not use it to compare AppIds. Note: this works because processes cannot move in the processes array. If we ever want to move processes we will need to update AppId to match.
Note that several functions can only be called on a active process. These functions are only called from the kernel scheduler, and will not currently be called unless the kernel can schedule the process.
8d4cdf5
to
21c1f85
Compare
I rebased, and updated the AppId equality check, and undid some of the changes to the ProcessType trait and instead added a little more documentation for some methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors d+
I think this looks good to me at this point. The inactive process stuff is probably still imperfect, but definitely best-of-our-current-world
✌️ bradjc can now approve this pull request. To approve and merge a pull request, simply reply with |
Seems reasonable to add a last-call tag for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still in favor of changing the body of process_each_capability
to:
for process in self.processes.iter() {
match process {
Some(p) => {
closure(p.appid().id(), *p);
}
None => {}
}
}
so that the PID printed by the process console would be correct, but either way LGTM.
bors r+ |
Build succeeded |
1588: kernel: give processes unique identifiers r=alevy a=bradjc ### Pull Request Overview This builds on top of PR #1566 and makes app identifiers for processes unique by incrementing a counter every time a new process needs an identifier. Since this requires #1566 there are many commits here, but the relevant one (and where you should look) is here: a2020b6 This is part of #1082. ### Testing Strategy todo ### TODO or Help Wanted This is a redo of a prior PR, and one question that came up is should the counter be 64 bit in case a process restarts continuously. Changing from a `usize` to `u64` was not as straightforward as I thought, so I did not include that change here. Should we do something about the counter wrapping around? Or should boards use one of the restart policies to avoid restarting an app that many times? ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. 1639: doc: wg: OpenTitan: Update charter with note about libtock-rs r=alevy a=bradjc Since robust RISC-V support for libtock-rs is essential to the OpenTitan effort, the working group has agreed it should be added to the charter in the same way that rv32i arch support is. ### Testing Strategy n/a ### TODO or Help Wanted Any comments or feedback? ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. Co-authored-by: Brad Campbell <bradjc5@gmail.com>
1609: ADC Capsule: More work on supporting restarts. r=alevy a=bradjc ### Pull Request Overview This pull request continues to update the ADC capsule to allow applications to restart while using the ADC driver and for the system to continue working afterwards. Highlights of changes: - Handle `InactiveApp` error when checking if a grant region is valid. - Check on `sample_ready()` and `samples_ready()` if the app is still valid, and properly handle it if not. - Check on every command syscall if the app that has exclusive control of the adc driver still exists. - Call stop sampling if a callback from the adc peripheral driver occurs and the app is no longer valid. Note, if/when the ADC capsule is virtualized then this will no longer be an issue. ### Testing Strategy I ran the adc and adc_continuous apps numerous times and used the process console to cause them to fault in various stages of their operation. I can now no longer get the ADC capsule into a state where apps only error. ### TODO or Help Wanted I tested this with a combination of my other PRs #1565 #1566 #1588. Until those are merged this PR/commit alone will not work. ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. Co-authored-by: Brad Campbell <bradjc5@gmail.com>
Pull Request Overview
Currently a capsule can store an
AppId
to refer to a process, and there is not really a mechanism for ensuring that thatAppId
is still valid. Currently in Tock we are OK with this since we only create processes once, they don't restart, and we don't move them around in the processes array. However, as process management becomes more sophisticated, the idea that an AppId is always valid will no longer be true. SinceAppId
only stores the index of the relevant process in that array, if a process restarts, or if a process is removed and then a new one is added in the same slot, then the storedAppId
will appear valid, even though it actually refers to a different process.To enable verifying that
AppId
s are still valid, this PR adds a new field (identifier
) to theAppId
object. This allowsAppId
s to be used as before (i.e. they are still copyable), but now operations usingAppId
can verify that the identifier in the AppId matches the identifier assigned to the process at a particular index.AppId retains the
index
field which does correspond to the index in the processes array (at least while the AppId is valid). This allows us to retain direct lookups, as well as simplifies the IPC implementation.Changes to AppId API
The existing
.idx()
function has been removed, and.index()
and.id()
have been added..index()
checks that the AppId is still valid before returning the index in the array of the corresponding app..id()
just returns the identifier, which can be passed to userspace or over a UART channel, for example.These changes resulted in a couple changes to capsules which need
usize
identifiers for apps, and now must call.id()
.Changes to kernel APIs
The major change in the kernel (beyond AppId) is that functions which used to take an index now just take an
AppId
. Mostly this is a minor change, except forkernel.process_map_or()
. This function runs a closure on a specific app if it is exists, and uses the index and identifier to verify the app is still valid.Otherwise, IPC had to be updated quite a bit to handle the changes. I also took the opportunity to update how grant iters are implemented since this PR reduces the reliance on an app's identifier being its index in the process array.
How does this fix the capsule problem?
A capsule can still store an AppId, and that AppId can still become invalid. However, the main use of an AppId between a capsule and the kernel is to call
grant.enter(appid)
. Without this PR, if appid refers to an old process, and a new process has started in the same array index,grant.enter(appid)
would succeed (after initializing a new grant region for the new process). Now, whengrant.enter()
calls.kernel.process_map_or(appid)
, an error will be returned which will bubble back to the capsule.What does this PR not do?
This PR just improves the mechanism for detecting app changes, but does not change the policy on how app identifiers are assigned. Identifiers remain equivalent to their index in the array.
I iterated on the design in this PR several times, and the changes became substantial without also changing how app identifiers are assigned.
This is part of #1082.
Testing Strategy
This pull request was tested by running the IPC apps. More testing is needed.
TODO or Help Wanted
n/a
Documentation Updated
/docs
, or no updates are required.Formatting
make formatall
.