-
Notifications
You must be signed in to change notification settings - Fork 147
A series of syscall-related cleanups #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- We don't use it. - We're not charging for the copy/memory. - Tracing is better for this kind of thing.
We'll be adding more error numbers as we add additional syscalls. Adding new error numbers to the system shouldn't be a breaking change. NOTE: adding/changing an error number on a specific syscall is a different matter. But if we add a new syscall with a new error number, that's not breaking anything.
| /// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves | ||
| /// include documentation on _which_ syscall errors they can be expected to return, and what they | ||
| /// mean in the context of the syscall. | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to do this, but it's easier to make this "extensible" now than it is to do that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? If we were wrong about this decision so early on in the lifetime of the FVM, it's going to be much harder to backtrack later and make it exhaustive, than to go the other way around. IMO this wasn't necessary now, but I don't have a strong opinion because it does seem right for this to be non-exhaustive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the non_exhaustive later would just mean that users can now exhaustively match, it won't affect existing code.
| /// ChainEpoch, Entropy from the ticket chain. | ||
| fn get_chain_randomness( | ||
| &self, | ||
| pers: DomainSeparationTag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not consensus critical, but it has been on my mind for a while.
- Removes unnecessary error cases from the kernel.
- Paves the way for user actors (that won't care about specific tags).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although does not change the ABI of syscalls, so we could've done this at any time.
This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'm happy to break it into a second PR if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).
I've removed the type from the FVM entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, not a problem.
|
|
||
| /// Returns whether the supplied code_cid belongs to a known built-in actor type. | ||
| fn resolve_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>; | ||
| fn get_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to revert this. We had two get and two resolve functions in this module, and I was trying to define a pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. I think I had proposed get_ originally but for some reason we agreed on resolve_.
| let actor_id = context | ||
| .kernel | ||
| .resolve_address(&addr)? | ||
| .ok_or_else(|| syscall_error!(NotFound; "actor not found"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns an error on resolution failure, not a -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the resolve_address kernel method is incapable of returning NotFound now or ever for a circumstance other than the address not existing. Otherwise we will have ambiguity. I think that's a fair assumption to make, but it might be worth to document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could break-up the NotFound error, but I'm not sure if it's worth it.
I think that's a fair assumption to make, but it might be worth to document it.
The SDK documents the meanings of all errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, just commenting that we're masking a potential NotFound from the kernel, but that it currently can't produce it. Definitely an uber-nit, not important.
| let typ = RegisteredSealProof::from(proof_type); | ||
| if let RegisteredSealProof::Invalid(invalid) = typ { | ||
| return Err(syscall_error!(IllegalArgument; "invalid proof type {}", invalid).into()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing a todo
| pub type BlockId = u32; | ||
|
|
||
| const FIRST_ID: BlockId = 1; | ||
| const MAX_BLOCKS: u32 = i32::MAX as u32; // TODO: Limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set a limit in M2.
| .len() | ||
| .try_into() | ||
| .map_err(|_| BlockError::TooManyBlocks)?; | ||
| id += FIRST_ID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could overflow (technically...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, how? We check if we're full above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to the old code. That is, we could successfully convert the length into a u32 (max), then overflow when we try to add 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, GitHub diff view...
| if self.is_full() { | ||
| return Err(BlockPutError::TooManyBlocks); | ||
| } | ||
| if block.codec != DAG_CBOR { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing the TODO, we only put cbor.
| pub enum BlockError { | ||
| #[error("block {0} is unreachable")] | ||
| Unreachable(Box<Cid>), | ||
| pub enum BlockPutError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to just use syscall errors, but technically this isn't a part of the syscall interface.
raulk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main pushback is the change in signature of the invoke entrypoint for actors.
| /// Look up the code ID at an actor address. | ||
| fn get_actor_code_cid(&self, addr: &Address) -> Result<Option<Cid>>; | ||
| /// Look up the code CID of an actor. | ||
| fn get_actor_code_cid(&self, id: ActorID) -> Result<Option<Cid>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to adapt the built-in actors runtime too. I don't think we ever get the actor CID from a non-ID address, so it should be straighforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already adapted the SDK to per-resolve the address, if necessary. And yes, we only ever get the actor CID for the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: no actor changes necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already been getting the built-ins changed to use IDs even tho the syscall was address
| /// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves | ||
| /// include documentation on _which_ syscall errors they can be expected to return, and what they | ||
| /// mean in the context of the syscall. | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? If we were wrong about this decision so early on in the lifetime of the FVM, it's going to be much harder to backtrack later and make it exhaustive, than to go the other way around. IMO this wasn't necessary now, but I don't have a strong opinion because it does seem right for this to be non-exhaustive.
| /// ChainEpoch, Entropy from the ticket chain. | ||
| fn get_chain_randomness( | ||
| &self, | ||
| pers: DomainSeparationTag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although does not change the ABI of syscalls, so we could've done this at any time.
This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).
|
|
||
| /// Returns whether the supplied code_cid belongs to a known built-in actor type. | ||
| fn resolve_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>; | ||
| fn get_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. I think I had proposed get_ originally but for some reason we agreed on resolve_.
| .len() | ||
| .try_into() | ||
| .map_err(|_| BlockError::TooManyBlocks)?; | ||
| id += FIRST_ID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, how? We check if we're full above.
|
|
||
| fn block_read(&mut self, id: BlockId, offset: u32, buf: &mut [u8]) -> Result<u32> { | ||
| let data = self.blocks.get(id).or_illegal_argument()?.data(); | ||
| fn block_read(&mut self, id: BlockId, offset: u32, buf: &mut [u8]) -> Result<i32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this change renders #551 irrelevant now.
|
|
||
| /// The maximum supported CID size. (SPEC_AUDIT) | ||
| pub const MAX_CID_LEN: usize = 100; | ||
| pub const MAX_CID_LEN: usize = 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we're doing this now, although we don't have a clear case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I didn't realize I had actually made this change. We don't have a clear case and I'm happy to bring it back down to 100.
| pub struct Block { | ||
| codec: u64, | ||
| data: Box<[u8]>, | ||
| data: Rc<Box<[u8]>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intentional. Otherwise, converting from a vec/box would mean we'd need to copy the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: we can avoid the extra allocation in most cases... with a custom library that I haven't published yet because it's a bit scary.
|
|
||
| /// The maximum supported CID size. (SPEC_AUDIT) | ||
| pub const MAX_CID_LEN: usize = 100; | ||
| pub const MAX_CID_LEN: usize = 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I didn't realize I had actually made this change. We don't have a clear case and I'm happy to bring it back down to 100.
| /// ChainEpoch, Entropy from the ticket chain. | ||
| fn get_chain_randomness( | ||
| &self, | ||
| pers: DomainSeparationTag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'm happy to break it into a second PR if you'd like.
| .len() | ||
| .try_into() | ||
| .map_err(|_| BlockError::TooManyBlocks)?; | ||
| id += FIRST_ID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to the old code. That is, we could successfully convert the length into a u32 (max), then overflow when we try to add 1.
| /// Look up the code ID at an actor address. | ||
| fn get_actor_code_cid(&self, addr: &Address) -> Result<Option<Cid>>; | ||
| /// Look up the code CID of an actor. | ||
| fn get_actor_code_cid(&self, id: ActorID) -> Result<Option<Cid>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: no actor changes necessary.
| let actor_id = context | ||
| .kernel | ||
| .resolve_address(&addr)? | ||
| .ok_or_else(|| syscall_error!(NotFound; "actor not found"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could break-up the NotFound error, but I'm not sure if it's worth it.
I think that's a fair assumption to make, but it might be worth to document it.
The SDK documents the meanings of all errors.
| let obuf = context.memory.try_slice_mut(obuf_off, obuf_len)?; | ||
|
|
||
| // Then make sure we can actually put the return result somewhere before we do anything else. | ||
| const EXPECTED_LEN: u32 = fvm_shared::address::PAYLOAD_HASH_LEN as u32 + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree.
| let out = self.try_slice_mut(offset, len)?; | ||
|
|
||
| let mut buf = Cursor::new([0u8; MAX_CID_LEN]); | ||
| // At the moment, all CIDs are gauranteed to fit in 100 bytes (statically) because the max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert that change.
| /// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves | ||
| /// include documentation on _which_ syscall errors they can be expected to return, and what they | ||
| /// mean in the context of the syscall. | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the non_exhaustive later would just mean that users can now exhaustively match, it won't affect existing code.
raulk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this cleanup is great! 💅
The _actors_ care about them, but we don't. This: 1. Reduces the coupling between actors & the FVM. 2. Generalizes these methods for user actors.
- Renames `resolve_builtin_actor_type` to `get_actor_builtin_type`.
There's no real resolution here.
- Make `resolve_address` fail with NotFound if the target actor doesn't
exist.
- Make `get_actor_code_cid` take an actor id, not an address.
- We don't charge for address resolution here.
- The address is almost always resolved anyways (i.e., the sender).
- Make CID return logic consistent:
- Fail with `ErrorNumber::BufferTooSmall` if it doesn't fit into the
output buffer.
- Always return the size of the CID as the return value.
- Never write partial CIDs only to fail later.
- Always check that we can write the return result before we execute the
syscall.
- More generally, be pedantic about checking for errors up-front. If a
syscall fails, it must fail _without_ side effects.
1. Fix/simplify error checks on put. 1. The overflow checks were wrong. 2. We didn't check for CBOR. 2. Refactor errors and implement conversions. This will make the next patch easier.
Previously, it would return the amount of data read. Unfortunately, this made it impossible to tell if we were at the end of the block. Now, we return the offset (negative or positive) from the end of the block, to the end of the user provided buffer.
This combines two changes that are somewhat interlinked. Really, they're just hard to factor out into two different patches without a bunch of work. Motivation: 1. Remove copying costs from send/return. 2. Avoid calling "syscall" kernel methods in the call manager. 3. I _really_ hated that `Kernel::block_get` method. 4. Remove an extra call on send to "stat" the returned object. ---- 1. In the kernel, the `send` function now takes/returns a block ID. This moves all the parameter-relates logic into the kernel itself. Otherwise, the syscall has to make three kernel calls, all of which charge gas. 2. In the call manager, instead of passing return values & parameters as raw bytes, we pass them as `Option<kernel::Block>` where `kernel::Block` is a reference counted IPLD block (with a codec). This, let's us pass them around without copying, and lets us be very clear about codecs, empty parameters, etc. 3. In the syscalls, return the block length/codec from send to avoid a second syscall _just_ to look those up.
This change-set includes 6 patches that should be reviewed (mostly) independently. They're bundled because they build on each other (at least to an extent).
Please see the commit messages for detailed descriptions. I don't feel too strongly about most of these changes, so I'm happy to punt anything we feel we can do later and/or just isn't necessary.