A series of syscall-related cleanups #555

Stebalien · 2022-05-13T03:39:49Z

This change-set includes 6 patches that should be reviewed (mostly) independently. They're bundled because they build on each other (at least to an extent).

Please see the commit messages for detailed descriptions. I don't feel too strongly about most of these changes, so I'm happy to punt anything we feel we can do later and/or just isn't necessary.

- We don't use it. - We're not charging for the copy/memory. - Tracing is better for this kind of thing.

We'll be adding more error numbers as we add additional syscalls. Adding new error numbers to the system shouldn't be a breaking change. NOTE: adding/changing an error number on a specific syscall is a different matter. But if we add a new syscall with a new error number, that's not breaking anything.

fvm/src/call_manager/backtrace.rs

Stebalien · 2022-05-13T03:42:19Z

shared/src/error/mod.rs

 /// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves
 /// include documentation on _which_ syscall errors they can be expected to return, and what they
 /// mean in the context of the syscall.
+#[non_exhaustive]


We don't have to do this, but it's easier to make this "extensible" now than it is to do that later.

Is it? If we were wrong about this decision so early on in the lifetime of the FVM, it's going to be much harder to backtrack later and make it exhaustive, than to go the other way around. IMO this wasn't necessary now, but I don't have a strong opinion because it does seem right for this to be non-exhaustive.

Removing the non_exhaustive later would just mean that users can now exhaustively match, it won't affect existing code.

Stebalien · 2022-05-13T03:43:33Z

fvm/src/externs/mod.rs

    /// ChainEpoch, Entropy from the ticket chain.
    fn get_chain_randomness(
        &self,
-        pers: DomainSeparationTag,


This is not consensus critical, but it has been on my mind for a while.

Removes unnecessary error cases from the kernel.

Paves the way for user actors (that won't care about specific tags).

LGTM, although does not change the ABI of syscalls, so we could've done this at any time.

This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).

Yep. I'm happy to break it into a second PR if you'd like.

This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).

I've removed the type from the FVM entirely.

Nah, not a problem.

Stebalien · 2022-05-13T03:45:51Z

fvm/src/kernel/mod.rs


    /// Returns whether the supplied code_cid belongs to a known built-in actor type.
-    fn resolve_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>;
+    fn get_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>;


I'm happy to revert this. We had two get and two resolve functions in this module, and I was trying to define a pattern.

This is fine. I think I had proposed get_ originally but for some reason we agreed on resolve_.

Stebalien · 2022-05-13T03:46:14Z

fvm/src/syscalls/actor.rs

+    let actor_id = context
+        .kernel
+        .resolve_address(&addr)?
+        .ok_or_else(|| syscall_error!(NotFound; "actor not found"))?;


Returns an error on resolution failure, not a -1.

This assumes that the resolve_address kernel method is incapable of returning NotFound now or ever for a circumstance other than the address not existing. Otherwise we will have ambiguity. I think that's a fair assumption to make, but it might be worth to document it.

We could break-up the NotFound error, but I'm not sure if it's worth it.

I think that's a fair assumption to make, but it might be worth to document it.

The SDK documents the meanings of all errors.

I mean, just commenting that we're masking a potential NotFound from the kernel, but that it currently can't produce it. Definitely an uber-nit, not important.

Stebalien · 2022-05-13T03:48:35Z

fvm/src/syscalls/crypto.rs

+    let typ = RegisteredSealProof::from(proof_type);
+    if let RegisteredSealProof::Invalid(invalid) = typ {
+        return Err(syscall_error!(IllegalArgument; "invalid proof type {}", invalid).into());
+    }


fixing a todo

Stebalien · 2022-05-13T03:50:05Z

fvm/src/kernel/blocks.rs

 pub type BlockId = u32;

 const FIRST_ID: BlockId = 1;
+const MAX_BLOCKS: u32 = i32::MAX as u32; // TODO: Limit


We can set a limit in M2.

Stebalien · 2022-05-13T03:50:18Z

fvm/src/kernel/blocks.rs

-            .len()
-            .try_into()
-            .map_err(|_| BlockError::TooManyBlocks)?;
-        id += FIRST_ID;


This could overflow (technically...).

Hm, how? We check if we're full above.

I'm referring to the old code. That is, we could successfully convert the length into a u32 (max), then overflow when we try to add 1.

Ahh, GitHub diff view...

Stebalien · 2022-05-13T03:50:39Z

fvm/src/kernel/blocks.rs

+        if self.is_full() {
+            return Err(BlockPutError::TooManyBlocks);
+        }
+        if block.codec != DAG_CBOR {


Fixing the TODO, we only put cbor.

Stebalien · 2022-05-13T03:51:20Z

fvm/src/kernel/blocks.rs

-pub enum BlockError {
-    #[error("block {0} is unreachable")]
-    Unreachable(Box<Cid>),
+pub enum BlockPutError {


I'm tempted to just use syscall errors, but technically this isn't a part of the syscall interface.

raulk

Main pushback is the change in signature of the invoke entrypoint for actors.

raulk · 2022-05-13T11:00:58Z

fvm/src/kernel/mod.rs

-    /// Look up the code ID at an actor address.
-    fn get_actor_code_cid(&self, addr: &Address) -> Result<Option<Cid>>;
+    /// Look up the code CID of an actor.
+    fn get_actor_code_cid(&self, id: ActorID) -> Result<Option<Cid>>;


We will need to adapt the built-in actors runtime too. I don't think we ever get the actor CID from a non-ID address, so it should be straighforward.

I've already adapted the SDK to per-resolve the address, if necessary. And yes, we only ever get the actor CID for the caller.

TL;DR: no actor changes necessary.

I've already been getting the built-ins changed to use IDs even tho the syscall was address

fvm/src/call_manager/backtrace.rs

raulk · 2022-05-13T11:05:52Z

shared/src/error/mod.rs

 /// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves
 /// include documentation on _which_ syscall errors they can be expected to return, and what they
 /// mean in the context of the syscall.
+#[non_exhaustive]


Is it? If we were wrong about this decision so early on in the lifetime of the FVM, it's going to be much harder to backtrack later and make it exhaustive, than to go the other way around. IMO this wasn't necessary now, but I don't have a strong opinion because it does seem right for this to be non-exhaustive.

raulk · 2022-05-13T11:07:37Z

fvm/src/externs/mod.rs

    /// ChainEpoch, Entropy from the ticket chain.
    fn get_chain_randomness(
        &self,
-        pers: DomainSeparationTag,


LGTM, although does not change the ABI of syscalls, so we could've done this at any time.

This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).

raulk · 2022-05-13T11:08:54Z

fvm/src/kernel/mod.rs


    /// Returns whether the supplied code_cid belongs to a known built-in actor type.
-    fn resolve_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>;
+    fn get_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>;


This is fine. I think I had proposed get_ originally but for some reason we agreed on resolve_.

raulk · 2022-05-13T11:45:44Z

fvm/src/kernel/blocks.rs

-            .len()
-            .try_into()
-            .map_err(|_| BlockError::TooManyBlocks)?;
-        id += FIRST_ID;


Hm, how? We check if we're full above.

raulk · 2022-05-13T11:54:11Z

fvm/src/kernel/default.rs


-    fn block_read(&mut self, id: BlockId, offset: u32, buf: &mut [u8]) -> Result<u32> {
-        let data = self.blocks.get(id).or_illegal_argument()?.data();
+    fn block_read(&mut self, id: BlockId, offset: u32, buf: &mut [u8]) -> Result<i32> {


Unfortunately this change renders #551 irrelevant now.

sdk/src/ipld.rs

fvm/src/call_manager/default.rs

raulk · 2022-05-13T12:28:23Z

fvm/src/syscalls/mod.rs


 /// The maximum supported CID size. (SPEC_AUDIT)
-pub const MAX_CID_LEN: usize = 100;
+pub const MAX_CID_LEN: usize = 256;


I guess we're doing this now, although we don't have a clear case for it?

Ah, sorry, I didn't realize I had actually made this change. We don't have a clear case and I'm happy to bring it back down to 100.

Stebalien · 2022-05-13T03:55:52Z

fvm/src/kernel/blocks.rs

 pub struct Block {
    codec: u64,
-    data: Box<[u8]>,
+    data: Rc<Box<[u8]>>,


Yes, this is intentional. Otherwise, converting from a vec/box would mean we'd need to copy the block.

NOTE: we can avoid the extra allocation in most cases... with a custom library that I haven't published yet because it's a bit scary.

fvm/src/call_manager/default.rs

Stebalien · 2022-05-13T15:10:52Z

fvm/src/syscalls/mod.rs


 /// The maximum supported CID size. (SPEC_AUDIT)
-pub const MAX_CID_LEN: usize = 100;
+pub const MAX_CID_LEN: usize = 256;


Ah, sorry, I didn't realize I had actually made this change. We don't have a clear case and I'm happy to bring it back down to 100.

Stebalien · 2022-05-13T15:12:12Z

fvm/src/externs/mod.rs

    /// ChainEpoch, Entropy from the ticket chain.
    fn get_chain_randomness(
        &self,
-        pers: DomainSeparationTag,


Yep. I'm happy to break it into a second PR if you'd like.

Stebalien · 2022-05-13T15:12:59Z

fvm/src/kernel/blocks.rs

-            .len()
-            .try_into()
-            .map_err(|_| BlockError::TooManyBlocks)?;
-        id += FIRST_ID;


I'm referring to the old code. That is, we could successfully convert the length into a u32 (max), then overflow when we try to add 1.

Stebalien · 2022-05-13T15:14:06Z

fvm/src/kernel/mod.rs

-    /// Look up the code ID at an actor address.
-    fn get_actor_code_cid(&self, addr: &Address) -> Result<Option<Cid>>;
+    /// Look up the code CID of an actor.
+    fn get_actor_code_cid(&self, id: ActorID) -> Result<Option<Cid>>;


TL;DR: no actor changes necessary.

Stebalien · 2022-05-13T15:15:23Z

fvm/src/syscalls/actor.rs

+    let actor_id = context
+        .kernel
+        .resolve_address(&addr)?
+        .ok_or_else(|| syscall_error!(NotFound; "actor not found"))?;


We could break-up the NotFound error, but I'm not sure if it's worth it.

I think that's a fair assumption to make, but it might be worth to document it.

The SDK documents the meanings of all errors.

Stebalien · 2022-05-13T15:15:51Z

fvm/src/syscalls/actor.rs

+    let obuf = context.memory.try_slice_mut(obuf_off, obuf_len)?;
+
+    // Then make sure we can actually put the return result somewhere before we do anything else.
+    const EXPECTED_LEN: u32 = fvm_shared::address::PAYLOAD_HASH_LEN as u32 + 1;


Yeah, I agree.

Stebalien · 2022-05-13T15:16:26Z

fvm/src/syscalls/context.rs

+        let out = self.try_slice_mut(offset, len)?;
+
+        let mut buf = Cursor::new([0u8; MAX_CID_LEN]);
+        // At the moment, all CIDs are gauranteed to fit in 100 bytes (statically) because the max


I'll revert that change.

Stebalien · 2022-05-13T16:11:05Z

shared/src/error/mod.rs

 /// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves
 /// include documentation on _which_ syscall errors they can be expected to return, and what they
 /// mean in the context of the syscall.
+#[non_exhaustive]


Removing the non_exhaustive later would just mean that users can now exhaustively match, it won't affect existing code.

raulk

Thanks, this cleanup is great! 💅

The _actors_ care about them, but we don't. This: 1. Reduces the coupling between actors & the FVM. 2. Generalizes these methods for user actors.

- Renames `resolve_builtin_actor_type` to `get_actor_builtin_type`. There's no real resolution here. - Make `resolve_address` fail with NotFound if the target actor doesn't exist. - Make `get_actor_code_cid` take an actor id, not an address. - We don't charge for address resolution here. - The address is almost always resolved anyways (i.e., the sender). - Make CID return logic consistent: - Fail with `ErrorNumber::BufferTooSmall` if it doesn't fit into the output buffer. - Always return the size of the CID as the return value. - Never write partial CIDs only to fail later. - Always check that we can write the return result before we execute the syscall. - More generally, be pedantic about checking for errors up-front. If a syscall fails, it must fail _without_ side effects.

1. Fix/simplify error checks on put. 1. The overflow checks were wrong. 2. We didn't check for CBOR. 2. Refactor errors and implement conversions. This will make the next patch easier.

Previously, it would return the amount of data read. Unfortunately, this made it impossible to tell if we were at the end of the block. Now, we return the offset (negative or positive) from the end of the block, to the end of the user provided buffer.

This combines two changes that are somewhat interlinked. Really, they're just hard to factor out into two different patches without a bunch of work. Motivation: 1. Remove copying costs from send/return. 2. Avoid calling "syscall" kernel methods in the call manager. 3. I _really_ hated that `Kernel::block_get` method. 4. Remove an extra call on send to "stat" the returned object. ---- 1. In the kernel, the `send` function now takes/returns a block ID. This moves all the parameter-relates logic into the kernel itself. Otherwise, the syscall has to make three kernel calls, all of which charge gas. 2. In the call manager, instead of passing return values & parameters as raw bytes, we pass them as `Option<kernel::Block>` where `kernel::Block` is a reference counted IPLD block (with a codec). This, let's us pass them around without copying, and lets us be very clear about codecs, empty parameters, etc. 3. In the syscalls, return the block length/codec from send to avoid a second syscall _just_ to look those up.

Stebalien added 2 commits May 12, 2022 23:15

chore: remove params from backtrace

5739fd1

- We don't use it. - We're not charging for the copy/memory. - Tracing is better for this kind of thing.

Stebalien mentioned this pull request May 13, 2022

nv16 development checklist #531

Closed

48 tasks

Stebalien commented May 13, 2022

View reviewed changes

Stebalien requested a review from raulk May 13, 2022 03:56

raulk reviewed May 13, 2022

View reviewed changes

raulk mentioned this pull request May 13, 2022

fix: kernel: ipld::read not using offset correctly. #551

Closed

Stebalien commented May 13, 2022

View reviewed changes

Stebalien force-pushed the steb/refactor branch from f760ece to a94b494 Compare May 13, 2022 18:18

raulk approved these changes May 13, 2022

View reviewed changes

Stebalien added 3 commits May 13, 2022 15:00

fix: treat domain separation tags as arbitrary numbers

c6ccdea

The _actors_ care about them, but we don't. This: 1. Reduces the coupling between actors & the FVM. 2. Generalizes these methods for user actors.

refactor: BlockRegistry error cases

cf15572

1. Fix/simplify error checks on put. 1. The overflow checks were wrong. 2. We didn't check for CBOR. 2. Refactor errors and implement conversions. This will make the next patch easier.

Stebalien force-pushed the steb/refactor branch from 56f263e to e25459f Compare May 13, 2022 19:06

Stebalien added 2 commits May 13, 2022 15:07

Stebalien force-pushed the steb/refactor branch from e25459f to b1b35d0 Compare May 13, 2022 19:07

Stebalien merged commit bd4e01b into feat/syscall-changes May 13, 2022

Stebalien deleted the steb/refactor branch May 13, 2022 19:58

This was referenced May 16, 2022

merge syscall changes to master #533

Merged

Clarify documentation / fix IPLD syscall errors #543

Closed

Syscall Cleanup #738

Closed

Allow arbitrary domain separation tags #264

Closed

A series of syscall-related cleanups #555

A series of syscall-related cleanups #555

Uh oh!

Conversation

Stebalien commented May 13, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raulk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment