Skip to content

Conversation

@Stebalien
Copy link
Member

This change-set includes 6 patches that should be reviewed (mostly) independently. They're bundled because they build on each other (at least to an extent).

Please see the commit messages for detailed descriptions. I don't feel too strongly about most of these changes, so I'm happy to punt anything we feel we can do later and/or just isn't necessary.

Stebalien added 2 commits May 12, 2022 23:15
- We don't use it.
- We're not charging for the copy/memory.
- Tracing is better for this kind of thing.
We'll be adding more error numbers as we add additional syscalls. Adding
new error numbers to the system shouldn't be a breaking change.

NOTE: adding/changing an error number on a specific syscall is a
different matter. But if we add a new syscall with a new error number,
that's not breaking anything.
@Stebalien Stebalien mentioned this pull request May 13, 2022
48 tasks
/// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves
/// include documentation on _which_ syscall errors they can be expected to return, and what they
/// mean in the context of the syscall.
#[non_exhaustive]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to do this, but it's easier to make this "extensible" now than it is to do that later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? If we were wrong about this decision so early on in the lifetime of the FVM, it's going to be much harder to backtrack later and make it exhaustive, than to go the other way around. IMO this wasn't necessary now, but I don't have a strong opinion because it does seem right for this to be non-exhaustive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the non_exhaustive later would just mean that users can now exhaustively match, it won't affect existing code.

/// ChainEpoch, Entropy from the ticket chain.
fn get_chain_randomness(
&self,
pers: DomainSeparationTag,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not consensus critical, but it has been on my mind for a while.

  1. Removes unnecessary error cases from the kernel.
  2. Paves the way for user actors (that won't care about specific tags).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although does not change the ABI of syscalls, so we could've done this at any time.

This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'm happy to break it into a second PR if you'd like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).

I've removed the type from the FVM entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, not a problem.


/// Returns whether the supplied code_cid belongs to a known built-in actor type.
fn resolve_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>;
fn get_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to revert this. We had two get and two resolve functions in this module, and I was trying to define a pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. I think I had proposed get_ originally but for some reason we agreed on resolve_.

Comment on lines +15 to +18
let actor_id = context
.kernel
.resolve_address(&addr)?
.ok_or_else(|| syscall_error!(NotFound; "actor not found"))?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns an error on resolution failure, not a -1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the resolve_address kernel method is incapable of returning NotFound now or ever for a circumstance other than the address not existing. Otherwise we will have ambiguity. I think that's a fair assumption to make, but it might be worth to document it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could break-up the NotFound error, but I'm not sure if it's worth it.

I think that's a fair assumption to make, but it might be worth to document it.

The SDK documents the meanings of all errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, just commenting that we're masking a potential NotFound from the kernel, but that it currently can't produce it. Definitely an uber-nit, not important.

Comment on lines +94 to +97
let typ = RegisteredSealProof::from(proof_type);
if let RegisteredSealProof::Invalid(invalid) = typ {
return Err(syscall_error!(IllegalArgument; "invalid proof type {}", invalid).into());
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing a todo

pub type BlockId = u32;

const FIRST_ID: BlockId = 1;
const MAX_BLOCKS: u32 = i32::MAX as u32; // TODO: Limit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set a limit in M2.

.len()
.try_into()
.map_err(|_| BlockError::TooManyBlocks)?;
id += FIRST_ID;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could overflow (technically...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, how? We check if we're full above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to the old code. That is, we could successfully convert the length into a u32 (max), then overflow when we try to add 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, GitHub diff view...

if self.is_full() {
return Err(BlockPutError::TooManyBlocks);
}
if block.codec != DAG_CBOR {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing the TODO, we only put cbor.

pub enum BlockError {
#[error("block {0} is unreachable")]
Unreachable(Box<Cid>),
pub enum BlockPutError {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to just use syscall errors, but technically this isn't a part of the syscall interface.

@Stebalien Stebalien requested a review from raulk May 13, 2022 03:56
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main pushback is the change in signature of the invoke entrypoint for actors.

/// Look up the code ID at an actor address.
fn get_actor_code_cid(&self, addr: &Address) -> Result<Option<Cid>>;
/// Look up the code CID of an actor.
fn get_actor_code_cid(&self, id: ActorID) -> Result<Option<Cid>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to adapt the built-in actors runtime too. I don't think we ever get the actor CID from a non-ID address, so it should be straighforward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already adapted the SDK to per-resolve the address, if necessary. And yes, we only ever get the actor CID for the caller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: no actor changes necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already been getting the built-ins changed to use IDs even tho the syscall was address

/// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves
/// include documentation on _which_ syscall errors they can be expected to return, and what they
/// mean in the context of the syscall.
#[non_exhaustive]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? If we were wrong about this decision so early on in the lifetime of the FVM, it's going to be much harder to backtrack later and make it exhaustive, than to go the other way around. IMO this wasn't necessary now, but I don't have a strong opinion because it does seem right for this to be non-exhaustive.

/// ChainEpoch, Entropy from the ticket chain.
fn get_chain_randomness(
&self,
pers: DomainSeparationTag,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although does not change the ABI of syscalls, so we could've done this at any time.

This also means that the type (DomainSeparationTag) can move from fvm_shared to fvm_sdk, or directly to the built-in actors runtime (this option is probably better to avoid confusion).


/// Returns whether the supplied code_cid belongs to a known built-in actor type.
fn resolve_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>;
fn get_builtin_actor_type(&self, code_cid: &Cid) -> Option<actor::builtin::Type>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. I think I had proposed get_ originally but for some reason we agreed on resolve_.

.len()
.try_into()
.map_err(|_| BlockError::TooManyBlocks)?;
id += FIRST_ID;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, how? We check if we're full above.


fn block_read(&mut self, id: BlockId, offset: u32, buf: &mut [u8]) -> Result<u32> {
let data = self.blocks.get(id).or_illegal_argument()?.data();
fn block_read(&mut self, id: BlockId, offset: u32, buf: &mut [u8]) -> Result<i32> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this change renders #551 irrelevant now.


/// The maximum supported CID size. (SPEC_AUDIT)
pub const MAX_CID_LEN: usize = 100;
pub const MAX_CID_LEN: usize = 256;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we're doing this now, although we don't have a clear case for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I didn't realize I had actually made this change. We don't have a clear case and I'm happy to bring it back down to 100.

pub struct Block {
codec: u64,
data: Box<[u8]>,
data: Rc<Box<[u8]>>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional. Otherwise, converting from a vec/box would mean we'd need to copy the block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: we can avoid the extra allocation in most cases... with a custom library that I haven't published yet because it's a bit scary.


/// The maximum supported CID size. (SPEC_AUDIT)
pub const MAX_CID_LEN: usize = 100;
pub const MAX_CID_LEN: usize = 256;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I didn't realize I had actually made this change. We don't have a clear case and I'm happy to bring it back down to 100.

/// ChainEpoch, Entropy from the ticket chain.
fn get_chain_randomness(
&self,
pers: DomainSeparationTag,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'm happy to break it into a second PR if you'd like.

.len()
.try_into()
.map_err(|_| BlockError::TooManyBlocks)?;
id += FIRST_ID;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to the old code. That is, we could successfully convert the length into a u32 (max), then overflow when we try to add 1.

/// Look up the code ID at an actor address.
fn get_actor_code_cid(&self, addr: &Address) -> Result<Option<Cid>>;
/// Look up the code CID of an actor.
fn get_actor_code_cid(&self, id: ActorID) -> Result<Option<Cid>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: no actor changes necessary.

Comment on lines +15 to +18
let actor_id = context
.kernel
.resolve_address(&addr)?
.ok_or_else(|| syscall_error!(NotFound; "actor not found"))?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could break-up the NotFound error, but I'm not sure if it's worth it.

I think that's a fair assumption to make, but it might be worth to document it.

The SDK documents the meanings of all errors.

let obuf = context.memory.try_slice_mut(obuf_off, obuf_len)?;

// Then make sure we can actually put the return result somewhere before we do anything else.
const EXPECTED_LEN: u32 = fvm_shared::address::PAYLOAD_HASH_LEN as u32 + 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree.

let out = self.try_slice_mut(offset, len)?;

let mut buf = Cursor::new([0u8; MAX_CID_LEN]);
// At the moment, all CIDs are gauranteed to fit in 100 bytes (statically) because the max
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert that change.

/// When a syscall fails, it returns an `ErrorNumber` to indicate why. The syscalls themselves
/// include documentation on _which_ syscall errors they can be expected to return, and what they
/// mean in the context of the syscall.
#[non_exhaustive]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the non_exhaustive later would just mean that users can now exhaustively match, it won't affect existing code.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this cleanup is great! 💅

Stebalien added 3 commits May 13, 2022 15:00
The _actors_ care about them, but we don't. This:

1. Reduces the coupling between actors & the FVM.
2. Generalizes these methods for user actors.
- Renames `resolve_builtin_actor_type` to `get_actor_builtin_type`.
  There's no real resolution here.
- Make `resolve_address` fail with NotFound if the target actor doesn't
  exist.
- Make `get_actor_code_cid` take an actor id, not an address.
    - We don't charge for address resolution here.
    - The address is almost always resolved anyways (i.e., the sender).
- Make CID return logic consistent:
    - Fail with `ErrorNumber::BufferTooSmall` if it doesn't fit into the
      output buffer.
    - Always return the size of the CID as the return value.
    - Never write partial CIDs only to fail later.
- Always check that we can write the return result before we execute the
  syscall.
- More generally, be pedantic about checking for errors up-front. If a
  syscall fails, it must fail _without_ side effects.
1. Fix/simplify error checks on put.
  1. The overflow checks were wrong.
  2. We didn't check for CBOR.
2. Refactor errors and implement conversions. This will make the next
   patch easier.
Stebalien added 2 commits May 13, 2022 15:07
Previously, it would return the amount of data read. Unfortunately, this
made it impossible to tell if we were at the end of the block.

Now, we return the offset (negative or positive) from the end of the
block, to the end of the user provided buffer.
This combines two changes that are somewhat interlinked. Really, they're
just hard to factor out into two different patches without a bunch of work.

Motivation:

1. Remove copying costs from send/return.
2. Avoid calling "syscall" kernel methods in the call manager.
3. I _really_ hated that `Kernel::block_get` method.
4. Remove an extra call on send to "stat" the returned object.

----

1. In the kernel, the `send` function now takes/returns a block ID. This
moves all the parameter-relates logic into the kernel itself. Otherwise,
the syscall has to make three kernel calls, all of which charge gas.

2. In the call manager, instead of passing return values & parameters as
raw bytes, we pass them as `Option<kernel::Block>` where `kernel::Block`
is a reference counted IPLD block (with a codec). This, let's us pass
them around without copying, and lets us be very clear about codecs,
empty parameters, etc.

3. In the syscalls, return the block length/codec from send to avoid a
second syscall _just_ to look those up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants