Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup confusion arising from mixing of unix exit codes and Rust Result types #4261

Merged
merged 4 commits into from
Nov 24, 2023

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Nov 23, 2023

An exit code is a kind of Result: Either everything is Ok() if the exit code = 0, or some Err() happened if exit code != 0. Therefore, nesting FcExitCode inside the Err variant of a Rust Result yields a construct of Result<_, Result<_, _>>. This is no well-defined (as it is unclear what Err(Ok(...)) is supposed to mean). This was made worse by the existence of ApiServerError::MicroVMStoppedWithoutError, which turned ApiServerError into a kind of Result itself. Since MicroVmStoppedWithoutError has a field of type FcExitCode, we ended up with a construct of Result<_, Result<Result<_, _>, _>, or, in terms of user visible error messages: "RunWithApiError error: MicroVMStopped without an error: GenericError". The confusion around this lead to bugs such as #4176.

This PR attempts to clear up this confusion by removing ApiServer::Error::MicroVMSteppedWithoutError, and eliminating the use of FcExitCode as an Err type wherever possible. We are only left with a single instance of this (ApiServerError::MicroVmStoppedWithError), which cannot be resolved without significantly refractoring (and eliminating the use of exit codes from the emulation code).

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

This error variant was used to encode that no error actually happened,
which does not make much sense conceptually. What made this worse is
that is contained a FcExitCode, which is itself just a fake Result<(),
non-zero-exit code>. This means it was possible to get Firecracker to
exit with status "error, but not actually error, but actually there is
an error after all", or: "Firecracker exited with an error:
Microvm stopped without an error: GenericError".

The underlying problem here is the fact that we are using `FcExitCode`
as an error variant for `Result`. Since `FcExitCode::Ok` exists,
`FcExitCode` is a kind of `Result` itself, meaning we are dealing with
`Result<_, Result<_, _>>` as a type, which has no well-defined
interpretation.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Copy link

codecov bot commented Nov 23, 2023

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Comparison is base (640b6d6) 81.66% compared to head (f4b8353) 81.69%.

Files Patch % Lines
src/vmm/src/rpc_interface.rs 0.00% 9 Missing ⚠️
src/firecracker/src/api_server_adapter.rs 0.00% 5 Missing ⚠️
src/vmm/src/lib.rs 50.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4261      +/-   ##
==========================================
+ Coverage   81.66%   81.69%   +0.03%     
==========================================
  Files         240      240              
  Lines       29303    29291      -12     
==========================================
- Hits        23931    23930       -1     
+ Misses       5372     5361      -11     
Flag Coverage Δ
4.14-c7g.metal 77.15% <0.00%> (+0.03%) ⬆️
4.14-m5d.metal 79.04% <20.83%> (+0.03%) ⬆️
4.14-m6a.metal 78.16% <20.83%> (+0.03%) ⬆️
4.14-m6g.metal 77.15% <0.00%> (+0.03%) ⬆️
4.14-m6i.metal 79.02% <20.83%> (+0.03%) ⬆️
5.10-c7g.metal 80.03% <0.00%> (+0.03%) ⬆️
5.10-m5d.metal 81.68% <20.83%> (+0.02%) ⬆️
5.10-m6a.metal 80.90% <20.83%> (+0.03%) ⬆️
5.10-m6g.metal 80.03% <0.00%> (+0.03%) ⬆️
5.10-m6i.metal 81.67% <20.83%> (+0.03%) ⬆️
6.1-c7g.metal 80.03% <0.00%> (+0.03%) ⬆️
6.1-m5d.metal 81.69% <20.83%> (+0.04%) ⬆️
6.1-m6a.metal 80.90% <20.83%> (+0.03%) ⬆️
6.1-m6g.metal 80.03% <0.00%> (+0.03%) ⬆️
6.1-m6i.metal 81.67% <20.83%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roypat roypat marked this pull request as ready for review November 23, 2023 12:54
Using FcExitCode as an error type is undesirable, as it allows us to
construct Err(FcExitCode::Ok), e.g. an object that says "error:
everything's okay!". This is confusing and has caused problems in
different contexts before, so replace FcExitCode with a proper error
type here.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Nov 23, 2023
src/vmm/src/lib.rs Outdated Show resolved Hide resolved
src/vmm/src/lib.rs Outdated Show resolved Hide resolved
Previously, when a VM exited, we looked for the first vcpu that reported
an exit status, and then indiscriminately reported that back. However,
it is possible to one vcpu to exit successfully while another exits with
an error, and this could lead us to report "Firecracker Exited
Successfully" even though it did not.

Now, we explicitly look for the status code of all vcpus. If any of them
report an error, this takes precedence over non-error status codes.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat merged commit 7660a59 into firecracker-microvm:main Nov 24, 2023
6 of 7 checks passed
roypat added a commit to roypat/firecracker that referenced this pull request Nov 24, 2023
It is technically a bug fix

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat mentioned this pull request Nov 24, 2023
9 tasks
wearyzen pushed a commit to roypat/firecracker that referenced this pull request Nov 27, 2023
It is technically a bug fix

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
roypat added a commit that referenced this pull request Nov 27, 2023
It is technically a bug fix

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat deleted the no-exitcode-in-result branch April 15, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants