Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: gracefully handle error conditions in PocketIC server #2556

Merged
merged 3 commits into from
Nov 12, 2024

Conversation

mraszyk
Copy link
Contributor

@mraszyk mraszyk commented Nov 11, 2024

This PR gracefully handles error conditions in the PocketIC server by returning OpOut::Error (resulting in an HTTP response with an error status code and error message in the response body) instead of panicking in the server (resulting in a confusing error message in the PocketIC library due to abruptly interrupted HTTP connection after the server panicked).

@mraszyk mraszyk requested review from a team as code owners November 11, 2024 17:39
@github-actions github-actions bot added the fix label Nov 11, 2024
@michael-weigelt
Copy link
Contributor

I agree that http responses from panics are bad. But I am not sure OpOut is the correct layer for errors. Aren't we wrapping the OpOut in some ApiResponse anymore? Which could have an Error variant, so that the http response can be nice nevertheless?

@mraszyk
Copy link
Contributor Author

mraszyk commented Nov 12, 2024

A computation returns OpOut so error conditions during a computation can only be reported via OpOut at the moment. The graph also stores OpOut. It is indeed the case that OpOut is converted to a status code and ApiResponse and then also back, but I see the separation as follows:

  • OpOut contains the result of the computation independently of its HTTP representation;
  • the conversion to ApiResponse deals with JSON encoding and enables returning an OpOut::Error as an OK response (e.g., if we decide that getting stable memory of a non-existing canister should not panic in the library, then we could say that the corresponding OpOut::Error maps to an OK response of a new Raw type that we introduce).

Does that make sense or should I elaborate more on some parts?

@michael-weigelt
Copy link
Contributor

error conditions during a computation

I think initially we aimed at making Ops infallible (and reject bad requests at parse time), but I am actually not sure that was ever realistic. But if possible, I would prefer that. Isn't it the case that all ingress messages will result in either WasmResult::Result or ::Reject, and we can always turn that into an OpOut? And all other Ops are System/Subnet-level operations, for which we could perhaps guarantee infallability upfront?

I think your new proposed strategy is viable too, but why switch the strategy now?

enables returning an OpOut::Error as an OK response

IIUC, this means we'd just use HTTP as a message protocol, and therefore don't provide a REST-API (where the state of the application/request is reflected in the http status code etc), which was the aim initially.

I am not sure how valuable adherence to REST is... but I could imagine that other implementers of libraries value this more than us internally.

Again what I question is not necessarily the proposition in the PR, but rather I want to point out that we used to have a strategy here and your proposal is a change in strategy.

I don't quite understand the example in parentheses. What does a library panic have to do with the server's return values?

@mraszyk
Copy link
Contributor Author

mraszyk commented Nov 12, 2024

I think initially we aimed at making Ops infallible (and reject bad requests at parse time), but I am actually not sure that was ever realistic.

This is not realistic because the PocketIC router cannot know if a canister exists (for instance).

Isn't it the case that all ingress messages will result in either WasmResult::Result or ::Reject, and we can always turn that into an OpOut?

No, they can also fail because the canister ID cannot be routed (the canister ID is valid, but there's no subnet to which the request could be routed).

And all other Ops are System/Subnet-level operations, for which we could perhaps guarantee infallability upfront?

This is not the case, e.g., setting time into the past is an error that you can only detect once you have access to the PocketIc instance (run a computation).

why switch the strategy now?

to me it doesn't feel like switching any strategy

where the state of the application/request is reflected in the http status code etc

it depends on the semantics of an operation, if you define that "get_stable_memory" must be called for a non-empty canister, then failing to do so should result in 4xx status code; if you define that "get_stable_memory" can be called for any canister and the structured return value (e.g., of some enumeration type) characterizes if the canister is empty, then we should get 200 even if the computation failed because the canister does not exist. Now you can encode this in OpOut (just like we do with OpOut::MaybeSubnetId), but it seems cleaner to return OpOut::Error and convert it into status code later (instead of having OpOut::MaybeA, OpOut::MaybeB, OpOut::MaybeC etc.)

What does a library panic have to do with the server's return values?

The library interprets 4xx as panic.

@michael-weigelt
Copy link
Contributor

to me it doesn't feel like switching any strategy

Ok, you have sufficiently shown that the original aim is not easily achievable, so it doesn't matter much if we don't get any closer.

Now you can encode this in OpOut (just like we do with OpOut::MaybeSubnetId), but it seems cleaner to return OpOut::Error and convert it into status code later (instead of having OpOut::MaybeA, OpOut::MaybeB, OpOut::MaybeC etc.)

Agreed.

The library interprets 4xx as panic.

Seems a bit strange, because from RESTful perspective I'd exactly expect a 404 if I try to interact with a canister that does not exist. To interpret every 4xx as a "axum panicked" is not right. But this is a bit off-topic.

@mraszyk mraszyk added this pull request to the merge queue Nov 12, 2024
Merged via the queue into master with commit bc2c84c Nov 12, 2024
24 checks passed
@mraszyk mraszyk deleted the mraszyk/pic-unwraps branch November 12, 2024 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants