-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: gracefully handle error conditions in PocketIC server #2556
Conversation
I agree that http responses from panics are bad. But I am not sure OpOut is the correct layer for errors. Aren't we wrapping the OpOut in some ApiResponse anymore? Which could have an Error variant, so that the http response can be nice nevertheless? |
A computation returns OpOut so error conditions during a computation can only be reported via OpOut at the moment. The graph also stores OpOut. It is indeed the case that OpOut is converted to a status code and ApiResponse and then also back, but I see the separation as follows:
Does that make sense or should I elaborate more on some parts? |
I think initially we aimed at making Ops infallible (and reject bad requests at parse time), but I am actually not sure that was ever realistic. But if possible, I would prefer that. Isn't it the case that all ingress messages will result in either WasmResult::Result or ::Reject, and we can always turn that into an OpOut? And all other Ops are System/Subnet-level operations, for which we could perhaps guarantee infallability upfront? I think your new proposed strategy is viable too, but why switch the strategy now?
IIUC, this means we'd just use HTTP as a message protocol, and therefore don't provide a REST-API (where the state of the application/request is reflected in the http status code etc), which was the aim initially. I am not sure how valuable adherence to REST is... but I could imagine that other implementers of libraries value this more than us internally. Again what I question is not necessarily the proposition in the PR, but rather I want to point out that we used to have a strategy here and your proposal is a change in strategy. I don't quite understand the example in parentheses. What does a library panic have to do with the server's return values? |
This is not realistic because the PocketIC router cannot know if a canister exists (for instance).
No, they can also fail because the canister ID cannot be routed (the canister ID is valid, but there's no subnet to which the request could be routed).
This is not the case, e.g., setting time into the past is an error that you can only detect once you have access to the PocketIc instance (run a computation).
to me it doesn't feel like switching any strategy
it depends on the semantics of an operation, if you define that "get_stable_memory" must be called for a non-empty canister, then failing to do so should result in 4xx status code; if you define that "get_stable_memory" can be called for any canister and the structured return value (e.g., of some enumeration type) characterizes if the canister is empty, then we should get 200 even if the computation failed because the canister does not exist. Now you can encode this in OpOut (just like we do with OpOut::MaybeSubnetId), but it seems cleaner to return OpOut::Error and convert it into status code later (instead of having OpOut::MaybeA, OpOut::MaybeB, OpOut::MaybeC etc.)
The library interprets 4xx as panic. |
Ok, you have sufficiently shown that the original aim is not easily achievable, so it doesn't matter much if we don't get any closer.
Agreed.
Seems a bit strange, because from RESTful perspective I'd exactly expect a 404 if I try to interact with a canister that does not exist. To interpret every 4xx as a "axum panicked" is not right. But this is a bit off-topic. |
This PR gracefully handles error conditions in the PocketIC server by returning
OpOut::Error
(resulting in an HTTP response with an error status code and error message in the response body) instead of panicking in the server (resulting in a confusing error message in the PocketIC library due to abruptly interrupted HTTP connection after the server panicked).