Skip to content

switch panic handling method #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2021
Merged

Conversation

rasviitanen
Copy link
Contributor

Issue: #310

Description of changes:
Uses std::panic::catch_unwind instead of spawning a new tokio task to catch any panics.
By using std::panic::AssertUnwindSafe we don't have to add any new bounds on the handler, so this should be backwards compatible.

Using std::panic::AssertUnwindSafe should be OK. This is essentially what tokio did under the hood with the previous implementation, so the behavior should be equivalent. See:
https://github.com/tokio-rs/tokio/blob/b42f21ec3e212ace25331d0c13889a45769e6006/tokio/src/runtime/task/harness.rs#L409

What this attempts to solve:
Get rid of the Send + Sync + 'static bounds.

By submitting this pull request

  • I confirm that my contribution is made under the terms of the Apache 2.0 license.
  • I confirm that I've made a best effort attempt to update all relevant documentation.

@@ -0,0 +1,63 @@
// This example requires the following input to succeed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example demonstrates use of shared resources such as DB connections or local caches that can be initialized at the start of the runtime and reused by subsequent lambda handler calls. Run it with the following input:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good comment, thank you! Fixed in the latest push

@rasviitanen rasviitanen force-pushed the remove-static branch 2 times, most recently from 0a6f22a to 493ee5f Compare April 20, 2021 19:32
Copy link
Contributor

@coltonweaver coltonweaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution, thank you! LGTM

@coltonweaver
Copy link
Contributor

Added @bahildebrand since he's been more involved with this discussion too.

@rimutaka
Copy link
Contributor

Can we not rush this PR? Even if it is backward compatible it is a major change in the expected behavior with some unexpected results in edge cases.

  1. catch_unwind does not guarantee all panics will be caught
  2. AssertUnwindSafe is not bullet-proof either

Even if the compiler guarantees memory safety of panics it does not statically guarantee that all panics will be safely unwound. Correct me if I'm wrong here.

I fully support this change, but would like to give it more time to understand all the implications.

@coltonweaver
Copy link
Contributor

Can we not rush this PR? Even if it is backward compatible it is a major change in the expected behavior with some unexpected results in edge cases.

1. catch_unwind does not guarantee all panics will be caught

2. AssertUnwindSafe is not bullet-proof either

Even if the compiler guarantees memory safety of panics it does not statically guarantee that all panics will be safely unwound. Correct me if I'm wrong here.

I fully support this change, but would like to give it more time to understand all the implications.

Intention definitely wasn't to rush this PR, which is why I asked Blake to also take a look :). Completely agree with your points, my desire is to get more eyes on it in case something has been missed.

@bahildebrand
Copy link
Contributor

Can we not rush this PR? Even if it is backward compatible it is a major change in the expected behavior with some unexpected results in edge cases.

1. catch_unwind does not guarantee all panics will be caught

2. AssertUnwindSafe is not bullet-proof either

Even if the compiler guarantees memory safety of panics it does not statically guarantee that all panics will be safely unwound. Correct me if I'm wrong here.

I fully support this change, but would like to give it more time to understand all the implications.

My reading of [catch_unwind(https://doc.rust-lang.org/std/panic/fn.catch_unwind.html) is that we would be guaranteed to catch any panics that were inherent to the types composing the handler since it requires an UnwindSafe implementation. I believe the only things that are !UnwindSafe are mutable references, which is not something that would make a ton of sense to use in this context.

One danger would be if someone were enable abort panics on their binary. I'm not sure if the compiler will get angry since we have a call to catch_unwind, or if it will allow someone to call an abort panic from within their handler. That being said I believe the behavior would be equivalent to spawn. I think this is kind of a foot gun that has existed from before this change.

Overall I don't see any problems with this change. It might be worthwhile to add a note that users should not compile this library panic abort.

@rasviitanen
Copy link
Contributor Author

rasviitanen commented Apr 22, 2021

re:

I believe the only things that are !UnwindSafe are mutable references, which is not something that would make a ton of sense to use in this context.

I think there are some other things that would make sense in this context that are !UnwindSafe. For example std::sync::Once or std::io::Error. That said, a regular Rust thread isn't really Unwindsafe either, I don't see why we would need it here.

About panic = "abort": The compiler will not complain and it will allow someone to abort on panic if they want to. Would it be a problem?

Also: Do we want to catch panics in the created future as well? I.e. if it panics here:
https://github.com/awslabs/aws-lambda-rust-runtime/pull/319/files#diff-66445b7321d3a64a01b04af7edeb065318728d4a137b6795f7121d61ef744dd9R61-R62

name: &'static str,
}

impl TryFrom<&'static str> for SharedClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be tryfrom if its infallible?

Copy link
Contributor Author

@rasviitanen rasviitanen Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial though was to avoid consumers of this library (or future contributors) to not run into the idea where: "Hey why is this a From, you can just do let client = &SharedClient("name"); directly". The problem with this is that client in this case would be static, so it wouldn't be as clear that you can use some non-static reference. Whereas TryFrom would make people think one extra step and realize that you can do some more advanced setup.

But the example in its current form is indeed confusing.
The better solution here is to simply implement a new function for the SharedClient and force it to be bound like that instead. Then I think the barrier to understand the example will be much lower.

I'll work on a fix, thanks!

let request_id = &ctx.request_id.clone();
#[allow(clippy::async_yields_async)]
let task = tokio::spawn(async move { handler.call(body, ctx) });
let task = panic::catch_unwind(panic::AssertUnwindSafe(|| handler.call(body, ctx)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this over set_hook?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime needs to report any panics to the invocation error endpoint: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html.

(I'm assuming that the set_hook you're referring to would be set by the customer?)

Copy link
Contributor

@bahildebrand bahildebrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm going to leave this open for another day in case anyone else has comments they'd like to get in. Otherwise great job @rasviitanen. This is an awesome contribution!

@seanpianka
Copy link
Contributor

Hi @bahildebrand, what is the timeline for merging this PR?

@bahildebrand
Copy link
Contributor

Merging this now. Rust fmt is broken, but I'll push up another PR to fix it.

@bahildebrand bahildebrand merged commit 53397cd into awslabs:master May 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants