Description
Hey! First let me say that I love this project. Clearly a lot of thoughtful design work has been put in. Kudos to all the contributors. Sorry about the wall of text in advance.
My scenario is this: I’ve noticed that even debug incremental recompiles of an absolutely barebones lambda_http
app take upwards of 15 seconds. That doesn’t sound like much but multiply this by 10x lambdas in a workspace and it gets out of hand pretty quick.
I did some cursory investigation (the results of which I can post later). My investigation indicates that the slowness comes during the codegen_crate
and subsequent LLVM optimize steps of the build.
Checking the amount of emitted LLVM IR, I see that it’s in the 4M+ lines range! This is a lot of code for LLVM to optimize, so it explains why both the steps take so long.
I took a look at why the IR is being generated but I couldn’t find any obvious pathological cases. It appears that the majority of the IR is serde::Deserialize
impls on the aws_lambda_events
types used by this crate.
I took a look at those types, and they have a heck of a lot of #[serde(deserialize_with=“…”)]
on their fields. That attribute isn’t too expensive on its own, but multiplied across every field of every struct -- that’s a lot of codegen. There was also some duplicate code because #[derive(Deserialize)]
implements visit_seq
and visit_map
. The former is unnecessary because we’re dealing with JSON and it is strictly invalid in most cases to allow users to deserialize a JSON array into a struct.
(Side-note: there was a proposal to share the deserialize_with
wrappers between visit_seq
and visit_map
, but it was closed as wontfix
)
So I just cut the visit_seq
code out entirely via a custom #[derive(Deserialize)]
impl. That plus stripping down this crate’s LambdaRequest
enum to only cover the cases I care about let me bring the LLVM IR count down from 4.5M to about 3.5.
That was it for the low-hanging fruit. The remaining deserialization code seems necessary, but is a bit bloated, purely because serde is very generous with error handling logic. I could probably write it out by hand to be more efficient, or improve the existing codegen of the #[derive(Deserialize)]
macro.
But I’ve been rethinking my approach. The problem is that we’re re-running the #[derive(Deserialize)]
macro every time the lambda binary compiles. There are some efforts to disable derives when running cargo check. This is clever for the purposes of improving type hint resolution times, etc. But I don’t think it’s sufficient. What’s needed is a true way to cache the results of proc macro evaluations -- if said proc macros are pure. This finally brings me to my first question: Does anyone know of a way to cache the results of a proc-macro evaluation between builds?
Alternatively: Is it possible to force rustc to evaluate a proc macro for a struct, before said struct is used in a leaf crate?
cc @LegNeato @calavera as maintainers of the aws-lambda-events crate, any ideas? Perhaps we could Selectively cargo-expand the Deserialize
impls in the aws-lambda-events crate? It would result in a much larger source file, but could cut out at least some of the crate_codegen
time in incremental builds. Not sure how much but it’d be a decent chunk of the 40% it seems to take now.
cc @calavera @nmoutschen as maintainers of this crate, any ideas? Perhaps we could refactor things a bit so that users can provide their own APIGW types and deserialize on their end? That way there’s a bit more flexibility. For example I could fork aws-lambda-events, strip it to just the types I need, and hand roll Deserialize
impls. At the moment I’m locked in to the types this crate provides. I understand that this crate does this because it handles a lot of the boilerplate when it comes to talking to AWS, but I’m fine with my crate handling some of that.