Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge runtime - JavaScript heap out of memory #51298

Closed
1 task done
ValentinH opened this issue Jun 14, 2023 · 32 comments · Fixed by #62336
Closed
1 task done

Edge runtime - JavaScript heap out of memory #51298

ValentinH opened this issue Jun 14, 2023 · 32 comments · Fixed by #62336
Assignees
Labels
bug Issue was opened via the bug report template. linear: next Confirmed issue that is tracked by the Next.js team. locked Runtime Related to Node.js or Edge Runtime with Next.js.

Comments

@ValentinH
Copy link
Contributor

ValentinH commented Jun 14, 2023

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
      Platform: darwin
      Arch: arm64
      Version: Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000
    Binaries:
      Node: 16.18.1
      npm: 8.19.2
      Yarn: 1.22.19
      pnpm: 7.26.1
    Relevant packages:
      next: 13.4.6-canary.4
      eslint-config-next: N/A
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 5.1.3

Which area(s) of Next.js are affected? (leave empty if unsure)

Middleware / Edge (API routes, runtime)

Link to the code that reproduces this issue or a replay of the bug

https://github.com/ValentinH/next-edge-build-issue

To Reproduce

  • clone the repository
  • yarn install
  • yarn build (or even NODE_OPTIONS='--max-old-space-size=4096' yarn build to make it crash sooner)

The memory consumption goes super high (more than 16GB on my machine) and the command ultimately fails with:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

<--- Last few GCs --->

[42616:0x158008000] 35153 ms: Scavenge 4086.3 (4135.9) -> 4084.0 (4136.7) MB, 4.1 / 0.0 ms (average mu = 0.407, current mu = 0.205) allocation failure
[42616:0x158008000] 35159 ms: Scavenge 4087.1 (4136.7) -> 4084.7 (4137.4) MB, 4.4 / 0.0 ms (average mu = 0.407, current mu = 0.205) allocation failure
[42616:0x158008000] 35549 ms: Scavenge 4088.0 (4137.7) -> 4085.5 (4146.4) MB, 386.1 / 0.0 ms (average mu = 0.407, current mu = 0.205) allocation failure

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0x1000f9c84 node::Abort() [/whatever/node]
2: 0x1000f9e74 node::ModifyCodeGenerationFromStrings(v8::Localv8::Context, v8::Localv8::Value, bool) [/whatever/node]
3: 0x10023e840 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/whatever/node]
4: 0x10023e800 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/whatever/node]
5: 0x1003c1d1c v8::internal::Heap::GarbageCollectionReasonToString(v8::internal::GarbageCollectionReason) [/whatever/node]
6: 0x1003c083c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/whatever/node]
7: 0x1003cbb84 v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/whatever/node]
8: 0x1003cbc18 v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/whatever/node]
9: 0x10039eaac v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/whatever/node]
10: 0x1006d6bd0 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/whatever/node]
11: 0x1009ea08c Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/whatever/node]
12: 0x1055b6444
13: 0x1052a2b90
14: 0x105115d20
15: 0x1055d44b4
16: 0x1052a40f4
17: 0x1055d34b0
18: 0x10504f498
19: 0x1055ccdb8
20: 0x1055d9aa4
21: 0x10097dd18 Builtins_InterpreterEntryTrampoline [/whatever/node]
22: 0x10504e368
23: 0x1052a1ab4
24: 0x104f61404
25: 0x104fe058c
26: 0x104ff0250
27: 0x104fdfc84
28: 0x10522f278
29: 0x100a32178 Builtins_PromiseFulfillReactionJob [/whatever/node]
30: 0x10099f6f4 Builtins_RunMicrotasks [/whatever/node]
31: 0x10097b9e4 Builtins_JSRunMicrotasksEntry [/whatever/node]
32: 0x10034e4cc v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/whatever/node]
33: 0x10034e900 v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/whatever/node]
34: 0x10034e9ec v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*, v8::internal::MaybeHandlev8::internal::Object) [/whatever/node]
35: 0x100371628 v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate
) [/whatever/node]
36: 0x100371ebc v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) [/whatever/node]
37: 0x100049c4c node::InternalCallbackScope::Close() [/whatever/node]
38: 0x10004977c node::CallbackScope::~CallbackScope() [/whatever/node]
39: 0x1000d1ae0 (anonymous namespace)::uvimpl::Work::AfterThreadPoolWork(int) [/whatever/node]
40: 0x10095c0c0 uv__work_done [/whatever/node]
41: 0x10095f85c uv__async_io [/whatever/node]
42: 0x1009715a8 uv__io_poll [/whatever/node]
43: 0x10095fcec uv_run [/whatever/node]
44: 0x10004a6d4 node::SpinEventLoop(node::Environment*) [/whatever/node]
45: 0x100133a90 node::NodeMainInstance::Run(int*, node::Environment*) [/whatever/node]
46: 0x100133770 node::NodeMainInstance::Run() [/whatever/node]
47: 0x1000cde38 node::Start(int, char**) [/whatever/node]
48: 0x19cd8fe50 start [/usr/lib/dyld]
error Command failed with signal "SIGABRT".

Describe the Bug

We are in the process of migrating all our API routes to the Edge runtime. So far we have migrated 43 of them and for a few days we are getting build errors on Vercel:

ERROR  run failed: command  exited (129)
Error: Command "turbo run build" exited with 129
BUILD_UTILS_SPAWN_129: Command "turbo run build" exited with 129

The build was actually also sometimes failing locally with the Reached heap limit Allocation failed - JavaScript heap out of memory error.

After digging a lot in our codebase to understand what was going on, we managed to specifically identify our Edge functions. These functions are using our generated GraphQL client that lives in a pretty large file (2.5MB) that contains all the possible operations.

I managed to create a reproduction in a greenfield Next.js project using the latest canary. However, to reach the same amount of memory, I had to create many more routes (200) to reproduce the crash. The main reason for this is that I'm not able to share our internal GraphQL client generated on our private schema. Therefore, I created a smaller client (3 times smaller) from the Gitlab GraphQL endpoint.

If I create the same scenario of 200 handlers but not using the Edge runtime, the build runs smoothly in under 10 seconds with no visible impact on my machine memory. To witness this, you can try the serverless branch on the shared repository.

Expected Behavior

Compiling many Edge runtime API routes should be similar to Serverless API routes.

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

Vercel

NEXT-1785

@ValentinH ValentinH added the bug Issue was opened via the bug report template. label Jun 14, 2023
@github-actions github-actions bot added the Runtime Related to Node.js or Edge Runtime with Next.js. label Jun 14, 2023
@ValentinH
Copy link
Contributor Author

Thanks to @elbalexandre, we discovered that setting swcMinify: false in the next.config.js "solves" the issue: the build is around 10x slower than the serverless version but it doesn't crash anymore.

On the serverless version, there is not much difference when using or not swcMinify.

@Eusebiotrigo
Copy link

Eusebiotrigo commented Nov 8, 2023

We are facing this issue as well, it runs out of memory and does not let us publish the page, through GitHub actions or through Cloudflare. It just raises a "yarn run build exited with code 129".

With the flag to false we can build, with the flag to true, it goes OOM. And just checked and this flag will be deprecated on Next15 and always enabled.

Ref: https://github.com/vercel/next.js/pull/57467/files

@ValentinH
Copy link
Contributor Author

Thanks for sharing, we haven't tried enabling the outputFileTracing flag .

@Eusebiotrigo
Copy link

We moved some pages (3) to the app folder from the pages folder and we got the high memory error again in our build in Cloudflare (and it already has the --max-old-space to 8GB.

So the minify set to false helped for a little, but now it is not working for us anymore.

@izakfilmalter
Copy link

I am having the same issue. Very hard to figure out which page or lib is causing the issue.

@boredjoker

This comment has been minimized.

@feedthejim feedthejim added the linear: next Confirmed issue that is tracked by the Next.js team. label Dec 4, 2023
@huozhi
Copy link
Member

huozhi commented Dec 7, 2023

It's sth related to source map of swc compilation, we're investigating now

@huozhi huozhi self-assigned this Dec 7, 2023
@izakfilmalter
Copy link

@huozhi let me know if you need help testing the fix. I can give you temp access to our closed source repo. Build fails everytime with edge runtime, but will succeed on node.

@huozhi
Copy link
Member

huozhi commented Dec 8, 2023

We landed a fix (#59393) in 14.0.5-canary.2, ideally it could reduce a bit the memory consuming issue during minification. Please test against and let us know the result 🙏

The reproduction as it has too many edge functions (not sure if it's too extreme) is still failing, we'll see if we can keep improve on it.

@ValentinH
Copy link
Contributor Author

The reproduction as it has too many edge functions (not sure if it's too extreme) is still failing, we'll see if we can keep improve on it.

In our application, we have a lot of API routes and ultimately we would like to be able to use the Edge Runtime in all of them. Therefore, I don't think it's too extreme.
Could it be possible to not minify all of them at once but do them by batch?

@ValentinH
Copy link
Contributor Author

Still I'm looking forward to trying the fix of #59393 because we recently have our builds often failing on Vercel (not locally) even though we stopped using our huge codegen file in the Edge functions.
Interestingly, redeploying without build cache makes it work. Not sure why doing more work (without cache) reduces the memory consumption but this might be a lead.

@izakfilmalter
Copy link

@huozhi Tried 14.0.5-canary.5, still failed. https://vercel.com/steeple-inc/steeple-works/BHscDFNbaYJd4TVTJP9FK3CrszvP

@huozhi
Copy link
Member

huozhi commented Dec 12, 2023

@izakfilmalter the deployment is 404 for me, do you have a testing app to share that we can keep looking into it?

@oliversoar
Copy link

@kdy1 @huozhi I can see this has been merged: swc-project/swc#8546

I'm happy to test once this lands on canary, just let me know! Thanks

@ValentinH
Copy link
Contributor Author

ValentinH commented Feb 7, 2024

@kdy1 @huozhi any chance that the above mentioned fix could be added to @next/swc?
We have reached a state where our production application deployments keep failing on Vercel due to OOM. The only way we have to workaround this for now is to "Redeploy". For some reason, redeploying without cache avoids the OOM.
But i'm scared that this will stop working at some point.

@ericmatthys
Copy link
Contributor

We have reached a state where our production application deployments keep failing on Vercel due to OOM. The only way we have to workaround this for now is to "Redeploy". For some reason, redeploying without cache avoids the OOM.

I think creating an environment variable named VERCEL_FORCE_NO_BUILD_CACHE with a value of 1 would be a better stopgap solution for you. The cache needs to be held in memory for it to be used and can affect memory usage.

@oliversoar
Copy link

We have reached a state where our production application deployments keep failing on Vercel due to OOM. The only way we have to workaround this for now is to "Redeploy". For some reason, redeploying without cache avoids the OOM.

I think creating an environment variable named VERCEL_FORCE_NO_BUILD_CACHE with a value of 1 would be a better stopgap solution for you. The cache needs to be held in memory for it to be used and can affect memory usage.

This is what works for us as a temporary solution. Painful as it doubles the build time.

Keen to see this fixed so we can roll out the edge runtime to the rest of our app.

@ValentinH
Copy link
Contributor Author

We just reached the point where even without cache the app won't build anymore.
We are therefore switching again to swcMinify: false which is slower but doesn't have these leaks.

@gwkline
Copy link

gwkline commented Feb 9, 2024

We just reached the point where even without cache the app won't build anymore. We are therefore switching again to swcMinify: false which is slower but doesn't have these leaks.

I believe this issue will be resolved once swc_core is updated (see: #61662). Waiting on this as well though.

@ValentinH
Copy link
Contributor Author

Looking forward to it!

In the meantime, in case it helps someone else: we managed to reduce our Edge functions bundles quite a lot by replacing an import of @sentry/nextjs by @sentry/core. We were ending up having react-dom and @sentry/replay in each Edge function bundle (approx. 1MB "Stat size") 🙈

@gwkline
Copy link

gwkline commented Feb 14, 2024

Seems to be fixed as of 14.1.1-canary.52 for our project (without needing to use VERCEL_FORCE_NO_BUILD_CACHE or swc_minify: false)

@steve-marmalade
Copy link

steve-marmalade commented Feb 19, 2024

After upgrading from next@14.1.1-canary.60 to next@14.1.1-canary.61 I am getting the memory error again:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

This is reproducible for me locally, and I can get the build to pass if I set export NODE_OPTIONS=--max_old_space_size=8192. But there is not workaround for deploying on Vercel.

Notably, the step: Creating an optimized production build ... has gone from taking 30 seconds to several minutes.

@ijjk , perhaps this is a side-effect of your split chunk handling update? FWIW I do see a dramatic reduction in edge-server-production file size when running next@14.1.1-canary.61 locally with the increased memory limits. #62205

@ijjk
Copy link
Member

ijjk commented Feb 19, 2024

Hi, @steve-marmalade could you provide a reproduction? The chunk splitting PR you referenced was fixing massive cache/memory usage for other cases we've seen for edge-runtime

@ValentinH
Copy link
Contributor Author

I just tried both next@14.1.1-canary.60 and next@14.1.1-canary.61 on the original reproduction I shared and it is still crashing: https://github.com/ValentinH/next-edge-build-issue.
However, I don't know how much this is realistic: the number of functions is realistic IMO (we already have more than 100 edge functions in our app) but the size of the codegen is probably too much (even though this is what we used to have; now we switched to graphql-codegen which generate much smaller files).

ijjk added a commit that referenced this issue Feb 21, 2024
We have a reproduction of OOMs still occurring with this chunking so
going to revert while we investigate further

x-ref:
#51298 (comment)

Reverts #62205

Closes NEXT-2548
ijjk added a commit that referenced this issue Feb 21, 2024
…62336)

This re-lands the chunking optimization with fix for the split chunks
config to ensure we aren't generating duplicate chunks from not chunking
`all` together.

Tested various configs against our repro case here:

https://vercel.com/vercel/vercel-site/2D5Xirs9Vr1M29WHAuNawgjvgE4G
https://vercel.com/vercel/vercel-site/B2aez1NNCyVvoUBTSMFy8npBKK3j

Closes NEXT-2552
closes: #51298
x-ref: #62313
@ValentinH
Copy link
Contributor Author

THANK YOU SO MUCH GUYS @ijjk @huozhi!!! ❤️❤️❤️

I just tested 14.1.1-canary.69 on https://github.com/ValentinH/next-edge-build-issue and it now builds within 10 seconds with no visible impact on memory.

I still have to test it on our production repo but this looks super good 🎉

@ValentinH
Copy link
Contributor Author

I confirmed that it fixes our issue on our prod repo: from more than 8GB for next build to around 1GB! 🎉🎉🎉
Congrats!

@ValentinH
Copy link
Contributor Author

However, the memory is still getting really high when browsing the app in dev mode:
image

But this is a subject for another issue 🙈

@xanderim
Copy link

xanderim commented Feb 22, 2024

I can confirm 14.1.1-canary.69 – no more OOM errors when deploying to vercel/cloudflare. Great job!

@dislick
Copy link

dislick commented Feb 22, 2024

14.1.1-canary.69 has also fixed OOM exceptions in our build! Thanks everybody!

@froblesmartin
Copy link

Nice! For us too! It reduces locally the time by half and there is no need to specify --max-old-space-size.

I wanted to ask, when is it planned to release the version 14.1.1? 😄

@ijjk
Copy link
Member

ijjk commented Mar 1, 2024

Hi, the above mentioned patch is now available in v14.1.1, please update and give it a try!

Copy link
Contributor

This closed issue has been automatically locked because it had no new activity for 2 weeks. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue was opened via the bug report template. linear: next Confirmed issue that is tracked by the Next.js team. locked Runtime Related to Node.js or Edge Runtime with Next.js.
Projects
None yet
Development

Successfully merging a pull request may close this issue.