Skip to content

Large unmanaged memory growth (leak?) when upgrading from .NET 6 to 8 #95922

Closed

Description

Description

We have a few different services hosted on kubernetes running on .NET. When we try to upgrade from .NET 6 to .NET 8, we see a steep but constant increase in memory usage, almost all in unmanaged memory. It seems to level off at around four times the memory usage in .NET 6, ignoring imposed memory limits, then continues to creep up more slowly depending on workload. So far we haven't seen an upper bound on the amount of unmanaged memory being leaked(?) here. Reproducing the problem in a minimal way has not been possible so far but we do have lots of data gathered about it. 🙂

Configuration

.NET 8, from the docker image mcr.microsoft.com/dotnet/aspnet:8.0, running on x86-64 machines on AWS EC2.

Regression?

Yes, see data below. This issue does not occur on .NET 6, only on 8. We think it might be part of the GC changes from .NET 6 to 7. Give us a shout and we can try to narrow this down by running it on .NET 7.

Data

Initially we switched from .NET 6 to .NET 8 and we monitored memory usage using prometheus metrics. This is what the memory usage graphs look like. Both pods actually reached the 512m limit we'd imposed, and was restarted. After that we reverted to .NET 6, and things went back to normal. On .NET 6, memory usage remained consistently around ~160MB, but as soon as we deployed the upgrade to .NET 8 the memory increased without limit and were restarted once at 15:30 after hitting 512MB, once we returned to .NET 6 things went back to normal.
image

We then tried increasing the available memory from 512MB to 1GB and re-deployed .NET 8. It increased rapidly as before, then levelled off at about 650MB and stayed that way until midnight. Service load increases drastically around that time and the memory grew again to about 950MB, where it stayed relatively level again until the service was unwittingly redeployed by a coworker. At that point we reverted back to .NET 6, where it went back to the lower memory level. I think it would have passed the 1GB memory limit after another midnight workload, but we haven't tested that again (yet).
image

After trying and failing to reproduce the issue using local containers, we re-deployed .NET 8 and attached the JetBrains dotMemory profiler to work out what was happening. This is the profile we collected, showing the unmanaged memory increases. Interestingly, the amount of managed memory actually goes down over time with GCs becoming more frequent, presumably .NET knows the available memory is running low as the total approaches 1GB. There also seem to be some circumstances where .NET will not allocate from unmanaged memory, since the spikes near the left hand side mirror each other for managed and unmanaged. We had to stop the profile before reaching the memory limit, since kubernetes would have restarted the pod and the profile would have been lost.
image
And the prometheus memory usage graph, for completeness (one pod is higher than the other because it was running the dotMemory profiler this time, and drops because of detaching the profiler):
image

Analysis

The only issue we could find that looked similar was this one, which also affects aspnet services running in kubernetes moving to .NET 7: #92490. As it's memory related we suspect this might be to do with the GC changes going from .NET 6 to 7. We haven't been able to get a clean repro (or any repro outside our hosted environments) yet, but please let us know if there's anything we can do to help narrow this down. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions