Skip to content

GzipStream decompression memory (leak?) lingers in certain environments #114502

Open
@Matheos96

Description

@Matheos96

Description

At first, I was unsure whether to blame this on Blazor WebAssembly runtime or GzipStream but I have decided that it has mostly to do with GzipStream so that's why I am posting this here.

EDIT: Actually, upon further investigation, this may be a blazor wasm issue. Below I describe how the same issue is replicated even in a .NET Console application, however, I realised a few days ago that in that test I actually had forgotten to dispose the stream, which is why the memory could not be GC:ed (I THINK... I am a bit confused as below I clearly say that I am sure I dispose all things, and it did not reproduce in Release... - so IFFY). Further investigation showed that I could replicate this very well in a Blazor Web Assembly Stand Alone application, stripped almost to the bones: a single page, no services etc. Just a single function doing the decompression of a file. The page contains a single file picker and when a file is selected it decompresses using the code below, and reports memory. I then have a separate button to force GC which reports GC before forcing and after.
This bug occurs only in blazor wasm standalone debug builds. In a release build this is not an issue (at times it could happen, but very rarely). In Debug builds it happens each time.
To make things even more confusing, the bug seems to be related to the PROJECT NAME(!!). I am currently investigating exactly how but it hard to pinpoint it. I thought it had to do with the length of the project name. 16 characters (o2345677891234567) did not reproduce the issue, currently removing and seeing where it stops. My Starting point was ExampleClient - does not seem to be related the the characters themselves as I could reproduce it using numbers in the name...

EDIT2: After even further testing I am even more confused. It seems that the issue is related somehow to both project name length and what character is contains.. Could it have to do with bytes used per character??? Here is a short summary of test and whether the names reproduce the issue or not:

o234567 - NO (LENGTH = 7)
o2345678 - YES (LENGTH = 8)
...
o23456789123456 - YES (LENGTH = 15)
o234567891234567 - NO (LENGTH = 16)

abcdefg - NO
abcdefgh - NO
abcdefghi - NO (LENGTH = 9)
...

A minimal reproducing example can be found in my comment below.

In our Blazor wasm (standalone) application we are using GzipStream to decompress part of a Stream (binary file where header is not compressed, but the rest is). For the sake of describing this bug and during debugging, I used the following snippet for the decompression:

var memoryStream = new MemoryStream();
using (var gzipStream = new GZipStream(inputStream, CompressionMode.Decompress, leaveOpen:true))
{
    await gzipStream.CopyToAsync(memoryStream);
}
memoryStream.Position = 0;
dataStream = memoryStream;

... Actual deserialization code ...

In short, I decompress the data all in one go into another memorystream. After this, our deserialization code would follow but I have it commented out currently as I have verified the memory leak is not caused by that. At a later point, I dispose of the stream decompressed stream of course (instantly afterwards in my tests).

The issue is that even after multiple rounds of forcing GC and making sure everything is collectible, memory is still reserved(? if that is the correct term). And weirdly enough, it does not happen with all inputs, but the source is definitely the gzipstream. I tried the code in both Blazor Wasm in my browser and in a test console application. The same issue can be seen in both. That being said, in the console application the memory actually DOES drop back down, BUT only if I run it in a release build. Debug build hogs the memory indefinitely, or at least long enough for my code with multiple GC runs not to collect it...
In Blazor, it is pretty much the same. With debug build (dev server via Rider/VS) the memory drops to a certain point after which it never returns to the baseline. With a published release build and served using simple http-server node CLI, it can at times, drop back to the baseline but at times it stays reserved.
I hope this is not too confusing of an explanation, but just in case I will drop some outputs below to describe the what is happening. I print the memory using GC.GetTotalMemory(forceFullCollection: false). Also please note that the difference between the baseline and the "hogged" memory is identical between the two: 576 - 64 == 519 - 7 == 512MB - which now when I write it out, oddly enough sounds like exactly half a GB... "Deserialization" mentions in output is actually in practise only gzip decompression (full decompression) - the deserialization stuff is commented out.

Console App Release build (memory correctly released)

Running force GC a few times with small delay. Memory before: 640 MB
.NET memory GC pass 0: 64 MB
.NET memory GC pass 1: 64 MB
.NET memory GC pass 2: 64 MB
.NET memory GC pass 3: 64 MB
.NET memory GC pass 4: 64 MB
Baseline: 64 MB
.NET memory before deserialization: 64 MB
.NET memory after deserialization: 832 MB
Forcing GC after deserialization...
.NET memory GC pass 0: 576 MB
.NET memory GC pass 1: 64 MB
.NET memory GC pass 2: 64 MB
.NET memory GC pass 3: 64 MB
.NET memory GC pass 4: 64 MB
.NET memory END: 64 MB

Console App Debug build (memory never released)

Running force GC a few times with small delay. Memory before: 640 MB
.NET memory GC pass 0: 64 MB
.NET memory GC pass 1: 64 MB
.NET memory GC pass 2: 64 MB
.NET memory GC pass 3: 64 MB
.NET memory GC pass 4: 64 MB
Baseline: 64 MB
.NET memory before deserialization: 64 MB
.NET memory after deserialization: 832 MB
Forcing GC after deserialization...
.NET memory GC pass 0: 576 MB
.NET memory GC pass 1: 576 MB
.NET memory GC pass 2: 576 MB
.NET memory GC pass 3: 576 MB
.NET memory GC pass 4: 576 MB
.NET memory END: 576 MB

Blazor WASM Debug (with dev server from Rider - excuse slightly different output, same theory though - memory never released)

Baseline: 7MB
After reading in gzip stream + decompress fully: 978 MB
GC pass 0 - .NET memory before: 978 MB
GC pass 0 -  .NET memory after: 519 MB
GC pass 1 -  .NET memory before: 519 MB
GC pass 1 -  .NET memory after: 519 MB
GC pass 2 -  .NET memory before: 519 MB
GC pass 2 -  .NET memory after: 519 MB
...

Blazor WASM published Release build (memory sometimes released)
Same as above, but randomly it actually drops back to 7MB, even after only 2 GC passes

I am yet to test it with a Native AOT published WASM build.

The main points here are these:

  • Both in Console App and Blazor WASM debug builds with the exact same input and code (basically only full gzip decompression), 512MB of extra memory lingers
  • At times (more often than not) it also happens in Blazor Wasm Release published build
  • Release Console App build does not have the issue
  • In all cases, all streams are definitely disposed of before trying to force GC and measure memory

Lastly, I have tried this with streamed decompression as opposed to decompressing the whole thing at once. With streamed decompression (stream used in our deserializer), there is no issue, but there is a big decrease in speed of course (not nice - but besides the point here). I have also tried decompression using chunks into a target stream instead of the whole thing "in one go". That does actually help a bit, but not fully. With this particular job, a chunk size of 68150 bytes was the best I could do (only tested in wasm). Then the lingering memory was at the lowest (something 200MB still though).

Can anyone help me understand this behaviour? Is there a memory leak in gzipstream or why is it so stubbornly holding on to such a large (at least when we consider we run this in a browser), chunk of memory? I understand that GC and runtime stuff in Blazor wasm is a bit iffy but with other inputs and other ways (as explained above), I have gotten the usage to quickly and reliably drop back to the baseline. There is something going on with the gzipstream CopyTo and/or reading the whole thing at once into another stream that causes something weird to happen here...

Configuration

SDK: 8.0.407
Runtime: 8.0.14
wasm-tools Installation Source: SDK 8.0.400, VS 17.13.35919.96

Windows 11 .NET 8 with Console Application

Blazor Web Assembly .NET 8 in Chrome for the web based tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    arch-wasmWebAssembly architecturearea-GC-monoos-browserBrowser variant of arch-wasmtenet-performancePerformance related issueuntriagedNew issue has not been triaged by the area owner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions