possible memory leak in server integrations, origins unclear #590

Zk2u · 2023-02-27T20:16:57Z

Context: #581

Removed all API use in valeralabs/web and hardcoded a user profile into users.rs. Running a stress test for 120 seconds:

Concurrency Level:      10
Time taken for tests:   120.001 seconds
Complete requests:      904656
Failed requests:        0
Total transferred:      4358651880 bytes
HTML transferred:       4260043940 bytes
Requests per second:    7538.75 [#/sec] (mean)
Time per request:       1.326 [ms] (mean)
Time per request:       0.133 [ms] (mean, across all concurrent requests)
Transfer rate:          35470.55 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       2
Processing:     0    1   0.5      1      23
Waiting:        0    1   0.5      1      23
Total:          0    1   0.5      1      23

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      2
  80%      2
  90%      2
  95%      2
  98%      2
  99%      3
 100%     23 (longest request)

The memory usage of the web server increased from ~5MB to 357MB, and did not decrease after the benchmark. This may be due to #529 somewhere. I'm not sure where this issue is coming from yet.

The text was updated successfully, but these errors were encountered:

Zk2u · 2023-02-27T20:27:59Z

Tested on the hackernews example with Actix using regular serde. 3190 requests increased memory usage from 5MB to 17.5MB (again, not decreasing after).

gbj · 2023-02-27T20:43:40Z

Do you have a reason to believe this is coming from within Leptos in particular? In other words, if you run the same stress test in Actix with the same number of requests but serve some static or generated HTML instead of running it through Leptos, what are the results?

gbj · 2023-02-27T20:46:30Z

This conversation may also provide a helpful frame of reference.

See also this GitHub issue and discussion.

Zk2u · 2023-02-27T20:57:53Z

The first test calls another Actix server, our API, on every request (1:1 with leptos). The API instance remained at 4.8MB throughout the test, but the Leptos server grew to 977MB (check the latest reply on #581).

gbj · 2023-02-28T00:50:19Z

I'm happy to take a look at this locally to try to figure it out, if you can give me some steps you did to do the stress-testing. This is an area I'm pretty unfamiliar with.

Zk2u · 2023-02-28T08:11:18Z

I'm using ApacheBench - if you're on a MacBook (I think...), it should be preinstalled. The command I'm using is:

ab -n 15000 -c 10 http://127.0.0.1:3000/

Zipped test app. Run with cargo leptos watch -r

gbj · 2023-02-28T12:30:46Z

Thanks. I ran this against the todo apps to avoid spamming the 3rd-party hackernews API I use, and was able to reproduce on a smaller scale. Interestingly Axum seems to do a better job of shrinking its memory footprint back down between bursts of requests but I agree there's probably a Leptos leak somewhere.

I checked and can confirm that the Runtime structs are being dropped for each request, and the RUNTIMES slotmap capacity is staying stable, both of which which are very good (although it would have been an easy fix).

I think the same behavior appears with very simple Leptos examples based on some limiting testing. I'm not very familiar with tools to trace this kind of memory leak so I'd appreciate any help.

gbj · 2023-02-28T13:06:19Z

Okay I tested this against a route that just uses render_to_string and found completely flat memory use, which is great. So it's something inside the render_to_stream implementation, which makes perfect sense to me.

Zk2u · 2023-02-28T13:50:27Z

A similar thing may have occurred in #103 (adding for potential extra context).

gbj · 2023-02-28T15:22:12Z

A similar thing may have occurred in #103 (adding for potential extra context).

Thanks — That particular issue was before I implemented proper management for runtimes, which I've confirmed are now being disposed properly, so it's a red herring I think. But having narrowed it down to the streaming implementation is very helpful, and I can dig in a little more later.

gbj · 2023-03-01T13:48:25Z

Edited the title, if you don't mind. I was able to reproduce with this very simple main.rs, which makes it much easier for me to test.

use leptos::{ssr::render_to_stream, view, IntoView};
use futures::StreamExt;

#[tokio::main]
async fn main() {
    loop {
        let mut stream = Box::pin(render_to_stream(|cx| view! { cx,
            <div>
                <p>"Paragraph 1."</p>
                <p>"Paragraph 2."</p>
                <p>"Paragraph 3."</p>
            </div>
        }.into_view(cx)));
        while let Some(chunk) = stream.next().await {
            println!("{chunk}");
        }
    }
}

gbj · 2023-03-01T14:00:49Z

Oh no, I've been a very naughty boy.

It turns out if you write a function called run_scope_undisposed and you write in the docs, "If you do not dispose of the scope on your own, memory will leak" then... if you do not dispose of the scope on your own, memory will leak. To be honest I thought those would be cleaned up when disposing of the runtime, but apparently I thought wrong.

So, I've managed to fix the memory leak I created in my example above. I'll need to do a little more testing to make sure I've got it figured out in the Actix integration case as well. I'm out of time this morning but should be able to finish this tonight, I'd think.

Zk2u · 2023-03-01T14:51:24Z

That made me laugh. Nice job. Let me know when you have, and I’ll rerun the initial stress test to confirm.

gbj · 2023-03-01T20:00:33Z

I was actually able to reproduce the memory leak that I think is causing most of this problem.

It has to do with the interaction between spawning a large number of Tokio tasks and memory allocators not returning memory to the system.

I could toggle the memory leak on and off by rendering an app with a single create_resource, and then toggling a tokio::task::spawn_local on and off within leptos::spawn_local, which is used to load async resources.

It's entirely possible I'm using Tokio wrong here and there's an easy solution. But see also discussion in several Tokio issues:

tokio-rs/tokio#4406

tokio-rs/tokio#4321 (comment)

Zk2u · 2023-03-01T21:08:30Z

So this isn't actually a “leak”, but rather the allocator holding onto the memory after it's finished handling any requests as it's faster to reuse?

What happens when we run it in a VM with a limited amount of memory?

gbj · 2023-03-01T23:34:19Z

Nope I was wrong. It was clearly a real leak, the longer I ran it. I have managed to create a streaming, non-leaking version, so I should be able to get those changes integrated into the Actix and Axum pieces I think.

gbj · 2023-03-02T01:43:28Z

...Okay, sorry for all the back and forth here. I've managed to create several different memory leaks during my experiments here, usually due to my doing something in a relatively stupid way. (For example, my "it was clearly a real leak" above was due to a test using a loop over synchronous render_to_string, so spawning a basically infinite number of async resources outside a context in which they were being handled properly.)

Importantly: when I isolate the largest possible chunks of Leptos code from the Axum or Actix integrations—literally everything that generates the streaming streaming—and run it in a very tight loop, logging the streams of HTML out to the console, with no Actix or Axum server involved, I find that I am unable to produce any kind of memory leak. They stay at a stable and fairly low memory use (about 5MB in my Actix).

I'll merge my fix to render_to_stream, which was a real memory leak but was actually not being used directly in the integrations, I don't think. I guess I'd recommend trying your test again once I've merged that just in case. Important: I'd recommend letting it run for a long time and seeing if it ever plateaus, or if it's continuous, linear growth.

And maybe see if using jemalloc does anything to the memory use, which was recommended in several places as returning memory to the OS more aggressively.

To use jemalloc, I think you can just add it to Cargo.toml

[target.'cfg(not(target_env = "msvc"))'.dependencies]
jemallocator = "0.5"

and then in main.rs

#[cfg(not(target_env = "msvc"))]
use jemallocator::Jemalloc;

#[cfg(not(target_env = "msvc"))]
#[global_allocator]
static GLOBAL: Jemalloc = Jemalloc;

Beyond that, I'm not really sure what to say. I've spent a lot of time testing all the different combinations and I'm unable to produce a memory leak from within Leptos's code... If you're able to produce a reproduction of a leak that's using Leptos in a plain binary, independent of Actix or Axum, I'd be really happy to take a look, but I don't think there's anything else I can do at this point.

EDIT: Oops managed to click the wrong button and close and reopen this. Thanks, GitHub.

Zk2u · 2023-03-02T13:07:17Z

I'm going to have a go at containerising the server, limiting its memory and then stressing it and seeing if it runs out of memory or not. I will get back if this is an actual issue with leptos.

I am curious that it keeps increasing, however, since I’d expect it to plateau at some point and start reusing allocated memory. The RPS doesn't change throughout the bench in a major way.

Zk2u · 2023-03-02T21:24:41Z

I managed to containerise the server and... boom. That confirms it. There isn't a memory leak, but the allocator is not returning the memory to the system.

Again - continuous benchmark as before for 10 minutes. The web server's container was limited to using 32 MB of memory, and the server had no issues staying within that. So, I'm just marking this as done.

Edit: I'm interested in what those drops are and what controls them...

gbj · 2023-03-02T22:16:16Z

Beautiful! And thanks for raising it... I feel much more confident that we're not doing anything wrong, having worked through it all, than I did before.

gbj added the question Further information is requested label Feb 28, 2023

Zk2u changed the title ~~Leptos is leaking memory on Actix ⚠️~~ Leptos is potentially leaking memory on Actix ⚠️ Feb 28, 2023

Zk2u changed the title ~~Leptos is potentially leaking memory on Actix ⚠️~~ leptos_actix::render_app_to_stream_with_context is leaking memory on Actix Feb 28, 2023

gbj changed the title ~~leptos_actix::render_app_to_stream_with_context is leaking memory on Actix~~ leptos_dom::ssr::render_to_stream is leaking memory Mar 1, 2023

gbj added a commit that referenced this issue Mar 1, 2023

fix: memory leak in streaming SSR (closes issue #590)

c26b44b

gbj changed the title ~~leptos_dom::ssr::render_to_stream is leaking memory~~ create_resource is leaking memory Mar 1, 2023

gbj changed the title ~~create_resource is leaking memory~~ render_to_stream and create_resource leak memory Mar 1, 2023

gbj closed this as completed Mar 2, 2023

gbj reopened this Mar 2, 2023

gbj added a commit that referenced this issue Mar 2, 2023

fix: memory leak in streaming SSR (closes issue #590) (#601)

f460847

gbj changed the title ~~render_to_stream and create_resource leak memory~~ possible memory leak in server integrations, origins unclear Mar 2, 2023

Zk2u closed this as completed Mar 2, 2023

gbj added a commit that referenced this issue Mar 21, 2023

fix: memory leak in streaming SSR (closes issue #590) (#601)

92920c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible memory leak in server integrations, origins unclear #590

possible memory leak in server integrations, origins unclear #590

Zk2u commented Feb 27, 2023

Zk2u commented Feb 27, 2023

gbj commented Feb 27, 2023

gbj commented Feb 27, 2023 •

edited

Loading

Zk2u commented Feb 27, 2023 •

edited

Loading

gbj commented Feb 28, 2023

Zk2u commented Feb 28, 2023 •

edited

Loading

gbj commented Feb 28, 2023

gbj commented Feb 28, 2023

Zk2u commented Feb 28, 2023

gbj commented Feb 28, 2023

gbj commented Mar 1, 2023

gbj commented Mar 1, 2023

Zk2u commented Mar 1, 2023 •

edited

Loading

gbj commented Mar 1, 2023 •

edited

Loading

Zk2u commented Mar 1, 2023

gbj commented Mar 1, 2023 •

edited

Loading

gbj commented Mar 2, 2023 •

edited

Loading

Zk2u commented Mar 2, 2023

Zk2u commented Mar 2, 2023 •

edited

Loading

gbj commented Mar 2, 2023

possible memory leak in server integrations, origins unclear #590

possible memory leak in server integrations, origins unclear #590

Comments

Zk2u commented Feb 27, 2023

Zk2u commented Feb 27, 2023

gbj commented Feb 27, 2023

gbj commented Feb 27, 2023 • edited Loading

Zk2u commented Feb 27, 2023 • edited Loading

gbj commented Feb 28, 2023

Zk2u commented Feb 28, 2023 • edited Loading

gbj commented Feb 28, 2023

gbj commented Feb 28, 2023

Zk2u commented Feb 28, 2023

gbj commented Feb 28, 2023

gbj commented Mar 1, 2023

gbj commented Mar 1, 2023

Zk2u commented Mar 1, 2023 • edited Loading

gbj commented Mar 1, 2023 • edited Loading

Zk2u commented Mar 1, 2023

gbj commented Mar 1, 2023 • edited Loading

gbj commented Mar 2, 2023 • edited Loading

Zk2u commented Mar 2, 2023

Zk2u commented Mar 2, 2023 • edited Loading

gbj commented Mar 2, 2023

gbj commented Feb 27, 2023 •

edited

Loading

Zk2u commented Feb 27, 2023 •

edited

Loading

Zk2u commented Feb 28, 2023 •

edited

Loading

Zk2u commented Mar 1, 2023 •

edited

Loading

gbj commented Mar 1, 2023 •

edited

Loading

gbj commented Mar 1, 2023 •

edited

Loading

gbj commented Mar 2, 2023 •

edited

Loading

Zk2u commented Mar 2, 2023 •

edited

Loading