Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow in release builds with large futures #6057

Open
mathieulj opened this issue Oct 6, 2023 · 3 comments
Open

Stack overflow in release builds with large futures #6057

mathieulj opened this issue Oct 6, 2023 · 3 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime

Comments

@mathieulj
Copy link

Version

❯ cargo tree | rg tokio
repro-tokio v0.1.0 (/Users/mathieuletendre-jauniaux/work/repro-tokio)
└── tokio v1.32.0
    └── tokio-macros v2.1.0 (proc-macro)

Platform

I've observed this on both macos and windows but the problem likely also affects linux.

❯ uname -a
Darwin Mathieus-MacBook-Pro-2.local 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000 arm64

Description

In debug builds, spawn will put futures on the heap but the release behaviour is to just pass it on via the stack. This leads to code that passes tests and runs fine in debug but crashes in production if the futures are sufficiently large.

Reproduction steps:

cargo init --bin repro-tokio
cd repro-tokio
cargo add tokio --features rt-multi-thread,macros,time
# edit main.rs

# doesn't crash
cargo run 

#crashes
cargo run --release
use std::time::Duration;
use tokio::{task, time};

#[tokio::main]
async fn main() {
    println!("This works both in debug and in release");
    let _ = task::spawn(Box::pin(future())).await;

    println!("This compiles but will crash at runtime **only in release builds**");
    let _ = task::spawn(future()).await;
}

async fn future() {
    let data = [0; 200_000];
    time::sleep(Duration::from_millis(100)).await;
    println!("{}", data[22]);
}

I expected to see this happen: Ideally tokio wouldn't overflow the stack ever but it should at minimum have the same behaviour in debug builds as it does in release builds.

Instead, this happened: tokio overflowed the stack only in release builds, test and debug builds work as expected.

@mathieulj mathieulj added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Oct 6, 2023
@Darksonn Darksonn added the M-runtime Module: tokio/runtime label Oct 7, 2023
@Darksonn
Copy link
Contributor

Darksonn commented Oct 7, 2023

Having the same code for debug and release does not guarantee identical behavior. The code in question was added because of frequent stack overflows that only happened in debug mode.

Ideally, I would like to avoid the box on release mode.

That said, there are ways to avoid the stack overflow without introducing extra indirection. Those are probably worth exploring.

@Nerdy5k

This comment was marked as spam.

@mhils
Copy link
Contributor

mhils commented Sep 5, 2024

I've opened #6826 in an attempt to fix this footgun.

This problem can be relatively tedious to track down for users. To illustrate this with our example:

  1. We find our Windows CI failing with "Windows fatal exception: stack overflow" on a dependency update. No more details, Linux and macOS work fine.
  2. First attempts to reproduce on Windows fail because I used debug builds. Eventually realize I need release builds.
  3. Bisect our code to a Dependabot commit (mitmproxy/mitmproxy_rs@ea7123d).
  4. Bump dependencies individually. Figure out it's the tokio upgrade.
  5. Bisect tokio to ab53bf0 - apparently a slight increase in future size has pushed us over the edge - "niche-optimization" 😅.
  6. Searching the tokio issue tracker I find Stack overflow in release builds with large futures #6057. Verify that always pinning fixes it.
  7. Patch tokio to panic for std::mem::size_of::<T>() > BOX_FUTURE_THRESHOLD to find all offending futures in our code base. Find & pin them individually. (Not sure if there's a better way, but this worked?)

This was a fun riddle, so no complaints at all. :) But I hope this shows that cause and effect can be relatively far apart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime
Projects
None yet
Development

No branches or pull requests

4 participants