You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
Envoy occasionally crashes when validating bootstrap config
Repro steps:
It happens on 30 hosts out of Stripe's around 50k hosts. And I can login to those hosts and manually call and roughly get segfault 1 out of 2/3 times
We actually found a potential root cause to be a tcmalloc bug that's being used in Envoy 1.24.4 that is unable to handle non-sequential online CPUs. Those segfaults happen on ec2 instances with nitro-enclaves enabled so there are some hot-plugged off CPUs, i.e.
And the theory is tcmalloc uses the cpu's id to index into the per-cpu arrays that hold the per cpu data structures. If tcmalloc allocates 14 entries because ncpu is 14, but the 14th cpu id is 15 then its array access is out of bounds.
Description:
Envoy occasionally crashes when validating bootstrap config
Repro steps:
It happens on 30 hosts out of Stripe's around 50k hosts. And I can login to those hosts and manually call and roughly get segfault 1 out of 2/3 times
Or sometimes the validate is hanging indefinitely.
Call stack:
uname -a:
Linux 5.15.0-1036-aws #40~20.04.1-Ubuntu SMP Mon Apr 24 00:21:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Envoy version
2f44165e55dd47475c44d2d03018eac3cb8a6264/1.24.4-stripe1/Clean/RELEASE/BoringSSL
2f44165e55dd47475c44d2d03018eac3cb8a6264 is internal commit of Stripe's Envoy repo, it uses OSS envoy 1.24.4
The text was updated successfully, but these errors were encountered: