-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Is your feature request related to a problem? Please describe.
The systemd package shipped in Amazon Linux 2023 provides a systemd-coredump binary that is linked only against liblzma for compression. While libzstd and liblz4 are available on the system, they are not compiled into the coredump handler.
For applications with large memory footprints, using xz (liblzma) results in extremely slow and CPU-intensive coredump processing. This leads to two critical failures on our production systems:
Loss of Critical Diagnostic Data: The systemd-coredump@.service frequently exceeds its default 5-minute timeout. The service is then terminated by systemd, and the coredump file is lost. This deprives our teams of essential data needed for post-mortem analysis of application crashes.
Extended Application Downtime: The crashing application cannot fully terminate until the coredump process either completes or times out. This blocks the service (in our case, a pod in EKS) from restarting, most often resulting in the pod being down for the full 5-minute timeout period.
Describe the solution you'd like
Add support for the liblz4 and/or libzstd libraries to the default coredump handler, or document the issue and propose solutions.
Describe alternatives you've considered
As a workaround, we intend to deploy our own handler script and change the kernel setting for core_pattern to use it instead. The script outputs the raw coredump on disk, and then compresses it with zstd. Dumps that typically timeout (5 mins +) with the standard flow, take a second or less to complete this way.
While this workaround is effective, it requires us to maintain custom infrastructure. We would much prefer if the default OS tools worked for modern, large-scale applications, and we believe many other AL2023 users would benefit from this change.
Additional context
Amazon Linux 2023 is marketed as a modern, high-performance OS. The use of a slow, outdated compression method for a critical diagnostic tool seems inconsistent with this philosophy.