Enable coredump for containerized applications #419

hoditohod · 2021-11-22T08:26:29Z

Overview

This PR contains the proposed fix for issue #418.

It unblocks the fault signal handler after the log/stacktrace has been flushed. The mechanism is reused from the windows crash handler where this unblocking was already implemented. The modification does not affect non-containerized native processes (non-PID1) as the final step after the flush is re-emitting the fatal signal (kill) and code execution for the process stops here. For containerized processes the kill is passed and ignored and unblocking the sleep in the fault signal handler takes place. This enables the containerized process to exit and create a core-dump.

There are no testing or documentation activities done.

TDD

New/modified code must be backed down with unit test - preferably TDD style development)

Documentation

All new/modified functionality should be backed up with API documentation (API.markdown or README.markdown)

Cross-Platform Testing

Travis-CI (Linux, OSX) + AppVeyor-CI (Windows)\
Optional: Local/VM testing: Windows
Optional: Local/VM testing: OSX
Optional: Local/VM testing: Linux

Testing Advice

mkdir build; cd build; cmake -DADD_G3LOG_UNIT_TEST=ON ..

Run Test Alternatives:

Cross-Platform: ctest
or ctest -V for verbose output
Linux: make test

KjellKod · 2021-11-22T14:25:32Z

src/crashhandler_unix.cpp

+         // When running as PID1 the above kill doesn't have any effect (execution simply passes through it, contrary
+         // to a non-PID1 process where execution stops at kill and switches over to signal handling). Also as PID1 
+         // we must unblock the thread that received the original signal otherwise the process will never terminate.
+         gBlockForFatal = false;


Good comments that explain this small but non-trivial system
insight

…er with PID 1 aborted Our service (running in Docker and PID 1) was crashed with SIGABRT signal. After SIGABRT dropped then unfortunatlly infinite SIGSEGV signals were also started to drop. So the infinite loop stucked since the kill signal doesn't stop the infinite loop when running in Docker container with PID 1. We used the similar solution mentioned this PR: KjellKod#419. We also had to restore the saved signal handlers. Without it infinte SIGSEGV signals were dropped circully and this situation also caused pending when running in Docker container with PID 1.

…ith PID 1 aborted Our service (running in Docker and PID 1) was crashed with SIGABRT signal. After SIGABRT dropped then unfortunatlly infinite SIGSEGV signals were also started to drop. So the infinite loop stucked since the kill signal doesn't stop the infinite loop when running in Docker container with PID 1. We used the similar solution mentioned this PR: KjellKod#419. We also had to restore the saved signal handlers. Without it infinte SIGSEGV signals were dropped circully and this situation also caused pending when running in Docker container with PID 1.

Unblock faulting thread signal handler after log flush is complete

e86709d

KjellKod approved these changes Nov 22, 2021

View reviewed changes

KjellKod mentioned this pull request Nov 22, 2021

No coredump generated for PID1 process on linux #418

Closed

KjellKod merged commit c51128f into KjellKod:master Nov 22, 2021

GergoTot mentioned this pull request Mar 3, 2023

Crash handler hangs #480

Closed

GergoTot mentioned this pull request Mar 3, 2023

Avoid pending of containerized applications in case of aborting #481

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable coredump for containerized applications #419

Enable coredump for containerized applications #419

hoditohod commented Nov 22, 2021

KjellKod Nov 22, 2021

Enable coredump for containerized applications #419

Enable coredump for containerized applications #419

Conversation

hoditohod commented Nov 22, 2021

Overview

KjellKod Nov 22, 2021

Choose a reason for hiding this comment