Skip to content

Commit cc2be63

Browse files
authored
Print stack trace from all threads in crash report (redis#12453)
In this PR we are adding the functionality to collect all the process's threads' backtraces. ## Changes made in this PR ### **introduce threads mngr API** The **threads mngr API** which has 2 abilities: * `ThreadsManager_init() `- register to SIGUSR2. called on the server start-up. * ` ThreadsManager_runOnThreads()` - receives a list of a pid_t and a callback, tells every thread in the list to invoke the callback, and returns the output collected by each invocation. **Elaborating atomicvar API** * `atomicIncrGet(var,newvalue_var,count) `-- Increment and get the atomic counter new value * `atomicFlagGetSet` -- Get and set the atomic counter value to 1 ### **Always set SIGALRM handler** SIGALRM handler prints the process's stacktrace to the log file. Up until now, it was set only if the `server.watchdog_period` > 0. This can be also useful if debugging is needed. However, in situations where the server can't get requests, (a deadlock, for example) we weren't able to change the signal handler. To make it available at run time we set SIGALRM handler on server startup. The signal handler name was changed to a more general `sigalrmSignalHandler`. ### **Print all the process' threads' stacktraces** `logStackTrace()` now calls `writeStacktraces()`, instead of logging the current thread stacktrace. `writeStacktraces()`: * On Linux systems we use the threads manager API to collect the backtraces of all the process' threads. To get the `tids` list (threads ids) we read the `/proc/<redis-server-pid>/tasks` file which includes a list of directories. Each directory name corresponds to one tid (including the main thread). For each thread, we also need to check if it can get the signal from the threads manager (meaning it is not blocking/ignoring that signal). We send the threads manager this tids list and `collect_stacktrace_data()` callback, which collects the thread's backtrace addresses, its name, and tid. * On other systems, the behavior remained as it was (writing only the current thread stacktrace to the log file). ## compatibility notes 1. **The threads mngr API is only supported in linux.** 2. glibc earlier than 2.3 We use `syscall(SYS_gettid)` and `syscall(SYS_tgkill...)` because their dedicated alternatives (`gettid()` and `tgkill`) were added in glibc 2.3. ## Output example Each thread backtrace will have the following format: `<tid> <thread_name> [additional_info]` * **tid**: as read from the `/proc/<redis-server-pid>/tasks` file * **thread_name**: the tread name as it is registered in the os/ * **additional_info**: Sometimes we want to add specific information about one of the threads. currently. it is only used to mark the thread that handles the backtraces collection by adding "*". In case of crash - this also indicates which thread caused the crash. The handling thread in won't necessarily appear first. ``` ------ STACK TRACE ------ EIP: /lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc] 67089 redis-server * linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb9437790] /lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc] redis-server *:6379(+0x75e0c)[0xaaaac2fe5e0c] redis-server *:6379(aeProcessEvents+0x18c)[0xaaaac2fe6c00] redis-server *:6379(aeMain+0x24)[0xaaaac2fe7038] redis-server *:6379(main+0xe0c)[0xaaaac3001afc] /lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0xffffb91d73fc] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffffb91d74cc] redis-server *:6379(_start+0x30)[0xaaaac2fe0370] 67093 bio_lazy_free /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67091 bio_close_file /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67092 bio_aof /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67089:signal-handler (1693824528) -------- ```
1 parent 2aad03f commit cc2be63

File tree

8 files changed

+561
-30
lines changed

8 files changed

+561
-30
lines changed

src/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -345,7 +345,7 @@ endif
345345

346346
REDIS_SERVER_NAME=redis-server$(PROG_SUFFIX)
347347
REDIS_SENTINEL_NAME=redis-sentinel$(PROG_SUFFIX)
348-
REDIS_SERVER_OBJ=adlist.o quicklist.o ae.o anet.o dict.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o logreqres.o
348+
REDIS_SERVER_OBJ=threads_mngr.o adlist.o quicklist.o ae.o anet.o dict.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o logreqres.o
349349
REDIS_CLI_NAME=redis-cli$(PROG_SUFFIX)
350350
REDIS_CLI_OBJ=anet.o adlist.o dict.o redis-cli.o zmalloc.o release.o ae.o redisassert.o crcspeed.o crc64.o siphash.o crc16.o monotonic.o cli_common.o mt19937-64.o strl.o cli_commands.o
351351
REDIS_BENCHMARK_NAME=redis-benchmark$(PROG_SUFFIX)

src/atomicvar.h

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,41 @@
11
/* This file implements atomic counters using c11 _Atomic, __atomic or __sync
22
* macros if available, otherwise we will throw an error when compile.
33
*
4-
* The exported interface is composed of three macros:
4+
* The exported interface is composed of the following macros:
55
*
66
* atomicIncr(var,count) -- Increment the atomic counter
77
* atomicGetIncr(var,oldvalue_var,count) -- Get and increment the atomic counter
8+
* atomicIncrGet(var,newvalue_var,count) -- Increment and get the atomic counter new value
89
* atomicDecr(var,count) -- Decrement the atomic counter
910
* atomicGet(var,dstvar) -- Fetch the atomic counter value
1011
* atomicSet(var,value) -- Set the atomic counter value
1112
* atomicGetWithSync(var,value) -- 'atomicGet' with inter-thread synchronization
1213
* atomicSetWithSync(var,value) -- 'atomicSet' with inter-thread synchronization
13-
*
14+
*
15+
* Atomic operations on flags.
16+
* Flag type can be int, long, long long or their unsigned counterparts.
17+
* The value of the flag can be 1 or 0.
18+
*
19+
* atomicFlagGetSet(var,oldvalue_var) -- Get and set the atomic counter value
20+
*
21+
* NOTE1: __atomic* and _Atomic implementations can be actually elaborated to support any value by changing the
22+
* hardcoded new value passed to __atomic_exchange* from 1 to @param count
23+
* i.e oldvalue_var = atomic_exchange_explicit(&var, count).
24+
* However, in order to be compatible with the __sync functions family, we can use only 0 and 1.
25+
* The only exchange alternative suggested by __sync is __sync_lock_test_and_set,
26+
* But as described by the gnu manual for __sync_lock_test_and_set():
27+
* https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html
28+
* "A target may support reduced functionality here by which the only valid value to store is the immediate constant 1. The exact value
29+
* actually stored in *ptr is implementation defined."
30+
* Hence, we can't rely on it for a any value other than 1.
31+
* We eventually chose to implement this method with __sync_val_compare_and_swap since it satisfies functionality needed for atomicFlagGetSet
32+
* (if the flag was 0 -> set to 1, if it's already 1 -> do nothing, but the final result is that the flag is set),
33+
* and also it has a full barrier (__sync_lock_test_and_set has acquire barrier).
34+
*
35+
* NOTE2: Unlike other atomic type, which aren't guaranteed to be lock free, c11 atmoic_flag does.
36+
* To check whether a type is lock free, atomic_is_lock_free() can be used.
37+
* It can be considered to limit the flag type to atomic_flag to improve performance.
38+
*
1439
* Never use return value from the macros, instead use the AtomicGetIncr()
1540
* if you need to get the current value and increment it atomically, like
1641
* in the following example:
@@ -93,6 +118,8 @@
93118
#define atomicGetIncr(var,oldvalue_var,count) do { \
94119
oldvalue_var = atomic_fetch_add_explicit(&var,(count),memory_order_relaxed); \
95120
} while(0)
121+
#define atomicIncrGet(var, newvalue_var, count) \
122+
newvalue_var = atomicIncr(var,count) + count
96123
#define atomicDecr(var,count) atomic_fetch_sub_explicit(&var,(count),memory_order_relaxed)
97124
#define atomicGet(var,dstvar) do { \
98125
dstvar = atomic_load_explicit(&var,memory_order_relaxed); \
@@ -103,6 +130,8 @@
103130
} while(0)
104131
#define atomicSetWithSync(var,value) \
105132
atomic_store_explicit(&var,value,memory_order_seq_cst)
133+
#define atomicFlagGetSet(var,oldvalue_var) \
134+
oldvalue_var = atomic_exchange_explicit(&var,1,memory_order_relaxed)
106135
#define REDIS_ATOMIC_API "c11-builtin"
107136

108137
#elif !defined(__ATOMIC_VAR_FORCE_SYNC_MACROS) && \
@@ -111,6 +140,8 @@
111140
/* Implementation using __atomic macros. */
112141

113142
#define atomicIncr(var,count) __atomic_add_fetch(&var,(count),__ATOMIC_RELAXED)
143+
#define atomicIncrGet(var, newvalue_var, count) \
144+
newvalue_var = __atomic_add_fetch(&var,(count),__ATOMIC_RELAXED)
114145
#define atomicGetIncr(var,oldvalue_var,count) do { \
115146
oldvalue_var = __atomic_fetch_add(&var,(count),__ATOMIC_RELAXED); \
116147
} while(0)
@@ -124,12 +155,16 @@
124155
} while(0)
125156
#define atomicSetWithSync(var,value) \
126157
__atomic_store_n(&var,value,__ATOMIC_SEQ_CST)
158+
#define atomicFlagGetSet(var,oldvalue_var) \
159+
oldvalue_var = __atomic_exchange_n(&var,1,__ATOMIC_RELAXED)
127160
#define REDIS_ATOMIC_API "atomic-builtin"
128161

129162
#elif defined(HAVE_ATOMIC)
130163
/* Implementation using __sync macros. */
131164

132165
#define atomicIncr(var,count) __sync_add_and_fetch(&var,(count))
166+
#define atomicIncrGet(var, newvalue_var, count) \
167+
newvalue_var = __sync_add_and_fetch(&var,(count))
133168
#define atomicGetIncr(var,oldvalue_var,count) do { \
134169
oldvalue_var = __sync_fetch_and_add(&var,(count)); \
135170
} while(0)
@@ -149,6 +184,8 @@
149184
ANNOTATE_HAPPENS_BEFORE(&var); \
150185
while(!__sync_bool_compare_and_swap(&var,var,value,__sync_synchronize)); \
151186
} while(0)
187+
#define atomicFlagGetSet(var,oldvalue_var) \
188+
oldvalue_var = __sync_val_compare_and_swap(&var,0,1)
152189
#define REDIS_ATOMIC_API "sync-builtin"
153190

154191
#else

0 commit comments

Comments
 (0)