Skip to content

Comments

Fix deadlock in IO thread shutdown during panic#2898

Merged
ranshid merged 1 commit intovalkey-io:unstablefrom
ouriamzn:fix-io-thread-deadlock
Dec 4, 2025
Merged

Fix deadlock in IO thread shutdown during panic#2898
ranshid merged 1 commit intovalkey-io:unstablefrom
ouriamzn:fix-io-thread-deadlock

Conversation

@ouriamzn
Copy link
Contributor

@ouriamzn ouriamzn commented Dec 3, 2025

Problem

IO thread shutdown can deadlock during server panic when the main thread calls pthread_cancel() while the IO thread holds its mutex, preventing the thread from observing the cancellation.

Solution

Release the IO thread mutex before cancelling to ensure clean thread termination.

Testing

Reproducer:

bash
./src/valkey-server --io-threads 2 --enable-debug-command yes
./src/valkey-cli debug panic

Before: Server hangs indefinitely
After: Server terminates cleanly

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.46%. Comparing base (70196ee) to head (8ba08f8).
⚠️ Report is 2 commits behind head on unstable.

Additional details and impacted files
@@            Coverage Diff            @@
##           unstable    #2898   +/-   ##
=========================================
  Coverage     72.45%   72.46%           
=========================================
  Files           129      129           
  Lines         70526    70528    +2     
=========================================
+ Hits          51102    51109    +7     
+ Misses        19424    19419    -5     
Files with missing lines Coverage Δ
src/io_threads.c 35.57% <100.00%> (+0.41%) ⬆️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ranshid ranshid requested review from ranshid and uriyage December 3, 2025 16:23
The IO thread shutdown could deadlock when the main thread calls
pthread_cancel() while the IO thread holds its mutex. The IO thread
would not observe the cancellation request, causing pthread_join()
to block indefinitely.

Release the IO thread mutex before cancelling to ensure the thread
can process the cancellation request and exit cleanly.

Reproducer:
./src/valkey-server --io-threads 2 --enable-debug-command yes
./src/valkey-cli debug panic

Signed-off-by: Ouri Half <ourih@amazon.com>
@ouriamzn ouriamzn force-pushed the fix-io-thread-deadlock branch from af34ff0 to 8ba08f8 Compare December 4, 2025 08:54
@ranshid ranshid moved this to To be backported in Valkey 8.1 Dec 4, 2025
@ranshid ranshid moved this to To be backported in Valkey 9.0 Dec 4, 2025
@ranshid ranshid removed this from Valkey 8.1 Dec 4, 2025
@ouriamzn
Copy link
Contributor Author

ouriamzn commented Dec 4, 2025

This deadlock was introduced by commit 3631208 which removed makeThreadKillable() from IO threads to fix jemalloc crashes. However, removing makeThreadKillable() had an unintended side effect: IO threads no longer respond to pthread_cancel() calls while holding mutexes.

When makeThreadKillable() was present, threads could be cancelled at any cancellation point, even while holding locks. After its removal, pthread_cancel() only works when threads explicitly call pthread_testcancel(), which doesn't happen while the thread holds its mutex in the shutdown path.

This creates a deadlock scenario during panic shutdown:

  1. Main thread calls shutdownIOThread()
  2. IO thread holds io_threads_mutex[id]
  3. Main thread calls pthread_cancel() but thread doesn't observe it
  4. pthread_join() blocks indefinitely waiting for a thread that will never exit

The fix ensures the mutex is released before cancellation, allowing the thread to process the cancellation request in its main loop where pthread_testcancel() is called.

@ranshid ranshid merged commit 3d65a4a into valkey-io:unstable Dec 4, 2025
55 checks passed
@zuiderkwast zuiderkwast moved this from To be backported to 9.0.1 (WIP) in Valkey 9.0 Dec 4, 2025
zuiderkwast pushed a commit to zuiderkwast/placeholderkv that referenced this pull request Dec 4, 2025
## Problem
IO thread shutdown can deadlock during server panic when the main thread
calls `pthread_cancel()` while the IO thread holds its mutex, preventing
the thread from observing the cancellation.

## Solution  
Release the IO thread mutex before cancelling to ensure clean thread
termination.

## Testing
Reproducer:
```
bash
./src/valkey-server --io-threads 2 --enable-debug-command yes
./src/valkey-cli debug panic
```

Before: Server hangs indefinitely
After: Server terminates cleanly

Signed-off-by: Ouri Half <ourih@amazon.com>
@ouriamzn ouriamzn deleted the fix-io-thread-deadlock branch December 7, 2025 10:00
zuiderkwast pushed a commit that referenced this pull request Dec 9, 2025
## Problem
IO thread shutdown can deadlock during server panic when the main thread
calls `pthread_cancel()` while the IO thread holds its mutex, preventing
the thread from observing the cancellation.

## Solution  
Release the IO thread mutex before cancelling to ensure clean thread
termination.

## Testing
Reproducer:
```
bash
./src/valkey-server --io-threads 2 --enable-debug-command yes
./src/valkey-cli debug panic
```

Before: Server hangs indefinitely
After: Server terminates cleanly

Signed-off-by: Ouri Half <ourih@amazon.com>
aradz44 pushed a commit to aradz44/valkey that referenced this pull request Dec 23, 2025
## Problem
IO thread shutdown can deadlock during server panic when the main thread
calls `pthread_cancel()` while the IO thread holds its mutex, preventing
the thread from observing the cancellation.

## Solution  
Release the IO thread mutex before cancelling to ensure clean thread
termination.

## Testing
Reproducer:
```
bash
./src/valkey-server --io-threads 2 --enable-debug-command yes
./src/valkey-cli debug panic
```

Before: Server hangs indefinitely
After: Server terminates cleanly

Signed-off-by: Ouri Half <ourih@amazon.com>
jdheyburn pushed a commit to jdheyburn/valkey that referenced this pull request Jan 8, 2026
## Problem
IO thread shutdown can deadlock during server panic when the main thread
calls `pthread_cancel()` while the IO thread holds its mutex, preventing
the thread from observing the cancellation.

## Solution  
Release the IO thread mutex before cancelling to ensure clean thread
termination.

## Testing
Reproducer:
```
bash
./src/valkey-server --io-threads 2 --enable-debug-command yes
./src/valkey-cli debug panic
```

Before: Server hangs indefinitely
After: Server terminates cleanly

Signed-off-by: Ouri Half <ourih@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 9.0.1

Development

Successfully merging this pull request may close these issues.

3 participants