Skip to content

Conversation

@awelzel
Copy link
Contributor

@awelzel awelzel commented Jan 27, 2025

When setReadyState(CLOSED) is executed from two different threads (the server's thread, detecting a close trying to receive and a separate sending thread), there is a race where both see _readyState as non-CLOSED and both execute the _onCloseCallback(). Worse, the server's thread might be returning from ->run(), unsetting the callback, while the other thread is still about to call the _onCloseCallback(), resulting in a crash rather than "just" a duplicate invocation.

This change ensures that setReadyState() and the_onCloseCallback() are executed by a single thread till completion.

I was tempted to remove the std::atomic<> for _readyState, but left it assuming it's still good having for getReadyState().

I've first tried _readyState.exchange() which seemed promising, but still crashed once in a while as racing threads do not wait for the _onCloseCallback() to complete.

Closes #539

@bsergean
Copy link
Contributor

Can you make the getReadyState use the mutex as well, and stop using an atomic ?

Having a mutex AND an atomic seems weird.

@awelzel
Copy link
Contributor Author

awelzel commented Feb 16, 2025

Can you make the getReadyState use the mutex as well, and stop using an atomic ?

Thanks for the feedback. Done and re-pushed.

@bsergean
Copy link
Contributor

The change looks simple, but all the thread sanitizer checks are not red ....

@bsergean
Copy link
Contributor

Do you have access to a mac machine ?

@bsergean
Copy link
Contributor

Test project /Users/runner/work/IXWebSocket/IXWebSocket/build
      Start  1: IXSocketTest
 1/16 Test  #1: IXSocketTest .................................   Passed    1.15 sec
      Start  2: IXSocketConnectTest
 2/16 Test  #2: IXSocketConnectTest ..........................   Passed    3.31 sec
      Start  3: IXWebSocketServerTest
 3/16 Test  #3: IXWebSocketServerTest ........................Subprocess aborted***Exception:   6.36 sec
      Start  4: IXWebSocketTestConnectionDisconnection
 4/16 Test  #4: IXWebSocketTestConnectionDisconnection .......   Passed   18.83 sec
      Start  5: IXUrlParserTest
 5/16 Test  #5: IXUrlParserTest ..............................   Passed    0.29 sec
      Start  6: IXUnityBuildsTest
 6/16 Test  #6: IXUnityBuildsTest ............................   Passed    0.24 sec
      Start  7: IXHttpTest
 7/16 Test  #7: IXHttpTest ...................................   Passed    0.25 sec
      Start  8: IXDNSLookupTest
 8/16 Test  #8: IXDNSLookupTest ..............................   Passed    0.27 sec
      Start  9: IXWebSocketSubProtocolTest
 9/16 Test  #9: IXWebSocketSubProtocolTest ...................Subprocess aborted***Exception:   0.76 sec
      Start 10: IXStrCaseCompareTest
10/16 Test #10: IXStrCaseCompareTest .........................   Passed    0.27 sec
      Start 11: IXExponentialBackoffTest
11/16 Test #11: IXExponentialBackoffTest .....................   Passed    0.26 sec
      Start 12: IXWebSocketCloseTest
12/16 Test #12: IXWebSocketCloseTest .........................Subprocess aborted***Exception:   8.43 sec
      Start 13: IXWebSocketHostTest
13/16 Test #13: IXWebSocketHostTest ..........................Subprocess aborted***Exception:   1.47 sec
      Start 14: IXHttpServerTest
14/16 Test #14: IXHttpServerTest .............................   Passed    0.51 sec
      Start 15: IXWebSocketChatTest
15/16 Test #15: IXWebSocketChatTest ..........................Subprocess aborted***Exception:   3.80 sec
      Start 16: IXWebSocketPerMessageDeflateCompressorTest
16/16 Test #16: IXWebSocketPerMessageDeflateCompressorTest ...   Passed    0.30 sec
Errors while running CTest
Output from these tests are in: /Users/runner/work/IXWebSocket/IXWebSocket/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

69% tests passed, 5 tests failed out of 16

Total Test time (real) =  46.51 sec

The following tests FAILED:
	  3 - IXWebSocketServerTest (Subprocess aborted)
	  9 - IXWebSocketSubProtocolTest (Subprocess aborted)
	 12 - IXWebSocketCloseTest (Subprocess aborted)
	 13 - IXWebSocketHostTest (Subprocess aborted)
	 15 - IXWebSocketChatTest (Subprocess aborted)

@awelzel
Copy link
Contributor Author

awelzel commented Feb 21, 2025

Do you have access to a mac machine ?

It's failing on Linux, too. Sorry, it's due to unprotected _readyState accesses in IXWebSocketTransport.cpp. I've replaced them all with call's to the getReadyState() method and switched the mutex to a recursive mutex. I've pushed that as a follow-up, happy to squash it in if you're happy with it. See below for my thoughts though.

Having a mutex AND an atomic seems weird.

Looking at the new diff, going back might be an option IMO and not all that weird: The mutex prevents concurrent execution of setCloseState() - the main culprit of #539 - while reading / writing of _readyState is separate and using an std::atomic not all that unreasonable to me.

@bsergean
Copy link
Contributor

bsergean commented Mar 6, 2025

Alright then ... let's go back to your original idea, but do you mind adding a small comment explaining why we're doing things that way ?

When setReadyState(CLOSED) is executed from two different threads (the
server's thread, detecting a close trying to receive and a separate
sending thread), there is a race where both see _readyState as non-CLOSED
and both execute the _onCloseCallback().  Worse, the server's thread might
be returning from ->run(), unsetting the callback while the other thread
is still about to call the _onCloseCallback(), resulting in a crash
rather than "just" a duplicate invocation of the callback.

This change ensures that setReadyState() *and* the_onCloseCallback()
are executed by a single thread till completion, by introducing a new
_setReadyStateMutex instance held during setReadyState() execution.
@awelzel
Copy link
Contributor Author

awelzel commented Mar 9, 2025

Alright then ... let's go back to your original idea, but do you mind adding a small comment explaining why we're doing things that way ?

Done. Feel free to word-smith during/after the merge.

@awelzel
Copy link
Contributor Author

awelzel commented Apr 9, 2025

Hello @bsergean - friendly ping whether you'd consider merging this, or would rather have something changed with this PR? We're currently using a fork of IXWebSocket but would prefer to switch to your repo again :-)

@bsergean bsergean merged commit 1babe1d into machinezone:master Apr 9, 2025
7 checks passed
@bsergean
Copy link
Contributor

bsergean commented Apr 9, 2025

Thanks I did run the PR checks one last time. They don't run automatically for some reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race when closing in setReadyState()

2 participants