Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IXWebSocketTransport::setReadyState(): Run under lock #540

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

awelzel
Copy link

@awelzel awelzel commented Jan 27, 2025

When setReadyState(CLOSED) is executed from two different threads (the server's thread, detecting a close trying to receive and a separate sending thread), there is a race where both see _readyState as non-CLOSED and both execute the _onCloseCallback(). Worse, the server's thread might be returning from ->run(), unsetting the callback, while the other thread is still about to call the _onCloseCallback(), resulting in a crash rather than "just" a duplicate invocation.

This change ensures that setReadyState() and the_onCloseCallback() are executed by a single thread till completion.

I was tempted to remove the std::atomic<> for _readyState, but left it assuming it's still good having for getReadyState().

I've first tried _readyState.exchange() which seemed promising, but still crashed once in a while as racing threads do not wait for the _onCloseCallback() to complete.

Closes #539

@bsergean
Copy link
Collaborator

Can you make the getReadyState use the mutex as well, and stop using an atomic ?

Having a mutex AND an atomic seems weird.

@awelzel awelzel force-pushed the topic/awelzel/fix-set-ready-state-race branch from 80e6c4f to 81682fe Compare February 15, 2025 18:41
@awelzel
Copy link
Author

awelzel commented Feb 16, 2025

Can you make the getReadyState use the mutex as well, and stop using an atomic ?

Thanks for the feedback. Done and re-pushed.

@bsergean
Copy link
Collaborator

The change looks simple, but all the thread sanitizer checks are not red ....

@bsergean
Copy link
Collaborator

Do you have access to a mac machine ?

@bsergean
Copy link
Collaborator

Test project /Users/runner/work/IXWebSocket/IXWebSocket/build
      Start  1: IXSocketTest
 1/16 Test  #1: IXSocketTest .................................   Passed    1.15 sec
      Start  2: IXSocketConnectTest
 2/16 Test  #2: IXSocketConnectTest ..........................   Passed    3.31 sec
      Start  3: IXWebSocketServerTest
 3/16 Test  #3: IXWebSocketServerTest ........................Subprocess aborted***Exception:   6.36 sec
      Start  4: IXWebSocketTestConnectionDisconnection
 4/16 Test  #4: IXWebSocketTestConnectionDisconnection .......   Passed   18.83 sec
      Start  5: IXUrlParserTest
 5/16 Test  #5: IXUrlParserTest ..............................   Passed    0.29 sec
      Start  6: IXUnityBuildsTest
 6/16 Test  #6: IXUnityBuildsTest ............................   Passed    0.24 sec
      Start  7: IXHttpTest
 7/16 Test  #7: IXHttpTest ...................................   Passed    0.25 sec
      Start  8: IXDNSLookupTest
 8/16 Test  #8: IXDNSLookupTest ..............................   Passed    0.27 sec
      Start  9: IXWebSocketSubProtocolTest
 9/16 Test  #9: IXWebSocketSubProtocolTest ...................Subprocess aborted***Exception:   0.76 sec
      Start 10: IXStrCaseCompareTest
10/16 Test #10: IXStrCaseCompareTest .........................   Passed    0.27 sec
      Start 11: IXExponentialBackoffTest
11/16 Test #11: IXExponentialBackoffTest .....................   Passed    0.26 sec
      Start 12: IXWebSocketCloseTest
12/16 Test #12: IXWebSocketCloseTest .........................Subprocess aborted***Exception:   8.43 sec
      Start 13: IXWebSocketHostTest
13/16 Test #13: IXWebSocketHostTest ..........................Subprocess aborted***Exception:   1.47 sec
      Start 14: IXHttpServerTest
14/16 Test #14: IXHttpServerTest .............................   Passed    0.51 sec
      Start 15: IXWebSocketChatTest
15/16 Test #15: IXWebSocketChatTest ..........................Subprocess aborted***Exception:   3.80 sec
      Start 16: IXWebSocketPerMessageDeflateCompressorTest
16/16 Test #16: IXWebSocketPerMessageDeflateCompressorTest ...   Passed    0.30 sec
Errors while running CTest
Output from these tests are in: /Users/runner/work/IXWebSocket/IXWebSocket/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

69% tests passed, 5 tests failed out of 16

Total Test time (real) =  46.51 sec

The following tests FAILED:
	  3 - IXWebSocketServerTest (Subprocess aborted)
	  9 - IXWebSocketSubProtocolTest (Subprocess aborted)
	 12 - IXWebSocketCloseTest (Subprocess aborted)
	 13 - IXWebSocketHostTest (Subprocess aborted)
	 15 - IXWebSocketChatTest (Subprocess aborted)

@awelzel
Copy link
Author

awelzel commented Feb 21, 2025

Do you have access to a mac machine ?

It's failing on Linux, too. Sorry, it's due to unprotected _readyState accesses in IXWebSocketTransport.cpp. I've replaced them all with call's to the getReadyState() method and switched the mutex to a recursive mutex. I've pushed that as a follow-up, happy to squash it in if you're happy with it. See below for my thoughts though.

Having a mutex AND an atomic seems weird.

Looking at the new diff, going back might be an option IMO and not all that weird: The mutex prevents concurrent execution of setCloseState() - the main culprit of #539 - while reading / writing of _readyState is separate and using an std::atomic not all that unreasonable to me.

@bsergean
Copy link
Collaborator

bsergean commented Mar 6, 2025

Alright then ... let's go back to your original idea, but do you mind adding a small comment explaining why we're doing things that way ?

When setReadyState(CLOSED) is executed from two different threads (the
server's thread, detecting a close trying to receive and a separate
sending thread), there is a race where both see _readyState as non-CLOSED
and both execute the _onCloseCallback().  Worse, the server's thread might
be returning from ->run(), unsetting the callback while the other thread
is still about to call the _onCloseCallback(), resulting in a crash
rather than "just" a duplicate invocation of the callback.

This change ensures that setReadyState() *and* the_onCloseCallback()
are executed by a single thread till completion, by introducing a new
_setReadyStateMutex instance held during setReadyState() execution.
@awelzel awelzel force-pushed the topic/awelzel/fix-set-ready-state-race branch from ba0b2dc to e86e61b Compare March 9, 2025 15:47
@awelzel
Copy link
Author

awelzel commented Mar 9, 2025

Alright then ... let's go back to your original idea, but do you mind adding a small comment explaining why we're doing things that way ?

Done. Feel free to word-smith during/after the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race when closing in setReadyState()
2 participants