Fix issue where deadlock can occur over logging._lock #4636
+31
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
We (google) saw a deadlock occur when logging.config.dictConfig is called after the OTEL
LoggingHandler
is attached to the root logger.This happened b/c
dictConfig
acquireslogging._lock
and thenclearsHandlers
which then callsshutdown
on the OTELLoggingHandler
, which callsflush
.flush
triggered anexport
call to ourexporter
. Deep inside ourexporter
we spin up a new thread to handle auth, and that thread also tried to acquirelogging._lock
resulting in a deadlock..To fix this I updated
LoggingHandler.flush
to executeforce_flush
in a separate thread, and not to block/wait before returning.. This should be reasonable because we don't return the result of the force flush anyway, so why block there.This seems to reliably fix the deadlock, but I think it's technically possible for this new thread to spin up and reach the lock before logging.config.dictConfig releases it's lock..
Another simple fix is to set
flushOnClose
to true, so that the OTELLogHandler.flush
is not called duringshutdown
. This seems fine to me as well because we callshutdown
on exit anyway. Either of these solutions seem fine.Also considered making our exporter
async
, but we don't have support forasync
exporter's in this repo yet.Type of change
How Has This Been Tested?
I've added a unit test to show how the deadlock happens.. I don't think that test should actually be submitted because of the chance a deadlock can occur and lock up the test suite..
Does This PR Require a Contrib Repo Change?
Checklist: