You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The top thread is the handler thread dequeuing messages
And the bottom is the main thread.
The main thread is waiting for the queue to be emptied,
And the handler thread is waiting to acquire the global lock in the logging, which is already held by DictConfigurator.configure in the main thread.
The handler thread has to acquire the global lock in Logger.isEnabledFor because importing ipdb as a side effect calls Logger.setLevel(…) which invalidates the enabled level cache of allLogger instances, and then calls logging.config.dictConfig which tries to shutdown all existing handlers.
This can also be reproduced if breakpoint() is replaced by:
This is results in a pretty straightforward deadlock, the handler thread is not able to actually send the logs to cloudwatch so it's never able to empty the queue.
Solutions?
The current behavior of flush is that it waits for the queue to be empty using queue.Queue.join
This doesn't take a timeout, however it is implemented as:
and queue.Queue.all_tasks_done is a threading.Condition and threading.Condition.waitdoes take a timeout. So it is possible to implement a join that does timeout as well.
flush could also be modified to send (self.FLUSH, flush_condition) for each queue, and then wait on then call flush_condition.wait(timeout) to wait for the specific flush message to have been processed.
This changes the behavior of flush under load though, from waiting until the queue is empty to waiting until a specific flush is processed, if you have a system where flush is called from multiple threads each would only wait until it's flush message was processed and then proceed.
But this might be the more straightforward behavior as it is could be counter intuitive given a queue state of: msg1, msg2, msg3, flush, msg4, msg5 for flush to block until msg5 is delivered.
Once flush/close are able to timeout at all, they could also be taught to check if self.sequence_tokens[log_stream_name] has changed in the timeout case, allowing them to wait as long as batches are still able to be submitted.
The text was updated successfully, but these errors were encountered:
Versions:
Reproduction
The following script will hang when run with
PYTHONBREAKPOINT=ipdb.set_trace
The running process has the following stacktraces:
What is going on?
The top thread is the handler thread dequeuing messages
And the bottom is the main thread.
The main thread is waiting for the queue to be emptied,
And the handler thread is waiting to acquire the global lock in the
logging
, which is already held byDictConfigurator.configure
in the main thread.The handler thread has to acquire the global lock in
Logger.isEnabledFor
because importingipdb
as a side effect callsLogger.setLevel(…)
which invalidates the enabled level cache of allLogger
instances, and then callslogging.config.dictConfig
which tries to shutdown all existing handlers.This can also be reproduced if
breakpoint()
is replaced by:This is results in a pretty straightforward deadlock, the handler thread is not able to actually send the logs to cloudwatch so it's never able to empty the queue.
Solutions?
The current behavior of
flush
is that it waits for the queue to be empty usingqueue.Queue.join
This doesn't take a timeout, however it is implemented as:
and
queue.Queue.all_tasks_done
is athreading.Condition
andthreading.Condition.wait
does take a timeout. So it is possible to implement ajoin
that does timeout as well.flush
could also be modified to send(self.FLUSH, flush_condition)
for each queue, and then wait on then callflush_condition.wait(timeout)
to wait for the specific flush message to have been processed.This changes the behavior of
flush
under load though, from waiting until the queue is empty to waiting until a specific flush is processed, if you have a system where flush is called from multiple threads each would only wait until it's flush message was processed and then proceed.But this might be the more straightforward behavior as it is could be counter intuitive given a queue state of:
msg1, msg2, msg3, flush, msg4, msg5
forflush
to block untilmsg5
is delivered.Once
flush
/close
are able to timeout at all, they could also be taught to check ifself.sequence_tokens[log_stream_name]
has changed in the timeout case, allowing them to wait as long as batches are still able to be submitted.The text was updated successfully, but these errors were encountered: