Saturday, March 15, 2014

Signal/GC-safe cross-thread queueing in Python

I've just released a new version of Crochet, and one of the bugs fixed involves an interesting problem - reentrancy. In this particular case I'm talking about garbage collection and signal reentrancy - any function your Python program is running may be interrupted at any time (on bytecode boundaries) to do garbage collection or handle a signal. A signal handler can run arbitrary Python code, as can GC via to __del__ or weakref callbacks. Once that code finishes running control is returned to the original location in the code.

Unfortunately, due to a bug in Python, Queue.put() can deadlock in the following situation:
  1. As part of calling Queue.put(), a thread acquires the Queue's lock. This lock does not support being acquired more than once by the same thread.
  2. GC or a signal handler interrupts the function call.
  3. If the GC or signal handler code then also does Queue.put(), it will try to acquire the lock again... and since it's already locked it blocks waiting for the lock to be released.
  4. Since the signal handler/GC code is now blocked, control is never returned to original code, so lock is never released there.
  5. The thread is now deadlocked and will never recover.
Unfortunately there was no way to prevent the Queue.put() in GC; the Queue accepts log messages, and this is a GC-caused logging message coming out of code that is not under Crochet control's.

The obvious short-term solution is to reimplement a simplified Queue using Python's RLock, which allows the same thread to acquire the lock multiples times. But... RLock isn't reentrancy safe either due to another bug in Python! I could wrap OS-specific reentrant lock implementations, but that's a bigger project than I want to start.

The solution I ended up with (suggested by Jean-Paul Calderone I believe) was giving up on using threading primitives to communicate across threads. Instead I used the self-pipe trick. That is, the thread uses select() (or poll() or epoll()) to wait on one end of the pipe; to wake the thread up and tell it to check for new messages to process we simply write a byte to the other end of the pipe. Since Crochet uses Twisted, I had a pre-written event loop that already implemented self-pipe waking, and now the logging thread runs another Twisted reactor in addition to the regular reactor thread.

As far as I can tell this works, but it feels a bit like overkill. I'd welcome suggestions for other solutions.

Sunday, March 9, 2014

Twisted on Python 3 now pip installable

The subset of Twisted that has been ported to Python 3 can now be pip installed. By either pointing at a version control URL or requiring Twisted 14.0 (once it's released), you can now have Twisted as a dependency for your Python 3 packages.

Here's a slightly edited version of my Travis-CI config for Crochet, demonstrating how I run unit tests on both Python 2 and Python 3 versions of Twisted (slightly tricky because the trial test runner hasn't been ported yet):

language: python

  - TWISTED=Twisted==13.0 RUNTESTS=trial
  - TWISTED=Twisted==13.1 RUNTESTS=trial
  - TWISTED=Twisted==13.2 RUNTESTS=trial

  - 2.6
  - 2.7
  - pypy

    - python: 3.3
      env: TWISTED=git+ RUNTESTS="python -m unittest discover"

  - pip install -q --no-use-wheel $TWISTED --use-mirrors
  - python -q install

script: $RUNTESTS crochet.tests