Monday, April 14, 2014

Crochet 1.2.0, now with a better API!

Crochet is a library for using Twisted more easily from blocking programs and libraries. The latest version, released here at PyCon 2014, includes a much improved API for calling into Twisted from threads. In particular, a timeout is passed in - if it is hit the underlying operation is cancelled, and an exception is raised. Not all APIs in Twisted support cancellation, but for those that do (or APIs you implement) this is a really nice feature. You get high level timeouts (instead of blocking sockets' timeout-per-socket-operation) and automatic cleanup of resources if something takes too long.

Do a DNS lookup using Twisted's APIs.
from __future__ import print_function

# The Twisted code we'll be using:
from twisted.names import client

from crochet import setup, wait_for

# Crochet layer, wrapping Twisted's DNS library in a blocking call.
def gethostbyname(name):
    """Lookup the IP of a given hostname.

    Unlike socket.gethostbyname() which can take an arbitrary amount
    of time to finish, this function will raise crochet.TimeoutError
    if more than 5 seconds elapse without an answer being received.
    d = client.lookupAddress(name)
    d.addCallback(lambda result: result[0][0].payload.dottedQuad())
    return d

if __name__ == '__main__':
    # Application code using the public API - notice it works in a normal
    # blocking manner, with no event loop visible:
    import sys
    name = sys.argv[1]
    ip = gethostbyname(name)
    print(name, "->", ip)

Saturday, March 15, 2014

Signal/GC-safe cross-thread queueing in Python

I've just released a new version of Crochet, and one of the bugs fixed involves an interesting problem - reentrancy. In this particular case I'm talking about garbage collection and signal reentrancy - any function your Python program is running may be interrupted at any time (on bytecode boundaries) to do garbage collection or handle a signal. A signal handler can run arbitrary Python code, as can GC via to __del__ or weakref callbacks. Once that code finishes running control is returned to the original location in the code.

Unfortunately, due to a bug in Python, Queue.put() can deadlock in the following situation:
  1. As part of calling Queue.put(), a thread acquires the Queue's lock. This lock does not support being acquired more than once by the same thread.
  2. GC or a signal handler interrupts the function call.
  3. If the GC or signal handler code then also does Queue.put(), it will try to acquire the lock again... and since it's already locked it blocks waiting for the lock to be released.
  4. Since the signal handler/GC code is now blocked, control is never returned to original code, so lock is never released there.
  5. The thread is now deadlocked and will never recover.
Unfortunately there was no way to prevent the Queue.put() in GC; the Queue accepts log messages, and this is a GC-caused logging message coming out of code that is not under Crochet control's.

The obvious short-term solution is to reimplement a simplified Queue using Python's RLock, which allows the same thread to acquire the lock multiples times. But... RLock isn't reentrancy safe either due to another bug in Python! I could wrap OS-specific reentrant lock implementations, but that's a bigger project than I want to start.

The solution I ended up with (suggested by Jean-Paul Calderone I believe) was giving up on using threading primitives to communicate across threads. Instead I used the self-pipe trick. That is, the thread uses select() (or poll() or epoll()) to wait on one end of the pipe; to wake the thread up and tell it to check for new messages to process we simply write a byte to the other end of the pipe. Since Crochet uses Twisted, I had a pre-written event loop that already implemented self-pipe waking, and now the logging thread runs another Twisted reactor in addition to the regular reactor thread.

As far as I can tell this works, but it feels a bit like overkill. I'd welcome suggestions for other solutions.

Sunday, March 9, 2014

Twisted on Python 3 now pip installable

The subset of Twisted that has been ported to Python 3 can now be pip installed. By either pointing at a version control URL or requiring Twisted 14.0 (once it's released), you can now have Twisted as a dependency for your Python 3 packages.

Here's a slightly edited version of my Travis-CI config for Crochet, demonstrating how I run unit tests on both Python 2 and Python 3 versions of Twisted (slightly tricky because the trial test runner hasn't been ported yet):

language: python

  - TWISTED=Twisted==13.0 RUNTESTS=trial
  - TWISTED=Twisted==13.1 RUNTESTS=trial
  - TWISTED=Twisted==13.2 RUNTESTS=trial

  - 2.6
  - 2.7
  - pypy

    - python: 3.3
      env: TWISTED=git+ RUNTESTS="python -m unittest discover"

  - pip install -q --no-use-wheel $TWISTED --use-mirrors
  - python -q install

script: $RUNTESTS crochet.tests