Monday, April 14, 2014

Crochet 1.2.0, now with a better API!

Crochet is a library for using Twisted more easily from blocking programs and libraries. The latest version, released here at PyCon 2014, includes a much improved API for calling into Twisted from threads. In particular, a timeout is passed in - if it is hit the underlying operation is cancelled, and an exception is raised. Not all APIs in Twisted support cancellation, but for those that do (or APIs you implement) this is a really nice feature. You get high level timeouts (instead of blocking sockets' timeout-per-socket-operation) and automatic cleanup of resources if something takes too long.

#!/usr/bin/python
"""
Do a DNS lookup using Twisted's APIs.
"""
from __future__ import print_function

# The Twisted code we'll be using:
from twisted.names import client

from crochet import setup, wait_for
setup()


# Crochet layer, wrapping Twisted's DNS library in a blocking call.
@wait_for(timeout=5.0)
def gethostbyname(name):
    """Lookup the IP of a given hostname.

    Unlike socket.gethostbyname() which can take an arbitrary amount
    of time to finish, this function will raise crochet.TimeoutError
    if more than 5 seconds elapse without an answer being received.
    """
    d = client.lookupAddress(name)
    d.addCallback(lambda result: result[0][0].payload.dottedQuad())
    return d


if __name__ == '__main__':
    # Application code using the public API - notice it works in a normal
    # blocking manner, with no event loop visible:
    import sys
    name = sys.argv[1]
    ip = gethostbyname(name)
    print(name, "->", ip)

Saturday, March 15, 2014

Signal/GC-safe cross-thread queueing in Python

I've just released a new version of Crochet, and one of the bugs fixed involves an interesting problem - reentrancy. In this particular case I'm talking about garbage collection and signal reentrancy - any function your Python program is running may be interrupted at any time (on bytecode boundaries) to do garbage collection or handle a signal. A signal handler can run arbitrary Python code, as can GC via to __del__ or weakref callbacks. Once that code finishes running control is returned to the original location in the code.

Unfortunately, due to a bug in Python, Queue.put() can deadlock in the following situation:
  1. As part of calling Queue.put(), a thread acquires the Queue's lock. This lock does not support being acquired more than once by the same thread.
  2. GC or a signal handler interrupts the function call.
  3. If the GC or signal handler code then also does Queue.put(), it will try to acquire the lock again... and since it's already locked it blocks waiting for the lock to be released.
  4. Since the signal handler/GC code is now blocked, control is never returned to original code, so lock is never released there.
  5. The thread is now deadlocked and will never recover.
Unfortunately there was no way to prevent the Queue.put() in GC; the Queue accepts log messages, and this is a GC-caused logging message coming out of code that is not under Crochet control's.

The obvious short-term solution is to reimplement a simplified Queue using Python's RLock, which allows the same thread to acquire the lock multiples times. But... RLock isn't reentrancy safe either due to another bug in Python! I could wrap OS-specific reentrant lock implementations, but that's a bigger project than I want to start.

The solution I ended up with (suggested by Jean-Paul Calderone I believe) was giving up on using threading primitives to communicate across threads. Instead I used the self-pipe trick. That is, the thread uses select() (or poll() or epoll()) to wait on one end of the pipe; to wake the thread up and tell it to check for new messages to process we simply write a byte to the other end of the pipe. Since Crochet uses Twisted, I had a pre-written event loop that already implemented self-pipe waking, and now the logging thread runs another Twisted reactor in addition to the regular reactor thread.

As far as I can tell this works, but it feels a bit like overkill. I'd welcome suggestions for other solutions.

Sunday, March 9, 2014

Twisted on Python 3 now pip installable

The subset of Twisted that has been ported to Python 3 can now be pip installed. By either pointing at a version control URL or requiring Twisted 14.0 (once it's released), you can now have Twisted as a dependency for your Python 3 packages.

Here's a slightly edited version of my Travis-CI config for Crochet, demonstrating how I run unit tests on both Python 2 and Python 3 versions of Twisted (slightly tricky because the trial test runner hasn't been ported yet):

language: python

env:
  - TWISTED=Twisted==13.0 RUNTESTS=trial
  - TWISTED=Twisted==13.1 RUNTESTS=trial
  - TWISTED=Twisted==13.2 RUNTESTS=trial

python:
  - 2.6
  - 2.7
  - pypy

matrix:
  include:
    - python: 3.3
      env: TWISTED=git+https://github.com/twisted/twisted.git RUNTESTS="python -m unittest discover"

install:
  - pip install -q --no-use-wheel $TWISTED --use-mirrors
  - python setup.py -q install

script: $RUNTESTS crochet.tests

Saturday, October 19, 2013

Announcing Crochet v1.0: Use Twisted Anywhere!

It's been a busy 6 months since I first released Crochet, and now it's up to v1.0. Along the way I've expanded the documentation quite a bit and moved it to Sphinx, fixed a whole bunch of bug reports from users, added some new APIs and probably introduced some new bugs. What is Crochet, you ask?

Crochet is an MIT-licensed library that makes it easier for blocking or threaded applications like Flask or Django to use the Twisted networking framework. Crochet provides the following features:

  • Runs Twisted's reactor in a thread it manages.
  • The reactor shuts down automatically when the process' main thread finishes.
  • Hooks up Twisted's log system to the Python standard library logging framework. Unlike Twisted's built-in logging bridge, this includes support for blocking Handler instances.
  • A blocking API to eventual results (i.e. Deferred instances). This last feature can be used separately, so Crochet is also useful for normal Twisted applications that use threads.
You can download Crochet at: http://pypi.python.org/pypi/crochet

Documentation can be found on Read The Docs.

Bugs and feature requests should be filed at the project Github page.

Monday, June 10, 2013

Available for Python Consulting

Need someone to write high-quality, well-tested and robust Python code? If you need some Python or Twisted development done (or some other language, for that matter), I now have some free time in my schedule. You can reach me at itamar@futurefoundries.com.

Friday, May 24, 2013

Announcing Crochet 0.7: Easily use Twisted from threaded applications

Crochet is an MIT-licensed library that makes it easier for threaded applications like Flask or Django to use the Twisted networking framework. Features include:
  • Runs Twisted's reactor in a thread it manages.
  • Hooks up Twisted's log system to the Python standard library logging framework. Unlike Twisted's built-in logging bridge, this includes support for blocking logging.Handler instances.
  • Provides a blocking API to eventual results (i.e. Deferred instances).
This release includes better documentation and API improvements, as well as better error reporting.
You can see some examples, read the documentation, and download the package at:

https://pypi.python.org/pypi/crochet

For those of you who have seen Crochet before, I'd like to feature a new example. In the following code you can see how Twisted and Crochet allow you download information in the background every few seconds and then cache it, so that the request handler for your web application is not slowed down retrieving the information:

"""
An example of scheduling time-based events in the background.

Download the latest EUR/USD exchange rate from Yahoo every 30
seconds in the background; the rendered Flask web page can use
the latest value without having to do the request itself.

Note this is example is for demonstration purposes only, and
is not actually used in the real world. You should not do this
in a real application without reading Yahoo's terms-of-service
and following them.
"""

from flask import Flask

from twisted.internet.task import LoopingCall
from twisted.web.client import getPage
from twisted.python import log

from crochet import run_in_reactor, setup
setup()


class ExchangeRate(object):
    """
    Download an exchange rate from Yahoo Finance using Twisted.
    """

    def __init__(self, name):
        self._value = None
        self._name = name

    # External API:
    def latest_value(self):
        """
        Return the latest exchange rate value.

        May be None if no value is available.
        """
        return self._value

    @run_in_reactor
    def start(self):
        """
        Start the background process.
        """
        self._lc = LoopingCall(self._download)
        # Run immediately, and then every 30 seconds:
        self._lc.start(30, now=True)

    def _download(self):
        """
        Do an actual download, runs in Twisted thread.
        """
        print "Downloading!"
        def parse(result):
            print("Got %r back from Yahoo." % (result,))
            values = result.strip().split(",")
            self._value = float(values[1])
        d = getPage(
            "http://download.finance.yahoo.com/d/quotes.csv?e=.csv&f=c4l1&s=%s=X"
            % (self._name,))
        d.addCallback(parse)
        d.addErrback(log.err)
        return d


# Start background download:
EURUSD = ExchangeRate("EURUSD")
EURUSD.start()


# Flask application:
app = Flask(__name__)

@app.route('/')
def index():
    rate = EURUSD.latest_value()
    if rate is None:
        rate = "unavailable, please refresh the page"
    return "Current EUR/USD exchange rate is %s." % (rate,)


if __name__ == '__main__':
    import sys, logging
    logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
    app.run()

Thursday, April 25, 2013

Unittesting With Localized Patching

In my previous two posts, I gave examples of alternatives to patching in unittests: class-based and function-based parameterization. Nick Coghlan pointed out that my example of patching was a little bit of a strawman argument - the most global (and therefore the most side-effecty) way of doing patching. Here's what he wrote:
While your point about the risks of patching destructive calls is valid, if you're going to decry the practice of using mocks in tests, at least decry a version which uses them properly. In your first example you are patching the wrong module - you shouldn't patch os._exit (with potentially non-local effects), you should patch the module under test so that *in that module only*, the reference "os._exit" resolves to your patched function.

Most functions under test *aren't* destructive (so you'll get the expected test result failure), and by jumping straight to dependency injection in cases where you don't need it, you can end up adding a huge amount of complexity to your production code without adequate reason *and* give yourself additional code paths to test in the process. Dependency injection should be used only if there is a *production* related reason for adding it (and "this function is destructive, so we should use dependency injection rather than mocking to test it" is a valid reason).

For those non-destructive cases, you can avoid most of the non-local effects without adding complexity to the production code by localising your mock operation to as narrow a target as possible.
The version of patching he suggests is definitely a lot better than my initial example, so let's take a look. First, the module we're going to test:
import os

def exit_with_result(function):
    result = function()
    if result:
        os._exit(0)
    else:
        os._exit(1)
And now the patch-based tests, based on example code from Nick Coghlan (any mistakes were added by me):

import unittest
# Note that we don't import os, because we're not touching it!

import exitersketch

class FakeOS:
    EXIT_NOT_CALLED = object()
    CALLED_WITH_DEFAULT = object()

    def __init__(self, module):
        self.module = module
        self.exit_code = self.EXIT_NOT_CALLED

    def _exit(self, code=CALLED_WITH_DEFAULT):
        self.exit_code = code

    def __getattr__(self, attr):
        return getattr(self.original_os, attr)

    def __enter__(self):
        self.original_os = self.module.os
        self.module.os = self
        return self

    def __exit__(self, *args):
        self.module.os = self.original_os


class ExiterTests(unittest.TestCase):

    def test_exiter_success(self):
        with FakeOS(exitersketch) as fake:
            exitersketch.exit_with_result(lambda: True)
        self.assertEqual(fake.exit_code, 0)

    def test_exiter_failure(self):
        with FakeOS(exitersketch) as fake:
            exitersketch.exit_with_result(lambda: False)
        self.assertEqual(fake.exit_code, 1)


if __name__ == '__main__':
    unittest.main()
This version is definitely a superior form of patching: only one module's view is impacted. Notice also the use of __getattr__ to ensure overriding exitersketch.os only overrides os._exit and not other parts of the os module. Nonetheless, it still suffers from the inherent problem of patching: it's overriding more state than necessary. Thus it's still possible to call destructive functions by mistake if you rearrange your imports. For non-destructive functions it's still possible to have a test unexpectedly call a patched function, albeit only from code in the same module rather than globally. If you are going to use patching, though, making the patching as local as possible is definitely the way to go.