Thursday, April 25, 2013

Unittesting With Localized Patching

In my previous two posts, I gave examples of alternatives to patching in unittests: class-based and function-based parameterization. Nick Coghlan pointed out that my example of patching was a little bit of a strawman argument - the most global (and therefore the most side-effecty) way of doing patching. Here's what he wrote:
While your point about the risks of patching destructive calls is valid, if you're going to decry the practice of using mocks in tests, at least decry a version which uses them properly. In your first example you are patching the wrong module - you shouldn't patch os._exit (with potentially non-local effects), you should patch the module under test so that *in that module only*, the reference "os._exit" resolves to your patched function.

Most functions under test *aren't* destructive (so you'll get the expected test result failure), and by jumping straight to dependency injection in cases where you don't need it, you can end up adding a huge amount of complexity to your production code without adequate reason *and* give yourself additional code paths to test in the process. Dependency injection should be used only if there is a *production* related reason for adding it (and "this function is destructive, so we should use dependency injection rather than mocking to test it" is a valid reason).

For those non-destructive cases, you can avoid most of the non-local effects without adding complexity to the production code by localising your mock operation to as narrow a target as possible.
The version of patching he suggests is definitely a lot better than my initial example, so let's take a look. First, the module we're going to test:
import os

def exit_with_result(function):
    result = function()
    if result:
        os._exit(0)
    else:
        os._exit(1)
And now the patch-based tests, based on example code from Nick Coghlan (any mistakes were added by me):

import unittest
# Note that we don't import os, because we're not touching it!

import exitersketch

class FakeOS:
    EXIT_NOT_CALLED = object()
    CALLED_WITH_DEFAULT = object()

    def __init__(self, module):
        self.module = module
        self.exit_code = self.EXIT_NOT_CALLED

    def _exit(self, code=CALLED_WITH_DEFAULT):
        self.exit_code = code

    def __getattr__(self, attr):
        return getattr(self.original_os, attr)

    def __enter__(self):
        self.original_os = self.module.os
        self.module.os = self
        return self

    def __exit__(self, *args):
        self.module.os = self.original_os


class ExiterTests(unittest.TestCase):

    def test_exiter_success(self):
        with FakeOS(exitersketch) as fake:
            exitersketch.exit_with_result(lambda: True)
        self.assertEqual(fake.exit_code, 0)

    def test_exiter_failure(self):
        with FakeOS(exitersketch) as fake:
            exitersketch.exit_with_result(lambda: False)
        self.assertEqual(fake.exit_code, 1)


if __name__ == '__main__':
    unittest.main()
This version is definitely a superior form of patching: only one module's view is impacted. Notice also the use of __getattr__ to ensure overriding exitersketch.os only overrides os._exit and not other parts of the os module. Nonetheless, it still suffers from the inherent problem of patching: it's overriding more state than necessary. Thus it's still possible to call destructive functions by mistake if you rearrange your imports. For non-destructive functions it's still possible to have a test unexpectedly call a patched function, albeit only from code in the same module rather than globally. If you are going to use patching, though, making the patching as local as possible is definitely the way to go.

Monday, April 22, 2013

Unittesting Without Patching: A Followup

I got a couple questions about my previous post asking why I didn't show the simpler, function-based style of parameterization. This style does make unittesting possible, and with less complexity than creating a new class:

import os

def exit_with_result(function, _exit=os._exit):
    result = function()
    if result:
        _exit(0)
    else:
        _exit(1)

The problem is that when you add arguments to a function, the parameterization leaks into your public API. This means that:
  • You need to document the fact that these extra arguments (e.g. _exit in the example above) should not be used.
  • *args and **kwargs can't be used at all.
  • Changing the function signature later on can be more difficult.
  • If you have large numbers of things you need to parameterize, the function definition gets pretty long and ugly.
In the class style in contrast the public API is not affected by the need to unittest.

What's more, you will often have a group of related functions using the same modules, functions or classes. By grouping them in a class, you can implement the parameterization hook once, rather than for every function. You can see an example of this in Crochet (specifically the Eventloop class), where the parameterized reactor is used by multiple functions. If the code you need to parameterize is already a method, setting the parameters in __init__ or as a class attribute is even more attractive, requiring only minimal additional complexity.

Update: If you go with this style of parameterization, you still need to assert that in the default case it actual calls the correct function (e.g. os._exit for exit_with_result). Probably the nicest way to do so is to use inspect.getcallargs.

Thursday, April 18, 2013

Unittesting Without Patching

Python has the power to override any attribute on any module or class, but just because you can doesn't mean you should. This is true in regular code, but just as true of unittests. Many testing libraries (mock, Twisted's trial, py.test) provide facilities for overriding some piece of global state; you can also do so manually. Occasionally these facilities prove invaluable, but often they are used unnecessarily. Better alternatives are available.

Before I explain why patching is problematic, let's look at an example. Consider the following module:

import os

def exit_with_result(function):
    result = function()
    if result:
        os._exit(0)
    else:
        os._exit(1)

On the face of it patching is necessary to test this example. The tests would look something like this:

import unittest
import os

from exitersketch import exit_with_result


class ExiterTests(unittest.TestCase):
    def setUp(self):
        self.exited = None
        self.originalExit = os._exit
        os._exit = self.fakeExit

    def fakeExit(self, code=0):
        self.exited = code

    def tearDown(self):
        os._exit = self.originalExit

    def test_exiter_success(self):
        exit_with_result(lambda: True)
        self.assertEqual(self.exited, 0)

    def test_exiter_failure(self):
        exit_with_result(lambda: False)
        self.assertEqual(self.exited, 1)


if __name__ == '__main__':
    unittest.main()

Having seen patching, and seen that it works as a testing technique, why should we avoid it?

  1. Patching is fragile. If the example above changed import os to from os import _exit, the patching would need to be modified. However, if you forgot to modify the patching, unexpected code will run. In this case, your test run will mysterious exit half way through. If the function you are attempting to patch is more destructive, worse things may happen: credit cards may get charged, data may get deleted, etc..
  2. Patching leads to unexpected behaviour. Because patching is a global change, the patched code may be called not just by the function being tested, but by code it is calling which happens to use the same patched code.
  3. Patching indicates bad design. Code code should be designed to be easily testable. Having to modify global state suggests that the code is not as modular as one might hope.

How to avoid patching? Parameterization, aka dependency injection. We refactor the code to accept the _exit function as a parameter. Notice the the public API has not changed:

import os

class _API(object):
    def __init__(self, exit):
        self.exit = exit

    def exit_with_result(self, function):
        result = function()
        if result:
            self.exit()
        else:
            self.exit(1)


_api = _API(os._exit)
exit_with_result = _api.exit_with_result

Our tests can now test both that _API.exit_with_result class has the correct behavior in general, and that the public exit_with_result is going to call os._exit in particular.

import unittest
import os

from exiter import _api, _API, exit_with_result


class ExiterTests(unittest.TestCase):
    def setUp(self):
        self.exited = None

    def fakeExit(self, code=0):
        self.exited = code

    def test_api(self):
        self.assertIsInstance(_api, _API)
        self.assertEqual(_api.exit, os._exit)
        self.assertEqual(exit_with_result, _api.exit_with_result)

    def test_exiter_success(self):
        _API(self.fakeExit).exit_with_result(lambda: True)
        self.assertEqual(self.exited, 0)

    def test_exiter_failure(self):
        _API(self.fakeExit).exit_with_result(lambda: False)
        self.assertEqual(self.exited, 1)


if __name__ == '__main__':
    unittest.main()

The same technique is useful when you are tempted to store some state in a module. Instead, store an instance of a class:

class _Counter(object):
    value = 0

    def increment(self):
        self.value += 1

    def value(self):
        return self.value


_counter = _Counter()
increment = _counter.increment
value = _counter.value

As I've demonstrated, patching can often be avoided by restructuring code to be more testable. The same Python features that make patching so easy also make avoiding patching just as easy. Given the choice, you should avoid changing global state when testing individual components.

Friday, April 12, 2013

SSH Into Your Python Server

Have you ever wanted to see what's going on inside your Python server? With Crochet and Twisted, you can add a Python prompt to you process that is accessible via SSH, allowing you to poke around in the internals of your running program. Here's an example session to a Flask server:
$ ssh admin@localhost -p 5022
admin@localhost's password: ******
>>> app
<flask.app.Flask object at 0x28a96d0>
>>> app.url_map
Map([<Rule '/' (HEAD, OPTIONS, GET) -> index>,
 <Rule '/static/<filename>' (HEAD, OPTIONS, GET) -> static>])
>>> from twisted.internet import reactor
>>> reactor._selectables
{9: <SSHServerTransport #0 on 5022>, 3: <<class 'twisted.internet.tcp.Port'> of twisted.conch.manhole_ssh.ConchFactory on 5022>, 6: <twisted.internet.posixbase._UnixWaker object at 0x28a2510>}
The code to start the SSH server has quite a lot of boilerplate, so I filed a ticket to provide a utility function. If you're using the system Twisted, you may need to install Twisted's Conch package, e.g. apt-get install python-twisted-conch on Ubuntu.
import logging

from flask import Flask
from crochet import setup, in_reactor
setup()

# Web server:
app = Flask(__name__)

@app.route('/')
def index():
    return "Welcome to my boring web server!"


@in_reactor
def start_ssh_server(reactor, port, username, password,
                     namespace):
    """
    Start an SSH server on the given port, exposing a Python
    prompt with the given namespace.
    """
    from twisted.conch.insults import insults
    from twisted.conch import manhole, manhole_ssh
    from twisted.cred.checkers import (
        InMemoryUsernamePasswordDatabaseDontUse as MemoryDB)
    from twisted.cred.portal import Portal

    sshRealm = manhole_ssh.TerminalRealm()
    def chainedProtocolFactory():
        return insults.ServerProtocol(manhole.Manhole,
                                      namespace)
    sshRealm.chainedProtocolFactory = chainedProtocolFactory

    portal = Portal(
        sshRealm, [MemoryDB(**{username: password})])
    reactor.listenTCP(port, manhole_ssh.ConchFactory(portal),
                      interface="127.0.0.1")


if __name__ == '__main__':
    import sys
    logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
    start_ssh_server(
        5022, "admin", "secret", {"app": app}).wait()
    app.run()

Wednesday, April 10, 2013

Crochet: Background Operations for Threaded Applications

In my previous post I showed Crochet doing a blocking call against a Twisted API. In this example, you can see how Twisted and Crochet allow you to run an operation in the background. An HTTP request for a new user starts a download in the background, and a reference is stored in the user's session. Every time the user reloads the page, a check is made to see if the download is finished, and if it is done it is display. You can also see the stash()/retrieve_result() API in use, which allows temporarily storing results under a key suitable for serialization in a session object.

import logging
from flask import Flask, session, escape
from crochet import setup, in_reactor, retrieve_result, TimeoutError
setup()

app = Flask(__name__)


@in_reactor
def download_page(reactor, url):
    """
    Download a page.
    """
    from twisted.web.client import getPage
    return getPage(url)


@app.route('/')
def index():
    if 'download' not in session:
        # @in_reactor functions return a DeferredResult:
        result = download_page('http://google.com')
        session['download'] = result.stash()
        return "Starting download, refresh to track progress."

    # retrieval is a one-time operation:
    result = retrieve_result(session.pop('download'))
    try:
        download = result.wait(timeout=0.1)
        return "Downloaded: " + escape(download)
    except TimeoutError:
        session['download'] = result.stash()
        return "Download in progress..."


if __name__ == '__main__':
    import os, sys
    logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
    app.secret_key = os.urandom(24)
    app.run()

Tuesday, April 9, 2013

Presenting Crochet: Use Twisted as a library

Twisted is an event-driven framework; by default it expects to run the reactor event loop in your main thread to drive your application. If however you're writing a Django or Flask application you may want to use Twisted as just another library. Unless you choose to use Twisted as a WSGI container, this requires you to run the reactor in a thread. Today I am happy to announce Crochet, which makes using Twisted even easier in this situation.

Here's an example program using Crochet, allowing it to easily use Twisted from a normal, blocking command-line tool:

from __future__ import print_function

from crochet import setup, in_reactor
setup()


@in_reactor
def mx(reactor, domain):
    """
    Return list of MX domains for a given domain.
    """
    from twisted.names.client import lookupMailExchange
    def got_records(result):
        hosts, authorities, additional = result
        return [str(record.name) for record in additional]
    d = lookupMailExchange(domain)
    d.addCallback(got_records)
    return d


def main(domain):
    print("Mail servers for %s:" % (domain,))
    for mailserver in mx(domain).wait():
        print(mailserver)


if __name__ == '__main__':
    import sys
    main(sys.argv[1])
When we run it on the command line, output looks this:
$ python mxquery.py gmail.com
Mail servers for gmail.com:
alt2.gmail-smtp-in.l.google.com
alt2.gmail-smtp-in.l.google.com
alt3.gmail-smtp-in.l.google.com
alt3.gmail-smtp-in.l.google.com
alt4.gmail-smtp-in.l.google.com
alt4.gmail-smtp-in.l.google.com
alt1.gmail-smtp-in.l.google.com
gmail-smtp-in.l.google.com
gmail-smtp-in.l.google.com
The library provides much more functionality, but that's the gist of it: it runs and stops the Twisted reactor for you, and wraps asynchronous results in a blocking API. If you'd like to try out Crochet, or learn more about its other features, visit Crochet's PyPI page.