Tuesday, January 29, 2013

Test-Driven Development with Twisted: A PyCon 2013 Tutorial

Testing network applications is hard: the order of events is unpredictable, the passage of time is important, and the sources of errors are many. At PyCon this year I will be teaching a three-hour tutorial on test-driven development with Twisted, demonstrating how to build well-tested network applications.

(As a reminder, I'm also teaching a two day Twisted class in San Francisco with Jean-Paul Calderone, on the Monday and Tuesday before PyCon; early bird pricing expires Feb 15th. The material in the testing tutorial is also included in the longer class.)

In the lecture part of the tutorial I will cover:
  • Testing network protocols in a deterministic manner (no need for actual TCP connections).
  • Testing the passage of time (no need to wait 2 hours in your test to prove that a timeout is hit).
  • Twisted's testing infrastructure for running the reactor and handling Deferreds.
Once that is done, students will begin a hands-on lab, implementing a HTTP server from scratch. I will provide pre-written unit tests, and students will write code to make these tests pass, with help from me (and perhaps an assistant or two depending on number of students).

The lessons here are implicit in the design of the tests, and the design of the server as shaped by the tests. If anything these lessons are more important than understanding Twisted's testing APIs:
  • The scoping of tests into small units of work.
  • Separation of concerns - parsing/generating bytes vs. business logic.
  • Design patterns for Deferred APIs.
  • Building robust network applications, including dealing with bad input and timeouts.
  • Separation of library code and application configuration.
Students who finish early can move on to a more difficult exercise, implementing both the tests and logic for an HTTP client, but benefiting from the ability to ask for in-person help.

You can sign up at the PyCon website as part of registration, or read more on the tutorial's PyCon web page.

Monday, January 28, 2013

Deferred Cancellation, part 3: Timeouts

Let's send an email! Here's the steps involved in what from the outside looks like a simple function call (and this is of course a very high-level view):
  1. Look up the IP of our SMTP server's domain using DNS. This may involve a series of UDP messages to one or more servers, which may do further work on our behalf.
  2. Establish a TCP connection with the server.
  3. Exchange a series of commands with the server over the TCP connection, some of which may involve arbitrarily complex processing on the server-side.
Obviously a lot can go wrong here, from communication problems to hardware failures to software bugs. The fact it usually works is an impressive engineering feat. For our purposes the interesting point is that failure can take an arbitrary amount of time.

The necessary and obvious solution is a timeout: if enough time has passed without getting a response, abort the operation and consider it to have failed. We may end up sending duplicate emails if we retry, but that is a business logic decision tied to the specifics of an application, so not something I'll be talking about. Now, we could have timeouts on each step of the process (DNS lookup, TCP connection, each command), and in fact may want timeouts for each of these. But from the point of view of the email sending API, the time it takes to do the underlying steps is irrelevant, except perhaps for debugging or performance: if we want to send an email within 5 seconds, we want it to take 5 seconds, and don't care which step happens to be the slow one.

This is where Deferred cancellation comes in. We want to make sure each step along the way has a cancellation function registered, if possible, but that's not strictly necessary. Our code looks something like this:
def sendmail(from, to, data, smtphost, smtpport=25):
    endpoint = TCPv4ClientEndpoint(smtphost, smtpport)
    d = endpoint.connect(SMTPFactory())
    def gotProtocol(smtpProtocol)
        return smtpProtocol.send(from, to, data)
    return d
The wonderful thing about Deferreds is that, as with results, cancellation also gets automatically chained. Thus if we call cancel() on the result of sendmail(), it will cancel the Deferred connecting to the server if that's where we are in the process, or the Deferred return from SMTPProtocol.send() if that's what we're waiting for. So if want to time out sending an email after 5 seconds... all we have to do is cancel the Deferred returned by the sendmail() function after 5 seconds if we haven't gotten a result! The following utility function, soon to be part of Twisted (ticket #5786), does just that:
def timeoutDeferred(deferred, timeout):
    delayedCall = reactor.callLater(timeout, deferred.cancel)
    def gotResult(result):
        if delayedCall.active():
        return result
And now, we can send an email with a timeout of our choice, e.g. 5 seconds:
sent = sendmail("from@example.net", "to@example.net",
                "An email message.", "smtp.example.net")
timeoutDeferred(sent, 5)
The nice thing about this API is that it doesn't require adding extra timeout arguments to every function. Instead, the highest-level caller just adds a timeout. And underlying callers (e.g. the TCP connect, its underlying DNS lookup, etc.) can have their own, more limited timeouts as well.

To summarize: supporting Deferred cancellation is a great way to make the integration points of your library code more useful, by allowing users of your code both ad-hoc and timeout-driven cancellation of your operations. And as the user of a Twisted library, timeouts can be easily added to any Deferred-returning API, in particular those that explicitly support cancellation for you.

Sunday, January 13, 2013

Update: Early bird discount for our Twisted class

If you pay for our class before February 15th, you'll get $100 off; sign up for our two-day Twisted class in San Francisco now at http://futurefoundries.eventbrite.com.

Tuesday, January 8, 2013

2-day Twisted Class in San Francisco

Interested in learning the fundamentals of Twisted and event-driven networking with Python? If you live in the Bay Area or SF, or are visiting for PyCon, you can join Jean-Paul Calderone and I for a two day intro to Twisted in San Francisco.
Location: San Francisco, exact site TBA.
Dates: March 11-12, the Monday and Tuesday before PyCon.
Cost: $650 early bird, or $750 after Feb 15.

Sign up now!

Covered Material

Combining a lecture with plenty of hands-on exercises, covered topics will include:
  • Understanding Event Loops: we'll re-implement Twisted's core APIs step-by-step (reactor, transport, protocol), explaining the why and how of event-driven networking.
  • TCP Clients and Servers.
  • Scheduling Timed Events.
  • Deferreds: the motivation and uses of Twisted's result callback abstraction.
  • Producers and Consumers: dealing with large amounts of data.
  • Unit Testing: how to test your networking code.
  • A large, self-paced exercise, implementing a HTTP server and client from scratch using pre-written unit tests as guidance, and our help as needed.

Daily Schedule (Tentative)

9:30-12:30: Lecture and exercises.
12:30-13:30: Lunch break.
13:30-16:30: Lecture and exercises.
16:30-17:30: Extended exercise time, and in-depth Q&A.