Monday, January 28, 2013

Deferred Cancellation, part 3: Timeouts

Let's send an email! Here's the steps involved in what from the outside looks like a simple function call (and this is of course a very high-level view):
  1. Look up the IP of our SMTP server's domain using DNS. This may involve a series of UDP messages to one or more servers, which may do further work on our behalf.
  2. Establish a TCP connection with the server.
  3. Exchange a series of commands with the server over the TCP connection, some of which may involve arbitrarily complex processing on the server-side.
Obviously a lot can go wrong here, from communication problems to hardware failures to software bugs. The fact it usually works is an impressive engineering feat. For our purposes the interesting point is that failure can take an arbitrary amount of time.

The necessary and obvious solution is a timeout: if enough time has passed without getting a response, abort the operation and consider it to have failed. We may end up sending duplicate emails if we retry, but that is a business logic decision tied to the specifics of an application, so not something I'll be talking about. Now, we could have timeouts on each step of the process (DNS lookup, TCP connection, each command), and in fact may want timeouts for each of these. But from the point of view of the email sending API, the time it takes to do the underlying steps is irrelevant, except perhaps for debugging or performance: if we want to send an email within 5 seconds, we want it to take 5 seconds, and don't care which step happens to be the slow one.

This is where Deferred cancellation comes in. We want to make sure each step along the way has a cancellation function registered, if possible, but that's not strictly necessary. Our code looks something like this:
def sendmail(from, to, data, smtphost, smtpport=25):
    endpoint = TCPv4ClientEndpoint(smtphost, smtpport)
    d = endpoint.connect(SMTPFactory())
    def gotProtocol(smtpProtocol)
        return smtpProtocol.send(from, to, data)
    d.addCallback(gotProtocol)
    return d
The wonderful thing about Deferreds is that, as with results, cancellation also gets automatically chained. Thus if we call cancel() on the result of sendmail(), it will cancel the Deferred connecting to the server if that's where we are in the process, or the Deferred return from SMTPProtocol.send() if that's what we're waiting for. So if want to time out sending an email after 5 seconds... all we have to do is cancel the Deferred returned by the sendmail() function after 5 seconds if we haven't gotten a result! The following utility function, soon to be part of Twisted (ticket #5786), does just that:
def timeoutDeferred(deferred, timeout):
    delayedCall = reactor.callLater(timeout, deferred.cancel)
    def gotResult(result):
        if delayedCall.active():
            delayedCall.cancel()
        return result
    deferred.addBoth(gotResult)
And now, we can send an email with a timeout of our choice, e.g. 5 seconds:
sent = sendmail("from@example.net", "to@example.net",
                "An email message.", "smtp.example.net")
timeoutDeferred(sent, 5)
The nice thing about this API is that it doesn't require adding extra timeout arguments to every function. Instead, the highest-level caller just adds a timeout. And underlying callers (e.g. the TCP connect, its underlying DNS lookup, etc.) can have their own, more limited timeouts as well.

To summarize: supporting Deferred cancellation is a great way to make the integration points of your library code more useful, by allowing users of your code both ad-hoc and timeout-driven cancellation of your operations. And as the user of a Twisted library, timeouts can be easily added to any Deferred-returning API, in particular those that explicitly support cancellation for you.

No comments:

Post a Comment