Friday, July 22, 2016

Python, the web, and snake oil - part 3

Here we are, over a year since I ranted about the goofiness of (most of) the python web ecosystem and later put together some coherent thoughts. So how did it go? Where have I ended up since? In a word...

Twisted

It's old; it has funny-looking style; it's not the new, cool whizzbang fresh off of that tech-news-source-that-shall-not-be-named.

But it is absolutely fantastic and you should use it.

There are few software projects in the world that will, given some time, practically bring you to tears of joy. The API is divine. It's been running in production environments for over 15 years. You can imagine the rock-solid stability of a library that began development before the current generation of python programmers learned how to use a toilet. Oh, and that's why it "looks funny"; twisted style was very carefully designed to be consistent and informative, before the python world even proposed pep8. Think about that, Twisted predates pep8.

Every single long-running Python application at Oscar speaks to the world using Twisted. This has expanded beyond just web applications to services. Over the past year, Twisted has become the substrate for anything written in Python.

Using Twisted with Blocking Code

While unsettling to some diehard Twisted users, we tend to hide the fact that our infrastructure is running with Twisted by extensive use of deferToThread. Twisted's wsgi container already does this, and I do so in our RPC infrastructure as well. This is totally ok, and still provides some benefits of an async networking stack while providing compatibility with more general, blocking code.

Since we perform all IO via Twisted, and defer to a threadpool to do work, we immediately gain the ability to concurrently hold thousands of mostly idle connections. This allows connections (and their associated handshakes, e.g. SASL) to remain open beyond a single request/response. The benefit of reusing an authenticated TCP channel is significant. Some refer to this kind of architecture as "half-sync", where IO is done asynchronously and work is done synchronously in a thread pool. In addition, many workloads may currently be better suited to threading (contrary to popular belief, most RDBMS access is CPU bound, not IO bound).

Growing with Twisted

As time goes on, we have found ourselves relying more and more on the Twisted stack. LoopingCall has started to spread through the codebase (even around mostly blocking code as mentioned above). On one occasion to debug a particularly nasty bug, I simply added a "manhole"--the ability to ssh into a running process and drop into a REPL. Usage of Twisted endpoints allows a service to be brought up listening in a variety of ways simply by configuration (from on a port to a unix domain socket to an inherited file descriptor, TLS or plain, etc).

With services, we have written our own protocol and transport stack for Thrift, which provides us with the same half-sync characteristics as our web containers.

At the same time, we utilized Twisted in a fully asynchronous manner where we can. Twisted itself provides the building blocks to talk to just about anything on the internet, and third party projects built on Twisted provide the rest. For example, the treq project is a Twisted-compatible port of the popular requests package.

Interpreter Environment

As mentioned previously (in parts 1 and 2), I was searching for a sane interpreter environment where development and production would be as close as possible. Every application and service is built into a pex using pants and is simply started with command-line/environment/configuration flags (using our published oscar.flag package). The process is the same in both development and production, and our python applications are just that - python applications. Since then we've had absolutely no errors due to difference in interpreter environment. This shouldn't be something to write home about, but in the current state of Python web deployment, it unfortunately is. Twisted is fully available as a set of python modules, and it will offer no surprises in your interpreter environment.