Years ago I built web applications in Python. The first one predated all of today's popular web frameworks. This was long before Flask or even Django. Pylons still didn't exist yet. We argued about Cheetah versus Mako templates. My team on
my first python web app actually implemented paste.httpserver in its current, threadpool'd incarnation (approximately 10 years ago).
About six years ago I more or less walked away from Python. Not because I wanted to, but because Google required me to write C++, and I was happy enough to do so. I did write a tiny bit of Python from time to time, but my bread and butter for several years was C++. After Google, I found myself dabbling in a bit of C, Java, Go and Ruby.
Now I'm back working day-to-day with Python. I just had my first experience in almost 6 years with web application deployment, and all I can say is, how did it end up like this? Who thought this was a good idea?
What am I talking about? I'm talking mostly about Gunicorn and uWSGI. Having deployed dozens of web apps a decade ago, I knew then that mod_python and mod_wsgi were a bad idea. Gunicorn and uWSGI are the natural result of spelunking deeper into that same (or very similar) rabbit hole.
Now, what has been the driving force behind these monoliths? Why have people chosen the sweat, blood and tears of deploying an application on an application server despite the gotchas, the errors, the hundreds of configuration options?
THE NEED FOR SPEED!
There exists a depressingly huge segment of the population that makes decisions in the following manner:
1. Need some unit of machine instruction to accomplish task.
2. Google for unit of machine instruction that solves task.
3. Find performance comparison of many such units.
4. Pick fastest unit.
You wrote an application in Python. It's not going to be fast. C is fast. Java is fast. C++ is fast. Go is pretty darn quick. Python is not fast. Think about why you are using Python, this is
extremely important.
Because it's productive.
Performance still matters, but in choosing Python you made the decision that productivity is a higher priority than performance. When push comes to shove, you're actively, consciously sacrificing performance for productivity. You can buy more performance, but buying more productivity is markedly harder. And that's probably a really sensible decision. You should stick by it and be proud of it.
So why are people using uWSGI, Gunicorn, mod_wsgi and so on? Because it's snake oil. Because pretty graphs proved to you that it was twice as fast. Because pretty graphs showed it could handle three times as many concurrent users.
But these numbers were derived in one of two ways. Either from an application that is little more than
return "hello world", or on some crazy harebrained, super high-volume application at some company that had the developer resources on hand to develop something like a Tornado web app (and all of the corresponding infrastructure, since you won't be using full-blown SQLAlchemy in such an app). Allow me to let you in on a dirty little secret:
The amount of time your application spends executing application code is going to be drastically higher, as in orders of magnitude higher, than the time spent by the server writing bytes to a wire.
Here's a tidbit about every single performance comparison I've seen around paste.httpserver: they all use the defaults for paste.httpserver and a few others, and they all carefully configure the ones that demand it (mod_wsgi for instance). For example
this one here. Had paste.httpserver been setup with multiple processes and given enough threads to match mod_wsgi in memory consumption in that article, well, you can go ahead and guess at the results; it would have been impressive. I suppose people don't realize paste's default threadpool size is 10 threads. And paste forces you to offload process supervision to an actual process supervisor (which is probably a good thing). And you're responsible for spawning a process for each CPU. But do that, and set it up comparable to your finely-tuned application server, and you will be blown away at how your real-world application performs on 10 year-old technology.
Here's another fun thing to think about regarding performance comparisons. If you're slamming a real application with 3000 requests per second, what's it doing to a database and other services?
But you know how to setup uWSGI/Gunicorn/mod_wsgi/whatever, and you feel why not? Surely this performance boon is practically free, so you might as well take it.
Well, what do these application servers do? Presumably they run a python interpreter and call your
wsgi_app(env, start_response) function. They understand Python enough to execute it and do some voodoo magic to turn that wsgi response into bytes on a wire.
And that's where the similarities to a real, sane Python interpreter end. Abruptly. Full stop. The environment your Python code runs in would be hard pressed to look any more different from a real Python interpreter.
If it works correctly in pure Python, then it should work in production.
The very core belief that led me to write this article is this: if the application works in development on your machine, it should work in production and every spot (staging, QA, etc) in between. Couple that with the fact that you most likely cannot control everything going on in your application (dependencies of third party libraries/systems), and you can end up in a situation where your pure Python web application code works perfectly on your system while failing, freezing or crashing on a production system.
But oh, there are workarounds. Workarounds abound. Are you opening some resource as a side effect of importing a module (hey there 90% of "settings" modules I've seen in the wild)? Then go ahead and make sure your application server is forking before loading. But bear in mind that's going to hurt performance and consume more memory which is why you went down this path. Are you using threading? Be careful, you might need to make sure your application server isn't using sub-interpreters. Are you using C extensions? Again with the sub-interpreters (which by the way are the default for mod_wsgi and uWSGI, at least). Most deployments I see these days are strictly forking (and probably load-after-fork or heavily decorated to do as such) with threading disabled. Are you not using sub-interpreters? Be careful about global namespace pollution.
As you can see, you can quickly find yourself in a situation where the environment in production is the wild wild west of Python interpreter environments, and is really nothing like a Python interpreter launched from the command-line. Furthermore, you can find yourself in a catch-22. E.g. your application won't even start in a sub-interpreter environment, but without it you are polluting some global namespace and getting odd crashes (or worse). Or you spent weeks developing, and when you deploy on an application server it manages to segfault uWSGI without so much as a whisper in a log.
These are real world examples. These are things I have seen with my own two eyes.
So what am I advocating? To be honest, I'm not even completely sure. Years ago we used paste.httpserver processes managed by supervisord and reverse proxied by lighttpd (nginx didn't exist yet or only had documentation in Russian). Without a sub-interpreter, without disabling or enabling strange harebrained options, without peppering the application with strange decorators tightly coupled to the application server, without fighting to get an application that
already works to...
work.
After that application (this was circa 2005), I preferred multiple supervisord managed, threaded fastcgi processes reverse proxied behind nginx. It was efficient, easy to setup, and robust. I did some performance testing with real applications and found absolutely negligible performance gained by backflipping through mod_wsgi. Later I started mulling over the idea of just serving http over a socket, and I'd bet that's almost as efficient with the added benefit of being able to tinker on the actual processes themselves (which is nice for operations).
Look, maximizing the performance of an application server is not magic. Sure, some string manipulation happens in C so the very marginal part of your app where some bytes in ram get put into HTTP format and shoved in a buffer is faster, but Python is probably good enough at that, after all your application is written in it.
Perhaps I don't have anything concrete to advocate. It just seems to me that a Python application should be run by a Python interpreter, not some strange process-managing server that mangles the interpreter to the point that very basic, core functionality becomes impossible. And if this means pure Python putting bytes on a wire, it seems worth the tradeoff for a consistent environment, a distinct lack of show-stopping bugs, and frankly simpler operations.