tag:blogger.com,1999:blog-69536957237871515492024-03-12T23:40:59.371-07:00Despair in SoftwareIncoherent ramblings of a software engineer.Unknownnoreply@blogger.comBlogger9125tag:blogger.com,1999:blog-6953695723787151549.post-65745720940689381122018-12-14T11:49:00.000-08:002018-12-14T11:49:53.761-08:00Unsafe Safe Spaces<p>There are a lot of ways to describe various mental illnesses. But for
me, I could spend hours just talking about the shame, guilt and
self-loathing.
</p><p>
I recently attended some sensitivity training at work. It's not a
bad thing, though it's pretty predictable.
</p><p>
Out of the entire presentation, mental health was mentioned once, as an
after thought, on one slide. It was the slide defining "protected
categories". I believe the line was "physical or mental disability".
</p><p>
Physical disability is something I remember being a very big deal when
I was younger. There was a lot of change, a lot of new
regulation. Something called the Americans with Disabilities Act was
kind of a big deal. When the dust settled, pretty much everyone agreed
it was a good thing and we had all become a lot more sensible.
</p><p>
Then I remember, at another point during my childhood, a similar
episode of public awakening around "sexual harrassment". Again, it
seemed like some were upset, but generally it seemed like we had
collectively come to our senses.
</p><p>
Generally these days, I believe we vaguely agree simply on "don't be a
dick", though there are plenty of disagreements on some of the finer
points and boundaries. I don't have a problem with this at all. I
think most decent human beings are fine with being nice and respectful
to each other. Of course, not everyone is decent, nice or respectful.
</p><p>
And sometimes, that person is me. No really, sometimes I am a real
jerk. And it's not even hard for me to see. We're not even talking
about splitting hairs or grey areas. Sometimes I am a massive jerk,
and I should probably be fired, publicly, as an example of what
happens to assholes at safe workplaces. My actions and their
consequences should be clear. Nobody could possibly fault anyone for
ridding themselves of such a toxic creature. And I hate myself for
it. I live with crushing shame. Often times I do and say things I
later cannot possibly fathom. I used to find myself completely out of
control.
</p><p>
Recently this impulsive behavior was explained to me, then to an
employer, by a doctor, in a letter, as a disability, protected by law.
</p><p>
And frankly, I don't feel one bit better about it. In fact, in a lot
of ways, I feel worse. Not only am I an absolutely miserable piece of
shit, I'm also "disabled", and I somehow get to make some kind of
excuse about how I'm an absolutely miserable piece of
shit. Furthermore, the disability isn't that I don't know, AND THIS IS
THE IMPORTANT BIT, and am incapable of self-reflection or indentifying
just how miserable I am. I am fully capable of that. The disability is
that *sometimes I can't control myself despite this*. So yes, I am
fully self-aware, and I get to spend my waking hours under the weight
of a completely functional and healthy conscience.
</p><p>
If you can imagine this existence, you can easily see how suicide is
not only an option, but a very attractive one. Tangental to this, my
latest medication carries a small risk of sudden death while tapering
up on it. Risk of death. That "side-effect" was not even a
consideration for me, as the alternative is sure death.
</p><p>
Futhermore, what you are reading, right now, at this moment, is from
someone who, thanks to medication, no longer suffers as described. And
only because of that, even has the awareness to describe it. In the
past, there was a time when I was not only ill but also not even aware
of it, let alone medicated or treated. I am able to live, today,
thanks to combined therapies. Imagine who and what I was before, and
how that led up to the breaking points where someone finally said, on
the record, in terms that carry medical and legal significance, as I
sit there devoid of shoelaces and belt, "this person has a
pathological condition and needs our help."
</p><p>
Now, if you recall previously I had delivered such a written diagnosis
to an employer. This was not done lightly or for academic
purposes. This was done because I had been a gigantic, intolerable
asshole during a hypomanic episode. And, looking at the real
possibility of (totally deserved) disciplinary action from my
employer, was convinced to accept protection as a disabled person.
</p><p>
I just want to restate, at this point, that I do not in any way feel
less guilt, or feel at all mollified because a piece of paper from
someone with a lot of schooling says I'm disabled. It just means I get
to have a job. A combination of medication and therapy has me to the
point where I am far less of a jerk than I used to be, and that maybe
that's good enough to see the sunrise tomorrow. Also, these pieces of
paper don't automagically smooth things over with the people you
screamed at. In case you were wondering.
</p><p>
So where does that leave things? Well, I presented my doctor's note
and diagnosis. I saved my own ass. What about the person who was
treated poorly by me? Do they get any justice? Should they?
</p><p>
What happens when you have the ADA behind you and an offended employee
in front of you? It's becoming increasingly common today for employees
to "stand up" against their company when they perceive no disciplinary
action. An employer cannot disclose a disability, they can only
respond that they have acted appropriately under the law. Put these
two together. Add in the increase of public shaming.
</p><p>
I don't like where this is going. It's not going to end well. In fact,
I'm confident people will die before it is over.
</p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-6953695723787151549.post-74514140360797673242016-09-14T15:24:00.000-07:002016-09-14T15:24:34.722-07:00How Open Source DevolvesYou know what I'm talking about. Why are you forced to use the build flag "utf8strings" to generate <i>correct </i>python thrift code? Why is the default behavior of MySQL to truncate data (among a million other things)? Why, over time, do so many projects/libraries/services become obtuse and require a wealth of knowledge to successfully use correctly? Why do so many open-source things come with absolute bonkers default behavior?<br />
<br />
Let me show you an example.<br />
<a href="https://www.blogger.com/goog_684203368"><br /></a>
<a href="https://issues.apache.org/jira/browse/THRIFT-395">https://issues.apache.org/jira/browse/THRIFT-395</a><br />
<br />
Let's systematically break this down. The behavior of the Thrift compiler at the time was completely unaware of unicode strings. It was essentially broken, especially when talking to other thrift code. Thrift contains two string-like types: string and binary. Binary is for raw bytes, while string is for utf8 -encoded strings. Python at the time wasn't correctly encoding unicode strings as utf8, so it was broken. Essentially every other thrift target language was doing the right thing.<br />
<br />
Now if you notice in that thread, a tortured programmer soul was disturbed by this change, because it would break his existing code. This argument is the cancer of the open-source world: if the world is broken, it must remain broken because fixing it will break my thing.<br />
<br />
But this isn't true. This person's code would only be broken if 1) this code change landed, 2) they upgraded their thrift libraries to the new version containing the change and 3) refused to go through their broken codebase and change "string" to "binary". This person is willing to upgrade versions of thrift, but unwilling to run sed. Maybe they're on a mac, and BSD sed can be tricky? I don't know. But this person could also just NOT upgrade thrift, and everything they've written will continue to work. Or they could both upgrade thrift and use some sed.<br />
<br />
Yet, because of this one person, the ENTIRE world gets to add "utf8strings" to their python thrift builds.<br />
<br />
Look, this is like if Ford made a truck and accidentally forgot one of the wheels. Then one person figures out how to load the truck bed so that it drives (albeit shakily) on 3 wheels. Then Ford issues a recall and this person protests, so they cancel THE ENTIRE RECALL and EVERY truck continues to be shipped with 3 wheels. The 4th wheel is included in the truck bed when you drive it off the lot, in case you want a truck with 4 wheels instead of 3.<br />
<br />
And if you go looking, you will find exactly this, over and over and over. This is literally how open source development works. <a href="http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mode-strict">You can't fix the world, you have to keep it broken</a>.<br />
<br />
This is how open source sucks.<br />
<br />
Don't even get me started on committee governance models. Let's go ahead and dilute any individual expertise on the committee by giving everyone an equal vote.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6953695723787151549.post-83417865462850058252016-07-22T15:52:00.004-07:002016-07-22T15:55:29.789-07:00Python, the web, and snake oil - part 3Here we are, over a year since I ranted about the goofiness of (most of) the python web ecosystem and later put together some coherent thoughts. So how did it go? Where have I ended up since? In a word...<br />
<br />
<h2>
<a href="http://twistedmatrix.com/trac/">Twisted</a></h2>
<div>
It's old; it has funny-looking style; it's not the new, cool whizzbang fresh off of that tech-news-source-that-shall-not-be-named.</div>
<div>
<br /></div>
<div>
But<b> it is absolutely fantastic and you should use it</b>.</div>
<div>
<br /></div>
<div>
There are few software projects in the world that will, given some time, practically bring you to tears of joy. The API is divine. It's been running in production environments for over 15 years. You can imagine the rock-solid stability of a library that began development before the current generation of python programmers learned how to use a toilet. Oh, and that's why it "looks funny"; <a href="https://twistedmatrix.com/documents/current/core/development/policy/coding-standard.html">twisted style</a> was very carefully designed to be consistent and informative, before the python world even proposed <a href="https://www.python.org/dev/peps/pep-0008/">pep8</a>. Think about that, Twisted predates pep8.</div>
<div>
<br /></div>
<div>
Every single long-running Python application at <a href="https://www.hioscar.com/">Oscar</a> speaks to the world using Twisted. This has expanded beyond just web applications to services. Over the past year, Twisted has become the substrate for anything written in Python.<br />
<br />
<h3>
Using Twisted with Blocking Code</h3>
</div>
<div>
While unsettling to some diehard Twisted users, we tend to hide the fact that our infrastructure is running with Twisted by extensive use of <a href="http://twistedmatrix.com/documents/current/api/twisted.internet.threads.deferToThread.html">deferToThread</a>. <a href="http://twistedmatrix.com/documents/current/web/howto/web-in-60/wsgi.html">Twisted's wsgi container</a> already does this, and I do so in our RPC infrastructure as well. This is totally ok, and still provides some benefits of an async networking stack while providing compatibility with more general, blocking code.</div>
<div>
<br /></div>
<div>
Since we perform all IO via Twisted, and defer to a threadpool to do work, we immediately gain the ability to concurrently hold thousands of mostly idle connections. This allows connections (and their <a href="https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer">associated handshakes, e.g. SASL</a>) to remain open beyond a single request/response. The benefit of reusing an authenticated TCP channel is significant. Some refer to this kind of architecture as "half-sync", where IO is done asynchronously and work is done synchronously in a thread pool. In addition, many workloads may <i>currently</i> be better suited to threading (contrary to popular belief, most RDBMS access is <a href="http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/">CPU bound, not IO bound</a>).<br />
<br /></div>
<h3>
Growing with Twisted</h3>
<div>
As time goes on, we have found ourselves relying more and more on the Twisted stack. <a href="http://twistedmatrix.com/documents/current/api/twisted.internet.task.LoopingCall.html">LoopingCall</a> has started to spread through the codebase (even around mostly blocking code as mentioned above). On one occasion to debug a particularly nasty bug, I simply added a "<a href="http://twistedmatrix.com/documents/current/api/twisted.conch.manhole.html">manhole</a>"--the ability to ssh into a running process and drop into a REPL. Usage of <a href="https://twistedmatrix.com/documents/current/core/howto/endpoints.html">Twisted endpoints</a> allows a service to be brought up listening in a variety of ways simply by configuration (from on a port to a unix domain socket to an inherited file descriptor, TLS or plain, etc).</div>
<div>
<br /></div>
<div>
With services, we have written our own <a href="http://twistedmatrix.com/documents/current/api/twisted.protocols.basic.Int32StringReceiver.html">protocol</a> and transport stack for Thrift, which provides us with the same half-sync characteristics as our web containers.</div>
<div>
<br /></div>
<div>
At the same time, we utilized Twisted in a fully asynchronous manner where we can. Twisted itself provides the building blocks to talk to just about anything on the internet, and third party projects built on Twisted provide the rest. For example, the <a href="https://github.com/twisted/treq">treq</a> project is a Twisted-compatible port of the popular requests package.<br />
<br /></div>
<h2>
Interpreter Environment</h2>
<div>
As mentioned previously (in parts <a href="http://www.despairinsoftware.com/2015/04/python-web-and-snake-oil.html">1</a> and <a href="http://www.despairinsoftware.com/2015/04/python-web-and-snake-oil-part-2.html">2</a>), I was searching for a sane interpreter environment where development and production would be as close as possible. Every application and service is built into a <a href="https://github.com/pantsbuild/pex">pex</a> using <a href="http://www.pantsbuild.org/">pants</a> and is simply started with command-line/environment/configuration flags (using our published <a href="https://oscarflag.readthedocs.io/en/latest/">oscar.flag</a> package). The process is the same in both development and production, and our python applications are just that - python applications. Since then we've had absolutely no errors due to difference in interpreter environment. This shouldn't be something to write home about, but in the current state of Python web deployment, it unfortunately is. Twisted is fully available as a set of python modules, and it will offer no surprises in your interpreter environment.</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6953695723787151549.post-15710858908784845562015-10-13T09:34:00.000-07:002015-10-13T09:34:00.058-07:00My (Current) Favorite Talks<h3>
My Favorite Talks</h3>
<ol>
<li>"<a href="https://www.youtube.com/watch?v=oebqlzblfyo">Full Stack Awareness</a>" - Artur Bergman</li>
<li>"<a href="http://www.infoq.com/presentations/Simple-Made-Easy">Simple Made Easy</a>" - Rick Hickey</li>
<li>"<a href="https://www.youtube.com/watch?v=lKXe3HUG2l4">The Mess We're In</a>" - Joe Armstrong</li>
<li>"<a href="https://vimeo.com/9270320">What We Actually Know About Software Development, And Why We Believe It's True</a>" - Greg Wilson</li>
<li>"<a href="https://www.youtube.com/watch?v=neI_Pj558CY">The Top 10 Ways To Scam The Modern American Programmer</a>" - Zed A. Shaw</li>
</ol>
<h3>
Notable Mentions</h3>
<ul>
<li>"<a href="https://www.youtube.com/watch?v=LQ_6oS3UqsM">Latency is the Mind Killer</a>" - Artur Bergman</li>
<li>"<a href="https://www.youtube.com/watch?v=5kj5ApnhPAE">Public Static Void</a>" - Rob Pike</li>
</ul>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6953695723787151549.post-11843258235878374282015-05-12T23:00:00.002-07:002015-05-12T23:04:25.397-07:00BreakupDear Python,<br />
<br />
We had some good times. When I broke up with the last language, you were just what I needed. You were there when I wanted you, but never overbearing. You and I were two peas in a pod.<br />
<br />
But lately I've been getting restless. I've been getting a bit more concurrent, and a bit less "hey I need a web framework and a template language". I feel like you haven't been listening to me when I tell you that I really need massive concurrency. I know you hate types, but lately I've been begging for some type safety and the best you've given me are weird type annotations via doc strings. They work, I guess, but they're not all that I want. I'm just not getting what I really want and need from you anymore.<br />
<br />
Look, to be completely honest, I met another language. This other language, it has concurrency off the charts. It has a rich type system that has expanded my understanding and resulted in new ways to ship libraries that can be used in any way imaginable without issue. Instead of installing various libraries and creating directories and making virtualenvs, I can copy a single file to a server and run it. And it even has a fully compliant http 1.1 and http 2.0 server in it - so completely compliant and secure that there's no reason to proxy behind nginx. In fact, it's had less security issues in the past year than nginx (due to openssl).<br />
<br />
So really, my dear python, it's not you, it's me. I think you're grand, but I need something more. I'm sorry that it's come to this. We will still be friends since you are dating most of my coworkers now, but I've moved on.<br />
<br />
Of course you're still welcome at certain gatherings. Everyone is happy to have a go-to for web handy. Scripting is best done with you riding shotgun. Various sysadmin tasks wouldn't be the same without you. But for what I do day-to-day, you just aren't meeting my needs. I'm so sorry.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6953695723787151549.post-72052677161001144652015-04-11T12:46:00.003-07:002015-04-11T12:46:27.552-07:00Python, the web, and snake oil - part 2While my previous post was cathartic to write, it was not useful. In the hours that followed I became aware of others who share those same feelings. Through some online conversation I found a few very good solutions, further distilled my thoughts, and found some great resources that deserve to be shared.<br />
<br />
First, I would strongly urge everyone working in, around or near a Python application to watch <a href="http://pyvideo.org/video/83/djangocon-2011--keynote---glyph-lefkowitz">this talk</a>. All of it. It just keeps getting better and more specific the deeper Glyph gets into it. While given at Djangocon as a keynote, it is applicable broadly.<br />
<br />
Watching this and speaking briefly with Glyph helped me distill my thoughts.<br />
<br />
<h3>
Your Python application should be a Python application, not a plugin for a web server.</h3>
<div>
<br /></div>
<div>
Your web server should be something you <i>can import</i>. Your application should not be something imported by a web server. This is an important distinction. The difference here is familiar; it is the distinction between framework and library. Having a web server import your application turns it into a peg that must be properly shaped to fit into it's corresponding hole. Over time, the effects of various third-party libraries (e.g. something importing lxml) become harder to control and predict in relation to the peg's shape. Flip this over and force the web server to be a properly behaved unit of Python which may be used like any other unit of Python: imported, tested, etc.</div>
<div>
<br /></div>
<div>
Developers, this will demystify deployment. The magic that happens in production will suddenly be attainable inside your development virtual environment. There will be fewer (or no) surprises. Rather than fighting with some strange piece of software written in C, you will be doing what you've always been doing: installing a dependency and using it.</div>
<div>
<br /></div>
<div>
SREs, this will help you get out the door at 5PM and maybe sleep through a few more nights. What developers do locally in development will work in production. Re-read that a few times. This is the sad current state of affairs in so many deployment scenarios, and we've all sat back and accepted it! How many times have you issued a rollback because production and development behave completely different? It won't solve every single one of these issues, but it will help enough that it warrants attention. By allowing the development environment to closely parallel the production environment developers will be solving production problems <i>for you</i>, before making it into production and wreaking havoc.</div>
<div>
<br /></div>
<h3>
There are several WSGI containers to choose from that are well-behaved Python modules.</h3>
<div>
<br /></div>
<div>
There are several, including cherrypy, and twisted.web. I am currently swooning over <a href="http://twistedmatrix.com/documents/current/web/howto/web-in-60/wsgi.html">twisted web's WSGI container</a>. Now sure, I said above that your web server should be something you <i>can</i> import, and the docs for these show examples of running a WSGI application in a manner that is slightly different. However, these (and some others) WSGI containers are well-behaved Python applications backed by well-behaved (and directly usable) Python packages. You can write your own script that imports the WSGI container and starts serving your application. When push comes to shove, you can treat the container like any other library, like real Python. There's no mystical loader machinery to work around. Want to know what twistd is doing when you tell it to run your app? It's <a href="http://twistedmatrix.com/trac/browser/tags/releases/twisted-15.0.0/twisted/scripts/twistd.py#L26">right here</a>, in Python.</div>
<div>
<br /></div>
<h3>
You will have to do a little bit of work, and you will have to understand what the web server is doing.</h3>
<div>
<br /></div>
<div>
And that's a really good thing. You <i>should</i> know what your web server is doing. Application developers may have to look at some documentation or code for a few minutes before properly initializing a WSGI container and serving it. Someone will have to take the time to write something slightly more sophisticated than <span style="font-family: Courier New, Courier, monospace;">app.run()</span> in your flask app, but it will only take a few lines and a few minutes to do so, and then you are developing on production infrastructure.</div>
<div>
<br /></div>
<div>
On the SRE side, there may be slightly more work as well. You might have to use monit or supervisord to run a process for each core. But this means you are explicitly in control of the process model of the web server. Rather than let declarative configuration options rigidly choose between a handful of ways to manage processes, you use a battle-tested tool you are comfortable with to precisely control the process model of the web application.</div>
<div>
<br /></div>
<div>
The entry-point into the application can be made to be the exact same whether I am running a development server on my laptop or behind a load-balancer in production. This will eliminate a whole class of unknowns.</div>
<div>
<br /></div>
<div>
This little bit of work is up-front and one-time only. As the saying goes, an ounce of prevention is worth a pound of cure.</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6953695723787151549.post-91305101440499891912015-04-09T18:04:00.000-07:002015-04-09T18:47:49.270-07:00Python, the web, and snake oilYears ago I built web applications in Python. The first one predated all of today's popular web frameworks. This was long before Flask or even Django. Pylons still didn't exist yet. We argued about Cheetah versus Mako templates. My team on <a href="https://usegalaxy.org/">my first python web app</a> actually implemented paste.httpserver in its current, threadpool'd incarnation (approximately 10 years ago).<br />
<br />
About six years ago I more or less walked away from Python. Not because I wanted to, but because Google required me to write C++, and I was happy enough to do so. I did write a tiny bit of Python from time to time, but my bread and butter for several years was C++. After Google, I found myself dabbling in a bit of C, Java, Go and Ruby.<br />
<br />
Now I'm back working day-to-day with Python. I just had my first experience in almost 6 years with web application deployment, and all I can say is, how did it end up like this? Who thought this was a good idea?<br />
<br />
What am I talking about? I'm talking mostly about Gunicorn and uWSGI. Having deployed dozens of web apps a decade ago, I knew then that mod_python and mod_wsgi were a bad idea. Gunicorn and uWSGI are the natural result of spelunking deeper into that same (or very similar) rabbit hole.<br />
<br />
Now, what has been the driving force behind these monoliths? Why have people chosen the sweat, blood and tears of deploying an application on an application server despite the gotchas, the errors, the hundreds of configuration options?<br />
<br />
<h4>
THE NEED FOR SPEED!</h4>
<br />
There exists a depressingly huge segment of the population that makes decisions in the following manner:<br />
<br />
1. Need some unit of machine instruction to accomplish task.<br />
2. Google for unit of machine instruction that solves task.<br />
3. Find performance comparison of many such units.<br />
4. Pick fastest unit.<br />
<br />
You wrote an application in Python. It's not going to be fast. C is fast. Java is fast. C++ is fast. Go is pretty darn quick. Python is not fast. Think about why you are using Python, this is <i>extremely important</i>.<br />
<br />
<h4>
Because it's productive.</h4>
<br />
Performance still matters, but in choosing Python you made the decision that productivity is a higher priority than performance. When push comes to shove, you're actively, consciously sacrificing performance for productivity. You can buy more performance, but buying more productivity is markedly harder. And that's probably a really sensible decision. You should stick by it and be proud of it.<br />
<br />
So why are people using uWSGI, Gunicorn, mod_wsgi and so on? Because it's snake oil. Because pretty graphs proved to you that it was twice as fast. Because pretty graphs showed it could handle three times as many concurrent users.<br />
<br />
But these numbers were derived in one of two ways. Either from an application that is little more than <span style="font-family: Courier New, Courier, monospace;">return "hello world"</span>, or on some crazy harebrained, super high-volume application at some company that had the developer resources on hand to develop something like a Tornado web app (and all of the corresponding infrastructure, since you won't be using full-blown SQLAlchemy in such an app). Allow me to let you in on a dirty little secret:<br />
<br />
<h4>
The amount of time your application spends executing application code is going to be drastically higher, as in orders of magnitude higher, than the time spent by the server writing bytes to a wire.</h4>
<br />
Here's a tidbit about every single performance comparison I've seen around paste.httpserver: they all use the defaults for paste.httpserver and a few others, and they all carefully configure the ones that demand it (mod_wsgi for instance). For example <a href="http://nichol.as/benchmark-of-python-web-servers">this one here</a>. Had paste.httpserver been setup with multiple processes and given enough threads to match mod_wsgi in memory consumption in that article, well, you can go ahead and guess at the results; it would have been impressive. I suppose people don't realize paste's default threadpool size is 10 threads. And paste forces you to offload process supervision to an actual process supervisor (which is probably a good thing). And you're responsible for spawning a process for each CPU. But do that, and set it up comparable to your finely-tuned application server, and you will be blown away at how your real-world application performs on 10 year-old technology.<br />
<br />
Here's another fun thing to think about regarding performance comparisons. If you're slamming a real application with 3000 requests per second, what's it doing to a database and other services?<br />
<br />
But you know how to setup uWSGI/Gunicorn/mod_wsgi/whatever, and you feel why not? Surely this performance boon is practically free, so you might as well take it.<br />
<br />
Well, what do these application servers do? Presumably they run a python interpreter and call your <span style="font-family: Courier New, Courier, monospace;">wsgi_app(env, start_response)</span> function. They understand Python enough to execute it and do some voodoo magic to turn that wsgi response into bytes on a wire.<br />
<br />
And that's where the similarities to a real, sane Python interpreter end. Abruptly. Full stop. The environment your Python code runs in would be hard pressed to look any more different from a real Python interpreter.<br />
<br />
<h4>
If it works correctly in pure Python, then it should work in production.</h4>
<br />
The very core belief that led me to write this article is this: if the application works in development on your machine, it should work in production and every spot (staging, QA, etc) in between. Couple that with the fact that you most likely cannot control everything going on in your application (dependencies of third party libraries/systems), and you can end up in a situation where your pure Python web application code works perfectly on your system while failing, freezing or crashing on a production system.<br />
<br />
But oh, there are workarounds. Workarounds abound. Are you opening some resource as a side effect of importing a module (hey there 90% of "settings" modules I've seen in the wild)? Then go ahead and make sure your application server is forking before loading. But bear in mind that's going to hurt performance and consume more memory which is why you went down this path. Are you using threading? Be careful, you might need to make sure your application server isn't using sub-interpreters. Are you using C extensions? Again with the sub-interpreters (which by the way are the default for mod_wsgi and uWSGI, at least). Most deployments I see these days are strictly forking (and probably load-after-fork or heavily decorated to do as such) with threading disabled. Are you not using sub-interpreters? Be careful about global namespace pollution.<br />
<br />
As you can see, you can quickly find yourself in a situation where the environment in production is the wild wild west of Python interpreter environments, and is really nothing like a Python interpreter launched from the command-line. Furthermore, you can find yourself in a catch-22. E.g. your application won't even start in a sub-interpreter environment, but without it you are polluting some global namespace and getting odd crashes (or worse). Or you spent weeks developing, and when you deploy on an application server it manages to segfault uWSGI without so much as a whisper in a log.<br />
<br />
These are real world examples. These are things I have seen with my own two eyes.<br />
<br />
So what am I advocating? To be honest, I'm not even completely sure. Years ago we used paste.httpserver processes managed by supervisord and reverse proxied by lighttpd (nginx didn't exist yet or only had documentation in Russian). Without a sub-interpreter, without disabling or enabling strange harebrained options, without peppering the application with strange decorators tightly coupled to the application server, without fighting to get an application that <i>already works</i> to...<i>work</i>.<br />
<br />
After that application (this was circa 2005), I preferred multiple supervisord managed, threaded fastcgi processes reverse proxied behind nginx. It was efficient, easy to setup, and robust. I did some performance testing with real applications and found absolutely negligible performance gained by backflipping through mod_wsgi. Later I started mulling over the idea of just serving http over a socket, and I'd bet that's almost as efficient with the added benefit of being able to tinker on the actual processes themselves (which is nice for operations).<br />
<br />
Look, maximizing the performance of an application server is not magic. Sure, some string manipulation happens in C so the very marginal part of your app where some bytes in ram get put into HTTP format and shoved in a buffer is faster, but Python is probably good enough at that, after all your application is written in it.<br />
<br />
Perhaps I don't have anything concrete to advocate. It just seems to me that a Python application should be run by a Python interpreter, not some strange process-managing server that mangles the interpreter to the point that very basic, core functionality becomes impossible. And if this means pure Python putting bytes on a wire, it seems worth the tradeoff for a consistent environment, a distinct lack of show-stopping bugs, and frankly simpler operations.Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-6953695723787151549.post-82892688697704220802015-02-19T21:39:00.003-08:002015-02-19T23:49:53.494-08:00Thrift and Go GenerateOh serialization wars. Every new team at some point has to standardize on a serialization system, and I've yet to see this decision end with smiles all around. I am personally biased towards protocol buffers but recently find myself living with <a href="https://thrift.apache.org/">Thrift</a>.<br />
<br />
I use the <a href="http://pantsbuild.github.io/">pants</a> build system, which is great for Python, Java and Scala. However, I also use a lot of Go, which doesn't have support from pants. Go's build system and tooling is very complete, allowing me to achieve the same ends without writing pants BUILD files and targets. For Thrift, the recent addition of <span style="font-family: Courier New, Courier, monospace;">go generate</span> allows me to generate my thrift bindings from the go toolchain.<br />
<br />
Go generate is a new tool in go1.4 that allows a comment directive to run external commands. The <a href="http://blog.golang.org/generate">documentation</a> and <a href="http://golang.org/s/go1.4-generate">design doc</a> are fairly complete (caveat: the <span style="font-family: Courier New, Courier, monospace;">-run</span> flag is not yet implemented!). And it's really quite simple. Just drop a comment into any go file in your package:<br />
<br />
<pre style="background-image: URL(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9NTPq-fzzmciDCYpcAxrlI8VBAAiNi4bUVOisx2pLYfrxKRELpVigHIllT-08GVOJE8_1NdXmaRmC3UgCf7US8FMNg6sZKZUYO7CZpTlCGuVsOZhe1gqbNYPFnbuWlQ8iQdlwSJENhCrs/s320/codebg.gif); background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> //go:generate thrift -r --gen go --out . foo.thrift
</code></pre>
<br />
That's it. <span style="font-family: Courier New, Courier, monospace;">go generate</span> runs thrift and generates the go bindings.<br />
<br />
Since we check in our entire workspace (including vendored dependencies, I should talk about why this is awesome later), we use thrift namespaces heavily and things start to look a bit different. My <span style="font-family: Courier New, Courier, monospace;">go:generate</span> directive looks a bit different:<br />
<br />
<pre style="background-image: URL(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9NTPq-fzzmciDCYpcAxrlI8VBAAiNi4bUVOisx2pLYfrxKRELpVigHIllT-08GVOJE8_1NdXmaRmC3UgCf7US8FMNg6sZKZUYO7CZpTlCGuVsOZhe1gqbNYPFnbuWlQ8iQdlwSJENhCrs/s320/codebg.gif); background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> //go:generate thrift -r --gen go -o $GOPATH/src/ $(git rev-list --show-toplevel )/thrift/foo.thrift
</code></pre>
<br />
We live with the gen-go import prefix:<br />
<br />
<pre style="background-image: URL(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9NTPq-fzzmciDCYpcAxrlI8VBAAiNi4bUVOisx2pLYfrxKRELpVigHIllT-08GVOJE8_1NdXmaRmC3UgCf7US8FMNg6sZKZUYO7CZpTlCGuVsOZhe1gqbNYPFnbuWlQ8iQdlwSJENhCrs/s320/codebg.gif); background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> import (
"gen-go/foo"
)
</code></pre>
<br />
The result is the ability to generate all of our generate thrift-go code via <span style="font-family: Courier New, Courier, monospace;">go generate</span>:<br />
<br />
<pre style="background-image: URL(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9NTPq-fzzmciDCYpcAxrlI8VBAAiNi4bUVOisx2pLYfrxKRELpVigHIllT-08GVOJE8_1NdXmaRmC3UgCf7US8FMNg6sZKZUYO7CZpTlCGuVsOZhe1gqbNYPFnbuWlQ8iQdlwSJENhCrs/s320/codebg.gif); background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> $ go generate all
</code></pre>
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6953695723787151549.post-77350130013819718992014-05-28T21:01:00.000-07:002014-06-24T09:55:43.620-07:00Engineering PrinciplesEngineering principles, software best practices, methodologies, process, etc. We're constantly inundated with golden rules to code by which amazingly are almost always boiled down into short sentences or even three letter acronyms (e.g. <a href="http://en.wikipedia.org/wiki/Not_invented_here">NIH</a>, <a href="http://en.wikipedia.org/wiki/Don't_repeat_yourself">DRY</a>). Most seem content to believe that the sum and total of engineering can be distilled down to a distinct, infallible ruleset (and the reduction to acronyms might even hint that it could even be sub symbolic).<br />
<br />
Especially frustrating is the moment when, in the middle of an honest discussion, some well-educated blog reader (much like you and I) quips one of these engineering principles with the fortitude and gusto to imply that not only is the discussion ended, but the answer has been provided by the holy commandments of best practice. Even worse is when a disagreement is not defeated on its merits, but paralyzed with the pinpoint insertion of best practice between the argument's metaphorical cervical vertebrae.<br />
<br />
After all, you aren't the kind of jackass to encourage an epidemic of NIH, are you?<br />
<br />
I certainly am. I also frequently reject code reuse, disregard the stability of my interfaces, ignore dependency management, and repeat myself. I learned these filthy habits from some of the best software engineers in the world. Allow me to explain.<br />
<br />
<h3>
NIH versus Limiting External Dependencies</h3>
<div>
To adhere absolutely to either of these principles is an impossible contradiction. The refusal to pass up anything that is available that works for your case, no matter how small, is to accept any dependency that might fit the bill. Even if you are using only the tiniest, most trivial slice of that pie, pies happen to be baked whole. This is like inviting folks with children to your barbecue and assuming the children won't track dirt into the house.</div>
<div>
<br /></div>
<div>
The children are going to drag dirt into the house, and you can't control them. Circular topologies in dependencies can now happen out of your control. You might wake up next monday with dependency conflicts.</div>
<div>
<br /></div>
<div>
At the same time, if it's a really good library, use the damn thing. Especially if it has managed to stay lean while being battle-tested (the limit of open source projects as they approach infinity is monolith).<br />
<br /></div>
<h3>
Interfaces Should be Stable</h3>
<div>
In theory, you should design your public interfaces once while in a stupor of enlightened engineering bliss. After that moment, they should remain constant for the remainder of forward-flowing time. 20,000 years from now robot archeologists will discover your brilliance, still churning away on one last node.</div>
<div>
<br /></div>
<div>
Except every single project EVER starts off with a huge disclaimer that the interfaces haven't stabilized yet, and you shouldn't rely on them until some fuzzy date roughly 20,000 years from now (may or may not be <a href="https://developer.valvesoftware.com/wiki/Valve_Time">Valve time</a>).</div>
<div>
<br /></div>
<div>
Don't get me wrong, at some point you need to freeze the interface. But let's examine why. Presumably your interface is going to be used by a whole heck of a bunch of people. It's going to be tweeted some day, retweeted, voted up to the top on HN and spread around as the glorious solution to that problem we all have (probably deployment related). At that point, you will have at least 13 million users of your beautifully crafted library. It's open source, so you won't make a dime from it, but before you go to sleep at night, a single tear will roll down your cheek as you smile to yourself knowing the world is a better place.</div>
<div>
<br /></div>
<div>
Except that's not going to happen. Because your library isn't open source; it's something nobody else will ever want, and your two users sit across from you and to your left. Your two coworkers are the only people using your library, and they're only calling into it from three places in the entire company's codebase. If your code lives in one repo, you can change the ENTIRE interface, and fix up those three places in a single commit. If your code is spread out, it's still at most four commits to change everything. The cost to changing the interface is practically zero. Furthermore, the advantage gained by this low overhead to change is substantial. You can move quickly. You can change. You can learn from how it's used by 100% of your user base (it takes but a few minutes to get feedback from BOTH coworkers), and adapt. You don't need to stuff yourself with <a href="http://en.wikipedia.org/wiki/Nootropic">nootropics</a> in search of some supernatural cognitive plateau the first time you give it a go.</div>
<div>
<br /></div>
<div>
And when you do end up in that situation where you have hit the open source lottery and actually have more than two users and want to change the interface long after the concrete has set? Create a new thing. The versioning of your interface can just as easily be embodied in the name itself. Luckily, there's no rule about that (yet). But really, that thing you built today because you had to get that darn thing working? That's going nowhere, ever.<br />
<br /></div>
<h3>
Dependency Management</h3>
<div>
Ah yes, one must manage dependencies very wisely. So wisely, in fact, that this is a problem that continues to plague (almost) every platform, language, ecosystem or distro. That moment when you kick off a build and gosh darn it, one of those kids tracked dirt all over your house. Damn that guy! Doesn't he understand the FIRST THING about dependency management? Ugh, everyone (except you) is an idiot!</div>
<div>
<br /></div>
<div>
Vendor them. Oh, but then you can't get updates! Well, sorry folks, if software engineering was easy we'd all be getting paid minimum wage. I strongly recommend you vendor (or the functional equivalent) your dependencies and move on with life. Dependency management is an impossibly hard problem. Sure, there are various tools out there to help you, but those tools are the product of a labor of sweat, blood and tears. Even so, you're probably going to run into a problem at some point.</div>
<div>
<br /></div>
<div>
This rule is also the bastard father of Interfaces Should be Stable. The assumption here is that if everyone simply followed the rules, none of this would be a problem, at all. Good luck with that. It's unrealistic and impractical.</div>
<div>
<br />
The best way to solve a problem is to eliminate it. Sure, a tool might exist that "fixes" the problem, but now you've introduced a tool, and tools are built the same way our now-broken system was built, with code by a human. Humans never do anything completely right.<br />
<br /></div>
<h3>
Don't Repeat Yourself</h3>
<div>
This rule is actually pretty good, except when it's completely misunderstood which is way too often. But even when it's understood, it can lead to some ridiculous abstractions. In addition to the engineering atrocities committed in the name of DRY, if someone discovers you ditched DRY for something as stupid as simplicity, they'll start hurling insults at your code like WET("we enjoy typing" or "write everything twice").<br />
<br />
If being DRY requires mind-bending backflips, abstractions, extensive use of the keyword `mutable`, and meta-programing, simply stop. Despite common misconception, code is not for computers. Code is for humans to read, which is then nicely translated into something a computer can do. One of the most (if not the most) important functions of code is to be easily understood by your coworkers, who were ruthlessly dropped onto your project yesterday with orders to fix the mess you made no later than tomorrow.<br />
<br />
The other common misapplication of DRY seems to be this idea that code should be reusable, and therefore never repeated.<br />
<br />
<h3>
Code Reuse</h3>
</div>
<div>
Oh the abominations that have been created in the name of code reuse. Again, the tendency of any project is towards a monolithic, one-size-fits-all creation.</div>
<div>
<br /></div>
<div>
When you're shopping for a replacement part for an actual, real-world widget, and you have the option between the one-size-fits-all and the actual replacement part, what has painful experience taught us? The one-size-fits-all will probably work, but it won't work great. It will most likely fit awkwardly. It's nearly impossible to think about code reuse without assuming the one-size-fits-all mentality. In addition, there is probably zero chance anyone (including your teammates) are ever going to reuse your code.</div>
<div>
<br /></div>
<div>
If you are writing infrastructure code, or something akin to standard library code, then by all means make the code reusable. For the remaining 99.999% of us, please stop with the overuse of generics and parameterization and over-engineering.</div>
<div>
<br /></div>
<div>
First and foremost, your code should do what it was supposed to do in the first place, clearly, concisely and as simply as possible. If that isn't reusable, so what? And if someone wishes your prefix trie could also send emails, kindly tell them where to shove the pull request. Isn't it interesting how new projects are so exciting, so elegant, so simple while the old ones are an overwhelming, burdensome array of factory factories and page-long configuration objects? Do one thing and do it well. Keep it simple.</div>
<div>
<br /></div>
<h3>
Simplicity</h3>
<div>
This one is for real. Simplicity is hard, but it's important. In fact, it's probably the most important. Simple things are easy to understand, fix, maintain and change. Unfortunately, simplicity is very hard to define in concrete terms, measure or enforce. The truth is, engineering can't be boiled down into convenient limericks and acronyms. Engineering is really, really hard. As pointed out, many engineering best practices are completely at odds with each other or at odds with reality. The best thing, in my unimportant opinion, is to constantly consider, "is the next poor soul to work on this going to understand it?" That is absolutely important, and if you need to break a few rules to get there, then by all means. Software engineering principles are like pirate law, they're loose guidelines, and they should never be treated otherwise.</div>
<script async="" charset="utf-8" src="//platform.twitter.com/widgets.js"></script>
Unknownnoreply@blogger.com1