Fairly recently, I started reading up on quantum mechanics (QM) to brush up my understanding of the topic and, quite surprisingly, I’ve found it ripe with analogies to my typical interests: software development. The one that stands out particularly well relates to the very basics of QM and the way they were widely misunderstood for many decades. What’s really amusing here is that while majority of physicists seem to have been easily fooled by how the world operates on quantum level, any contemporary half-decent software engineer, faced with problems of very similar nature, typically doesn’t exhibit folly of this magnitude.
We are not uncovering the Grand Scheme of Things every day, of course; what I’m saying is that we seem to be much less likely to come up with certain extremely bad answers to all the why? questions we encounter constantly in our work. Even the really hard ones (“Why-oh-why it doesn’t work?!”) are rarely different in this regard.
Thus I dare to say that we would not be so easily tricked by some “bizarre” phenomena that have fooled many of the early QM researchers. In fact, they turn out to be perfectly reasonable (and rather simple) if we look at them with programmer’s mindset. The hard part, of course, is to discover that such a perspective applies here, instead of quickly jumping to “intuitive” but wrong conclusions.
To see how tempting that jump can be, we should now look at one simple experiment with light and mirrors, and try to decipher its puzzling results.
The setup is not very complicated. We have one light source, two detectors and two pairs of mirrors. One pair consists of standard, fully reflective mirrors. Second pair has half-silvered ones; they reflect only half of the light, letting the other half through without changing its direction.
We arrange this equipment as shown in the following picture. Here, the yellow lines depict path the light is taking after being emitted from the source, somewhere beyond the left edge.
Source of this and subsequent images
But in this experiment, we are not letting out a continuous ray of light. Instead, we send out individual photons. We know (from some previous observations) that half-silvered mirrors are still behaving correctly in this scenario: they just reflect a photon about 50% of the time. Normal mirrors, obviously, are always reflecting all the photons.
Knowing this, we would expect both detectors to go off with roughly similar frequency. What we find out in practice is that only detector 2 is ever registering any photons, and no particle whatsoever reaches detector 1, at any time. (This is illustrated by a dashed line).
At this point we might want to perform a sanity check, to see whether we are really dealing with individual particles (rather than waves that can interfere and thus cancel themselves out). So, we block out one of the paths:
and now both detectors are going off, but not simultaneously. This indicates that our photons are indeed localized particles, as they appear to be only in one place at a time. Yet, for some weird inexplicable reason, they don’t show at detector 1 if we remove the barrier.
There are all sorts of peculiar conclusions we could come up with already, including the mere possibility of photon going both ways to have an effect on results we observe. Let’s try not to be crazy just yet, though. Surely we can establish which one of the two paths is actually being taken; it’s just a matter of putting an additional sensor:
So we do just that, and we turn on the machinery again. What we find out, however, is far from definite answer. Actually, it’s totally opposite: both detectors are going off now, just like in the previous setup – but we haven’t blocked anything this time! We just wanted to take a sneak peak and learn about the actual paths that our photons are taking.
But as it turns out, we are now preventing the phenomenon from occurring at all… What the hell?!
Those cookies that we all know and love (if we’re web developers) or hate (if we’re overly paranoid about privacy) are not the only thing in computing to be known under this name, however. I learned this quite recently when talking to a friend of mine who is working in the realm of IT security. As it turns out, ‘cookie’ can refer to quite diverse array of different solutions, all unified through similar underlying concept.
It’s pretty much assumed that if you’re writing Python, you are not really concerned with the performance and speed of your code, provided it gets the job done in sufficiently timely manner. The benefits of using such a high level language usually outweigh the cons, so we’re at ease with sacrificing some of the speed in exchange for other qualities. The feasibility of this trade is always relative, though, and depends entirely on the tasks at hand. Sometimes the ‘sufficiently fast’ bar might be hung quite high up.
But while some attitudes are clearly beyond Python’s reach – like real-time software on embedded systems – it doesn’t mean it’s impossible to write efficient code. More importantly, it’s almost always possible to write more efficient code than we currently have; the nimble domain of optimization has its subdivision dedicated specifically to Python. And quite surprisingly, performance-tuning at this high level of abstraction often proves to be even more challenging than squeezing nanoseconds out of bare metal.
So today, we’re going to look at some basic principles of optimization and good practices targeted at writing efficient Python code.
Before jumping into specific advice, it’s essential to briefly mention few standard modules that are indispensable when doing any kind of optimization work.
The first one is timeit, a simple utility for measuring execution time of snippets of Python code. Using
timeit is often one of the easiest way to confirm (or refute) our suspicions about insufficient performance of particular piece of code. timeit helps us in straightforward way: by executing the statement in questions many times and showing average, as well as cumulative, time it has taken.
As for more detailed analysis, the profile and cProfile modules can be used to gain insight on CPU time consumed by different parts of our code. Profiling a statement will yield us some vital data about number of times that any particular function was called, how much time a single call takes on average and how big is the function’s impact on overall execution time. These are the essential information for identifying bottlenecks and therefore ensuring that our optimizations are correctly targeted.
Then there is the dis module: Python’s disassembler. This nifty tool allows us to inspect instructions that the interpreter is executing, with handy names translated from actual binary bytecode. Compared to actual assemblers or even code executed on Java VM, Python’s bytecode is pretty trivial to analyze:
Getting familiar with it proves to be very useful, though, as eliminating slow instructions in favor of more efficient ones is a fundamental (and effective) optimization technique.
There is a specific technology I wanted to play around with for some time now; it’s called node.js. It also happens that I think the best way to get to know new stuff is to create something small, but complete and functional. Note that by ‘functional’ I don’t really mean ‘practical’; that distinction is pretty important, given what I’m about to present here.
Basically, I wrote a package manager for jQuery. The idea was to have a straightforward way to install jQuery plugins – a way that somewhat mirrors the experience of dozens of other package managers, from
cabal. End result looks pretty decent, in my opinion:
The funny part? It doesn’t use any central, remote registry of plugins. What it does is searching GitHub and pulling code directly from there – provided it is able to find something relevant that looks like jQuery plugin. That seems to work well for quite a few popular ones, which is rather surprising given how silly and simplistic the underlying algorithm is. Certainly, there’s plenty of room for improvement, including support for jquery.json manifests – the future standard for the upcoming official plugin site.
As I said before, though, the main purpose of jqpm was educational one. After toying with underlying technologies for a couple of evenings, I definitely have better perspective to evaluate their usefulness. While the topic might warrant a follow-up posts in the future, I think I can briefly summarize my findings in few bullet points:
functions and loops, and without denser syntactic sugar such as list comprehensions.
require()calls, it makes for an unusual system that resembles classic C/C++
#includes – but improved, of course. What stands out the most is the lack of virtualenv/rvm-style utilities; instead, an equivalent approach of local
node_modulessubfolders is used instead. (
npm help foldersprovide more elaborate explanation on how does it work exactly).
The bottom line: node.js is definitely not a cancer and has many legitimate uses, mostly pertaining to rapid transfer of relatively small pieces of data over the Internet. API backends, single page web applications or certain game servers all fall easily into this category.
From developer’s point of view, it’s also quite fun platform to code in, despite the asynchronous PITA mentioned above (which is partially alleviated by libraries like async.js or frameworks providing futures/promises). On the overall abstraction ladder, I think it can be placed noticeably lower than Java and not very much higher than plain C. That place is an interesting one, and it’s also not densely populated by any similar technologies and languages (only Go and Objective-C come to mind). Occupying this mostly overlooked niche could very well be one of reasons for Node’s recent popularity.