It is quite likely you are familiar with the Wat talk by Gary Bernhardt. It is sweeping through the Internet, giving some good laugh to pretty much anyone who watches it. Surely it did to me!
The speaker is making fun of Ruby and JavaScript languages (although mostly the latter, really), showing totally unexpected and baffling results of some seemingly trivial operations – like adding two arrays. It turns out that in JavaScript, the result is an empty string. (And the reasons for that provoke even bigger “wat”).
After watching the talk for about five times (it hardly gets old), I started to wonder whether it is only those two languages that exhibit similarly confusing behavior… The answer is of course “No”, and that should be glaringly obvious to anyone who knows at least a bit of C++ ;) But beating on that horse would be way too easy, so I’d rather try something more ambitious.
Hence I ventured forth to search for “wat” in Python 2.x. The journey wasn’t short enough to stop at mere addition operator but nevertheless – and despite me being nowhere near Python expert – I managed to find some candidates rather quickly.
I strove to keep with the original spirit of Gary’s talk, so I only included those quirks that can be easily shown in interactive interpreter. The final result consists of three of them, arranged in the order of increasing puzzlement. They are given without explanation or rationale, hopefully to encourage some thought beyond amusement :)
Behold, then, the Wat of Python!
Python dictionaries have an inconspicuous method named setdefault
. Asking for its description, we’ll be presented with rather terse interpretation:
While it might not be immediately obvious what we could use this method for, we can actually find quite a lot of applications if we only pay a little attention. The main advantage of setdefault
seems to lie in elimination of if
s; a feat we can feel quite smug about. As an example, consider a function which groups list of key-value pairs (with possible duplicate keys) into a dictionary of lists:
This is something which we could do when parsing query string of an URL, if there weren’t a readily available API for that:
So, with setdefault
we can get the same job done more succinctly. It would seem that it is a nifty little function, and something we should keep in mind as part of our toolbox. Let’s remember it and move on, shall we?…
Actually, no. setdefault
is really not a good piece of dict
‘s interface, and the main reason we should remember about it is mostly for its caveats. Indeed, there are quite a few of them, enough to shrink the space of possible applications to rather tiny size. As a result, we should be cautious whenever we see (or write) setdefault
in our own code.
Here’s why.
Unless you were living under an enormous, media-impervious rock for at least few weeks, you must have heard of the recent ruckus about laws concerned with the so-called intellectual property. It was mostly due to infamous SOPA and PIPA bills that were to pass in the US, and the temporary “blackout” of various big websites whose owners voiced their protest against those bills. But we also have something of a local spin-off, relevant to the EU and specifically Poland: an outcry against ACTA, a similar albeit slightly less ridiculous piece of law, which (as of this writing) is due to be signed in a few days.
Even though we do not really know yet how those issues are going to be resolved, there are already few important lessons to be learned here. And I think that the biggest one is not actually concerned with the merit of laws in question. Instead, it draws attention to the problem of communication between IT industry and general public.
It never was, really. If anything, recent events served as a good wake-up call, reminding that the Internet and all related technological infrastructure is something that we take for granted way too often. While I might exaggerate a bit, I don’t think it’s very far-fetched to say that for average person, the Internet is pretty much magic. You pop up “the Internet app”, type whatever you are looking for, and few moments later, voilà: you just got it, by the sheer magic of intertubes. Assuming, of course, that there is actually something specific you have in mind; otherwise, you can always look at what your friends have “shared”, and from there start your clicky journey through the nether. It’s awesome, virtually boundless, and it just works… right?
So far, consequences of such technical ignorance were also mostly technical, surfacing as security issues, loss of data, malware spread and so on. But that’s alright! We’re dabbling into arcane and invoking supernatural powers, so it’s no wonder we sometimes accidentally summon few annoying daemons. Should that happen, we can always call an exorcist in a form of friendly geek-next-door, or (at worst) tech support.
Now, however, failure to grasp the fundamental nature of the Internet can result in much more dramatic backlash. See, this technical stuff underneath is not just a plumbing that can be safely ignored. The foundational, idealistic principles of the ‘net – decentralization, freedom, knowledge, progress, innovation, flexibility, and so on – are woven into its very fabric. It is by exposure to those “boring details” that those ideas may influence and reshape minds, helping to do away with flawed and outdated notions – including, for example, the 19th century’s concept of intellectual property. I cannot even fathom how someone with reasonable understanding of how software, Internet and IT work could conceive something as outrageous as those infamous IP laws. It just doesn’t compute.
Yet it was conceived, put into words, formed into a legal document and officially proposed – more than once, in fact. Obviously, such things don’t happen by itself, especially when powerful interest groups are at play. And that’s precisely the reason why we have various public institutions, from parliaments to international organizations: to weed out blatantly bad solutions, and sometimes even let the good ones pass through.
At least, that’s the theory. It’s naive to postulate virtues among politicians (i.e. those willingly aspiring for power), so we make them cling to one value they’ll always embrace, for it is needed to maintain a ruling position: popularity. This should roughly translate to caring for the same things the voters care for. Roughly, because there will be always some disconnection due to effects of scale, perception, biases and multitudes of other factors.
Still, this is pretty much how it works. In order to push an agenda we are vitally interested in (or obstruct one we are strongly against), we need to gain enough publicity. We need to make people share our values and consider as utility those things that we consider as such. That’s how we set the appropriate casual chain in motion, eventually leading to fulfillment of our goals.
And this is where the IT industry has failed miserably. It’s the reason why we’re frantically looking for support, rolling out our biggest cannons, hitting the news with blackouts of absolutely crucial Internet services and DDoS attacks on government sites. We have failed in making the society share our values in elegant, gradual and systematic manner, so now we need to condone a shock therapy to compensate for this negligence. Actually, in some cases we have been actively making things worse, professing the exact opposites of those values mentioned before: centralization, limitation, monocultures, stagnation and rehashing of old concepts.
Now we are just reaping what we have sown.
As the Python docs state, with
statement is used to encapsulate common patterns of try
–except
–finally
constructs. It is most often associated with managing external resources in order to ensure that they are properly freed, released, closed, disconnected from, or otherwise cleaned up. While at times it is sufficient to just write the finally
block directly, repeated occurrences ask for using this language goodness more consciously, including writing our own context managers for specialized needs.
Those managers – basically a with
-enabled, helper objects – are strikingly similar to small local objects involved in the RAII technique from C++. The acronym expands to Resource Acquisition Is Initialization, further emphasizing the resource management part of this pattern. But let us not be constrained by that notion: the usage space of with
is much wider.
On contemporary websites and web applications, it is extremely common task to display a list of items on page. In any reasonable framework and/or templating engine, this can be accomplished rather trivially. Here’s how it could look like in Jinja or Django templates:
But it’s usually not too long before our list grows big and it becomes unfeasible to render it all on the server. Or maybe, albeit less likely, we want it to be updated in real-time, without having to reload the whole page. Either case requires to incorporate some JavaScript code, talking to the server and obtaining next portion of data whenever it’s needed.
Obviously, that data has to be rendered as well, and there is one option of doing it on the server side, serving actual HTML directly to JS. An arguably better solution is to respond with JSON or similar representation of our items. This is conceptually simpler, feels less messy and is potentially reusable as a part of website’s external API. There is just one drawback: it forces rendering to be done also in JavaScript.
When thinking about concurrent programs, we are sometimes blinded by the notion of bare threads. We create them, start them, join them, and sometimes even interrupt
them, all by operating directly on those tiny little abstractions over several paths of simultaneous execution. At the same time we might be extremely reluctant to directly use synchronization primitives (semaphores, mutexes, etc.), preferring more convenient and tailored solutions – such as thread-safe containers. And this is great, because synchronization is probably the most difficult aspect of concurrent programming. Any place where we can avoid it is therefore one less place to be infested by nasty bugs it could ensue.
So why we still cling to somewhat low-level Thread
s for actual execution, while having no problems with specialized solutions for concurrent data exchange and synchronization?… Well, we can be simply unaware that “getting code to execute in parallel” is also something that can benefit from safety and clarity of more targeted approach. In Java, one such approach is oh-so-object-orientedly called executors.
As we might expect, an Executor
is something that executes, i.e. runs, code. Pieces of those code are given it in a form of Runnable
s, just like it would happen for regular Thread
s:
Executor
itself is an abstract class, so it could be used without any knowledge about queuing policy, scheduling algorithms and any other details of the way it conducts execution of tasks. While this seems feasible in some real cases – such as servicing incoming network requests – executors are useful mainly because they are quite diverse in kind. Their complex and powerful variants are also relatively easy to use.
Simple functions for creating different types of executors are contained within the auxiliary Executors
class. Behind the scenes, most of them have a thread pool which they pull threads from when they are needed to process tasks. This pool may be of fixed or variable size, and can reuse a thread for more than one task,
Depending on how much load we expect and how many threads can we afford to create, the choice is usually between newCachedThreadPool
and newFixedThreadPool
. There is also peculiar (but useful) newSingleThreadExecutor
, as well as time-based newScheduledThreadPool
and newSingleThreadScheduledExecutor
, allowing to specify delay for our Runnable
s by passing them to schedule
method instead of execute
.
There is one case where the abstract nature of base Executor
class comes handy: testing and performance tuning. A certain types of executors can serve as good approximation of some common concurrency scenarios.
Suppose that we are normally handling our tasks using a pool with fixed number of threads, but we are not sure whether it’s actually the most optimal number. If our tasks appear to be mostly I/O-bound, it could be good idea to increase the thread count, seeing that threads waiting for I/O operations simply lay dormant for most of the time.
To see if our assumptions have grounds, and how big the increase can be, we can temporarily switch to cached thread pool. By experimenting with different levels of throughput and observing the average execution time along with numbers of threads used by application, we can get a sense of optimal number of threads for our fixed pool.
Similarly, we can adjust and possibly decrease this number for tasks that appear to be mostly CPU-bound.
Finally, it might be also sensible to use the single-threaded executor as a sort of “sanity check” for our complicated, parallel program. What we are checking this way is both correctness and performance, in rather simple and straightforward way.
For starters, our program should still compute correct results. Failing to do so serves as indication that seemingly correct behavior in multi-threaded setting may actually be an accidental side effect of unspotted hazards. In other words, threads might “align just right” if there is more than one running, and this would hide some insidious race conditions which we failed to account for.
As for performance, we should expect the single-thread code to run for longer time than its multi-thread variant. This is somewhat obvious observation that we might carelessly take for granted and thus never verify explicitly – and that’s a mistake. Indeed, it’s not unheard of to have parallelized algorithms which are actually slower than their serial counterparts. Throwing some threads is not a magic bullet, unfortunately: concurrency is still hard.
An extremely common programming exercise – popping up usually as an interview question – is to write a function that turns all characters in a string into uppercase. As you may know or suspect, such task is not really about the problem stated explicitly. Instead, it’s meant to probe if you can program at all, and whether you remember about handling special subsets of input data. That’s right: the actual problem is almost insignificant; it’s all about the necessary plumbing. Without a need for it, the task becomes awfully easy, especially in certain kind of languages:
This simplicity may be a cause of misconception that the whole problem of letter case is similarly trivial. Actually, I would not be surprised if the notion of having any sort of real ‘problem’ here is baffling to some. After all, every self-respecting language has those toLowerCase
/toUpperCase
functions built-in, right?…
Sure it has. But even assuming they work correctly, they are usually the only case-related transformations available out of the box. As it turns out, it’s hardly uncommon to need something way more sophisticated.