When working with dictionaries in Python, or any equivalent data structure in some other language, it is quite important to remember the difference between a key which is not present and a key that maps to
null) value. We often tend to blur the distinction by using
We do that because more often that not,
None and other falsy values (such as empty strings) are not interesting on their own, so we may as well lump them together with the “no value at all” case.
There are some situations, however, where these variants shall be treated separately. One of them is building a dictionary of keyword arguments that are subsequently ‘unpacked’ through the
**kwargs construct. Consider, for example, this code:
With a key mapping to
None, we’re calling the function with argument explicitly set to
None. Without the key present, we’re not passing the argument at all, allowing it to assume its default value.
But adding or not adding a key to dictionary is somewhat more cumbersome than mapping it to some value or
None. The latter can be done with conditional expression (
x if cond else None), together with many other keys and value at once. The former requires an
if statement, as shown above.
Would it be convenient if we had a special “
missing” value that could be used like
None, but caused the key to not be added to dictionary at all? If we had it, we could (for example) rewrite parts of the previous function that currently contain
It shouldn’t be surprising that we could totally introduce such a value and extend
dict to support this functionality – after all, it’s Python we’re talking about :) Patching the
dict class itself is of course impossible, but we can inherit it and come up with something like the following piece:
missing object is only a marker here, used to filter out keys that we want to ignore.
With this class at hand, some dictionary manipulations become a bit shorter:
We could take this idea further and add support for
missing not only initialization, but also other dictionary operations – most notably the
__setitem__ assignments. This gist shows how it could be done.
At work I’ve got a colleague who displays unusual aptitude in coming up with amusing terminology for everyday (coding) things. Among those, the Cake Pattern is always certain to provoke few laughs.
Recently, though, I heard him mention a great common sense rule for any development decision, from the grand architecture down to single line of code. It can be phrased like this:
One hack is fine. But if you need another one on top of the first, it’s probably high time to really consider what you are doing.
For good measure, I’ll call it a Principle of Two Hacks, and I’m pretty convinced that it’s a very beneficial rule to apply in programming – especially when creating any non-trivial, not throw-away programs.
At first it may sound rather vague, however. It’s concept of a “hack” is not easily explicable, or put into indisputable definitions. But that’s what makes it powerful: we don’t need to invoke elaborate (and often controversial) notions of design patterns or abstraction to be able to discuss them.
At best, the ideas of accidental complexity or technical debt might be somewhat close to what developers typically deem as a hack. In practice, this is mostly an opaque intuition that stems from experience or skill, and it’s usually very hard to express in words. Yet, it’s always apparent whenever we encounter it, even though the exact sensation may vary considerably: from a dim feeling that something is maybe a bit off, up to severe intellectual nausea caused by looking at really bad code.
I also like how extremely pragmatic this principle is. Quick-and-dirty fixes making it into production code are just a fact of life, and we are not prohibited from letting them slip. What we are strongly advised here is to maintain integrity of the software we’re writing, trying not to stack one hack upon another.
But even that is not absolute, nonnegotiable gospel; there might still be valid reasons to loosen up its structure. The important part, however, is to notice when we’re doing something fishy and consciously decide whether or not it is a good idea. It is much better than just plunging forward with total disregard of sanity of future maintainers.
Ultimately, this principle is just subtly telling us to think, and that is never a bad advice.
While at work a few days ago, I had an interesting albeit weird problem which started with the following cryptic error message:
ERROR 1005: can’t create table `qtn_formdisplay_product` (errno: 150)
It was produced by a local MySQL server running on my development machine when I tried to rebuild test database to accommodate for some model changes happening in the codebase. As you might have noticed, it’s not terribly informative, with the errno number as the only useful tidbit. A cursory glance at top search result for this message said that the most probable cause was a malformed
FOREIGN KEY constraint inside the offending
CREATE TABLE query.
Upon reading this, I blinked several times; something here was definitely off. The query wasn’t of course written by hand – if it was, we could at least consider an actual mistake to be a problem here. But no, it came from ORM – and not just an ORM, but the best ORM known to mankind. While obviously nothing is perfect, I would think it’s extremely unlikely that I found a serious bug in a widely used library just by doing something as innocent as creating a table with foreign key. I’m not that good, after all ;)
Well, except that it could totally be such a bug. The before mentioned search results also pointed to MySQL issue tracker where it was suggested that the error might happen after trying to create foreign key constraint with duplicate name. Supposedly, this could “corrupt” the parent table and no new
FOREIGN KEYs could reference it anymore, yielding the errno 150 if we attempted to create one. While it could not explain the behavior I observed (the parent table was freshly created), it raised some doubts whether MySQL itself may be to blame here.
These were exacerbated when one of my colleagues tried out the same procedure, and it worked for him just fine. He turned out to use newer version of MySQL, though: 5.5 versus 5.1. This appeared to support the hypothesis about a possible bug in MySQL but it didn’t seem to help one bit to get the thing running on the older version.
However, it was an important clue that something relevant changed in between, that had an influence on the whole issue. It was not really any particular bugfix or new feature: it was a change of defaults.
function keyword is essential to becoming an effective JS coder.
So, let’s look into them one by one and see what the
function might really mean.
Any and all code is enclosed within an anonymous
function. It’s not even stored in a
variable; it’s just called immediately so its content is just executed, now.
window in case of web browsers) which is a fragile namespace, easily polluted by defining things directly at the script level.
So in the example above, the
function is used to confine script’s code and all the symbols it defines. But sometimes we obviously want to let some things through, while restricting access to some others – a concept known as encapsulation and exposing an interface.
What we get here is normal JS object but it should be thought of more like a module. It offers some public interface in the form of
getValue functions. But underneath, it also has some internal data stored within a closure: the
value variable. If you know few things about C or C++, you can easily see parallels with header files (.h, .hpp, …) which store declarations that are only implemented in the code files (.c, .cpp).
Or, alternatively, you may draw analogies to C# or Java with their public and private (or protected) members of a class. Incidentally, this leads us to another point…
Let’s assume that the
counter object from the example above is practical enough to be useful in more than one place (a tall order, I know). The DRY principle of course prohibits blatant duplication of code such as this, so we’d like to make the piece more reusable.
Here’s how we typically tackle this problem – still using only vanilla
Pretty straightforward, right? Instead of calling the function on a spot, we keep it around and use to create multiple objects. Hence the function becomes a constructor for them, while the whole mechanism is nothing else but a foundation for object-oriented programming.
I recently had a discussion with a co-worker about feasibility of using anonymous functions in Python. We happen to overuse them quite a bit and this is not something I’m particularly fond of. For me lambdas in Python are looking pretty weird and thus I prefer to use them sparingly. I wasn’t entirely sure why is it so – given that I’m quite a fan of functional programming paradigm – until I noticed a seemingly insignificant fact.
lambda keyword is long. With six letters, it is among the longer keywords in Python 2.x, tied with
global, and beaten only by
finally. Quite likely, this is what causes lambdas in Python to stand out and require additional mental resources to process (assuming we’re comfortable enough with the very idea of anonymous functions). The long
lambda keyword seems slightly out of place because, in general, Python keywords are short.
Or are they?… Thinking about this, I’ve got an idea of comparing the average length of keywords from different programming languages. I didn’t really anticipate what kind of information would be exposed by applying such a metric but it seemed like a fun exercise. And it surely was; also, the results might provoke a thought or two.
Here they are:
|Language||Keyword||Total chars||Chars / keyword|
The newest incarnation of C++ seems to be losing badly in this competition, followed by C#. On the other side of the spectrum, Go and Python seem to be deliberately designed to avoid keyword bloat as much as possible. Java is somewhere in between when it comes to sheer numbers of keywords but their average length is definitely on the long side. This could very well be one of the reasons for the perceived verbosity of the language.
For those interested, the exact data and code I used to obtain these statistics are in this gist.
If you haven’t heard about it, DreamPie is an awesome GUI application layered on top of standard Python shell. I use it for elaborate prototyping where its multi-line input box is a significant advance over raw, terminal UX of IPython.
However, up until recently I didn’t know how to make DreamPie cooperate with virtualenv. Because it’s a GUI program, I scoured its menu and all the preference windows, searching for any trace of option that would allow me to set the Python executable. Having failed, I was convinced that authors didn’t think about including it – which was rather surprising, though.
But hey, DreamPie is open source! So I went to look around its code to see whether I can easily enhance it with an ability to specify Python binary. It wasn’t too long before I stumbled into this vital fragment:
The conclusions we could draw from this anecdote are thereby as follows:
With this newfound knowledge about dreampie arguments, it wasn’t very hard to make it use current virtualenv:
Now I can simply type dp to get a DreamPie instance operating within current virtualenv but independent from terminal session. Very useful!
…for fun and profit!
I’m still kind of amazed of how malleable the Python language is. It’s no small feat to allow for messing with classes before they are created but it turns out to be pretty commonplace now. My latest frontier of pythonic hackery is import hooks and today I’d like to write something about them. I believe this will come handy for at least a few pythonistas because the topic seems to be rather scarcely covered on the ‘net.
As you can easily deduce, the name ‘import hook’ indicates something related to Python’s mechanism of imports. More specifically, import hooks are about injecting our custom logic directly into Python’s importing routines. Before delving into details, though, let’s revise how the imports are being handled by default.
As far as we are concerned, the process seems to be pretty simple. When the Python interpreter encounters an
import statement, it looks up the list of directories stored inside
sys.path. This list is populated at startup and usually contains entries inserted by external libraries or the operating system, as well as some standard directories (e.g. dist-packages). These directories are searched in order and in greedy fashion: if one of them contains the desired package/module, it’s picked immediately and the whole process stops right there.
Should we run out of places to look, an
ImportError is raised. Because this is an exception we can catch, it’s possible to try multiple imports before giving up:
While this is extremely ugly boilerplate, it serves to greatly increase portability of our application or package. Fortunately, there is only handful of worthwhile libraries that we may need to handle this way;
json is the most prominent example.
What I presented above as Python’s import flow is sufficient as description for most purposes but far from being complete. It omits few crucial places where we can tweak things to our needs.
First is the
__path__ attribute which can be defined in package’s __init__.py file. You can think of it as a local extension to
sys.path list that only works for submodules of this particular package. In other words, it contains directories that should be searched when a package’s submodule is being imported. By default it only has the __init__.py‘s directory but it can be extended to contain different paths as well.
A typical use case here is splitting single “logical” package between several “physical” packages, distributed separately – typically as different PyPI packets. For example, let’s say we have
foo package with
foo.client as subpackages. They are registered in PyPI as separate distributions (foo-server and foo-client, for instance) and user can have any or both of them installed at the same time. For this setup to work correctly, we need to modify
foo.__path__ so that it may point to
foo.server‘s directory and
foo.client‘s directory, depending on whether they are present or not. While this task sounds exceedingly complex, it is actually very easy thanks to the standard
pkgutil module. All we need to do is to put the following two lines into foo/__init__.py file:
Moving on, let’s focus on parts of import process that let you do the truly amazing things. Here I’m talking stuff like pulling modules directly from Zip files or remote repositories, or just creating them dynamically based on, say, WSDL description of Web services, symbols exported by DLLs, REST APIs, command line tools and their arguments… pretty much anything you can think of (and your imagination is likely better than mine). I’m also referring to “aggressive” interoperability between independent modules: when one package can adjust or expand its functionality when it detects that another one has been imported. Finally, I’m also talking about security-enhanced Python sandboxes that intercept import requests and can deny access to certain modules or alter their functionality on the fly.