While at work a few days ago, I had an interesting albeit weird problem which started with the following cryptic error message:
ERROR 1005: can’t create table `qtn_formdisplay_product` (errno: 150)
It was produced by a local MySQL server running on my development machine when I tried to rebuild test database to accommodate for some model changes happening in the codebase. As you might have noticed, it’s not terribly informative, with the errno number as the only useful tidbit. A cursory glance at top search result for this message said that the most probable cause was a malformed FOREIGN KEY
constraint inside the offending CREATE TABLE
query.
Upon reading this, I blinked several times; something here was definitely off. The query wasn’t of course written by hand – if it was, we could at least consider an actual mistake to be a problem here. But no, it came from ORM – and not just an ORM, but the best ORM known to mankind. While obviously nothing is perfect, I would think it’s extremely unlikely that I found a serious bug in a widely used library just by doing something as innocent as creating a table with foreign key. I’m not that good, after all ;)
Well, except that it could totally be such a bug. The before mentioned search results also pointed to MySQL issue tracker where it was suggested that the error might happen after trying to create foreign key constraint with duplicate name. Supposedly, this could “corrupt” the parent table and no new FOREIGN KEY
s could reference it anymore, yielding the errno 150 if we attempted to create one. While it could not explain the behavior I observed (the parent table was freshly created), it raised some doubts whether MySQL itself may be to blame here.
These were exacerbated when one of my colleagues tried out the same procedure, and it worked for him just fine. He turned out to use newer version of MySQL, though: 5.5 versus 5.1. This appeared to support the hypothesis about a possible bug in MySQL but it didn’t seem to help one bit to get the thing running on the older version.
However, it was an important clue that something relevant changed in between, that had an influence on the whole issue. It was not really any particular bugfix or new feature: it was a change of defaults.
I recently had a discussion with a co-worker about feasibility of using anonymous functions in Python. We happen to overuse them quite a bit and this is not something I’m particularly fond of. For me lambdas in Python are looking pretty weird and thus I prefer to use them sparingly. I wasn’t entirely sure why is it so – given that I’m quite a fan of functional programming paradigm – until I noticed a seemingly insignificant fact.
Namely: the lambda
keyword is long. With six letters, it is among the longer keywords in Python 2.x, tied with return
, import
and global
, and beaten only by continue
and finally
. Quite likely, this is what causes lambdas in Python to stand out and require additional mental resources to process (assuming we’re comfortable enough with the very idea of anonymous functions). The long lambda
keyword seems slightly out of place because, in general, Python keywords are short.
Or are they?… Thinking about this, I’ve got an idea of comparing the average length of keywords from different programming languages. I didn’t really anticipate what kind of information would be exposed by applying such a metric but it seemed like a fun exercise. And it surely was; also, the results might provoke a thought or two.
Here they are:
Language | Keyword | Total chars | Chars / keyword |
Python 2.7 | 31 | 133 | 4.29 |
C++03 | 74 | 426 | 5.76 |
C++11 | 84 | 516 | 6.14 |
Java 1.7 | 50 | 289 | 5.78 |
C | 32 | 166 | 5.19 |
C# 4.0 | 77 | 423 | 5.49 |
Go | 25 | 129 | 5.16 |
The newest incarnation of C++ seems to be losing badly in this competition, followed by C#. On the other side of the spectrum, Go and Python seem to be deliberately designed to avoid keyword bloat as much as possible. Java is somewhere in between when it comes to sheer numbers of keywords but their average length is definitely on the long side. This could very well be one of the reasons for the perceived verbosity of the language.
For those interested, the exact data and code I used to obtain these statistics are in this gist.
If you haven’t heard about it, DreamPie is an awesome GUI application layered on top of standard Python shell. I use it for elaborate prototyping where its multi-line input box is a significant advance over raw, terminal UX of IPython.
However, up until recently I didn’t know how to make DreamPie cooperate with virtualenv. Because it’s a GUI program, I scoured its menu and all the preference windows, searching for any trace of option that would allow me to set the Python executable. Having failed, I was convinced that authors didn’t think about including it – which was rather surprising, though.
But hey, DreamPie is open source! So I went to look around its code to see whether I can easily enhance it with an ability to specify Python binary. It wasn’t too long before I stumbled into this vital fragment:
The conclusions we could draw from this anecdote are thereby as follows:
With this newfound knowledge about dreampie arguments, it wasn’t very hard to make it use current virtualenv:
And after doing some more research, I ended up adding the following line to my ~/.bash_aliases:
Now I can simply type dp to get a DreamPie instance operating within current virtualenv but independent from terminal session. Very useful!
…for fun and profit!
I’m still kind of amazed of how malleable the Python language is. It’s no small feat to allow for messing with classes before they are created but it turns out to be pretty commonplace now. My latest frontier of pythonic hackery is import hooks and today I’d like to write something about them. I believe this will come handy for at least a few pythonistas because the topic seems to be rather scarcely covered on the ‘net.
As you can easily deduce, the name ‘import hook’ indicates something related to Python’s mechanism of imports. More specifically, import hooks are about injecting our custom logic directly into Python’s importing routines. Before delving into details, though, let’s revise how the imports are being handled by default.
As far as we are concerned, the process seems to be pretty simple. When the Python interpreter encounters an import
statement, it looks up the list of directories stored inside sys.path
. This list is populated at startup and usually contains entries inserted by external libraries or the operating system, as well as some standard directories (e.g. dist-packages). These directories are searched in order and in greedy fashion: if one of them contains the desired package/module, it’s picked immediately and the whole process stops right there.
Should we run out of places to look, an ImportError
is raised. Because this is an exception we can catch, it’s possible to try multiple imports before giving up:
While this is extremely ugly boilerplate, it serves to greatly increase portability of our application or package. Fortunately, there is only handful of worthwhile libraries that we may need to handle this way; json
is the most prominent example.
__path__
What I presented above as Python’s import flow is sufficient as description for most purposes but far from being complete. It omits few crucial places where we can tweak things to our needs.
First is the __path__
attribute which can be defined in package’s __init__.py file. You can think of it as a local extension to sys.path
list that only works for submodules of this particular package. In other words, it contains directories that should be searched when a package’s submodule is being imported. By default it only has the __init__.py‘s directory but it can be extended to contain different paths as well.
A typical use case here is splitting single “logical” package between several “physical” packages, distributed separately – typically as different PyPI packets. For example, let’s say we have foo
package with foo.server
and foo.client
as subpackages. They are registered in PyPI as separate distributions (foo-server and foo-client, for instance) and user can have any or both of them installed at the same time. For this setup to work correctly, we need to modify foo.__path__
so that it may point to foo.server
‘s directory and foo.client
‘s directory, depending on whether they are present or not. While this task sounds exceedingly complex, it is actually very easy thanks to the standard pkgutil
module. All we need to do is to put the following two lines into foo/__init__.py file:
There is much more to __path__
manipulation than this simple trick, of course. If you are interested, I recommend reading an issue of Python Module of the Week devoted solely to pkgutil
.
sys.meta_path
and sys.path_hooks
Moving on, let’s focus on parts of import process that let you do the truly amazing things. Here I’m talking stuff like pulling modules directly from Zip files or remote repositories, or just creating them dynamically based on, say, WSDL description of Web services, symbols exported by DLLs, REST APIs, command line tools and their arguments… pretty much anything you can think of (and your imagination is likely better than mine). I’m also referring to “aggressive” interoperability between independent modules: when one package can adjust or expand its functionality when it detects that another one has been imported. Finally, I’m also talking about security-enhanced Python sandboxes that intercept import requests and can deny access to certain modules or alter their functionality on the fly.
It came up today as a real surprise to me. Up until then, I thought that long gone were the times when I stared at plain old syntax errors in confused bewilderment. Well, at least if we’re talking languages I have some experience with, like Python. So when it happened to me today, I was really caught off-guard.
The crux of the issue can be demonstrated in the following, artificial example:
The goal is to build some simple XML tree using the most convenient interface, i.e. the lxml.builder.E
manipulator from the lxml
library. The real code is somewhat longer and more complicated but this snippet encapsulates the issue pretty neatly.
And strange as it may seem, this little piece produces a SyntaxError
at the final closing parenthesis:
In such case, the first obvious thing anyone would do is of course to look for unmatched opening brace. With the aid of modern editors (or even not so modern ones ;>) this is a trivial task. Before too long we would therefore find out that… all the braces are fine. Double-checking, just to be sure, will have the same result. Everything appears to be in order.
But, of course, we still have the syntax error. What the hell?!
As it turns out, the offending line is just above the seemingly erroneous parentheses. It’s this one:
Or, to be more specific, it is the very last character of this line that the interpreter has problems with:
See, Python really doesn’t like this trailing comma. Which, admittedly, is more than surprising, given how lenient it is in pretty much any other setting. You may recall that it’s perfectly OK to include the additional comma after the final element of a list, tuple, or dictionary, and it is quite useful to do so in practice. Not only that – it is also possible for argument lists in function call. Indeed, this very fragment has one instance of such trailing comma that appears after a keyword argument (city=user.address.city,
).
But apparently this doesn’t really work for all kinds of arguments. If we unpack some positional ones (using *
operator), we cannot put a comma afterwards. The relevant part of Python grammar specification is stating this, of course:
but I wouldn’t call it very explicit. And it seems that you actually can have a comma after *foo
but only if another argument follows. If my intuition of formal grammars is correct, the reason for this rule to prohibit foo(*args,)
(or foo(**kwargs,)
for that matter) is strictly related to the fact than Python’s grammar is LL(1). And this, by the way, is here to stay. Quoting PEP 3099:
Simple is better than complex. This idea extends to the parser. Restricting Python’s grammar to an LL(1) parser is a blessing, not a curse. It puts us in handcuffs that prevent us from going overboard and ending up with funky grammar rules like some other dynamic languages that will go unnamed, such as Perl.
I, for one, deem this attitude completely reasonable – even if it results in 20 minutes of utter confusion once in a blue moon.
Footnote: The title is of course a not-so-obvious reference to The Infernal Semicolon.
Few days ago I needed to write a script which was supposed to run inside a temporary directory. The exact matter was about deployment from an ad hoc Git repository, and it’s something that I may describe in more detail later on. Today, however, I wanted to focus on its small part: a one that (I think) has neatly captured the notion of executing something within a non-persistent, working directory. Because it’s a very general technique, I suppose quite a few readers may find it pretty useful.
Obtaining a temporary file or even directory shouldn’t be a terribly complicated thing – and indeed, it’s very easy in case of Python. We have a standard tempfile
module here and it serves our needs pretty well in this regard. For one, it has the mkdtemp
function which creates a temporary directory and returns path to it:
That’s what it does. What it doesn’t do is e.g ensuring a proper cleanup once the directory is not needed anymore. This is especially important on Windows where the equivalent of /tmp
is not wiped out at boot time.
We also wanted our fresh temp directory to be set as the program’s working one (PWD
), and obviously this is also something we need to manually take care of. To combine those two needs, I think the best solution is to employ a context manager.
Context manager is basically a fancy name for an object that the with
statement can be applied upon. You may recall that some time ago I wrote about interesting use cases for the with
construct. This one could also qualify as such, but the principles are very typical. It’s about introducing a scope where some resource (here: a temporary directory) remains accessible as long as we’re inside it. Once we leave the with
block, it is cleaned up – just like file handles, network sockets, concurrent locks and plenty of other similar objects.
But while semantics are pretty clear, there are of course several ways to do this syntactically. I took this opportunity to try out the supposedly simplest one which I learned recently on local Python community meet-up: the contextlib
library. It includes the contextmanager
decorator: a simple and clever way to write with
-enabled objects as simple functions. It is based on particular usage of yield
statement which makes it very interesting even by itself.
So without further ado, let’s look at the final solution I wanted to present:
As we can see, yield
divides this function into two parts: setup and cleanup. Setup will be executed when we enter the with
block, while cleanup will run when we’re about to exit it. By the way, this scheme of multiple entry and exit points in one function is typically referred to as coroutine, and it allows for several very intriguing techniques of smart computation.
Usage of temp_directory
function is pretty obvious, I’d say. Here’s a simplified excerpt of the Git-based deployment script that I used it in:
Note how the meaning of '.'
(current directory) shifts depending on whether we’re inside or outside the with
block. Users of Fabric (Python- and SSH-based remote administration tool) will find this very similar to its cd
context manager. The main difference is of course that directory we’re cd
-ing to is not a predetermined one, and that it will disappear once we’re done with it.
I suppose it this not uncommon to encounter a general situation such as the following. Say you have some well-defined function that performs a transformation of one value into another. It’s not particularly important how lengthy or complicated this function is, only that it takes one parameter and outputs a result. Here’s a somewhat trivial but astonishingly useful example:
Depending on what happens in other parts of your program, you may find yourself applying such function to many different inputs. Then at some point, it is possible that you’ll need to handle lists of those inputs in addition to supporting single values. Query string of URLs, for example, often require such treatment because they may contain more than one value for given key, and web frameworks tend to collate those values into lists of strings.
In those situations, you will typically want to deal just with the list case. This leads to writing a conditional in either the caller code:
or directly inside a particular function. I’m not a big fan of similar solutions because everyone do them differently, and writing the same piece several times is increasingly prone to errors. Quite not incidentally, a mistake is present in the very example above – it shouldn’t be at all hard to spot it.
In any case, repeated application calls for extracting the pattern into something tangible and reusable. What I devised is therefore a general “recursivator”, whose simplified version is given below:
As for usage, I think it’s equally feasible for both on-a-spot calls:
as well as decorating functions to make them recursive permanently. For this, though, it would be wise to turn it into class-based decorator, applying the technique I’ve described previously. This way we could easily extend the solution and tie it to our needs.
But what are the specific ways of doing so? I could think of some, like:
datetime
s into ISO-formatted strings.set
s from being turned into lists, as obviously sets are also iterable. In more general version, one could supply a predicate function for deciding whether to recurse or not.recursive
into generator for more memory-efficient solution. If we’re lucky to program in Python 3.x, it would be a good excuse to employ the new yield from
construct from 3.3.One way or another, capturing a particular concept of computation into actual API such as recursive
looks like a good way for making the code more descriptive and robust. Certainly it adheres to one of the statements from Zen: that explicit is better than implicit.