I recently had a discussion with a co-worker about feasibility of using anonymous functions in Python. We happen to overuse them quite a bit and this is not something I’m particularly fond of. For me lambdas in Python are looking pretty weird and thus I prefer to use them sparingly. I wasn’t entirely sure why is it so – given that I’m quite a fan of functional programming paradigm – until I noticed a seemingly insignificant fact.
Namely: the lambda
keyword is long. With six letters, it is among the longer keywords in Python 2.x, tied with return
, import
and global
, and beaten only by continue
and finally
. Quite likely, this is what causes lambdas in Python to stand out and require additional mental resources to process (assuming we’re comfortable enough with the very idea of anonymous functions). The long lambda
keyword seems slightly out of place because, in general, Python keywords are short.
Or are they?… Thinking about this, I’ve got an idea of comparing the average length of keywords from different programming languages. I didn’t really anticipate what kind of information would be exposed by applying such a metric but it seemed like a fun exercise. And it surely was; also, the results might provoke a thought or two.
Here they are:
Language | Keyword | Total chars | Chars / keyword |
Python 2.7 | 31 | 133 | 4.29 |
C++03 | 74 | 426 | 5.76 |
C++11 | 84 | 516 | 6.14 |
Java 1.7 | 50 | 289 | 5.78 |
C | 32 | 166 | 5.19 |
C# 4.0 | 77 | 423 | 5.49 |
Go | 25 | 129 | 5.16 |
The newest incarnation of C++ seems to be losing badly in this competition, followed by C#. On the other side of the spectrum, Go and Python seem to be deliberately designed to avoid keyword bloat as much as possible. Java is somewhere in between when it comes to sheer numbers of keywords but their average length is definitely on the long side. This could very well be one of the reasons for the perceived verbosity of the language.
For those interested, the exact data and code I used to obtain these statistics are in this gist.
If you haven’t heard about it, DreamPie is an awesome GUI application layered on top of standard Python shell. I use it for elaborate prototyping where its multi-line input box is a significant advance over raw, terminal UX of IPython.
However, up until recently I didn’t know how to make DreamPie cooperate with virtualenv. Because it’s a GUI program, I scoured its menu and all the preference windows, searching for any trace of option that would allow me to set the Python executable. Having failed, I was convinced that authors didn’t think about including it – which was rather surprising, though.
But hey, DreamPie is open source! So I went to look around its code to see whether I can easily enhance it with an ability to specify Python binary. It wasn’t too long before I stumbled into this vital fragment:
The conclusions we could draw from this anecdote are thereby as follows:
With this newfound knowledge about dreampie arguments, it wasn’t very hard to make it use current virtualenv:
And after doing some more research, I ended up adding the following line to my ~/.bash_aliases:
Now I can simply type dp to get a DreamPie instance operating within current virtualenv but independent from terminal session. Very useful!
…for fun and profit!
I’m still kind of amazed of how malleable the Python language is. It’s no small feat to allow for messing with classes before they are created but it turns out to be pretty commonplace now. My latest frontier of pythonic hackery is import hooks and today I’d like to write something about them. I believe this will come handy for at least a few pythonistas because the topic seems to be rather scarcely covered on the ‘net.
As you can easily deduce, the name ‘import hook’ indicates something related to Python’s mechanism of imports. More specifically, import hooks are about injecting our custom logic directly into Python’s importing routines. Before delving into details, though, let’s revise how the imports are being handled by default.
As far as we are concerned, the process seems to be pretty simple. When the Python interpreter encounters an import
statement, it looks up the list of directories stored inside sys.path
. This list is populated at startup and usually contains entries inserted by external libraries or the operating system, as well as some standard directories (e.g. dist-packages). These directories are searched in order and in greedy fashion: if one of them contains the desired package/module, it’s picked immediately and the whole process stops right there.
Should we run out of places to look, an ImportError
is raised. Because this is an exception we can catch, it’s possible to try multiple imports before giving up:
While this is extremely ugly boilerplate, it serves to greatly increase portability of our application or package. Fortunately, there is only handful of worthwhile libraries that we may need to handle this way; json
is the most prominent example.
__path__
What I presented above as Python’s import flow is sufficient as description for most purposes but far from being complete. It omits few crucial places where we can tweak things to our needs.
First is the __path__
attribute which can be defined in package’s __init__.py file. You can think of it as a local extension to sys.path
list that only works for submodules of this particular package. In other words, it contains directories that should be searched when a package’s submodule is being imported. By default it only has the __init__.py‘s directory but it can be extended to contain different paths as well.
A typical use case here is splitting single “logical” package between several “physical” packages, distributed separately – typically as different PyPI packets. For example, let’s say we have foo
package with foo.server
and foo.client
as subpackages. They are registered in PyPI as separate distributions (foo-server and foo-client, for instance) and user can have any or both of them installed at the same time. For this setup to work correctly, we need to modify foo.__path__
so that it may point to foo.server
‘s directory and foo.client
‘s directory, depending on whether they are present or not. While this task sounds exceedingly complex, it is actually very easy thanks to the standard pkgutil
module. All we need to do is to put the following two lines into foo/__init__.py file:
There is much more to __path__
manipulation than this simple trick, of course. If you are interested, I recommend reading an issue of Python Module of the Week devoted solely to pkgutil
.
sys.meta_path
and sys.path_hooks
Moving on, let’s focus on parts of import process that let you do the truly amazing things. Here I’m talking stuff like pulling modules directly from Zip files or remote repositories, or just creating them dynamically based on, say, WSDL description of Web services, symbols exported by DLLs, REST APIs, command line tools and their arguments… pretty much anything you can think of (and your imagination is likely better than mine). I’m also referring to “aggressive” interoperability between independent modules: when one package can adjust or expand its functionality when it detects that another one has been imported. Finally, I’m also talking about security-enhanced Python sandboxes that intercept import requests and can deny access to certain modules or alter their functionality on the fly.