…for fun and profit!
I’m still kind of amazed of how malleable the Python language is. It’s no small feat to allow for messing with classes before they are created but it turns out to be pretty commonplace now. My latest frontier of pythonic hackery is import hooks and today I’d like to write something about them. I believe this will come handy for at least a few pythonistas because the topic seems to be rather scarcely covered on the ‘net.
As you can easily deduce, the name ‘import hook’ indicates something related to Python’s mechanism of imports. More specifically, import hooks are about injecting our custom logic directly into Python’s importing routines. Before delving into details, though, let’s revise how the imports are being handled by default.
As far as we are concerned, the process seems to be pretty simple. When the Python interpreter encounters an
import statement, it looks up the list of directories stored inside
sys.path. This list is populated at startup and usually contains entries inserted by external libraries or the operating system, as well as some standard directories (e.g. dist-packages). These directories are searched in order and in greedy fashion: if one of them contains the desired package/module, it’s picked immediately and the whole process stops right there.
Should we run out of places to look, an
ImportError is raised. Because this is an exception we can catch, it’s possible to try multiple imports before giving up:
While this is extremely ugly boilerplate, it serves to greatly increase portability of our application or package. Fortunately, there is only handful of worthwhile libraries that we may need to handle this way;
json is the most prominent example.
What I presented above as Python’s import flow is sufficient as description for most purposes but far from being complete. It omits few crucial places where we can tweak things to our needs.
First is the
__path__ attribute which can be defined in package’s __init__.py file. You can think of it as a local extension to
sys.path list that only works for submodules of this particular package. In other words, it contains directories that should be searched when a package’s submodule is being imported. By default it only has the __init__.py‘s directory but it can be extended to contain different paths as well.
A typical use case here is splitting single “logical” package between several “physical” packages, distributed separately – typically as different PyPI packets. For example, let’s say we have
foo package with
foo.client as subpackages. They are registered in PyPI as separate distributions (foo-server and foo-client, for instance) and user can have any or both of them installed at the same time. For this setup to work correctly, we need to modify
foo.__path__ so that it may point to
foo.server‘s directory and
foo.client‘s directory, depending on whether they are present or not. While this task sounds exceedingly complex, it is actually very easy thanks to the standard
pkgutil module. All we need to do is to put the following two lines into foo/__init__.py file:
Moving on, let’s focus on parts of import process that let you do the truly amazing things. Here I’m talking stuff like pulling modules directly from Zip files or remote repositories, or just creating them dynamically based on, say, WSDL description of Web services, symbols exported by DLLs, REST APIs, command line tools and their arguments… pretty much anything you can think of (and your imagination is likely better than mine). I’m also referring to “aggressive” interoperability between independent modules: when one package can adjust or expand its functionality when it detects that another one has been imported. Finally, I’m also talking about security-enhanced Python sandboxes that intercept import requests and can deny access to certain modules or alter their functionality on the fly.
All of these (and possibly much more) can be achieved through the usage of import hooks. There are two distinct types of them, usually referred to as meta hooks (defined in
sys.meta_path) and path hooks (defined in
sys.path_hooks). Although they are invoked at slightly different stages of the import flow, they are both built upon the same two concepts: that of a module finder and a module loader.
Module finder is simply an object which implements one specific method –
It receives a fully qualified name of the module to be imported, along with
path where it’s supposed to be found. and it is expected that the method does one of three things:
None, meaning that given module cannot be found by this particular finder. It can still be found during the next stages of import flow, either by some other custom finder or just the standard Python import mechanism.
The last case is of course the most interesting one, as it gracefully leads us to the concept of module loader. This one is an object that implements the
fullname parameter is a fully qualified name of the module that we want to import. Return value should be a module object – a final result of the whole importing process. Note that this could be something that was already imported; for such “duplicate” imports the loader should simply return the existing module:
If anything goes wrong at this stage, loader should raise an exception (typically is just an
Here’s where most of the theory ends, as conveniently described in PEP 302. In practice both finder and loader can be the same object and the
find_module method can simply
return self. As an example, consider this simple hook which is intended to block some specific modules from being imported at all:
Once installed in
sys.meta_path, it will intercept every attempt to import a new module and check whether its name exists on our list. This applies to indirect imports as well: if we attempt to use the Python Requests library:
then it will also fail, as
requests internally uses
urllib3, which in turn uses the restricted
A hook that is a total blocker doesn’t seem very useful, so let’s try something slightly different. Rather than refusing to import a particular module, we’ll proceed normally and issue a warning instead. Such a hook can help detect when deprecated Python modules are introduced to the project:
In order to access the normal importing mechanism, we can use the
imp package. Its functions
load_module are roughly the equivalents of our import hook’s methods with the same names. But
imp offers much more, as it also contains functions capable of creating modules from various inputs (e.g.
load_compiled) or even creating them completely from scratch (
While this all is surely very interesting, we may doubt whether import hooks actually have any notable applications at all. There is surely potential for some really impressive things, including importing Python modules straight from remote URLs (security concerns notwithstanding). In my case, though, I had an actual need that import hooks seemed to satisfy best.
There is this great
pytz package for supporting date and time calculations involving timezones. In general, it is a really shaky ground to thread upon, where issues related to daylight saving time are among the easier ones to deal with. For the most part,
pytz helps navigating through the obstacles in elegant manner but there is one thing where it falls short.
But the keyword here is ‘usable’. As a matter of fact,
pytz has Etc/GMT+X timezones. However, due to some obscure, decades-old compatibility imperative they are the exact opposite of what we would expect them to be: their offsets are effectively negated. It means that Etc/GMT+2, for example, doesn’t refer to normal time in eastern Europe (or DST in center/western) but to a timezone on the other side of Prime Meridian which is almost unused except as DST for few South American countries, and Greenland. It goes without saying that this is completely and utterly insane.
In cases like this there are usually two solutions. You can put a thin layer with appropriate fix in front of (already perfect) library interface and make sure that no one uses the library directly – and this is of course impossible. Or you can fork it and make necessary changes – but then you’ll have to maintain the fork, manually pulling changes from upstream whenever the original timezone database is updated (which is every few months). Neither is satisfying; couldn’t we just use the library as it is, but somehow patch it on the fly, just before it’s used?…
Why, of course – hello import hooks! Using a relatively simple module finder and loader, we can easily achieve the desired effect and transparently expand the
pytz library to include more useful generic timezones. The full code can be seen in this gist and it isn’t even long or complex.
That would be it for today’s write-up. If you want to learn more about the intricacies of Python’s import hooks – such as the meaning of
sys.path_hooks, for example – the canonical source will of course be the appropriate PEP. Beyond that, there isn’t really any wealth of information to point at: some blog posts here and there, with this one being probably the most useful.