Posts tagged ‘import’

Hacking Python Imports

2012-05-06 19:05

…for fun and profit!

I’m still kind of amazed of how malleable the Python language is. It’s no small feat to allow for messing with classes before they are created but it turns out to be pretty commonplace now. My latest frontier of pythonic hackery is import hooks and today I’d like to write something about them. I believe this will come handy for at least a few pythonistas because the topic seems to be rather scarcely covered on the ‘net.

Importing: a simplistic view

As you can easily deduce, the name ‘import hook’ indicates something related to Python’s mechanism of imports. More specifically, import hooks are about injecting our custom logic directly into Python’s importing routines. Before delving into details, though, let’s revise how the imports are being handled by default.

As far as we are concerned, the process seems to be pretty simple. When the Python interpreter encounters an import statement, it looks up the list of directories stored inside sys.path. This list is populated at startup and usually contains entries inserted by external libraries or the operating system, as well as some standard directories (e.g. dist-packages). These directories are searched in order and in greedy fashion: if one of them contains the desired package/module, it’s picked immediately and the whole process stops right there.

Should we run out of places to look, an ImportError is raised. Because this is an exception we can catch, it’s possible to try multiple imports before giving up:

  1. try:
  2.     # Python 2.7 and 3.x
  3.     import json
  4. except ImportError:
  5.     try:
  6.         # Python 2.6 and below
  7.         import simplejson as json
  8.     except ImportError:
  9.         try:
  10.              # some older versions of Django have this
  11.              from django.utils import simplejson as json
  12.          except ImportError:
  13.              raise Exception("MyAwesomeLibrary requires a JSON package!")

While this is extremely ugly boilerplate, it serves to greatly increase portability of our application or package. Fortunately, there is only handful of worthwhile libraries that we may need to handle this way; json is the most prominent example.

More details: about __path__

What I presented above as Python’s import flow is sufficient as description for most purposes but far from being complete. It omits few crucial places where we can tweak things to our needs.

First is the __path__ attribute which can be defined in package’s __init__.py file. You can think of it as a local extension to sys.path list that only works for submodules of this particular package. In other words, it contains directories that should be searched when a package’s submodule is being imported. By default it only has the __init__.py‘s directory but it can be extended to contain different paths as well.

A typical use case here is splitting single “logical” package between several “physical” packages, distributed separately – typically as different PyPI packets. For example, let’s say we have foo package with foo.server and foo.client as subpackages. They are registered in PyPI as separate distributions (foo-server and foo-client, for instance) and user can have any or both of them installed at the same time. For this setup to work correctly, we need to modify foo.__path__ so that it may point to foo.server‘s directory and foo.client‘s directory, depending on whether they are present or not. While this task sounds exceedingly complex, it is actually very easy thanks to the standard pkgutil module. All we need to do is to put the following two lines into foo/__init__.py file:

  1. import pkgutil
  2. __path__ = pkgutil.extend_path(__path__, __name__)

There is much more to __path__ manipulation than this simple trick, of course. If you are interested, I recommend reading an issue of Python Module of the Week devoted solely to pkgutil.

Actual hooks: sys.meta_path and sys.path_hooks

Moving on, let’s focus on parts of import process that let you do the truly amazing things. Here I’m talking stuff like pulling modules directly from Zip files or remote repositories, or just creating them dynamically based on, say, WSDL description of Web services, symbols exported by DLLs, REST APIs, command line tools and their arguments… pretty much anything you can think of (and your imagination is likely better than mine). I’m also referring to “aggressive” interoperability between independent modules: when one package can adjust or expand its functionality when it detects that another one has been imported. Finally, I’m also talking about security-enhanced Python sandboxes that intercept import requests and can deny access to certain modules or alter their functionality on the fly.

Tags: , , , ,
Author: Xion, posted under Programming » 3 comments
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.