Hacking Python Imports

2012-05-06 19:05

…for fun and profit!

I’m still kind of amazed of how malleable the Python language is. It’s no small feat to allow for messing with classes before they are created but it turns out to be pretty commonplace now. My latest frontier of pythonic hackery is import hooks and today I’d like to write something about them. I believe this will come handy for at least a few pythonistas because the topic seems to be rather scarcely covered on the ‘net.

Importing: a simplistic view

As you can easily deduce, the name ‘import hook’ indicates something related to Python’s mechanism of imports. More specifically, import hooks are about injecting our custom logic directly into Python’s importing routines. Before delving into details, though, let’s revise how the imports are being handled by default.

As far as we are concerned, the process seems to be pretty simple. When the Python interpreter encounters an import statement, it looks up the list of directories stored inside sys.path. This list is populated at startup and usually contains entries inserted by external libraries or the operating system, as well as some standard directories (e.g. dist-packages). These directories are searched in order and in greedy fashion: if one of them contains the desired package/module, it’s picked immediately and the whole process stops right there.

Should we run out of places to look, an ImportError is raised. Because this is an exception we can catch, it’s possible to try multiple imports before giving up:

  1. try:
  2.     # Python 2.7 and 3.x
  3.     import json
  4. except ImportError:
  5.     try:
  6.         # Python 2.6 and below
  7.         import simplejson as json
  8.     except ImportError:
  9.         try:
  10.              # some older versions of Django have this
  11.              from django.utils import simplejson as json
  12.          except ImportError:
  13.              raise Exception("MyAwesomeLibrary requires a JSON package!")

While this is extremely ugly boilerplate, it serves to greatly increase portability of our application or package. Fortunately, there is only handful of worthwhile libraries that we may need to handle this way; json is the most prominent example.

More details: about __path__

What I presented above as Python’s import flow is sufficient as description for most purposes but far from being complete. It omits few crucial places where we can tweak things to our needs.

First is the __path__ attribute which can be defined in package’s __init__.py file. You can think of it as a local extension to sys.path list that only works for submodules of this particular package. In other words, it contains directories that should be searched when a package’s submodule is being imported. By default it only has the __init__.py‘s directory but it can be extended to contain different paths as well.

A typical use case here is splitting single “logical” package between several “physical” packages, distributed separately – typically as different PyPI packets. For example, let’s say we have foo package with foo.server and foo.client as subpackages. They are registered in PyPI as separate distributions (foo-server and foo-client, for instance) and user can have any or both of them installed at the same time. For this setup to work correctly, we need to modify foo.__path__ so that it may point to foo.server‘s directory and foo.client‘s directory, depending on whether they are present or not. While this task sounds exceedingly complex, it is actually very easy thanks to the standard pkgutil module. All we need to do is to put the following two lines into foo/__init__.py file:

  1. import pkgutil
  2. __path__ = pkgutil.extend_path(__path__, __name__)

There is much more to __path__ manipulation than this simple trick, of course. If you are interested, I recommend reading an issue of Python Module of the Week devoted solely to pkgutil.

Actual hooks: sys.meta_path and sys.path_hooks

Moving on, let’s focus on parts of import process that let you do the truly amazing things. Here I’m talking stuff like pulling modules directly from Zip files or remote repositories, or just creating them dynamically based on, say, WSDL description of Web services, symbols exported by DLLs, REST APIs, command line tools and their arguments… pretty much anything you can think of (and your imagination is likely better than mine). I’m also referring to “aggressive” interoperability between independent modules: when one package can adjust or expand its functionality when it detects that another one has been imported. Finally, I’m also talking about security-enhanced Python sandboxes that intercept import requests and can deny access to certain modules or alter their functionality on the fly.

Tags: , , , ,
Author: Xion, posted under Programming » 3 comments

Bootstrap, a UI framework for the modern Web

2012-04-22 20:11

It was almost exactly one year ago and I remember it quite vividly. I was attending an event organized by Google which was about the Chrome Web Store, as well as HTML5 and web applications in general. After the speaker finished pitching about awesomeness of this stuff (and how Chrome was the only browser that supported them all, of course), it was time for a round of questions and some discussion. I seized this opportunity and brought up an issue of user interface inconsistencies that plague the contemporary web apps. Because the Web as a platform doesn’t really enforce any constraints on UI paradigms, we can experience all sorts of “creative” approaches to user interface. In their pursuit of novelty and eye candies, web designers often sacrifice usability by not adhering to well known interface metaphors, and shying away from uniform UI elements.

At that time I didn’t really get a good answer that would address this issue. And it’s an important one, given the rate at which web applications are springing to life and replacing their equivalent desktop programs. Does it mean we’ll be stuck with this UI bonanza for the time being?…

Fortunately, there are some early first signs that it might not necessarily be so.

Enter Bootstrap


Few months later, in August 2011, Twitter released the first version of Bootstrap framework. Originally intended for internal use, this set of common HTML, CSS and JS patterns was made open source and released into the wild. The praise it subsequently gained is definitely well deserved, for it is a great set of tools for kickstarting development of any web-related project. Its features include:

  • a flexible grid system for establishing a skeleton of the web page or app
  • a set of great-looking styles for HTML form elements
  • many complex UI components, like collapsible alerts, dropdowns, navigation bars, modal windows, and so on
  • customizable set of CSS styles for typical markup elements, such as headers or tables

Along with universal acclaim came also the popularity: it is currently the most watched project on GitHub.

The value of consistency

However, some want to argue that being so popular has also an unanticipated, negative side. It makes the developers lazy, convinced they can get away without a “proper” design for their site or app. Even more: it allegedly shows disrespect for their users, as if the creator simply didn’t care how does their product look like.

Does it compute? I don’t think so. Do you complain if you find that any particular desktop application uses the very same looks for UI components, like buttons or list boxes?… Of course not. We learned to value the consistency and predictability that this entails, because it frees us from the burden of mentally adjusting to every single GUI app that we happen to use. Similarly, developers appreciate how this makes their work easier: they don’t have to code dropdown menus or modal dialogs, which in turns allows them to spend more time on actual, useful functionality.

Sure, it didn’t happen overnight when desktop OS’ were emerging as software platforms. Also, there are still plenty of apps whose creators – willfully or unintentionally – chose not to adhere to the standards. Sometimes it’s even for the good, as it allows for new, useful UI patterns to emerge and be adopted by the mainstream. The resulting process is still that of convergence, producing interfaces which are more consistent and easier to use.

Bootstrap shapes the Web

The analogous process may just be happening to the Internet, considered as a “platform” for web applications. By steadily raising in popularity, Bootstrap has a chance of becoming the UI framework for Web in general – an obvious first choice for any new project.

Of course, even if this happens, it would be terribly unlikely that it starts reigning supreme and making every website look almost exactly the same – i.e. transforming the Web into equivalent of desktop. What’s much more likely is following the footsteps of mobile platforms. In there, every app strives to be original and appealing but only those that correctly balance usability with artsy design provide really compelling user experience.

It will not be without differences, though. Mobile platforms are generally ruled with iron (or clay) fist and have relevant UI guidelines spelled out explicitly. For Web it’s very much not the case, so any potential “standardization” is necessarily a bottom-up process whose benefits have to be indisputable and clearly visible. Despite some opposition, Bootstrap seems to have enough momentum to really (ahem) bootstrap this process. It already wins hearts and minds of many web developers, so it may be just a matter of time.

If it happens, I believe the Web will be in better place.

Tags: , , , ,
Author: Xion, posted under Internet, Programming, Thoughts » Comments Off on Bootstrap, a UI framework for the modern Web

The Infernal Comma

2012-04-16 19:58

It came up today as a real surprise to me. Up until then, I thought that long gone were the times when I stared at plain old syntax errors in confused bewilderment. Well, at least if we’re talking languages I have some experience with, like Python. So when it happened to me today, I was really caught off-guard.

The crux of the issue can be demonstrated in the following, artificial example:

  1. from lxml.builder import E
  2.  
  3. def user_to_xml(user):
  4.     address = [E.address(
  5.         street=user.address.street,
  6.         zipcode=user.address.zipcode,
  7.         city=user.address.city,
  8.     )] if user.address else []
  9.     return E.user(
  10.         dict(first_name=user.first_name,
  11.              last_name=user.last_name),
  12.         *address,
  13.     )

The goal is to build some simple XML tree using the most convenient interface, i.e. the lxml.builder.E manipulator from the lxml library. The real code is somewhat longer and more complicated but this snippet encapsulates the issue pretty neatly.

And strange as it may seem, this little piece produces a SyntaxError at the final closing parenthesis:

  1. SyntaxError: invalid syntax (at line 13 col 5)

In such case, the first obvious thing anyone would do is of course to look for unmatched opening brace. With the aid of modern editors (or even not so modern ones ;>) this is a trivial task. Before too long we would therefore find out that… all the braces are fine. Double-checking, just to be sure, will have the same result. Everything appears to be in order.

But, of course, we still have the syntax error. What the hell?!

As it turns out, the offending line is just above the seemingly erroneous parentheses. It’s this one:

  1. *address,

Or, to be more specific, it is the very last character of this line that the interpreter has problems with:

  1. *address, # comma!

See, Python really doesn’t like this trailing comma. Which, admittedly, is more than surprising, given how lenient it is in pretty much any other setting. You may recall that it’s perfectly OK to include the additional comma after the final element of a list, tuple, or dictionary, and it is quite useful to do so in practice. Not only that – it is also possible for argument lists in function call. Indeed, this very fragment has one instance of such trailing comma that appears after a keyword argument (city=user.address.city,).

But apparently this doesn’t really work for all kinds of arguments. If we unpack some positional ones (using * operator), we cannot put a comma afterwards. The relevant part of Python grammar specification is stating this, of course:

  1. arglist: (argument ',')* (argument [',']
  2.                          |'*' test (',' argument)* [',' '**' test]
  3.                          |'**' test)

but I wouldn’t call it very explicit. And it seems that you actually can have a comma after *foo but only if another argument follows. If my intuition of formal grammars is correct, the reason for this rule to prohibit foo(*args,) (or foo(**kwargs,) for that matter) is strictly related to the fact than Python’s grammar is LL(1). And this, by the way, is here to stay. Quoting PEP 3099:

Simple is better than complex. This idea extends to the parser. Restricting Python’s grammar to an LL(1) parser is a blessing, not a curse. It puts us in handcuffs that prevent us from going overboard and ending up with funky grammar rules like some other dynamic languages that will go unnamed, such as Perl.

I, for one, deem this attitude completely reasonable – even if it results in 20 minutes of utter confusion once in a blue moon.

Footnote: The title is of course a not-so-obvious reference to The Infernal Semicolon.

Tags: , , ,
Author: Xion, posted under Computer Science & IT » 2 comments

Self-Reinforcement and Exponential Functions

2012-04-11 18:23

Special relativity is really kind of mean. Not only it prohibits anything from going faster than the speed of light (therefore destroying our Star Trek-inspired dreams of interstellar travel) but also threatens with extreme adverse effects should anyone dare to even come close to the impenetrable barrier of c. Assuming we can deal with the steady increase of mass as the speed goes up, there is always this issue of time dilation. While you are taking your short (i.e. few years-long) trip to nearby star, time passed on Earth could very well be measured in centuries. Having a millennium to catch up might prove cumbersome, and rather frustrating. Just think of all the iPhone models you would have missed!

As a solace, though, you could get quite a pile of cash waiting for you to pick up. Let’s say you’ve put 10,000 dollars (or euro, or your favorite currency) into investment with a yearly interest rate of 10 percent. Every year, this deposit will therefore increase by one tenth, and this will happen continuously over the next 1000 years. Could you quickly tell how big the final amount will be, compared to the initial one? How many times will it increase?…

You shouldn’t be very hard on yourself if you answered instinctively with e.g. 100 times or something similar. I mean, such figures are totally, utterly wrong by many orders of magnitude because the actual value is bigger than 1040. But it’s absolutely common to have problems with grasping exponential functions intuitively. In many ways this is quite pitiful, for they accurately describe many phenomenons that occur in nature, civilization, technology and culture. Yet they often escape understanding, leading to unfulfilled predictions, incorrect extrapolations, and plain old cognitive biases.

What is so bizarre about these functions that they tend to confuse a significant fraction, if not the majority of people?…

Counting Lines in Multiple Files

2012-04-06 13:40

Looks like using Linux is really bound to slowly – but steadily – improve your commandline-fu. As evidence, today I wanted to share a little piece of shell acolyte’s magic that I managed to craft without very big trouble. It’s about counting lines in files – code lines in code files, to be specific.

For a single file, getting the number of text rows is very simple:

  1. $ wc -l some.file
  2.   142 some.file

Although the name wc comes from “word count”, the -l switch changes its mode of operation into counting rows. The flexibility of this little program doesn’t end here; for example, it can also accept piped input (as stdin):

  1. $ cat some.file | wc -l
  2. 142

as well as multiple files:

  1. $ wc -l some.file other.file
  2.   142 some.file
  3.    54 other.file
  4.   196 all

or even wildcards, such as wc -l *.file. With these we could rather easily count the number of lines of code in our project:

  1. $ wc -l **/*.py
  2.     3 foo/__init__.py
  3.   189 foo/main.py
  4.    89 foo/utils.py
  5.    24 setup.py
  6.   133 tests.py
  7.   438 all

Unfortunately, the exact interpretation of **/* wildcard seems to vary between shells. In zsh it works as shown above, but in bash I had it omit files from current directory. While it might make some sense here (as it would give a total without setup script and tests), I’m sure it won’t be the case all projects.

And so we need something smarter.

Prison Escape: Game from IGK Compo

2012-04-03 18:50

Yesterday I came back from IGK conference (Engineering of Computer Games) in Siedlce. Among few interesting lectures and presentations it also featured a traditional, 7-hour long game development contest: Compo. Those unfamiliar with the concept should know that it’s a competition where every team (of max. 4 people) has to produce a playable game, according to particular topic, e.g. theme or genre. When the work is done, results are being presented to the public while a comittee of organizers votes for winners.

As usual, I took part in the competition along with Adam Sawicki “Reg” and Krzystof Kluczek “Krzysiek K.“. Topic for this year revolved around the idea of “escape” or “survival” kind of game, so we designed an old school, pixel-artsy scroller where you play as a prisoner trying to escape detention by running and avoiding numerous obstacles. We coded intensely and passionately, and in the end it paid off really well because we actually managed to snatch the first place! Woohoo!

 

A whopping amount of 15 teams took part in this year’s Compo, so it might take some time before all the submissions are available online. Meanwhile, I’m mirroring our game here. Please note that it uses DirectX 9 (incl. some shaders), so for best results you should have a non-ancient GPU and Windows OS. (It might work under Wine; we haven’t tested it).

File: [2012-04-01] Prison Escape  [2012-04-01] Prison Escape (5.8 MiB, 1,911 downloads)

Tags: , ,
Author: Xion, posted under Events, Games, Programming » 4 comments

Working within Temporary Directory

2012-03-24 15:35

Few days ago I needed to write a script which was supposed to run inside a temporary directory. The exact matter was about deployment from an ad hoc Git repository, and it’s something that I may describe in more detail later on. Today, however, I wanted to focus on its small part: a one that (I think) has neatly captured the notion of executing something within a non-persistent, working directory. Because it’s a very general technique, I suppose quite a few readers may find it pretty useful.

Obtaining a temporary file or even directory shouldn’t be a terribly complicated thing – and indeed, it’s very easy in case of Python. We have a standard tempfile module here and it serves our needs pretty well in this regard. For one, it has the mkdtemp function which creates a temporary directory and returns path to it:

  1. temp_dir = tempfile.mkdtemp()

That’s what it does. What it doesn’t do is e.g ensuring a proper cleanup once the directory is not needed anymore. This is especially important on Windows where the equivalent of /tmp is not wiped out at boot time.
We also wanted our fresh temp directory to be set as the program’s working one (PWD), and obviously this is also something we need to manually take care of. To combine those two needs, I think the best solution is to employ a context manager.

Context manager is basically a fancy name for an object that the with statement can be applied upon. You may recall that some time ago I wrote about interesting use cases for the with construct. This one could also qualify as such, but the principles are very typical. It’s about introducing a scope where some resource (here: a temporary directory) remains accessible as long as we’re inside it. Once we leave the with block, it is cleaned up – just like file handles, network sockets, concurrent locks and plenty of other similar objects.

But while semantics are pretty clear, there are of course several ways to do this syntactically. I took this opportunity to try out the supposedly simplest one which I learned recently on local Python community meet-up: the contextlib library. It includes the contextmanager decorator: a simple and clever way to write with-enabled objects as simple functions. It is based on particular usage of yield statement which makes it very interesting even by itself.

So without further ado, let’s look at the final solution I wanted to present:

  1. import os
  2. import shutil
  3. import tempfile
  4. from contextlib import contextmanager
  5.  
  6. @contextmanager
  7. def temp_directory(*args, **kwargs):
  8.     """Allows the program to operate inside temporary directory.
  9.    Sets the app's working dir automatically and restores it
  10.    to original one upon existing the `with` clause.
  11.    """
  12.     orig_workdir = os.getcwd()
  13.     temp_workdir = tempfile.mkdtemp(*args, **kwargs)
  14.     os.chdir(temp_workdir)
  15.  
  16.     yield temp_workdir
  17.  
  18.     os.chdir(orig_workdir)
  19.     shutil.rmtree(temp_workdir)

As we can see, yield divides this function into two parts: setup and cleanup. Setup will be executed when we enter the with block, while cleanup will run when we’re about to exit it. By the way, this scheme of multiple entry and exit points in one function is typically referred to as coroutine, and it allows for several very intriguing techniques of smart computation.

Usage of temp_directory function is pretty obvious, I’d say. Here’s a simplified excerpt of the Git-based deployment script that I used it in:

  1. import subprocess
  2. shell = lambda cmd: subprocess.call(cmd, shell=True)
  3.  
  4. orig_repo = os.getcwd()
  5. with temp_directory():
  6.     shell('git clone --shared %s .' % orig_repo)
  7.     shell('./build')
  8.     shell('git add -f ' + build_products)
  9.     shell('git commit -m "%s"' % message)
  10.     shell('git push %s master' % deploy_remote)

Note how the meaning of '.' (current directory) shifts depending on whether we’re inside or outside the with block. Users of Fabric (Python- and SSH-based remote administration tool) will find this very similar to its cd context manager. The main difference is of course that directory we’re cd-ing to is not a predetermined one, and that it will disappear once we’re done with it.

Tags: , , ,
Author: Xion, posted under Programming » Comments Off on Working within Temporary Directory
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.