Hashbang Hacks: Parameters for Python

2013-08-18 14:00

This:

  1. #!/bin/sh

is an example of hashbang. It’s a very neat Unix concept: when placed at the beginning of a script, the line starting with # (hash) and ! (bang) indicates an interpreter that should be chosen when running the script as an executable. Often used for shells (#!/bin/bash, #!/bin/zsh…), it also works for many regular programming languages, like Ruby, Python or Perl. Some of them may not even use # as comment character but still allow for hashbangs, simply by ignoring such a first line. Funnily enough, this is just enough to fully “support” them, as the choice of interpreter is done at the system level.

Sadly, though, the only portable way to write a hashbang is to follow it with absolute path to an executable, which makes it problematic for pretty much anything other than /bin/*sh.
Take Python as an example. On many Linuxes it will be under /usr/bin/python, but that’s hardly a standard. What about /usr/local/bin/python? ~/bin/python?… Heck, one Python I use is under /usr/local/Cellar/python/2.7.3/bin – that’s installed by Homebrew on OS X, a perfectly valid Unix! And I haven’t even mentioned virtualenv

This madness is typically solved by a standard tool called env, located under /usr/bin on anything at least somewhat *nixy:

  1. #!/usr/bin/env python

env looks up the correct executable for its argument, relying on the PATH environmental variable (hence its name). Thanks to env, we can solve all of the problems signaled above, and any similar woes for many other languages. That’s because by the very definition, running Python file with the above hashbang is equivalent to passing it directly to the interpreter:

  1. $ cat >hello.py «EOF
  2. #!/usr/bin/env python
  3. print "Hello, world"
  4. EOF
  5. $ chmod a+x hello.py
  6. $ ./hello.py
  7. Hello, world
  8. $ python hello.py
  9. Hello, world

Now, what if you wanted to also include some flags in interpreter invocation? For python, for example, you can add -O to turn on some basic optimizations. The seemingly obvious solution is to include them in hashbang:

  1. #!/usr/bin/env python -O

Although this may very well work, it puts us again into “not really portable” land. Thankfully, there is a very ingenious (but, sadly, quite Python-specific) trick that lets us add arguments and be confident that our program will run pretty much anywhere.

Here’s how it looks like:

  1. #!/bin/sh
  2. """"exec python -O "$0" "$@";" """

Understandably, it may not be immediately obvious how does it work. Let’s dismantle the pieces one by one, so we can see how do they all fit together – down not just to every quotation sign, but also to every space.

That’s no moon; it’s a shell script

First thing worth pointing out is that we’re back to the most standard hashbang: #!/bin/sh. So we pretend our Python file is a shell script now, but a very special kind of shell script. The only command we want it to execute is this:

  1. exec python -O "$0" "$@"

exec will tell the shell to replace the process running our script with given command. In our case, this is the Python interpreter, along with the flags we want (-O).

We don’t just run it, though. We tell it to execute a specific Python file, which happens to be the the very same file that was ran through sh just a moment ago. Its name and path is stored in the $0 variable. Under $@, on the other hand, we can find all the command line arguments that the user has originally passed.

Since all this juggling can get quite confusing, it’s best explained by an example. If foo.py is our Python file, then invoking it as:

  1. $ ./foo.py forty two

will put ./foo.py into $0 and forty two into $@. As a result, shell gets to execute the following command:

  1. exec python -O "./foo.py" "forty two"

which correctly invokes Python interpreter (with -O flag), telling it to run our code.

Python has no business here

But how does Python happen to handle our file properly? Does it “know” to skip over the parts that are only intended for the shell?…
Turns out, it totally does. Those two lines have virtually zero effect on any Python code that follows. The code is also recognized, parsed and executed just like any other Python source.

Shrugging off the first line is easy to explain. Like in shell scripts, hash (#) starts a comment in Python. There is not even a need to have any special machinery for handling hashbangs inside the interpreter. Normal parsing of code is just enough.

As for the second line… Well, that’s a string – a docstring, to be specific. Those texts are only in the code for documentation purposes and they normally don’t affect execution of Python programs. For all intents and purposes, the second line can therefore be also treated as nonexistent.

Always be closing

In the end, it all boils down to combination of syntactic tricks that fool Python but appease the shell.

The first one is using triple quotations (""") for the docstring/command. This a common way for putting long strings into Python code, especially texts spanning multiple lines. Moreover, it allows for the quotation character itself to be entered without escaping; this is crucial, as our exec command needs them to work correctly.
But from the shell’s PoV, three quotes signify an empty string, followed by a start of another string – but not the end of it. To fix that, we add a fourth pair of quotes, thus “closing off” the string for bash. For Python, this just means the docstring text starts with " character – a weird but perfectly valid circumstance.

Now, you could think we can pull off the exact same trick at the end of the line. Again we need a triple of quotation characters, so let’s add one more to even everything out:

  1. """"exec python -O "$0" "$@" """"

The minor problem here is that those two empty strings would actually be passed to our Python script. Whether they would mess with its code depends entirely on the particulars of Python logic; they might very well go unnoticed.
The bigger problem, however, is that the Python code in this form would not even parse. Just like in many other programming languages, the tokenizer in Python is greedy: it tries to match as many characters as possible into a single token. Staying true to this principle, a sequence of four " chars will always be interpreted as triple (""") followed by a single (") quotation. And that’s exactly how it was done just a second ago, when we found the behavior very desirable.

It’s not so useful now, though. We obviously need to break the """" chain with a well placed space. However, this will make the issue of extra arguments somewhat more pronounced.
But that’s fine, since the problem is easy to solve. Commands in bash can be terminated not only with newline, but also with semicolon. Add one, and we can deal with the quotations separately:

  1. """"exec python -O "$0" "$@";" """

Aaand… that’s it. We have arrived at the final solution. Pretty trivial, eh? ;)

Tags: , , , , ,
Author: Xion, posted under Computer Science & IT »


One comment for post “Hashbang Hacks: Parameters for Python”.
  1. Xender:
    August 20th, 2013 o 13:39

    Actually, “$@” will evaluate to “forty” “two”, not “forty two”. And this is correct behaviour, as we don’t want to glue all arguments in one string, as “$*” would do (well, at least under bash, zsh treats “$*” same as “$@”).

Comments are disabled.
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.