Posts tagged ‘syntax errors’

Hashbang Hacks: Parameters for Python

2013-08-18 14:00

This:

  1. #!/bin/sh

is an example of hashbang. It’s a very neat Unix concept: when placed at the beginning of a script, the line starting with # (hash) and ! (bang) indicates an interpreter that should be chosen when running the script as an executable. Often used for shells (#!/bin/bash, #!/bin/zsh…), it also works for many regular programming languages, like Ruby, Python or Perl. Some of them may not even use # as comment character but still allow for hashbangs, simply by ignoring such a first line. Funnily enough, this is just enough to fully “support” them, as the choice of interpreter is done at the system level.

Sadly, though, the only portable way to write a hashbang is to follow it with absolute path to an executable, which makes it problematic for pretty much anything other than /bin/*sh.
Take Python as an example. On many Linuxes it will be under /usr/bin/python, but that’s hardly a standard. What about /usr/local/bin/python? ~/bin/python?… Heck, one Python I use is under /usr/local/Cellar/python/2.7.3/bin – that’s installed by Homebrew on OS X, a perfectly valid Unix! And I haven’t even mentioned virtualenv

This madness is typically solved by a standard tool called env, located under /usr/bin on anything at least somewhat *nixy:

  1. #!/usr/bin/env python

env looks up the correct executable for its argument, relying on the PATH environmental variable (hence its name). Thanks to env, we can solve all of the problems signaled above, and any similar woes for many other languages. That’s because by the very definition, running Python file with the above hashbang is equivalent to passing it directly to the interpreter:

  1. $ cat >hello.py «EOF
  2. #!/usr/bin/env python
  3. print "Hello, world"
  4. EOF
  5. $ chmod a+x hello.py
  6. $ ./hello.py
  7. Hello, world
  8. $ python hello.py
  9. Hello, world

Now, what if you wanted to also include some flags in interpreter invocation? For python, for example, you can add -O to turn on some basic optimizations. The seemingly obvious solution is to include them in hashbang:

  1. #!/usr/bin/env python -O

Although this may very well work, it puts us again into “not really portable” land. Thankfully, there is a very ingenious (but, sadly, quite Python-specific) trick that lets us add arguments and be confident that our program will run pretty much anywhere.

Here’s how it looks like:

  1. #!/bin/sh
  2. """"exec python -O "$0" "$@";" """

Understandably, it may not be immediately obvious how does it work. Let’s dismantle the pieces one by one, so we can see how do they all fit together – down not just to every quotation sign, but also to every space.

The Infernal Comma

2012-04-16 19:58

It came up today as a real surprise to me. Up until then, I thought that long gone were the times when I stared at plain old syntax errors in confused bewilderment. Well, at least if we’re talking languages I have some experience with, like Python. So when it happened to me today, I was really caught off-guard.

The crux of the issue can be demonstrated in the following, artificial example:

  1. from lxml.builder import E
  2.  
  3. def user_to_xml(user):
  4.     address = [E.address(
  5.         street=user.address.street,
  6.         zipcode=user.address.zipcode,
  7.         city=user.address.city,
  8.     )] if user.address else []
  9.     return E.user(
  10.         dict(first_name=user.first_name,
  11.              last_name=user.last_name),
  12.         *address,
  13.     )

The goal is to build some simple XML tree using the most convenient interface, i.e. the lxml.builder.E manipulator from the lxml library. The real code is somewhat longer and more complicated but this snippet encapsulates the issue pretty neatly.

And strange as it may seem, this little piece produces a SyntaxError at the final closing parenthesis:

  1. SyntaxError: invalid syntax (at line 13 col 5)

In such case, the first obvious thing anyone would do is of course to look for unmatched opening brace. With the aid of modern editors (or even not so modern ones ;>) this is a trivial task. Before too long we would therefore find out that… all the braces are fine. Double-checking, just to be sure, will have the same result. Everything appears to be in order.

But, of course, we still have the syntax error. What the hell?!

As it turns out, the offending line is just above the seemingly erroneous parentheses. It’s this one:

  1. *address,

Or, to be more specific, it is the very last character of this line that the interpreter has problems with:

  1. *address, # comma!

See, Python really doesn’t like this trailing comma. Which, admittedly, is more than surprising, given how lenient it is in pretty much any other setting. You may recall that it’s perfectly OK to include the additional comma after the final element of a list, tuple, or dictionary, and it is quite useful to do so in practice. Not only that – it is also possible for argument lists in function call. Indeed, this very fragment has one instance of such trailing comma that appears after a keyword argument (city=user.address.city,).

But apparently this doesn’t really work for all kinds of arguments. If we unpack some positional ones (using * operator), we cannot put a comma afterwards. The relevant part of Python grammar specification is stating this, of course:

  1. arglist: (argument ',')* (argument [',']
  2.                          |'*' test (',' argument)* [',' '**' test]
  3.                          |'**' test)

but I wouldn’t call it very explicit. And it seems that you actually can have a comma after *foo but only if another argument follows. If my intuition of formal grammars is correct, the reason for this rule to prohibit foo(*args,) (or foo(**kwargs,) for that matter) is strictly related to the fact than Python’s grammar is LL(1). And this, by the way, is here to stay. Quoting PEP 3099:

Simple is better than complex. This idea extends to the parser. Restricting Python’s grammar to an LL(1) parser is a blessing, not a curse. It puts us in handcuffs that prevent us from going overboard and ending up with funky grammar rules like some other dynamic languages that will go unnamed, such as Perl.

I, for one, deem this attitude completely reasonable – even if it results in 20 minutes of utter confusion once in a blue moon.

Footnote: The title is of course a not-so-obvious reference to The Infernal Semicolon.

Tags: , , ,
Author: Xion, posted under Computer Science & IT » 2 comments
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.