The Infernal Comma

2012-04-16 19:58

It came up today as a real surprise to me. Up until then, I thought that long gone were the times when I stared at plain old syntax errors in confused bewilderment. Well, at least if we’re talking languages I have some experience with, like Python. So when it happened to me today, I was really caught off-guard.

The crux of the issue can be demonstrated in the following, artificial example:

  1. from lxml.builder import E
  2.  
  3. def user_to_xml(user):
  4.     address = [E.address(
  5.         street=user.address.street,
  6.         zipcode=user.address.zipcode,
  7.         city=user.address.city,
  8.     )] if user.address else []
  9.     return E.user(
  10.         dict(first_name=user.first_name,
  11.              last_name=user.last_name),
  12.         *address,
  13.     )

The goal is to build some simple XML tree using the most convenient interface, i.e. the lxml.builder.E manipulator from the lxml library. The real code is somewhat longer and more complicated but this snippet encapsulates the issue pretty neatly.

And strange as it may seem, this little piece produces a SyntaxError at the final closing parenthesis:

  1. SyntaxError: invalid syntax (at line 13 col 5)

In such case, the first obvious thing anyone would do is of course to look for unmatched opening brace. With the aid of modern editors (or even not so modern ones ;>) this is a trivial task. Before too long we would therefore find out that… all the braces are fine. Double-checking, just to be sure, will have the same result. Everything appears to be in order.

But, of course, we still have the syntax error. What the hell?!

As it turns out, the offending line is just above the seemingly erroneous parentheses. It’s this one:

  1. *address,

Or, to be more specific, it is the very last character of this line that the interpreter has problems with:

  1. *address, # comma!

See, Python really doesn’t like this trailing comma. Which, admittedly, is more than surprising, given how lenient it is in pretty much any other setting. You may recall that it’s perfectly OK to include the additional comma after the final element of a list, tuple, or dictionary, and it is quite useful to do so in practice. Not only that – it is also possible for argument lists in function call. Indeed, this very fragment has one instance of such trailing comma that appears after a keyword argument (city=user.address.city,).

But apparently this doesn’t really work for all kinds of arguments. If we unpack some positional ones (using * operator), we cannot put a comma afterwards. The relevant part of Python grammar specification is stating this, of course:

  1. arglist: (argument ',')* (argument [',']
  2.                          |'*' test (',' argument)* [',' '**' test]
  3.                          |'**' test)

but I wouldn’t call it very explicit. And it seems that you actually can have a comma after *foo but only if another argument follows. If my intuition of formal grammars is correct, the reason for this rule to prohibit foo(*args,) (or foo(**kwargs,) for that matter) is strictly related to the fact than Python’s grammar is LL(1). And this, by the way, is here to stay. Quoting PEP 3099:

Simple is better than complex. This idea extends to the parser. Restricting Python’s grammar to an LL(1) parser is a blessing, not a curse. It puts us in handcuffs that prevent us from going overboard and ending up with funky grammar rules like some other dynamic languages that will go unnamed, such as Perl.

I, for one, deem this attitude completely reasonable – even if it results in 20 minutes of utter confusion once in a blue moon.

Footnote: The title is of course a not-so-obvious reference to The Infernal Semicolon.

Tags: , , ,
Author: Xion, posted under Computer Science & IT »


2 comments for post “The Infernal Comma”.
  1. Liosan:
    April 16th, 2012 o 22:47

    I recently got a bulldozer to the face when I found what the difference between

    localVar = complexFunction(lotsaArguments)

    and

    localVar = complexFunction(lotsaArguments),

    is. I can give you a hint – a syntax error would be a blessing. Considering it was part of a complex, 2-hours-long multi-threaded exception-ridden system test… well… things only got worse.

  2. Xion:
    April 16th, 2012 o 23:48

    Hah. Seems like Python is nice, clean and dandy – as long as you remember to put only the right amount of commas :)

Comments are disabled.
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.