Beware of Sneaky Tuples

2014-01-12 0:47

If you code in Python, then chances are that at some point, you have written a check similar to this one:

  1. def function(arg):
  2.     if not isinstance(arg, basestring):
  3.         raise TypeError("%r is not a string" % arg)
  4.     # ... rest of the function ...

Some would of course argue against putting such an explicit if in the code, insisting to rely on duck typing instead. But while this is an easy target of critique, it’s nowhere near the biggest problem you can find in the snippet above.

This code has a subtle bug. The bug is not even limited to checks like this one; it can occur in many different situations. It surfaces rarely, too, so it’s all the more surprising when it actually rears its ugly head.

The bug is related to string formatting, which in this case points to this expression:

  1. "%r is not a string" % arg

Most of the time, it is perfectly fine and works flawlessly. But since arg is a value we do not have any control over, sometimes it may not work correctly. Sometimes, it can just blow the whole thing up, likely in a way we have not intended.

Wild tuple appears!

All it takes is for arg to be a tuple – any tuple. Tuples are special, because the string formatting operator (%) expects you’ll use them to pass more than one argument to fill in placeholders in the string:

  1. def print_square(x):
  2.     print "%d ^ 2 = %d" % (x, x*x)

The construct of a string followed by percent sign, followed by parenthesis, is very likely familiar to you. Notice, however, that there is nothing exceptional about using a tuple literal: what is important is the tuple type. Indeed, we could rewrite the above in the following manner:

  1. def print_square(x):
  2.     args = (x, x*x)
  3.     print "%d ^ 2 = %d" % args

and the end result would be exactly the same. The only reason we prefer the first version is its obviously superior readability.

Tuple uses Misformat! It’s super effective!

Comparing that last piece of code with the first one, we can see quite clearly how everything will go horribly wrong should we try to format the TypeError‘s message using arg which happens to be a tuple. Not just one, but three different failure modes are possible here:

  • empty tuple (too few arguments for string formatting)
  • tuple with at least 2 elements (too many arguments)
  • tuple with exactly one element

Last one is particularly jarring. It raises no exceptions on by itself, and can additionally result in confusing messages, along the lines of:

  1. 'Alice has a cat' is not a string

Much head-scratching would probably ensue if you stumbled upon exception that reports something like this.

Tuple was caught!

To avoid these problems, one solution is to engage in some sort of pythonic homeopathy. As it turns out, we can cure the malady of tuples by adding even more tuples:

  1. raise TypeError("%r is not a string" % (arg,))

Through this weird (arg,) singleton (1-tuple), we are explicitly sidestepping the error-prone feature of % operator, where it allows a single right-hand side argument to be passed directly. Instead, we are always wrapping all the arguments in a tuple – yes, even if it means using the bizarre (1,) syntax. This way, we can fully control how many of arguments we actually give to the formatter, regardless of what they are and where did they come from.

It’s not pretty, I know – it adds some visual clutter. But the total alternative, the format method, is even more verbose and ridden with issues. C’est la vie.

Tags: , ,
Author: Xion, posted under Programming »


5 comments for post “Beware of Sneaky Tuples”.
  1. Xender:
    January 12th, 2014 o 14:49

    Or, instead using %r, you could just explicitely use repr(arg) – problem solved and code is more readable. ;)

    Also, about linked StackOverflow thread – you shouldn’t complaining about issue, that has been fixed in next version (first sentence from first answer – “Your original code actually works in Python 2.7”).

    And why not Python3?

  2. Xion:
    January 19th, 2014 o 14:35

    “you shouldn’t complaining about issue, that has been fixed in next version”
    I’m not complaining, I’m warning. The choice of Python version to use is also not something that is always in your control, too.

    “And why not Python3?”
    I’d flip the question around: why Python 3?

  3. Xender:
    January 21st, 2014 o 19:07

    If your intention was to warn, not to complain, then sorry for misunderstanding.
    Well, of course Python version is up to choice of the programmer, but most of the time I’ve seen 2.newest or 3.newest used – that is, 2.7 or 3.3 at the time.

    Why Py3K? Well, it has some nice things, like strings being by default Unicode etc., but more importantly – version 2 is not being developed anymore, so all new things go to 3. And in IT, choosing to code in old technology could be not the best decision…

    I know, Python 2.7 isn’t “old” yet, but it has it’s years (it was released in 2010, right?), so it will eventually become old.

  4. Kos:
    February 3rd, 2014 o 10:38

    What bites me about `format` in py2 is that it’s more strict with unicode. Compare:


    >>> '%s' % u'żółw'
    u'\u017c\xf3\u0142w'
    >>> '{}'.format(u'żółw')
    Traceback (most recent call last):
    File "", line 1, in
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

  5. Xion:
    February 9th, 2014 o 13:39

    “Well, of course Python version is up to choice of the programmer, but most of the time I’ve seen 2.newest or 3.newest used – that is, 2.7 or 3.3 at the time.”

    Not if you have existing system that’s written in 2.6 and cannot be upgraded easily. Granted, there aren’t many stark incompatibilities between, say, 2.6 and 2.7, but when upgrade itself needs justification (because, you know, it’s commercial production system) then “things are unlikely to break” might not be enough.

    “(…) version 2 is not being developed anymore, so all new things go to 3. ”

    Whether or not it’s a good thing depends on your view on the direction Python 3 is going. I’m quite indifferent or even nonplussed about it right now, although I can imagine once they get the async stuff finalized, going with 3 may be more compelling. So far I don’t really see anything that gives you obvious benefits and is not available in __future__.

    “I know, Python 2.7 isn’t “old” yet, but it has it’s years (it was released in 2010, right?), so it will eventually become old.”

    If only the Java guys could hear you ;-) “Old” or not, it is and will be maintained for foreseeable future; the most recent release (2.7.6) was only in November 2013.

    “What bites me about `format` in py2 is that it’s more strict with unicode.”

    It’s actually ANSI vs. Unicode separation which (1) you can get rid of (from __future__ import unicode_literals); and (2) it makes sense if you stick with it. Your example mixes narrow and wide char strings so you’re asking for error (u'{}'.format(u'żółw') works just fine).

Comments are disabled.
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.