Posts tagged ‘tuples’

Beware of Sneaky Tuples

2014-01-12 0:47

If you code in Python, then chances are that at some point, you have written a check similar to this one:

  1. def function(arg):
  2.     if not isinstance(arg, basestring):
  3.         raise TypeError("%r is not a string" % arg)
  4.     # ... rest of the function ...

Some would of course argue against putting such an explicit if in the code, insisting to rely on duck typing instead. But while this is an easy target of critique, it’s nowhere near the biggest problem you can find in the snippet above.

This code has a subtle bug. The bug is not even limited to checks like this one; it can occur in many different situations. It surfaces rarely, too, so it’s all the more surprising when it actually rears its ugly head.

The bug is related to string formatting, which in this case points to this expression:

  1. "%r is not a string" % arg

Most of the time, it is perfectly fine and works flawlessly. But since arg is a value we do not have any control over, sometimes it may not work correctly. Sometimes, it can just blow the whole thing up, likely in a way we have not intended.

Wild tuple appears!

All it takes is for arg to be a tuple – any tuple. Tuples are special, because the string formatting operator (%) expects you’ll use them to pass more than one argument to fill in placeholders in the string:

  1. def print_square(x):
  2.     print "%d ^ 2 = %d" % (x, x*x)

The construct of a string followed by percent sign, followed by parenthesis, is very likely familiar to you. Notice, however, that there is nothing exceptional about using a tuple literal: what is important is the tuple type. Indeed, we could rewrite the above in the following manner:

  1. def print_square(x):
  2.     args = (x, x*x)
  3.     print "%d ^ 2 = %d" % args

and the end result would be exactly the same. The only reason we prefer the first version is its obviously superior readability.

Tuple uses Misformat! It’s super effective!

Comparing that last piece of code with the first one, we can see quite clearly how everything will go horribly wrong should we try to format the TypeError‘s message using arg which happens to be a tuple. Not just one, but three different failure modes are possible here:

  • empty tuple (too few arguments for string formatting)
  • tuple with at least 2 elements (too many arguments)
  • tuple with exactly one element

Last one is particularly jarring. It raises no exceptions on by itself, and can additionally result in confusing messages, along the lines of:

  1. 'Alice has a cat' is not a string

Much head-scratching would probably ensue if you stumbled upon exception that reports something like this.

Tuple was caught!

To avoid these problems, one solution is to engage in some sort of pythonic homeopathy. As it turns out, we can cure the malady of tuples by adding even more tuples:

  1. raise TypeError("%r is not a string" % (arg,))

Through this weird (arg,) singleton (1-tuple), we are explicitly sidestepping the error-prone feature of % operator, where it allows a single right-hand side argument to be passed directly. Instead, we are always wrapping all the arguments in a tuple – yes, even if it means using the bizarre (1,) syntax. This way, we can fully control how many of arguments we actually give to the formatter, regardless of what they are and where did they come from.

It’s not pretty, I know – it adds some visual clutter. But the total alternative, the format method, is even more verbose and ridden with issues. C’est la vie.

Tags: , ,
Author: Xion, posted under Programming » 5 comments

Lists vs Tuples

2013-09-04 18:42

Among the many things that Python does right, there are also a few that could have been better thought out. No language is perfect, of course, but since Python is increasingly adopted as the language of choice for complete beginners in programming, the net effect will be many new coders being heavily influenced by ideas they encounter here.

And unfortunately, some of those ideas are dead wrong. A chief example is the relation between lists and tuples, specifically the mistaken notion of their mutual similarity. In reality – that is, in every other language – they are completely different concepts, and rightfully so.

What is list?…

The term ‘list’ in programming is sometimes meant to refer to ‘linked list’ – one of the simple data structures you’d typically encounter in the introductory course to algorithms. It consists of a sequence of separately allocated elements, stitched together using pointers. Depending on the density and direction of those pointers, the type is further subdivided into singly and doubly linked lists, cyclic lists, and so on.

But the broader meaning, which is also more common now, describes a general collection of elements that can be linearly traversed, regardless of the details of its underlying representation. The List interface in Java or .NET is intended to capture the idea, and it does so pretty accurately. Concrete implementations (like ArrayList or LinkedList) can still be decided between if those details turn out to be important, but most of the code can deal with lists in quite abstract, storage-agnostic manner.

…in Python

In Python, lists generally conform to the above description. Here, you cannot really decide how they are stored under the hood (array module notwithstanding), but the interface is what you would expect:

  1. files = get_file_list('/home/xion')
  2. if files[0] == '.':
  3.     files.pop(0)
  4. if files[0] == '..':
  5.     files.pop(0)
  6. print "Found ", len(files), " files."
  7. for file in files:
  8.     print "- ", file

You can insert, append and remove elements, as well as iterate over the list and access elements by indexes. There is also a handy bracket notation for list literals in the code:

  1. a_list = [1, 2, 3]

About the only weird thing is the fact that you can freely put objects of different types into the same list:

  1. diverse_list = [42, "foo", False]

In a way, though, this appears to be in sync with the general free-form nature of Python. For analogy, think about List<Object> in Java that can do exactly the same (or, in fact, any List prior to introduction of generics).

On the other hand, this is a tuple:

  1. a_tuple = (1, 2, 3)

Besides brackets giving way to parentheses, there is no difference in literal notation. Similarly, you can iterate over the tuple just as well as you can over the list:

  1. for x in a_tuple:
  2.    print x

in addition to accessing elements by index. What you cannot do, though, is modifying a tuple once it is created: be it by trying to add more elements, or changing existing ones:

  1. >>> a_tuple[0] = 42
  2. Traceback (most recent call last):
  3.   File "<stdin>", line 1, in <module>
  4. TypeError: 'tuple' object does not support item assignment

But just like with lists, there is no limitation for the types of elements you put therein:

  1. diverse_tuple = (42, "foo", False)

All in all, a tuple in Python behaves just like an immutable (unchangeable) list and can be treated as such for all intents and purposes.

Tuplication

Now, let’s look at tuples in some other programming languages. Those that support them one way or another include C++, D, Haskell, Rust, Scala, and possibly few more exotic ones. There is also a nascent support for tuple-like constructs in Go, but it’s limited to returning multiple results from functions.

In any of these, tuple means several objects grouped together for a particular purpose. Unlike a list, it’s not a collection. That’s because:

  • objects retain their individual identity
  • their number is not arbitrary; in fact, tuples of different count have separate names: pairs, triples, etc.
  • objects have positions assigned: when handling a tuple, it is expected that certain thing X will be at position N

Statically typed languages expand especially on the last point, making it possible to specify what type we expect to see at every position. Together, they constitute the tuple type:

  1. typedef tuple<int, string> numbered_line;
  2.  
  3. // Example usage
  4. vector<numbered_line> read_numbered_lines(string filename, int start=1) {
  5.     vector<numbered_line> result;
  6.     ifstream infile(filename);
  7.     string line;
  8.     for (int i = start; getline(filename, line); ++i) {
  9.         result.push_back(make_tuple(i, line));
  10.     }
  11.     return result;
  12. }

In other words, you don’t operate on “tuples” in general. You use instances of specific tuple types – like a pair of int and string – to bind several distinct values together.
Then, as your code evolves, you may need to have more of them, or to manipulate them in more sophisticated ways. To accommodate, you may either expand the tuple type, or increase readability at the (slight) cost of verbosity and transform it into a structure:

  1. struct numbered_line {
  2.     string filename;
  3.     int number;
  4.     string text;
  5. };

Once here, you can stay with functions operating on those structures, or refactor those into methods, or use some combination of those two approaches.

Missing link

But in Python, that link between tuples and structures is almost completely lost. There is one built-in tuple type, which is simply tuple, but even that isn’t really paid attention to. What happens instead is that tuples are lumped together with lists and other collections into the umbrella term of “iterable”, i.e. something which can iterated over (typically via for loop).

As a result, a tuple can be converted to list and list can be converted to tuple. Both operations make no sense whatsoever. But they are possible, and that leads to tuples and lists being used interchangeably, making it harder to identify crucial pieces of data that drive your logic. And so they will grow uncontrollably, rather than having been harnessed at appropriate time into an object, or a namedtuple at the very least.

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Linus Torvalds

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Lists vs Tuples
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.