Posts tagged ‘dictionaries’

Dictionary with Missing Values

2012-06-29 22:36

When working with dictionaries in Python, or any equivalent data structure in some other language, it is quite important to remember the difference between a key which is not present and a key that maps to None (null) value. We often tend to blur the distinction by using dict.get:

  1. thingie = some_dict.get('key')
  2. if thingie:
  3.     do_stuff_with(thingie)

We do that because more often that not, None and other falsy values (such as empty strings) are not interesting on their own, so we may as well lump them together with the “no value at all” case.

There are some situations, however, where these variants shall be treated separately. One of them is building a dictionary of keyword arguments that are subsequently ‘unpacked’ through the **kwargs construct. Consider, for example, this code:

  1. import re
  2. from bs4 import BeautifulSoup
  4. def find_images_by_text(html, text, in_alt=True, in_title=False):
  5.     """Search for <img> tags 'matching' given text in HTML document."""
  6.     regex = re.compile(".*%s.*" % re.escape(text), re.IGNORE_CASE)
  7.     attrs = {}
  8.     if in_alt:
  9.         attrs['alt'] = regex
  10.     if in_title:
  11.         attrs['title'] = regex
  12.     return BeautifulSoup(html).find_all('img', **attrs)

With a key mapping to None, we’re calling the function with argument explicitly set to None. Without the key present, we’re not passing the argument at all, allowing it to assume its default value.

But adding or not adding a key to dictionary is somewhat more cumbersome than mapping it to some value or None. The latter can be done with conditional expression (x if cond else None), together with many other keys and value at once. The former requires an if statement, as shown above.
Would it be convenient if we had a special “missing” value that could be used like None, but caused the key to not be added to dictionary at all? If we had it, we could (for example) rewrite parts of the previous function that currently contain if branches:

  1. attrs = {'alt': regex if in_alt else missing,
  2.          'title': regex if in_title else missing}

It shouldn’t be surprising that we could totally introduce such a value and extend dict to support this functionality – after all, it’s Python we’re talking about :) Patching the dict class itself is of course impossible, but we can inherit it and come up with something like the following piece:

  1. class MissingDict(dict):
  2.     def __init__(self, **kwargs):
  3.         kwargs = dict((k, v) for (k, v) in kwargs.iteritems()
  4.                       if v is not missing)
  5.         dict.__init__(self, **kwargs)
  7. missing = object()

The magical missing object is only a marker here, used to filter out keys that we want to ignore.

With this class at hand, some dictionary manipulations become a bit shorter:

  1. def find_images_by_text(html, text, in_alt=True, in_title=False):
  2.     regex = re.compile(".*%s.*" % re.escape(text), re.IGNORE_CASE)
  3.     return BeautifulSoup(html).find_all('img', **MissingDict(
  4.         alt=regex if in_alt else missing,
  5.         title=regex if in_title else missing))

We could take this idea further and add support for missing not only initialization, but also other dictionary operations – most notably the __setitem__ assignments. This gist shows how it could be done.

Tags: , ,
Author: Xion, posted under Computer Science & IT » Comments Off on Dictionary with Missing Values

Python’s setdefault Considered Harmful

2012-01-28 19:26

Python dictionaries have an inconspicuous method named setdefault. Asking for its description, we’ll be presented with rather terse interpretation:

  1. >>> help(dict.setdefault)
  3. setdefault(...)
  4.     D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

While it might not be immediately obvious what we could use this method for, we can actually find quite a lot of applications if we only pay a little attention. The main advantage of setdefault seems to lie in elimination of ifs; a feat we can feel quite smug about. As an example, consider a function which groups list of key-value pairs (with possible duplicate keys) into a dictionary of lists:

  1. def group_assoc_list_(a_list): # with if
  2.     res = {}
  3.     for k, v in a_list:
  4.         if k in res:
  5.             res[k].append(v)
  6.         else:
  7.             res[k] = [v]
  8.     return res
  10. def group_assoc_list(a_list): # with setdefault
  11.     res = {}
  12.     for k, v in a_list:
  13.         res.setdefault(k, []).append(v)
  14.     return res

This is something which we could do when parsing query string of an URL, if there weren’t a readily available API for that:

  1. def parse_query_string(qs):
  2.     return [arg.split('=') for arg in qs.split('&')]
  4. >>> group_assoc_list(parse_query_string('a=1&b=2&c=3&b=4&a=5'))
  5. {'a': ['1', '5'], 'b': ['2', '4'], 'c': ['3']}

So, with setdefault we can get the same job done more succinctly. It would seem that it is a nifty little function, and something we should keep in mind as part of our toolbox. Let’s remember it and move on, shall we?…

Actually, no. setdefault is really not a good piece of dict‘s interface, and the main reason we should remember about it is mostly for its caveats. Indeed, there are quite a few of them, enough to shrink the space of possible applications to rather tiny size. As a result, we should be cautious whenever we see (or write) setdefault in our own code.

Here’s why.


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with