Dictionary with Missing Values

2012-06-29 22:36

When working with dictionaries in Python, or any equivalent data structure in some other language, it is quite important to remember the difference between a key which is not present and a key that maps to None (null) value. We often tend to blur the distinction by using dict.get:

  1. thingie = some_dict.get('key')
  2. if thingie:
  3.     do_stuff_with(thingie)

We do that because more often that not, None and other falsy values (such as empty strings) are not interesting on their own, so we may as well lump them together with the “no value at all” case.

There are some situations, however, where these variants shall be treated separately. One of them is building a dictionary of keyword arguments that are subsequently ‘unpacked’ through the **kwargs construct. Consider, for example, this code:

  1. import re
  2. from bs4 import BeautifulSoup
  3.  
  4. def find_images_by_text(html, text, in_alt=True, in_title=False):
  5.     """Search for <img> tags 'matching' given text in HTML document."""
  6.     regex = re.compile(".*%s.*" % re.escape(text), re.IGNORE_CASE)
  7.     attrs = {}
  8.     if in_alt:
  9.         attrs['alt'] = regex
  10.     if in_title:
  11.         attrs['title'] = regex
  12.     return BeautifulSoup(html).find_all('img', **attrs)

With a key mapping to None, we’re calling the function with argument explicitly set to None. Without the key present, we’re not passing the argument at all, allowing it to assume its default value.

But adding or not adding a key to dictionary is somewhat more cumbersome than mapping it to some value or None. The latter can be done with conditional expression (x if cond else None), together with many other keys and value at once. The former requires an if statement, as shown above.
Would it be convenient if we had a special “missing” value that could be used like None, but caused the key to not be added to dictionary at all? If we had it, we could (for example) rewrite parts of the previous function that currently contain if branches:

  1. attrs = {'alt': regex if in_alt else missing,
  2.          'title': regex if in_title else missing}

It shouldn’t be surprising that we could totally introduce such a value and extend dict to support this functionality – after all, it’s Python we’re talking about :) Patching the dict class itself is of course impossible, but we can inherit it and come up with something like the following piece:

  1. class MissingDict(dict):
  2.     def __init__(self, **kwargs):
  3.         kwargs = dict((k, v) for (k, v) in kwargs.iteritems()
  4.                       if v is not missing)
  5.         dict.__init__(self, **kwargs)
  6.  
  7. missing = object()

The magical missing object is only a marker here, used to filter out keys that we want to ignore.

With this class at hand, some dictionary manipulations become a bit shorter:

  1. def find_images_by_text(html, text, in_alt=True, in_title=False):
  2.     regex = re.compile(".*%s.*" % re.escape(text), re.IGNORE_CASE)
  3.     return BeautifulSoup(html).find_all('img', **MissingDict(
  4.         alt=regex if in_alt else missing,
  5.         title=regex if in_title else missing))

We could take this idea further and add support for missing not only initialization, but also other dictionary operations – most notably the __setitem__ assignments. This gist shows how it could be done.

Tags: , ,
Author: Xion, posted under Computer Science & IT »



Adding comments is disabled.

Comments are disabled.
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.