Posts tagged ‘higher-order functions’

Adding Recursive Depth to Our Functions

2012-03-05 22:46

I suppose it this not uncommon to encounter a general situation such as the following. Say you have some well-defined function that performs a transformation of one value into another. It’s not particularly important how lengthy or complicated this function is, only that it takes one parameter and outputs a result. Here’s a somewhat trivial but astonishingly useful example:

  1. def is_true(value):
  2.     ''' Checks whether given value can be interpreted as "true",
  3.    using various typical representations of truth. '''
  4.     s = str(value).lower()
  5.     can_be_true = s in ['1', 'y', 'yes', 'true']
  6.     can_be_false = s in ['0', 'n' 'no', 'false']
  7.     if can_be_true != (not can_be_false):
  8.         return bool(value) # fall back in case of inconsistency
  9.     return can_be_true

Depending on what happens in other parts of your program, you may find yourself applying such function to many different inputs. Then at some point, it is possible that you’ll need to handle lists of those inputs in addition to supporting single values. Query string of URLs, for example, often require such treatment because they may contain more than one value for given key, and web frameworks tend to collate those values into lists of strings.

In those situations, you will typically want to deal just with the list case. This leads to writing a conditional in either the caller code:

  1. if not isinstance(values, list):
  2.     values = [values]
  3. bools = map(is_true, values)

or directly inside a particular function. I’m not a big fan of similar solutions because everyone do them differently, and writing the same piece several times is increasingly prone to errors. Quite not incidentally, a mistake is present in the very example above – it shouldn’t be at all hard to spot it.

In any case, repeated application calls for extracting the pattern into something tangible and reusable. What I devised is therefore a general “recursivator”, whose simplified version is given below:

  1. def recursive(func):
  2.     ''' Creates a recursive function out of supplied one.
  3.    Resulting function recurses on lists, applying itself
  4.    to its elements. '''
  5.     def recursive_func(obj, *args, **kwargs):
  6.         if hasattr(obj, '__iter__'):
  7.             return [recursive_func(i, *args, **kwargs)
  8.                     for i in obj]
  9.         return obj
  10.     return recursive_func

As for usage, I think it’s equally feasible for both on-a-spot calls:

  1. bools = recursive(is_true)(values)

as well as decorating functions to make them recursive permanently. For this, though, it would be wise to turn it into class-based decorator, applying the technique I’ve described previously. This way we could easily extend the solution and tie it to our needs.

But what are the specific ways of doing so? I could think of some, like:

  • Recursing not only on lists, bit also on mappings (dictionaries) and applying the function to dictionary values. A common use case could be a kind of sanitization function for preparing values to be serialized, e.g. by turning datetimes into ISO-formatted strings.
  • Excluding some data types from recursion, preventing, say, sets from being turned into lists, as obviously sets are also iterable. In more general version, one could supply a predicate function for deciding whether to recurse or not.
  • Turning recursive into generator for more memory-efficient solution. If we’re lucky to program in Python 3.x, it would be a good excuse to employ the new yield from construct from 3.3.

One way or another, capturing a particular concept of computation into actual API such as recursive looks like a good way for making the code more descriptive and robust. Certainly it adheres to one of the statements from Zen: that explicit is better than implicit.

Tags: , , ,
Author: Xion, posted under Programming » Comments Off on Adding Recursive Depth to Our Functions

Generator Pitfalls

2012-02-08 19:34

I’m a big fan of Python generators. With seemingly little effort, they often allow to greatly reduce memory overhead. They do so by completely eliminating intermediate results of list manipulations. Along with functions defined within itertools package, generators also introduce a basic lazy computation capabilities into Python.
This combination of higher-order functions and generators is truly remarkable. Why? Because compared to ordinary loops, it gains us both speed and readability at the same time. That’s quite an achievement; typically, optimizations sacrifice one for the other.

All this goodness, however, comes with few strings attached. There are some ways in which we can be bitten by improperly used generators, and I think it’s helpful to know about them. So today, I’ll try to outline some of those traps.

Generators are lazy

You don’t say, eh? That’s one of the main reasons we are so keen on using them, after all! But what I’m implying is the following: if a generator-based code is to actually do anything, there must be a point where all this laziness “bottoms out”. Otherwise, you are just building expression thunks and not really evaluating anything.

How such circumstances might occur, though? One case involves using generators for their side effects – a rather questionable practice, mind you:

  1. def process_kvp(key, value=None):
  2.     print key,
  3.     if value: print "=", value
  5. itertools.starmap(process_kvp, some_dict.iteritems())

The starmap call does not print anything here, because it just constructs a generator. Only when this generator is iterated over, dictionary elements would be fed to process_kvp function. Such iteration would of course require a loop (or consume recipe from itertools docs), so we might as well do away with generator altogether and just stick with plain old for:

  1. for kv in some_dict.iteritems():
  2.     process_kvp(*kv)

In real code the judgement isn’t so obvious, of course. We’ll most likely receive the generator from some external function – a one that probably refers to its result only as an iterable. As usual in such cases, we must not assume anything beyond what is explicitly stated. Specifically, we cannot expect that we have any kind of existing, tangible collection with actual elements. An iterable can very well be just a generator, whose items are yet to be evaluated. This fact can impact performance by moving some potentially heavy processing into unexpected places, resulting in e.g. executing database queries while rendering website’s template.

Generators are truthy

The above point was concerned mostly with performance, but what about correctness? You may think it’s not hard to conform to iterables’ contract, which should in turn guarantee that they don’t backfire on us with unexpected behavior. Yet how many times did you find yourself reading (or writing!) code such as this:

  1. def do_stuff():
  2.     items = get_stuff_to_do() # returns iterable
  3.     if not items:
  4.         logging.debug("No stuff to do!")
  5.         return 0
  6.     # something with 'items'...

It’s small and may look harmless, but it still manages to make an ungrounded (thus potentially false) assumption about an iterable. The problem lies in if condition, meant to check whether we have any items to do stuff with. It would work correctly if items were a list, dict, deque or any other actual collection. But for generators, it will not work that way – even though they are still undoubtedly iterable!

Why? Because generators are not collections; they are just suppliers. We can only tell them to return their next value, but we cannot peek inside to see if the value is really there. As such, generators are not susceptible to boolean coercion in the same way that collections are; it’s not possible to check their “emptiness”. They behave like regular Python objects and are always truthy, even if it’s “obvious” they won’t yield any value when asked:

  1. >>> g = (x for x in [])
  2. >>> if g: print "Truthy!"
  3. Truthy!

Going back to previous sample, we can see that if block won’t be executed in case of get_stuff_to_do returning an “empty” generator. Consequences of this fact may vary from barely noticeable to disastrous, depending on how the rest of do_stuff function looks like. Nevertheless, that code will run with one of its invariants violated: a fertile ground for any kind of unintended effects.

Generators are once-off

An intuitive, informal understanding of the term ‘iterable’ is likely to include one more unjustified assumption: that it can iterated over, and over, i.e. multiple times. Again, this is very much true if we’re dealing with a collection, but generators simply don’t carry enough information to repeat the sequence they yield. In other words, they cannot be rewound: once we go through a generator, it’s stuck in its final state, not usable for anything more.

Just like with previous caveats, failure to account for that can very well go unnoticed – at least until we spot some weird behavior it results in. Continuing our superficial example from preceding paragraph, let’s pretend the rest of do_stuff function requires going over items at least twice. It could be, for example, an iterable of objects in a game or physics simulation; objects that can potentially move really fast and thus require some more sophisticated collision detection (based on e.g. intersection of segments):

  1. new_positions = calculate_displacement(items, delta_time)
  2. collision_pairs, rest = detect_collisions(items, new_positions)
  3. collided = compute_collision_response(collision_pairs)
  4. for item in itertools.chain(collided, rest):
  5.     item.draw()

Even assuming the incredible coincidence of getting all the required math right (;-)), we wouldn’t see any action whatsoever if items is a generator. The reason for that is simple: when calculate_displacement goes through items once (vigorously applying the Eulerian integration, most likely), it fully expends the generator. For any subsequent traversal (like the one in detect_collitions), the generator will appear empty. In this particular snippet, it will most likely result in blank screen, which hopefully is enough of a hint to figure out what’s going on :P

Generators are not lists

An overarching conclusion of the above-mentioned pitfalls is rather evident and seemingly contrary to statement from the beginning. Indeed, generators may not be a drop-in replacement for lists (or other collections) if we are very much relying on their “listy” behavior. And unless memory overhead proves troublesome, it’s also not worth it to inject them into older code that already uses lists.

For new code, however, sticking with generators right off the bat has numerous benefits which I mentioned at the start. What it requires, though, is evicting some habits that might have emerged after we spent some time manipulating lists. I think I managed to pinpoint the most common ways in which those habits result in incorrect code. Incidentally, they all origin from accidental, unfounded expectations towards iterables in general. That’s no coincidence: generators simply happen to be the “purest” of iterables, supporting only the bare minimum of required operations.

Tags: , , , , ,
Author: Xion, posted under Computer Science & IT » Comments Off on Generator Pitfalls

© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with