Posts tagged ‘generators’

Simulating nonlocal Keyword in Python 2.x

2013-08-04 15:30

The overall direction where Python 3 is going might be a bit worrying, but it’s undeniable that the 3.0 line has some really nice features and quality-of-life improvements. What’s not to love about Unicode string literals enabled by default? Or the print function? What about range, map and filter all being generators? Neato!

There are also few lesser known points. One is the new nonlocal keyword. It shares the syntax with the global keyword, which would make it instantaneously fishy just by this connotation. However, nonlocal looks genuinely useful: it allows to modify variables captured inside function’s closure:

  1. def count_parity(numbers):
  2.     even_count = odd_count = 0
  3.  
  4.     def examine(number):
  5.         if number % 2 == 0:
  6.             nonlocal even_count
  7.             even_count += 1
  8.         else:
  9.             nonlocal odd_count
  10.             odd_count += 1
  11.  
  12.     list(map(examine, numbers))
  13.     return even_count, odd_count

It’s something you would do quite a lot in certain other languages (*cough* JavaScript), so it’s good that Python has got around to support the notion as well. Syntax here is a bit clunky, true, but that’s simply a trade off, stemming directly from the lack of variable declarations in Python.

What about Python 2.x, though – are we out of luck? Well, not completely. There are a few ways to emulate nonlocal through other pythonic capabilities, sometimes even to better effect than the nonlocal keyword would yield.

Use generator instead

Speaking of yielding… As you have probably noticed right away, the example above is quite overblown and just plain silly. You don’t need to play functional just to count some values – you would use a loop instead:

  1. def count_parity(numbers):
  2.     even_count = odd_count = 0
  3.     for number in numbers:
  4.         if number % 2 == 0:
  5.             even_count += 1
  6.         else:
  7.             odd_count += 1
  8.      return even_count, odd_count

Really, the previous version is just a mindless application of the classic Visitor pattern, which is another reason why you shouldn’t do that: pattern overuse is bad. This saying, Visitor obviously has its place: it’s irreplaceable when traversing more complicated structures in more bureaucratic languages. A simple list of numbers in Python is the direct opposite of both of these characteristics.

Complex data structures exist in any language, however. How would we run some Python code for every node in a tree, or maybe graph? Unrolling the DFS or BFS or whatever traversal algorithm we use certainly doesn’t sound like an elegant and reusable approach.
But even then, there is still no need for functions and closures. We can easily get away with the simple for loop, if we just find a suitable iterable to loop over:

  1. def bst_count_parity(tree):
  2.     """Count the number of even and odd numbers in binary search tree."""
  3.     even_count = odd_count = 0
  4.     for node in bst_nodes(tree):
  5.         if node.value % 2 == 0:
  6.             even_count += 1
  7.         else:
  8.             odd_count += 1
  9.     return even_count, odd_count

The bst_nodes function above is not black magic by any stretch. It’s just a simple example of generator function, taking advantage of the powerful yield statement:

  1. def bst_nodes(tree):
  2.     """Yields nodes of binary tree in breadth-first order."""
  3.     queue = [tree]
  4.     while queue:
  5.         node = queue.popleft()
  6.         yield node
  7.         if node.left:
  8.             queue.append(node.left)
  9.         if node.right:
  10.             queue.append(node.right)

This works because both bst_count_parity and bst_nodes functions are executed “simultaneously”. That has the same practical effect as calling the visitor function to process a node, only the “function” is concealed as the body of for loop.

Language geeks (and Lisp fans) would say that we’ve exchanged a closure for continuation. There is probably a monad here somewhere, too.

Create reference where there is none

Generators can of course solve a lot of problems that we may want to address with nonlocal, but it’s true you cannot write them all off just by clever use of yield statement. For the those rare occasions – when you really, positively, truly need a mutable closure – there are still some options on the board.

The crucial observation is that while the closure in Python 2.x is indeed immutable – you cannot add new variables to it – the objects inside need not be. If you are normally able to change their state, you can do so through captured variables as well. After all, you are still just “reading” those variables; they do not change, even if the objects they point to do.

Hence the solution (or workaround, more accurately) is simple. You need to wrap your value inside a mutable object, and access it – both outside and inside the inner function – through that object only. There are few choices of suitable objects to use here, with lists and dictionaries being the simplest, built-in options:

  1. def incr(redis, key):
  2.     """Increments value of Redis key, as if Redis didn't have INCR command.
  3.    :return: New value for the key
  4.    """
  5.     res = []
  6.  
  7.     def txn(pipe):
  8.         res[0] = int(pipe.get(key)) + 1
  9.         pipe.multi()
  10.         pipe.set(key, res[0])
  11.  
  12.     redis.transaction(txn, key)
  13.     return res[0]

If you become fond of this technique, you may want to be more explicit and roll out your own wrapper. It might be something like a Var class with get and set methods, or just a value attribute.

Classy solution

Finally, there is a variant of the above approach that involves a class rather than function. It is strangely similar to “functor” objects from the old C++, back when it didn’t support proper lambdas and closures. Here it is:

  1. def incr(redis, key):
  2.     """Increments value of Redis key, as if Redis didn't have INCR command.
  3.    :return: New value for the key
  4.    """
  5.     class IncrTransaction(object):
  6.         def __call__(self, pipe):
  7.             self.result = int(pipe.get(key)) + 1
  8.             pipe.multi()
  9.             pipe.set(key, self.result)
  10.  
  11.     txn = IncrTransaction()
  12.     redis.transaction(txn, key)
  13.     return txn.result

Its main advantage (besides making it a bit clearer what’s going on) is the potential for extracting the class outside of the function – and thus reusing it. In the above example, you would just need to add the __init__(self, key) method to make the class independent from the enclosing function.

Ironically, though, that would also defeat the whole point: you don’t need a mutable closure if you don’t need a closure at all. Problem solved? ;-)

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Simulating nonlocal Keyword in Python 2.x

Generator Pitfalls

2012-02-08 19:34

I’m a big fan of Python generators. With seemingly little effort, they often allow to greatly reduce memory overhead. They do so by completely eliminating intermediate results of list manipulations. Along with functions defined within itertools package, generators also introduce a basic lazy computation capabilities into Python.
This combination of higher-order functions and generators is truly remarkable. Why? Because compared to ordinary loops, it gains us both speed and readability at the same time. That’s quite an achievement; typically, optimizations sacrifice one for the other.

All this goodness, however, comes with few strings attached. There are some ways in which we can be bitten by improperly used generators, and I think it’s helpful to know about them. So today, I’ll try to outline some of those traps.

Generators are lazy

You don’t say, eh? That’s one of the main reasons we are so keen on using them, after all! But what I’m implying is the following: if a generator-based code is to actually do anything, there must be a point where all this laziness “bottoms out”. Otherwise, you are just building expression thunks and not really evaluating anything.

How such circumstances might occur, though? One case involves using generators for their side effects – a rather questionable practice, mind you:

  1. def process_kvp(key, value=None):
  2.     print key,
  3.     if value: print "=", value
  4.  
  5. itertools.starmap(process_kvp, some_dict.iteritems())

The starmap call does not print anything here, because it just constructs a generator. Only when this generator is iterated over, dictionary elements would be fed to process_kvp function. Such iteration would of course require a loop (or consume recipe from itertools docs), so we might as well do away with generator altogether and just stick with plain old for:

  1. for kv in some_dict.iteritems():
  2.     process_kvp(*kv)

In real code the judgement isn’t so obvious, of course. We’ll most likely receive the generator from some external function – a one that probably refers to its result only as an iterable. As usual in such cases, we must not assume anything beyond what is explicitly stated. Specifically, we cannot expect that we have any kind of existing, tangible collection with actual elements. An iterable can very well be just a generator, whose items are yet to be evaluated. This fact can impact performance by moving some potentially heavy processing into unexpected places, resulting in e.g. executing database queries while rendering website’s template.

Generators are truthy

The above point was concerned mostly with performance, but what about correctness? You may think it’s not hard to conform to iterables’ contract, which should in turn guarantee that they don’t backfire on us with unexpected behavior. Yet how many times did you find yourself reading (or writing!) code such as this:

  1. def do_stuff():
  2.     items = get_stuff_to_do() # returns iterable
  3.     if not items:
  4.         logging.debug("No stuff to do!")
  5.         return 0
  6.     # ...do something with 'items'...

It’s small and may look harmless, but it still manages to make an ungrounded (thus potentially false) assumption about an iterable. The problem lies in if condition, meant to check whether we have any items to do stuff with. It would work correctly if items were a list, dict, deque or any other actual collection. But for generators, it will not work that way – even though they are still undoubtedly iterable!

Why? Because generators are not collections; they are just suppliers. We can only tell them to return their next value, but we cannot peek inside to see if the value is really there. As such, generators are not susceptible to boolean coercion in the same way that collections are; it’s not possible to check their “emptiness”. They behave like regular Python objects and are always truthy, even if it’s “obvious” they won’t yield any value when asked:

  1. >>> g = (x for x in [])
  2. >>> if g: print "Truthy!"
  3. Truthy!

Going back to previous sample, we can see that if block won’t be executed in case of get_stuff_to_do returning an “empty” generator. Consequences of this fact may vary from barely noticeable to disastrous, depending on how the rest of do_stuff function looks like. Nevertheless, that code will run with one of its invariants violated: a fertile ground for any kind of unintended effects.

Generators are once-off

An intuitive, informal understanding of the term ‘iterable’ is likely to include one more unjustified assumption: that it can iterated over, and over, i.e. multiple times. Again, this is very much true if we’re dealing with a collection, but generators simply don’t carry enough information to repeat the sequence they yield. In other words, they cannot be rewound: once we go through a generator, it’s stuck in its final state, not usable for anything more.

Just like with previous caveats, failure to account for that can very well go unnoticed – at least until we spot some weird behavior it results in. Continuing our superficial example from preceding paragraph, let’s pretend the rest of do_stuff function requires going over items at least twice. It could be, for example, an iterable of objects in a game or physics simulation; objects that can potentially move really fast and thus require some more sophisticated collision detection (based on e.g. intersection of segments):

  1. new_positions = calculate_displacement(items, delta_time)
  2. collision_pairs, rest = detect_collisions(items, new_positions)
  3. collided = compute_collision_response(collision_pairs)
  4. for item in itertools.chain(collided, rest):
  5.     item.draw()

Even assuming the incredible coincidence of getting all the required math right (;-)), we wouldn’t see any action whatsoever if items is a generator. The reason for that is simple: when calculate_displacement goes through items once (vigorously applying the Eulerian integration, most likely), it fully expends the generator. For any subsequent traversal (like the one in detect_collitions), the generator will appear empty. In this particular snippet, it will most likely result in blank screen, which hopefully is enough of a hint to figure out what’s going on :P

Generators are not lists

An overarching conclusion of the above-mentioned pitfalls is rather evident and seemingly contrary to statement from the beginning. Indeed, generators may not be a drop-in replacement for lists (or other collections) if we are very much relying on their “listy” behavior. And unless memory overhead proves troublesome, it’s also not worth it to inject them into older code that already uses lists.

For new code, however, sticking with generators right off the bat has numerous benefits which I mentioned at the start. What it requires, though, is evicting some habits that might have emerged after we spent some time manipulating lists. I think I managed to pinpoint the most common ways in which those habits result in incorrect code. Incidentally, they all origin from accidental, unfounded expectations towards iterables in general. That’s no coincidence: generators simply happen to be the “purest” of iterables, supporting only the bare minimum of required operations.

Tags: , , , , ,
Author: Xion, posted under Computer Science & IT » Comments Off on Generator Pitfalls

Nowoczesne wyliczanie

2009-11-04 1:33

W chyba każdy języku posiadającym pojemniki (jak wektory czy listy) istnieje koncepcja iteratorów: obiektów, które pozwalają na przeglądanie kolekcji i są uogólnieniem wskaźników. W najprostszym pozwalają one tylko na pobranie aktualnego elementu i przejście do następnego, ale jest to zupełnie wystarczające do celów wyliczania.
Z wierzchu więc wyglądają one całkiem prosto i przejrzyście – zwłaszcza, jeśli język udostępnia pętlę typu foreach, która ładnie i przezroczyście je opakowuje. Dlatego może wydawać się dziwne, czemu zazwyczaj mechanizm ten jest używany właściwie tylko dla pojemników; w teorii bowiem za pomocą iteratorów (zwanych gdzieniegdzie enumeratorami) można by było przeglądać dosłownie wszystko.
Weźmy chociażby wyszukiwanie plików na dysku – sporo programów w jakimś momencie swojego działania musi znaleźć pliki np. o danej nazwie w określonym katalogu. Wtedy zwykle zakasujemy rękawy i piszemy odpowiednią procedurę rekurencyjną lub bawimy się ze stosem czy kolejką. A czy nie fajniej byłoby, gdyby dało się to zrobić po prostu tak:

  1. for (FsIterator it = ListFiles("c:\\", "*.exe"); it; ++it) { /* zrób coś */ }

oczywiście przeszukując w ten sposób również podkatalogi bez jawnego “wchodzenia” do nich?… Według mnie to by było bardzo fajne :)

Od razu zaznaczę więc, że wbrew pozorom taki iterator jest jak najbardziej możliwy do napisania. Problemem jest jednak to, jak należy przechowywać jego stan. Kiedy wyszukiwanie czy przeglądanie zaimplementowane jest bezpośrednio jako jedna funkcja, robi się to w zasadzie samo: w postaci zmiennych lokalnych (stos/kolejka) albo parametrów (rekurencja). Nikt specjalnie nie zwraca na ten fakt uwagi. Jednak w momencie próby “wyciągnięcia” z algorytmu operacji Next (w celu stworzenia iteratora) okazuje się nagle, że wymaga to jawnego pamiętania tych wszystkich danych, które pozwalają obliczyć następny element. Przy przeszukiwania katalogów trzeba by na przykład pamiętać jakiś systemowy uchwyt wyszukiwania dla aktualnego katalogu, poziom zagnieżdżenia oraz analogiczne uchwyty… dla każdego takiego poziomu!
Zawracanie głowy, prawda? :) Nic dziwnego, że traktowanie wyszukiwania “per iterator” nie jest popularną praktyką. Z punktu widzenia piszącego algorytm wyliczania nieporównywalnie łatwiej jest po prostu wymusić jakiś callback i wywołać go dla każdego elementu; niech się programista-klient martwi o to, jak ten callback wpasować w swój kod. A że ten byłby o wiele czytelniejszy, gdyby w grę wchodziły iteratory? No cóż, iteratorów tak łatwo pisać się nie da…

…chyba że programujemy w Pythonie. Tam bowiem “iteratory” (zwane generatorami) piszemy w zupełnie unikalny, łatwy sposób. Weźmy dla przykładu taką oto klasę drzewa binarnego (BST – Binary Search Tree):

  1. class Tree:
  2.     def __init__(self, key, left = None, right = None):
  3.         self.key = key
  4.         self.left = left
  5.         self.right = right
  6.     def __insert(self, k): # wstawianie elementu
  7.         if (k < self.key):
  8.             if (self.left != None): self.left.__insert (k)
  9.             else:                   self.left = Tree(k)
  10.         else:
  11.             if (self.right != None): self.right.__insert (k)
  12.             else:                   self.right = Tree(k)
  13.     def insert(self, *keys): # wst. wielu elementów
  14.         for k in keys: self.__insert(k)&#91;/python]
  15. Żeby dało się je przeglądać zwykła pętlą <code>for</code> w porządku <em>inorder</em> (dającym posortowanie kluczy), piszemy do niego odpowiedni generator:
  16. [python]    def inorder(self):
  17.         if (self.left != None):
  18.             for t in self.left.inorder(): yield t
  19.         yield self.key
  20.         if (self.right != None):
  21.             for t in self.right.inorder(): yield t

I już – to wystarczy, by poniższa pętla:

  1. for i in tree.inorder():

rzeczywiście “chodziła” po krawędziach drzewa “w czasie rzeczywistym” – bez żadnego buforowania elementów na liście.

Tajemnica tkwi tutaj w instrukcji yield – działa ona jak “tymczasowe zwrócenie” elementu, który jest przetwarzany przez ciało pętli for. Gdy konieczny jest następny element, funkcja inorder podejmuje po prostu działanie począwszy od kolejnej instrukcji – i tak do następnego yielda, kolejnego cyklu pętli i dalszej pracy funkcji. yield działa więc jak callback, tyle że w obie strony. Całkiem zmyślne, czyż nie?
Aż chciałoby się zapytać, czy w innych językach – zwłaszcza kompilowanych – nie dałoby się zrobić czegoś podobnego. Teoretycznie odpowiedź jest pozytywna: przy pomocy zmyślnych sztuczek na stosie wywołań funkcji (technika zwana ‘nawijaniem stosu’ – stack winding) można uzyskać efekt “zawieszenia funkcji” po zwróceniu wyniku i mieć możliwość powrotu do niej począwszy od następnej instrukcji. Nie jestem jednak przekonany, jak taki feature mógłby współpracować z innymi elementami współczesnych języków programowania, jak choćby wyjątkami. Trudno powiedzieć, czy jest to w ogóle możliwe.

Ale skoro w Pythonie się da, to może już C++2x będzie to miał? ;-)

Tags: , , , , ,
Author: Xion, posted under Programming » 4 comments
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.