Archive for Programming

Min-Maxing Readability

2013-12-01 16:44

Let me introduce you to the following two important functions:

min(a_1, \dots, a_n) = \begin{cases}     a_1 &\mbox{if } n = 1 \mbox{ or } a_1 \le min(a_2, \dots, a_n) \\     min(a_2, \dots, a_n) &\mbox{otherwise} \end{cases}

max(a_1, \dots, a_n) = \begin{cases}     a_1 &\mbox{if } n = 1 \mbox{ or } a_1 \ge max(a_2, \dots, a_n) \\     max(a_2, \dots, a_n) &\mbox{otherwise} \end{cases}

I don’t believe, of course, that this is the first time you may have encountered them. Nor that they are even half as complicated as the definitions above would suggest.

But although they appear rather awkward, what I wanted for those formulations to highlight is one particular way of interpreting the min and max functions as they are used in code. You may think it’s easy enough to read them quite literally (“get me the smallest/largest value”), and in many cases you are absolutely correct:

  1. scores = {
  2.     'Alice': 10,
  3.     'Bob': 12,
  4.     'Charlie': 11,
  5. }
  6. winning_score = max(scores.values())

Often though, we don’t want to get the extreme value from a set, list, vector or other collection of unspecified size. Instead, we call the min/max functions on a few arguments that are known and “hard-coded” upfront. In many cases, this is mostly done to spare us from introducing a verbose if statement, or a more cryptic ternary operator (?:).

Or is it? I was recently surprised when discussing some very simple programming exercise on one of the IRC channels I frequent. Someone pointed out how my proposed solution is quite unusual with its (over)use of the max function. The task goes somewhere along these lines:

You are presented with a device that has n counters (c0, …, cn-1) and n+1 buttons (b0, …, bn). Every counter ci is incremented whenever a corresponding button bi is pressed. Additionally, pressing the extra button bn sets all the counters to the maximum value displayed on any of them (e.g. [1, 4, 3][4, 4, 4]).
Find a way to compute the final state of all counters given a sequence of k buttons presses.

Just by reading the above description and implementing it in the most straightforward way, we can trivially arrive at an algorithm which solves the problem in O(nk) time. And since we need to go through the sequence of button presses at least once, the lower bound for complexity of any other solution is therefore O(k).

My version was a simple improvement over the obvious one, scoring O(n+k) at the cost of maintaining a negligible amount of extra data:

  1. min_state = max_state = 0

The noteworthy application of max function was responsible for updating one of those values:

  1. max_state = max(max_state, button_states[press])

If you look closely, you will notice how the first argument serves as a reference point, while only the second one is the actual ‘input’. What this invocation is saying is essentially “make max_state at least as big as it was before – and possibly bigger”.

Hardly a groundbreaking insight, eh? This approach, however, allows to rapidly parse even complicated applications of min and max. If you encounter, for example, an inlined version of a “clamping” function that is written in the following manner:

  1. finalValue = Math.max(Math.min(someValue, b), a);

I’m pretty sure it will take you at least a few moments to grok what it’s doing. The maxmin compound is puzzling, and the meaningful arguments (a and b) are disconnected from the function names. Had it had more nesting, you might even have needed to *gasp* count the parentheses.
But we know it’s all about trimming a number to the <a; b> interval. Why not express it in a way that readily highlights this fact?

  1. finalValue = Math.max(a, Math.min(b, someValue));

max(a, ...) means “I want the result to be at least a“.
min(b, ...) means “I want the result to be at most b“.

Making both thresholds into first arguments to their respective functions is just one of those nice small things that very subtly, almost invisibly, will make your code easier to work with.

Tags: , , ,
Author: Xion, posted under Programming » 5 comments

Making App Engine More Pythonic

2013-10-18 22:27

Lately, I was re-evaluating Google App Engine – the cloud computing platform – to see how feasible it would be for one pet project I’ve had in mind. It was pleasantly surprising overall, as the platform improved quite a lot while I wasn’t looking, since about a year and a half ago. Mostly interested in the Python part, I noticed that version 2.7 is now standard, lots of libraries are available out of the box, and it’s possible to use to pretty much any web framework you’d like to, such as Flask or Django.

Still, there are some quirks. App Engine SDK, for example, is a self-contained bundle with a bunch of Python packages that make it possible to run the development server with your app on your local machine. You don’t really “install” it into your Python interpreter, though.
Same goes for any additional, third party libraries your app may need. They must all be deployed along with it, as there is no setup.py or requirements.txt to specify your dependencies in. If you’re used to how e.g. Heroku handles dependencies, GAE’s way will undoubtedly be quite a letdown.

Good news are: you can still make it work sanely. By that I mean using virtualenv for development rather than your global, system-level interpreter, and keeping the code of any third party libraries out of your project’s repository. You may not get quite the same experience of pip install and pip freeze > requirements.txt but well… it’s close enough :)

Dependencies as Git submodules

So you have an application that requires some external libraries. Few of them are provided by App Engine itself, and you will be able to access them after you specify your requirement in app.yaml. Many times, however, you will want to tap into broader open source ecosystem, just like you’d like with any other Python app.

There is a way, fortunately, to include external libraries to go with your application without them cluttering your repository. Since the de facto standard for publishing code on the ‘net is to push it to a public Git repository, we can use Git submodules to “symlink” to those repositories. Our own Git repo won’t store any of their actual content, but only a list of URLs; the .gitmodules file.

If you held your breath at the mere mention of Git submodules, don’t panic. They get a lot of flak, that’s true, and many of their claimed shortcomings are quite genuine. All of them apply to the scenario where a main repo uses submodules to reuse shared subproject that is modified in conjunction with the main one.
As you have probably noticed, this is totally different than the setting we’re discussing here. When including an external dependency, the fact that Git submodule points to specific commit in the other repo is a feature, not bug. It’s the exact same reason why we should always put version numbers in requirements.txt: upgrading a third party library must never be accidental, or you risk breaking your code through unexpected API or behavior changes.

So, how to do it – use Git submodules, that is? You substitute pip install with git submodule add:

  1. $ git submodule add git://github.com/mitsuhiko/flask lib/flask

This will establish reference between the repo under given URL and a directory path inside your project, fetching the repo’s content in the process. But as you will quickly notice in $ git status, that content won’t become part of the working directory.
After all this talk about being explicit with your libraries’ version, you probably also want to checkout a correct release:

  1. $ cd lib/flask
  2. $ git checkout 0.10.1

Otherwise, you will work off whatever the current HEAD happened to be, exactly how pip install flask would give you whatever is the newest release in PyPI.

Working alone from a single machine, this would set you up for the time being. For starting somewhere else, though, you need equivalent of pip install -r requirements.txt, i.e. a way to fetch all your libraries at once. Here’s where git submodule update comes handy:

  1. $ git submodule update --init

It will both setup your freshly cloned repo to use submodules specified in .gitmodules files, as well as pull the submodules’ content.

There’s much more to Git submodules, of course, so if you want to gain much more thorough insight into them than this short overview, I recommend having a look at the Git book. And as with most things, $ man git submodule is always helpful.

virtualenv for it all

With dependencies seemingly in place, you might be quite disappointed trying to, you know, use them:

  1. $ workon myawesomeproject
  2. (myawesomeproject)$ python
  3. Python 2.7.4 (default, Sep 26 2013, 03:20:26)
  4. [GCC 4.7.3] on linux2
  5. Type "help", "copyright", "credits" or "license" for more information.
  6. >>> import flask
  7. Traceback (most recent call last):
  8.   File "<stdin>", line 1, in <module>
  9. ImportError: No module named flask

The reason for that is simple, though: the libraries are physically there on your disk, but they are not in your virtualenv’s $PYTHONPATH, so Python has no idea where to import them from. There are ways to solve this problem that I could ramble for a while about, but I will just go ahead and demonstrate a ready-made shell script which handles it all :)
You might need to tweak it, e.g. if your GAE SDK installation path is different than /opt/google_appengine, but otherwise it should be pretty straightforward. One caveat, though: the script should be re-run after adding a brand new library, as described in previous section:

  1. $ git submodule add ...
  2. $ ./setup_virtualenv.sh

As an added bonus, you will get dev_appserver and appcfg binaries inside your virtualenv’s ./bin, so you may remove App Engine’s SDK directory from your regular $PATH.

Deployment shenanigans

Setup of a local development environment generally ends here – you should be now ready to run your app through dev_appserver. What’s still missing is making your bundled libraries work with remote Python on actual App Engine instance. Sadly, there is no virtualenv in the cloud.

Instead, we need to revert to the glorified sys.path hacks. Before importing anything, we extend the actual PYTHONPATH so that it covers our third party libraries. If their directory layout is just like shown in the first section (lib/ root with subdirs for different libraries), the following shim will suffice to correctly bootstrap the import mechanics:

  1. # run.py
  2. import os
  3. import sys
  4.  
  5.  
  6. lib_dir = os.path.join(os.path.abspath('.'), 'lib')
  7. sys.path[1:1] = [os.path.join(lib_dir, name)
  8.                  for name in os.listdir(lib_dir)]
  9.  
  10. app = __import__('myapppackage').app

Place this in the root of your project’s source tree (outside the main Python package) and point the app.yaml to it:

  1. handlers:
  2. # ... other handlers
  3. - url: .*
  4.   script: run.app

With this, you may now deploy your app and see whether it works correctly. If you encounter problems, I recommend taking a look at Flask on App Engine Project Template. Even if you intend to use different web framework, the example code should be largely applicable.

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Making App Engine More Pythonic

Promise Objects in JavaScript

2013-10-07 21:48

JavaScript’s default mode of operation is to rely heavily on callbacks: functions invoked when a longer operation (such as network I/O) finishes and delivers result. This makes it asynchronous, which is a notably different programming style than using blocking operations contained within threads (or their equivalents, like goroutines in Go).

Callbacks have numerous problems, though, out of which the most severe one is probably the phenomenon of “marching to the right”:

  1. doSomething(withTheseArgs, function(result) {
  2.     theDoSomethingElse(function() {
  3.         nowDoThis(toThat, function(result) {
  4.             andThen(function() {
  5.                 stop(hammertime, function(result) {
  6.                     // ...and so on...
  7.                 });
  8.             });
  9.         });
  10.     });
  11. });

When using the (still) common style of providing a callback as the last argument to a function initiating an asynchronous operation, you get this annoying result of ever-increasing indentation as you chain those operations together. It feels like the language itself is telling you that it was not designed for such a complex stuff… Coincidence? ;)

But it gets worse. Operations may fail somewhere along the way, which is something you’d probably like to know about. Depending on the conventions your project, framework or environment uses, this could mean additional boilerplate inside the callbacks to distinguish success from error cases. This is typical in Node.js, where first argument of callback represents the error, if any:

  1. fs.readFile('/etc/passwd', 'utf8', function (err,data) {
  2.   if (err) {
  3.     return console.log("Sorry, no hacking for you (" + err.message + ")");
  4.   }
  5.   console.log(data);
  6. });

Alternatively, you may be asked to provide the error handler separately; an “errback”, as it’s sometimes called. Splitting the code into small parts is great and everything, but here it means you’ll have two functions as arguments:

  1. doSomething(withThis, function(result) {
  2.     // ...
  3. }, function(error) {
  4.     // ...
  5. });

Giving them names and extracting somewhere outside may help readability a little, but will also prevent you from taking advantage of one of the JavaScript’s biggest benefits: superior support for anonymous functions and closures.

Tags: , , , ,
Author: Xion, posted under Programming » 3 comments

Literal __imports__

2013-09-28 22:56

As part of a language, Python obviously has import statements. They allow us to divide the code into different modules and packages:

  1. import collections
  2. import os
  3.  
  4. import thirdpartylib
  5. import anotherlib
  6.  
  7. from myapp.util import do_stuff

What is lesser known fact is that it also has an __import__ function. This function retains all functionality of the import statement, but has some additional features and slightly different use cases. With it, for example, you can import a module whose name you only know at “runtime”:

  1. def import_plugin(name):
  2.     plugins_package = __import__('myapp.plugins', fromlist=[name])
  3.     return getattr(plugins_package, name)

This comes handy in various types of general dispatchers, factory functions, plugin systems, and so forth. Returned from __import__ function is always a module object (even in cases when fromlist argument is used), so often a getattr is needed to extract a specific symbol from it.

Quite surprisingly, I have discovered that __import__ function may very well be useful also when you do know the desired module name. Reason is that import statement is sometimes unwieldy. It has similar problem as global variables (i.e. global statements) and inner function definitions (as opposed to lambdas): it makes the code stretch unnecessarily in the vertical dimension.

Just for one thing

This can be considered a waste if you only need to access one specific thing from one specific module. Using __import__ function, you can golf the import and the usage into a single statement. Here’s an example, coming straight from my own project recursely:

  1. __import__('recursely').install()

Incidentally, the other uses of literal __import__ described can be conveniently replaced thanks to that small library :)

Will it lint?

Another issue with import statement is that it introduces symbols into the global (or local) namespace. Most of the time, this is precisely what we want. Occasionally, though, a sole fact of loading the module is enough.

A canonical example of the latter case is web application with request handlers scattered between different Python files, or even packages. All those files have to be imported if the handlers are to be added to framework’s routing table; but beyond that, we have no business with them.

As a result, the import statement(s):

  1. import handlers

introduces an unused symbol – here, it is handlers. Many linting tools will be eager to point this fact out, which is not really that helpful. There is sometimes an option to disable the warning on per line basis, but some checkers (e.g. pep8.py) don’t offer this functionality.
Universal solution? Use __import__ function, of course:

  1. __import__('handlers')

The module is still loaded just fine, but since we’re ignoring the return value, no stray variables are created. As a added bonus, the __import__ call also looks very different, signifying its special purpose.

Staying in place

Actually, there is one more benefit of this trick, also coming from fooling-the-tools department. Many Python IDEs, like Eclipse/Pydev, are able to automatically insert necessary imports and organize them in groups, effectively providing a neat, Java-like experience. What is not so neat is that they often insist on putting every import statement somewhere near the beginning of the file. preceding any other definition, variable, class or function.

In a scenario described above, this behavior may actually cause problems. When the handlers’ module gets imported, it may need to refer back to the application object; this is exactly the case in the Flask framework, for example. If that object happens to be defined in the module importing handlers, we’ll have a circular import error because the application object has not yet been defined. It would have been defined, however, if the statement:

  1. import handlers

hasn’t been touched by the IDE when it wanted to be helpful and organize our imports. All imports, as it turns out.

Fortunately, mechanisms like that tend to be easy to fool. Per answers to StackOverflow question I’ve once asked, it is a matter breaking the textual pattern that the algorithm searches our code for. As you may have guessed by now, one of the ways of achieving that goal is to shed the import statement in favor of __import__ function.

Tags: , ,
Author: Xion, posted under Programming » Comments Off on Literal __imports__

Inheritance as an API

2013-09-15 22:46

Saying “API” nowadays is thought first and foremost to refer to a collection of HTTP request handlers that expose data from some Web service in a machine-readable format (usually JSON). This is not the meaning of API I have in mind right here, though. The classic one talks about any conglomerate of programming language constructs – functions, classes, packages – that are available for us to use.

You can interact with an API in numerous different ways, but one of them is somewhat less common. Occasionally, besides having functions to call and objects to create, you are also presented with base classes to inherit. This is symptomatic to more complex libraries, written mostly in statically typed languages. Among them, the one that takes the most advantage of this technique is probably Java.

Note that what I’m talking about here is substantially different from implementing an interface. That is a relatively common occurrence, required when working with listeners and callbacks:

  1. private OnClickListener mHelloButtonClickListener = new OnClickListener() {
  2.     @Override
  3.     public void onClick(View v) {
  4.         Toast.makeText(getContext(), "Hello world!", Toast.LENGTH_SHORT).show();
  5.     }
  6. };

as well as in many other situations and patterns. There is nothing terribly special about it, given that even non-object oriented languages have equivalent mechanisms of accomplishing the same objective: separating the “how” from “what” in code, possibly to exchange or expand the latter.

Inheriting from a base class is something else entirely. The often criticized, ideologically skewed interpretation of OOP would claim that inheritance is meant to introduce more specialized kinds of existing types of objects. In practice, that’s almost completely missing the real point.

What’s important in creating a derived class is that:

  • like with interface, it needs to override certain methods and fill them with code
  • but also, it may do so using a unique, additional API

Combined, these two qualities allow to introduce much more sophisticated ways of communication between the API and its client code. It’s a powerful tool and should be used sparingly, but certain types of libraries and (especially) frameworks can benefit greatly from it.

As an example, look at the Guice framework for dependency injection. If you use it in your project, you need to configure it by specifying bindings: mapping between the interfaces used by your code and their implementations. This is necessary for the framework can wire them together, possibly in more then one way (different in production code and in test code, for example).
Bindings are quite sophisticated constructs. For non-trivial applications – and these are the ones you generally want to apply dependency injection to – they cannot really be pinned down to a single function call. Other, similar frameworks would therefore use completely external configuration files (usually in XML), which has a lot of downsides.

Guice, however, has a sufficiently smart API that allows to realize all configuration in actual, compilable code. To achieve this, it uses inheritance, exactly as described above. Here’s a short sample:

  1. public class BillingModule extends AbstractModule {
  2.     @Override
  3.     protected void configure() {
  4.       bind(TransactionLog.class).to(DatabaseTransactionLog.class);
  5.       bind(CreditCardProcessor.class).to(PaypalCreditCardProcessor.class);
  6.     }
  7. }

Overridden methods in the above class (well, one method) serve as “sections” in the “configuration file”. Inherited methods, on the other hand, provide an internal, limited namespace that can be used to compose configuration “entries”. Since everything is real code, we can have the compiler check everything for some basic sanity as well.

Tags: , , ,
Author: Xion, posted under Programming » 4 comments

Lists vs Tuples

2013-09-04 18:42

Among the many things that Python does right, there are also a few that could have been better thought out. No language is perfect, of course, but since Python is increasingly adopted as the language of choice for complete beginners in programming, the net effect will be many new coders being heavily influenced by ideas they encounter here.

And unfortunately, some of those ideas are dead wrong. A chief example is the relation between lists and tuples, specifically the mistaken notion of their mutual similarity. In reality – that is, in every other language – they are completely different concepts, and rightfully so.

What is list?…

The term ‘list’ in programming is sometimes meant to refer to ‘linked list’ – one of the simple data structures you’d typically encounter in the introductory course to algorithms. It consists of a sequence of separately allocated elements, stitched together using pointers. Depending on the density and direction of those pointers, the type is further subdivided into singly and doubly linked lists, cyclic lists, and so on.

But the broader meaning, which is also more common now, describes a general collection of elements that can be linearly traversed, regardless of the details of its underlying representation. The List interface in Java or .NET is intended to capture the idea, and it does so pretty accurately. Concrete implementations (like ArrayList or LinkedList) can still be decided between if those details turn out to be important, but most of the code can deal with lists in quite abstract, storage-agnostic manner.

…in Python

In Python, lists generally conform to the above description. Here, you cannot really decide how they are stored under the hood (array module notwithstanding), but the interface is what you would expect:

  1. files = get_file_list('/home/xion')
  2. if files[0] == '.':
  3.     files.pop(0)
  4. if files[0] == '..':
  5.     files.pop(0)
  6. print "Found ", len(files), " files."
  7. for file in files:
  8.     print "- ", file

You can insert, append and remove elements, as well as iterate over the list and access elements by indexes. There is also a handy bracket notation for list literals in the code:

  1. a_list = [1, 2, 3]

About the only weird thing is the fact that you can freely put objects of different types into the same list:

  1. diverse_list = [42, "foo", False]

In a way, though, this appears to be in sync with the general free-form nature of Python. For analogy, think about List<Object> in Java that can do exactly the same (or, in fact, any List prior to introduction of generics).

On the other hand, this is a tuple:

  1. a_tuple = (1, 2, 3)

Besides brackets giving way to parentheses, there is no difference in literal notation. Similarly, you can iterate over the tuple just as well as you can over the list:

  1. for x in a_tuple:
  2.    print x

in addition to accessing elements by index. What you cannot do, though, is modifying a tuple once it is created: be it by trying to add more elements, or changing existing ones:

  1. >>> a_tuple[0] = 42
  2. Traceback (most recent call last):
  3.   File "<stdin>", line 1, in <module>
  4. TypeError: 'tuple' object does not support item assignment

But just like with lists, there is no limitation for the types of elements you put therein:

  1. diverse_tuple = (42, "foo", False)

All in all, a tuple in Python behaves just like an immutable (unchangeable) list and can be treated as such for all intents and purposes.

Tuplication

Now, let’s look at tuples in some other programming languages. Those that support them one way or another include C++, D, Haskell, Rust, Scala, and possibly few more exotic ones. There is also a nascent support for tuple-like constructs in Go, but it’s limited to returning multiple results from functions.

In any of these, tuple means several objects grouped together for a particular purpose. Unlike a list, it’s not a collection. That’s because:

  • objects retain their individual identity
  • their number is not arbitrary; in fact, tuples of different count have separate names: pairs, triples, etc.
  • objects have positions assigned: when handling a tuple, it is expected that certain thing X will be at position N

Statically typed languages expand especially on the last point, making it possible to specify what type we expect to see at every position. Together, they constitute the tuple type:

  1. typedef tuple<int, string> numbered_line;
  2.  
  3. // Example usage
  4. vector<numbered_line> read_numbered_lines(string filename, int start=1) {
  5.     vector<numbered_line> result;
  6.     ifstream infile(filename);
  7.     string line;
  8.     for (int i = start; getline(filename, line); ++i) {
  9.         result.push_back(make_tuple(i, line));
  10.     }
  11.     return result;
  12. }

In other words, you don’t operate on “tuples” in general. You use instances of specific tuple types – like a pair of int and string – to bind several distinct values together.
Then, as your code evolves, you may need to have more of them, or to manipulate them in more sophisticated ways. To accommodate, you may either expand the tuple type, or increase readability at the (slight) cost of verbosity and transform it into a structure:

  1. struct numbered_line {
  2.     string filename;
  3.     int number;
  4.     string text;
  5. };

Once here, you can stay with functions operating on those structures, or refactor those into methods, or use some combination of those two approaches.

Missing link

But in Python, that link between tuples and structures is almost completely lost. There is one built-in tuple type, which is simply tuple, but even that isn’t really paid attention to. What happens instead is that tuples are lumped together with lists and other collections into the umbrella term of “iterable”, i.e. something which can iterated over (typically via for loop).

As a result, a tuple can be converted to list and list can be converted to tuple. Both operations make no sense whatsoever. But they are possible, and that leads to tuples and lists being used interchangeably, making it harder to identify crucial pieces of data that drive your logic. And so they will grow uncontrollably, rather than having been harnessed at appropriate time into an object, or a namedtuple at the very least.

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Linus Torvalds

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Lists vs Tuples

Simulating nonlocal Keyword in Python 2.x

2013-08-04 15:30

The overall direction where Python 3 is going might be a bit worrying, but it’s undeniable that the 3.0 line has some really nice features and quality-of-life improvements. What’s not to love about Unicode string literals enabled by default? Or the print function? What about range, map and filter all being generators? Neato!

There are also few lesser known points. One is the new nonlocal keyword. It shares the syntax with the global keyword, which would make it instantaneously fishy just by this connotation. However, nonlocal looks genuinely useful: it allows to modify variables captured inside function’s closure:

  1. def count_parity(numbers):
  2.     even_count = odd_count = 0
  3.  
  4.     def examine(number):
  5.         if number % 2 == 0:
  6.             nonlocal even_count
  7.             even_count += 1
  8.         else:
  9.             nonlocal odd_count
  10.             odd_count += 1
  11.  
  12.     list(map(examine, numbers))
  13.     return even_count, odd_count

It’s something you would do quite a lot in certain other languages (*cough* JavaScript), so it’s good that Python has got around to support the notion as well. Syntax here is a bit clunky, true, but that’s simply a trade off, stemming directly from the lack of variable declarations in Python.

What about Python 2.x, though – are we out of luck? Well, not completely. There are a few ways to emulate nonlocal through other pythonic capabilities, sometimes even to better effect than the nonlocal keyword would yield.

Use generator instead

Speaking of yielding… As you have probably noticed right away, the example above is quite overblown and just plain silly. You don’t need to play functional just to count some values – you would use a loop instead:

  1. def count_parity(numbers):
  2.     even_count = odd_count = 0
  3.     for number in numbers:
  4.         if number % 2 == 0:
  5.             even_count += 1
  6.         else:
  7.             odd_count += 1
  8.      return even_count, odd_count

Really, the previous version is just a mindless application of the classic Visitor pattern, which is another reason why you shouldn’t do that: pattern overuse is bad. This saying, Visitor obviously has its place: it’s irreplaceable when traversing more complicated structures in more bureaucratic languages. A simple list of numbers in Python is the direct opposite of both of these characteristics.

Complex data structures exist in any language, however. How would we run some Python code for every node in a tree, or maybe graph? Unrolling the DFS or BFS or whatever traversal algorithm we use certainly doesn’t sound like an elegant and reusable approach.
But even then, there is still no need for functions and closures. We can easily get away with the simple for loop, if we just find a suitable iterable to loop over:

  1. def bst_count_parity(tree):
  2.     """Count the number of even and odd numbers in binary search tree."""
  3.     even_count = odd_count = 0
  4.     for node in bst_nodes(tree):
  5.         if node.value % 2 == 0:
  6.             even_count += 1
  7.         else:
  8.             odd_count += 1
  9.     return even_count, odd_count

The bst_nodes function above is not black magic by any stretch. It’s just a simple example of generator function, taking advantage of the powerful yield statement:

  1. def bst_nodes(tree):
  2.     """Yields nodes of binary tree in breadth-first order."""
  3.     queue = [tree]
  4.     while queue:
  5.         node = queue.popleft()
  6.         yield node
  7.         if node.left:
  8.             queue.append(node.left)
  9.         if node.right:
  10.             queue.append(node.right)

This works because both bst_count_parity and bst_nodes functions are executed “simultaneously”. That has the same practical effect as calling the visitor function to process a node, only the “function” is concealed as the body of for loop.

Language geeks (and Lisp fans) would say that we’ve exchanged a closure for continuation. There is probably a monad here somewhere, too.

Create reference where there is none

Generators can of course solve a lot of problems that we may want to address with nonlocal, but it’s true you cannot write them all off just by clever use of yield statement. For the those rare occasions – when you really, positively, truly need a mutable closure – there are still some options on the board.

The crucial observation is that while the closure in Python 2.x is indeed immutable – you cannot add new variables to it – the objects inside need not be. If you are normally able to change their state, you can do so through captured variables as well. After all, you are still just “reading” those variables; they do not change, even if the objects they point to do.

Hence the solution (or workaround, more accurately) is simple. You need to wrap your value inside a mutable object, and access it – both outside and inside the inner function – through that object only. There are few choices of suitable objects to use here, with lists and dictionaries being the simplest, built-in options:

  1. def incr(redis, key):
  2.     """Increments value of Redis key, as if Redis didn't have INCR command.
  3.    :return: New value for the key
  4.    """
  5.     res = []
  6.  
  7.     def txn(pipe):
  8.         res[0] = int(pipe.get(key)) + 1
  9.         pipe.multi()
  10.         pipe.set(key, res[0])
  11.  
  12.     redis.transaction(txn, key)
  13.     return res[0]

If you become fond of this technique, you may want to be more explicit and roll out your own wrapper. It might be something like a Var class with get and set methods, or just a value attribute.

Classy solution

Finally, there is a variant of the above approach that involves a class rather than function. It is strangely similar to “functor” objects from the old C++, back when it didn’t support proper lambdas and closures. Here it is:

  1. def incr(redis, key):
  2.     """Increments value of Redis key, as if Redis didn't have INCR command.
  3.    :return: New value for the key
  4.    """
  5.     class IncrTransaction(object):
  6.         def __call__(self, pipe):
  7.             self.result = int(pipe.get(key)) + 1
  8.             pipe.multi()
  9.             pipe.set(key, self.result)
  10.  
  11.     txn = IncrTransaction()
  12.     redis.transaction(txn, key)
  13.     return txn.result

Its main advantage (besides making it a bit clearer what’s going on) is the potential for extracting the class outside of the function – and thus reusing it. In the above example, you would just need to add the __init__(self, key) method to make the class independent from the enclosing function.

Ironically, though, that would also defeat the whole point: you don’t need a mutable closure if you don’t need a closure at all. Problem solved? ;-)

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Simulating nonlocal Keyword in Python 2.x
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.