Importance of Using Virtual Environments

2012-02-22 19:59

One of technological marvels behind modern languages is the easiness of installing new libraries, packages and modules. Thanks to having a central repository (PyPI, RubyGems, Hackage, …) and a suitable installer (pip/easy_install, gem, cabal, …), any library is usually just one command away. For one, this makes it very easy to bootstrap development of a new project – or alternatively, to abandon the idea of doing so because there is already something that does what you need :)

But being generous with external libraries also means adding a lot of dependencies. After a short while, they become practically untraceable, unless we keep an up-to-date list. In Python, for example, it would be the content of requirements.txt file, or a value for requires parameter of the setuptools.setup/distutils.setup function call inside module. Other languages have their own means of specifying dependencies but the principles are generally the same.

How to ensure this list is correct, though?… The best way is to create a dedicated virtual environment specifically for our project. An environment is simply a sandboxed interpreter/compiler, along with all the packages that it can use for executing/compiling) programs.


Normally, there is just one, global environment for a system as a whole: all external libraries or packages for a particular language are being installed there. This makes it easy to accidentally introduce extraneous dependencies to our project. More importantly, with this setting we are sharing our required libraries with other applications installed or developed on the system. This spells trouble if we’re relying on particular version of a library: some other program could update it and suddenly break our application this way.

If we use a virtual environment instead, our program is isolated from the rest and is using its own, dedicated set of libraries and packages. Besides preventing conflicts, this also has an added benefit of keeping our dependency list up to date. If we use an API which isn’t present in our virtual environment, the program will simply blow up – hopefully with a helpful error :) Should this happen, we need to make proper amends to the list, and use it to update the environment by reinstalling our project into it. As a bonus – though in practice that’s the main treat – deploying our program to another machine is as trivial as repeating this last step, preferably also in a dedicated virtual environment created there.

Using it

So, how to use all this goodness? It heavily depends on what programming language are we actually using. The idea of virtual environments (or at least this very term) comes from Python, where it coalesced into the virtualenv package. For Ruby, there is a pretty much exact equivalent in the form of Ruby Version Manager (rvm). Haskell has somewhat less developed cabal-dev utility, which should nevertheless suffice for most purposes.

More exotic languages might have their own tools for that. In that case, searching for “language virtualenv” is almost certain way to find them.

Author: Xion, posted under Programming » 2 comments

Generator Pitfalls

2012-02-08 19:34

I’m a big fan of Python generators. With seemingly little effort, they often allow to greatly reduce memory overhead. They do so by completely eliminating intermediate results of list manipulations. Along with functions defined within itertools package, generators also introduce a basic lazy computation capabilities into Python.
This combination of higher-order functions and generators is truly remarkable. Why? Because compared to ordinary loops, it gains us both speed and readability at the same time. That’s quite an achievement; typically, optimizations sacrifice one for the other.

All this goodness, however, comes with few strings attached. There are some ways in which we can be bitten by improperly used generators, and I think it’s helpful to know about them. So today, I’ll try to outline some of those traps.

Generators are lazy

You don’t say, eh? That’s one of the main reasons we are so keen on using them, after all! But what I’m implying is the following: if a generator-based code is to actually do anything, there must be a point where all this laziness “bottoms out”. Otherwise, you are just building expression thunks and not really evaluating anything.

How such circumstances might occur, though? One case involves using generators for their side effects – a rather questionable practice, mind you:

  1. def process_kvp(key, value=None):
  2.     print key,
  3.     if value: print "=", value
  5. itertools.starmap(process_kvp, some_dict.iteritems())

The starmap call does not print anything here, because it just constructs a generator. Only when this generator is iterated over, dictionary elements would be fed to process_kvp function. Such iteration would of course require a loop (or consume recipe from itertools docs), so we might as well do away with generator altogether and just stick with plain old for:

  1. for kv in some_dict.iteritems():
  2.     process_kvp(*kv)

In real code the judgement isn’t so obvious, of course. We’ll most likely receive the generator from some external function – a one that probably refers to its result only as an iterable. As usual in such cases, we must not assume anything beyond what is explicitly stated. Specifically, we cannot expect that we have any kind of existing, tangible collection with actual elements. An iterable can very well be just a generator, whose items are yet to be evaluated. This fact can impact performance by moving some potentially heavy processing into unexpected places, resulting in e.g. executing database queries while rendering website’s template.

Generators are truthy

The above point was concerned mostly with performance, but what about correctness? You may think it’s not hard to conform to iterables’ contract, which should in turn guarantee that they don’t backfire on us with unexpected behavior. Yet how many times did you find yourself reading (or writing!) code such as this:

  1. def do_stuff():
  2.     items = get_stuff_to_do() # returns iterable
  3.     if not items:
  4.         logging.debug("No stuff to do!")
  5.         return 0
  6.     # something with 'items'...

It’s small and may look harmless, but it still manages to make an ungrounded (thus potentially false) assumption about an iterable. The problem lies in if condition, meant to check whether we have any items to do stuff with. It would work correctly if items were a list, dict, deque or any other actual collection. But for generators, it will not work that way – even though they are still undoubtedly iterable!

Why? Because generators are not collections; they are just suppliers. We can only tell them to return their next value, but we cannot peek inside to see if the value is really there. As such, generators are not susceptible to boolean coercion in the same way that collections are; it’s not possible to check their “emptiness”. They behave like regular Python objects and are always truthy, even if it’s “obvious” they won’t yield any value when asked:

  1. >>> g = (x for x in [])
  2. >>> if g: print "Truthy!"
  3. Truthy!

Going back to previous sample, we can see that if block won’t be executed in case of get_stuff_to_do returning an “empty” generator. Consequences of this fact may vary from barely noticeable to disastrous, depending on how the rest of do_stuff function looks like. Nevertheless, that code will run with one of its invariants violated: a fertile ground for any kind of unintended effects.

Generators are once-off

An intuitive, informal understanding of the term ‘iterable’ is likely to include one more unjustified assumption: that it can iterated over, and over, i.e. multiple times. Again, this is very much true if we’re dealing with a collection, but generators simply don’t carry enough information to repeat the sequence they yield. In other words, they cannot be rewound: once we go through a generator, it’s stuck in its final state, not usable for anything more.

Just like with previous caveats, failure to account for that can very well go unnoticed – at least until we spot some weird behavior it results in. Continuing our superficial example from preceding paragraph, let’s pretend the rest of do_stuff function requires going over items at least twice. It could be, for example, an iterable of objects in a game or physics simulation; objects that can potentially move really fast and thus require some more sophisticated collision detection (based on e.g. intersection of segments):

  1. new_positions = calculate_displacement(items, delta_time)
  2. collision_pairs, rest = detect_collisions(items, new_positions)
  3. collided = compute_collision_response(collision_pairs)
  4. for item in itertools.chain(collided, rest):
  5.     item.draw()

Even assuming the incredible coincidence of getting all the required math right (;-)), we wouldn’t see any action whatsoever if items is a generator. The reason for that is simple: when calculate_displacement goes through items once (vigorously applying the Eulerian integration, most likely), it fully expends the generator. For any subsequent traversal (like the one in detect_collitions), the generator will appear empty. In this particular snippet, it will most likely result in blank screen, which hopefully is enough of a hint to figure out what’s going on :P

Generators are not lists

An overarching conclusion of the above-mentioned pitfalls is rather evident and seemingly contrary to statement from the beginning. Indeed, generators may not be a drop-in replacement for lists (or other collections) if we are very much relying on their “listy” behavior. And unless memory overhead proves troublesome, it’s also not worth it to inject them into older code that already uses lists.

For new code, however, sticking with generators right off the bat has numerous benefits which I mentioned at the start. What it requires, though, is evicting some habits that might have emerged after we spent some time manipulating lists. I think I managed to pinpoint the most common ways in which those habits result in incorrect code. Incidentally, they all origin from accidental, unfounded expectations towards iterables in general. That’s no coincidence: generators simply happen to be the “purest” of iterables, supporting only the bare minimum of required operations.

Author: Xion, posted under Computer Science & IT » Comments Off on Generator Pitfalls

Checking Whether IP is Within a Subnet

2012-02-04 22:11

Recently I had to solve a simple but very practical coding puzzle. The task was to check whether an IPv4 address (given in traditional dot notation) – is within specified network, described as a CIDR block.
You may notice that this is almost as ubiquitous as a programming problem can get. Implementations of its simplified version are executed literally millions (if not billions) of times per second, for every IP packet at every junction of this immensely vast Internet. Yet, when it comes to doing such thing in Python, one is actually left without much help from the standard library and must simply Do It Yourself.

It’s not particularly hard, of course. But before jumping to a solution, let’s look how we expect our function to behave:

  1. >>> ip_in_network("", "")
  2. True
  3. >> ip_in_network("", "")
  4. True
  5. >> ip_in_network("", "")
  6. False

Pretty obvious, isn’t it? Now, if you recall how packet routing works under the hood, you might remember that there are some bitwise operations involved. They are necessary to determine whether specific IP address can be found within given network. However low-level this may sound, there is really no escape from doing it this way. Yes, even in Python :P

It goes deeper, though. Most of the interface of Python’s socket module is actually carbon-copied from POSIX socket.h and related headers, down to exact names of particular functions. As a result, solving our task in C isn’t very different from doing it in Python. I’d even argue that the C version is clearer and easier to follow. This is how it could look like:

bool ip_in_network(const char* addr, const char* net) {
struct in_addr ip_addr;
if (!inet_aton(addr, &ip_addr)) return false;

char network[32];
strncpy(network, net, strlen(net));

char* slash = strstr(network, “/”);
if (!slash) return false;
int mask_len = atoi(slash + 1);

*slash = ‘\0’;
struct in_addr net_addr;
if (!inet_aton(network, &net_addr)) return false;

unsigned ip_bits = ip_addr.s_addr;
unsigned net_bits = net_addr.s_addr;
unsigned netmask = net_bits & ((1 << mask_len) - 1); return (ip_bits & netmask) == net_bits; }[/c] A similar thing done in Python obviously requires less scaffolding, but it also introduces its own intricacies via struct module (for unpacking bytes). All in all, it seems like there is not much to be gained here from Python’s high level of abstraction.

And that’s perfectly OK: no language is a silver bullet. Sometimes we need to do things the quirky way.

Author: Xion, posted under Computer Science & IT » 3 comments

The “Wat” of Python

2012-01-31 21:19

It is quite likely you are familiar with the Wat talk by Gary Bernhardt. It is sweeping through the Internet, giving some good laugh to pretty much anyone who watches it. Surely it did to me!
The speaker is making fun of Ruby and JavaScript languages (although mostly the latter, really), showing totally unexpected and baffling results of some seemingly trivial operations – like adding two arrays. It turns out that in JavaScript, the result is an empty string. (And the reasons for that provoke even bigger “wat”).

After watching the talk for about five times (it hardly gets old), I started to wonder whether it is only those two languages that exhibit similarly confusing behavior… The answer is of course “No”, and that should be glaringly obvious to anyone who knows at least a bit of C++ ;) But beating on that horse would be way too easy, so I’d rather try something more ambitious.
Hence I ventured forth to search for “wat” in Python 2.x. The journey wasn’t short enough to stop at mere addition operator but nevertheless – and despite me being nowhere near Python expert – I managed to find some candidates rather quickly.

I strove to keep with the original spirit of Gary’s talk, so I only included those quirks that can be easily shown in interactive interpreter. The final result consists of three of them, arranged in the order of increasing puzzlement. They are given without explanation or rationale, hopefully to encourage some thought beyond amusement :)

Behold, then, the Wat of Python!

Author: Xion, posted under Internet, Programming » 12 comments

Python’s setdefault Considered Harmful

2012-01-28 19:26

Python dictionaries have an inconspicuous method named setdefault. Asking for its description, we’ll be presented with rather terse interpretation:

  1. >>> help(dict.setdefault)
  3. setdefault(...)
  4.     D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

While it might not be immediately obvious what we could use this method for, we can actually find quite a lot of applications if we only pay a little attention. The main advantage of setdefault seems to lie in elimination of ifs; a feat we can feel quite smug about. As an example, consider a function which groups list of key-value pairs (with possible duplicate keys) into a dictionary of lists:

  1. def group_assoc_list_(a_list): # with if
  2.     res = {}
  3.     for k, v in a_list:
  4.         if k in res:
  5.             res[k].append(v)
  6.         else:
  7.             res[k] = [v]
  8.     return res
  10. def group_assoc_list(a_list): # with setdefault
  11.     res = {}
  12.     for k, v in a_list:
  13.         res.setdefault(k, []).append(v)
  14.     return res

This is something which we could do when parsing query string of an URL, if there weren’t a readily available API for that:

  1. def parse_query_string(qs):
  2.     return [arg.split('=') for arg in qs.split('&')]
  4. >>> group_assoc_list(parse_query_string('a=1&b=2&c=3&b=4&a=5'))
  5. {'a': ['1', '5'], 'b': ['2', '4'], 'c': ['3']}

So, with setdefault we can get the same job done more succinctly. It would seem that it is a nifty little function, and something we should keep in mind as part of our toolbox. Let’s remember it and move on, shall we?…

Actually, no. setdefault is really not a good piece of dict‘s interface, and the main reason we should remember about it is mostly for its caveats. Indeed, there are quite a few of them, enough to shrink the space of possible applications to rather tiny size. As a result, we should be cautious whenever we see (or write) setdefault in our own code.

Here’s why.

On Clever Usage of Python ‘with’ Clauses

2012-01-21 20:21

As the Python docs state, with statement is used to encapsulate common patterns of tryexceptfinally constructs. It is most often associated with managing external resources in order to ensure that they are properly freed, released, closed, disconnected from, or otherwise cleaned up. While at times it is sufficient to just write the finally block directly, repeated occurrences ask for using this language goodness more consciously, including writing our own context managers for specialized needs.

Those managers – basically a with-enabled, helper objects – are strikingly similar to small local objects involved in the RAII technique from C++. The acronym expands to Resource Acquisition Is Initialization, further emphasizing the resource management part of this pattern. But let us not be constrained by that notion: the usage space of with is much wider.

Looking Back, and Maybe Even Forward

2012-01-01 12:35

Also known as Obligatory New Year’s Post.

It was quite a year, this 2011. No single ground-breaking change, but a lot of somewhat significant events and small steps – mostly in the right direction. A short summary is of course in order, because taking time to stop and reflect is a good thing from time to time.

Technically, the biggest change would be the fact that I’m no longer a student. Attaining MSc. some time in the first quarter, I finished a five year-long period of computer science studies at Warsaw University of Technology. While there are mixed views on the importance of formal education, I consider this a major and important achievement – and a one with practical impact as well.

Being a polyglot is fun

My master thesis was about implementing a reflection system for C++. Ironically, since then I haven’t really got to code anything in this language. That’s not actually something I’m at odds with. For me, sticking to just one language for extended period of time seems somewhat detrimental to development of one’s programming skills. On the other hand, there goes the saying that a language which doesn’t change your view on programming as a whole is not worth learning. As usual, it looks like a question of proper balance.

This year, I’ve got to use a handful of distinct languages in different contexts and applications. There was Java but mostly (if not exclusively) on the Android platform. There was JavaScript in its original incarnation – i.e. on client side, in the browser.
Finally, there was Python: for scripts, for cloud computing on Google App Engine, for general web programming, and for many everyday tasks and experiments. It seems to be my first choice language as of now – a one that I’m most productive in. Still, it probably has many tricks and crispy details waiting to be uncovered, which makes it likely to grab my attention for quite a bit longer.

Its status always has contenders, though. Clojure, Ruby and Haskell are among languages which I gave at least a brief glance in 2011. The last one is especially intriguing and may therefore be a subject of few posts later on.

Speaking and listening

2011 was also a busy year for me when it comes to attending various software-related events. Many of these were organized or influenced by local Google Technology User Group. Some of those I even got to speak at, lecturing on the Google App Engine platform or advanced topics in Android UI programming. In either case it was an exciting and refreshing experience.

There were also several other events and meet-ups I got to attend in the passing year. Some of them even required traveling abroad, some resulted in grabbing juicy awards (such as autographed books), while some were slightly less formal albeit still very interesting.
And kinda unexpected, too. I learned that there is bunch of thriving communities gathered around specific technologies, and they are all just around the corner – literally. Because contrary to the stereotype of lone hacker, their members are regularly meeting in real life. Wow! ;-)

Author: Xion, posted under Computer Science & IT, Life, Thoughts » 2 comments

