Real-world metaphors are quite abundant when discussing topics related to IT and programming. They seem to be particularly useful when introducing newcomers, although it’s equally easy to point out mature and established techniques that originate from non-digital world (OOP anyone?…). Regardless, the flow of ideas seems to be extremely one-directional, and I think that’s very unfortunate. There’s wealth of concepts specific to IT or software industry that the general public would benefit from knowing about.
One of those, in my not-so-humble opinion, is the idea of issue tracking. I suppose vast majority of readers is intimately familiar with it but let’s put it into words anyway, for the sake of clarity and explicitness. The concept revolves around a system which allows its users to create tickets describing various issues pertaining to some particular project, or part of it, or process, or any similar endeavor. Those tickets necessarily consist of title and content, very much like emails. Usually though, they also have few additional fields that are more meta, and describe the ticket itself. Typical examples include:
Lastly, every ticket allows for discussion in a forum-like manner, and for adding comments to any metadata changes we make.
That’s it, in a nutshell. It doesn’t seem very complicated and frankly, it may not sound very innovative either. Why do I think such a concept is worthy of attention in broader context, then?…
I suppose it this not uncommon to encounter a general situation such as the following. Say you have some well-defined function that performs a transformation of one value into another. It’s not particularly important how lengthy or complicated this function is, only that it takes one parameter and outputs a result. Here’s a somewhat trivial but astonishingly useful example:
Depending on what happens in other parts of your program, you may find yourself applying such function to many different inputs. Then at some point, it is possible that you’ll need to handle lists of those inputs in addition to supporting single values. Query string of URLs, for example, often require such treatment because they may contain more than one value for given key, and web frameworks tend to collate those values into lists of strings.
In those situations, you will typically want to deal just with the list case. This leads to writing a conditional in either the caller code:
or directly inside a particular function. I’m not a big fan of similar solutions because everyone do them differently, and writing the same piece several times is increasingly prone to errors. Quite not incidentally, a mistake is present in the very example above – it shouldn’t be at all hard to spot it.
In any case, repeated application calls for extracting the pattern into something tangible and reusable. What I devised is therefore a general “recursivator”, whose simplified version is given below:
As for usage, I think it’s equally feasible for both on-a-spot calls:
as well as decorating functions to make them recursive permanently. For this, though, it would be wise to turn it into class-based decorator, applying the technique I’ve described previously. This way we could easily extend the solution and tie it to our needs.
But what are the specific ways of doing so? I could think of some, like:
datetime
s into ISO-formatted strings.set
s from being turned into lists, as obviously sets are also iterable. In more general version, one could supply a predicate function for deciding whether to recurse or not.recursive
into generator for more memory-efficient solution. If we’re lucky to program in Python 3.x, it would be a good excuse to employ the new yield from
construct from 3.3.One way or another, capturing a particular concept of computation into actual API such as recursive
looks like a good way for making the code more descriptive and robust. Certainly it adheres to one of the statements from Zen: that explicit is better than implicit.
About two weeks ago I moved to Netherlands, landing in a medium-sized but cozy town of Groningen. This of course deserves a general blogpost on its own right (more than a single one, in fact), but the story I wanted to share today is extremely specific. For the most part, it’s also purely technical, exhibiting typical hacker’s dynamics. An overarching theme is an “itch” that needs to be scratched.
Let’s start then, first by defining the problem at hand.
I happen to live in a square which is referred to as town’s center, where significant fraction of buildings – maybe even majority – look kinda like this. Don’t pay too much attention to the outside appearance, as it can be very misleading. Despite their seemingly old architectural style, they’re often quite new and modern, but have been “retrofitted” to match the surrounding urban landscape. Final effect is rather pleasing aesthetically, I’d say.
What is more important and relevant here, though, is the size of windows. By my standards at least, they are simply enormous – and this fact precludes simple evaluation. (On one hand, there’s a lot of sunlight! On the other hand, there’s a lot of sunlight…). Moreover, it hints at how remarkably long the vertical distance between floor and the ceiling is. In my case, it’s about 2.8 m (that’s 9 ft for you Imperalists), having a significant impact on how spacious the apartment feels.
But for the issue I want to talk about, this was actually a downside. The matter concerns Internet access, which is shared among mine and three neighboring apartments through a Wi-Fi network with a single access point. In theory, the area it has to cover spans just couple of meters. In practice, however, it’s hindered not only by walls and doors, but also – and maybe even primarily – but this significant distance along the vertical axis.
See, most of the typical household wireless routers have directional antennae which are deliberately set to output signal mostly in the horizontal plane. While this allows for a single AP to easily cover even big apartment, it is also a liability in a setting similar to mine. Because the access point is on ground floor, it fails to appropriately cover the higher levels. And since this includes the very place I’m living in, I’ve been having rather annoying problems caused by signal’s low strength and quality: dropped packets, lost connections, and all that stuff. Even web browsing (or similar activities that doesn’t really depend on latency) has been very cumbersome, despite sitting less than 5 meters in straight line from the access point! It’s almost amazing how one can get screwed over by such simple design limitation like the direction of an antenna.
So, you can easily see how this was a problem that yearned to be solved. But how? One does not simply make the waves go further, right?…
Actually, this is perfectly possible: devices known as WLAN repeaters do just that. Serving as sort of amplifier, they can extend the range of wireless network by relaying its signal between base access point and the final receiver, e.g. WLAN interface in a laptop. Basic physics suggest that it obviously cannot be done for free, so side effects include decreased performance of such range-extended networks due to reduced bandwidth. But where applicable, this solution is usually worth its while.
My scenario was definitely one of those, as I vastly prefer “not ideal Wi-Fi connectivity” to “almost no Wi-Fi connectivity at all”. So I went to investigate how I could procure such a device. Specifically, I had an old router lying around unplugged and useless – and it quickly gained my attention.
I cannot say that I’m any sort of expert when it comes to telecommunication, electronics or hardware of any kind (that’s vastly below the level abstraction I typically operate on), but I have some basic idea of what a wireless router really is. Elementary deduction suggest it’s a transceiver, an equipment able to receive and send out radio signals. What those signals are – that should be pretty much irrelevant, as long as (1) they fit into physical characteristics of the device and (2) the only thing we want to do is to propagate them further.
In short, it should be capable of acting as a repeater! Yay?
Well, not really – not at a first glance, at least. While many routers feature the repeater option in their firmware, mine is somewhat old and low-end model: it doesn’t even support IPv6, not to mention goodies like the 802.11n band. Acting as a relay was also on this “nay” list, because being an ordinary access point is pretty much the only thing this inconspicuous black box used to know.
But thanks to one impressive piece of hackery, it was possible for its limitations to be lifted. What I’m talking about is DD-WRT – a community project that provides a custom firmware for variety of different models of popular routers. This firmware is very powerful and allows to easily tap into device’s hidden power, exposing capabilities omitted in vendor software. In my case, it promised to provide the crucial Client Bridge feature: ability to create more then one virtual, wireless interface and form a bridge between them in order to relay network traffic.
I set to try it out – and it turned out to be non-trivial, to say the least. Some steps of the process were amusing due to their obscurity – like setting up a local server for the archaic TFTP protocol. Turns out it was needed for the actual transfer of new firmware and a small Linux distro that works on top of that. I suppose you have to do same when installing Linux on a microwave, but admittedly, I haven’t tried that just yet ;)
The most troublesome part was actually the very beginning. It involved connecting to the device via Ethernet wire and then telnetting at the right moment during it’s boot-up. This way, it is possible to access the RedBoot bootstrapping shell and perform all kinds of surgery on software internals.
Unfortunately, the instruction for making this happen was hopelessly unclear. I spent good several quarters troubleshooting any potential issues, even going as far as to use Wireshark to monitor any traffic originating from the router, looking into how it identifies itself within this crude two-node network.
Fortunately, I’ve later found a much better instruction that didn’t lack the rather important part about holding down the RESET button for, well, long time. From there everything went rather smoothly.
The last part was tweaking numerous options and settings in DD-WRT’s web interface in order to make the router talk to its cousin downstairs. This level of abstraction was obviously much more comfortable for me to work at. Still, there was some sorcery involved, as in deciding whether I’d like to have my own subnet or operate within existing one – essentially a choice between WLAN-to-WLAN router or “switch”. The second option was of course much more appealing because it was completely transparent to clients. Choosing it, I didn’t have to reconfigure the vast multitude of my 3 (three) devices that use Wi-Fi :)
This variant ended up being more complicated, though. And again, I have found both good and rather crappy instructions on how to make it happen – but unlike last time, now they were both coming from the DD-WRT wiki. Well, seems like documentation is not among the strongest sides of this project…
But rest assured: this story has a happy ending :) Yes, it was preceded by juggling IP configuration of my PC and few reboots of the router, but it would be malicious to consider this as something more than a little nuisance.
So, what’s the point in all of this?… I guess the bottom line would be about not being afraid to experiment if something needs to be improved. Risk aversion is powerful, true – but sometimes even failure is not that bad, especially if everything remains inside the realm of software. Here, even if you “blow” something up there will be no holes left to cover with duct tape ;-)
When discussing the topic of unit testing and methodologies they might entail (mostly TDD, i.e. Test-Driven Development), I noticed a curious imbalance in the number and strength of arguments pro and contra. The latter are few and far between, up to the point of ridiculous scarcity when googling “arguments against TDD” is equally likely to yield stories from both sides of the fence. That’s pretty telling. Is it so that TDD in general and unit tests in particular are just the best thing ever, because there is an industry-wide consensus about them?…
I wouldn’t be so sure. All this unequivocal acknowledgement looks suspiciously similar to many other trends and fashions that were (or still are) sweeping through the IT domain, receding only when the alternative approach gains enough traction.
Take OOP, for example. Back in the 90s and around 2000, you would hear all kinds of praise for the object-oriented methodology: how natural it is, how it helps to model problems in intuitive way, how flexible and useful its abstractions are. Critics’ camp existed, of course, but they were small, scattered and not taken very seriously. Objects and classes were reigning supreme.
Compare this to present day, when OOP is taking blows from almost every direction. On one hand, it is rejected on performance basis, as the unknown factors of virtual method’s call are seen as a liability. On the other hand, its abstraction patterns are considered baroque, overblown and outdated, unfit for modern computing challenges – most notably concurrency and asynchronism.
Could it be that approaches emphasizing the utmost importance of unit tests are following the same route? Given the pretty much universal praise they are receiving, it’s not unimaginable. In this context, providing some reasonable counterarguments seems like a good thing: if we let some air out of this balloon, we may prevent it from popping later on.
Incidentally, this is a service for TDD/unit testing that I’m glad to provide ;-) So in the rest of this post, I’m going to discuss some of their potential drawbacks, hopefully helping to even-out the playing field. Ultimately, this should always lead to better software engineering practices, and better software.
One of technological marvels behind modern languages is the easiness of installing new libraries, packages and modules. Thanks to having a central repository (PyPI, RubyGems, Hackage, …) and a suitable installer (pip/easy_install, gem, cabal, …), any library is usually just one command away. For one, this makes it very easy to bootstrap development of a new project – or alternatively, to abandon the idea of doing so because there is already something that does what you need :)
But being generous with external libraries also means adding a lot of dependencies. After a short while, they become practically untraceable, unless we keep an up-to-date list. In Python, for example, it would be the content of requirements.txt file, or a value for requires
parameter of the setuptools.setup
/distutils.setup
function call inside setup.py
module. Other languages have their own means of specifying dependencies but the principles are generally the same.
How to ensure this list is correct, though?… The best way is to create a dedicated virtual environment specifically for our project. An environment is simply a sandboxed interpreter/compiler, along with all the packages that it can use for executing/compiling) programs.
Normally, there is just one, global environment for a system as a whole: all external libraries or packages for a particular language are being installed there. This makes it easy to accidentally introduce extraneous dependencies to our project. More importantly, with this setting we are sharing our required libraries with other applications installed or developed on the system. This spells trouble if we’re relying on particular version of a library: some other program could update it and suddenly break our application this way.
If we use a virtual environment instead, our program is isolated from the rest and is using its own, dedicated set of libraries and packages. Besides preventing conflicts, this also has an added benefit of keeping our dependency list up to date. If we use an API which isn’t present in our virtual environment, the program will simply blow up – hopefully with a helpful error :) Should this happen, we need to make proper amends to the list, and use it to update the environment by reinstalling our project into it. As a bonus – though in practice that’s the main treat – deploying our program to another machine is as trivial as repeating this last step, preferably also in a dedicated virtual environment created there.
So, how to use all this goodness? It heavily depends on what programming language are we actually using. The idea of virtual environments (or at least this very term) comes from Python, where it coalesced into the virtualenv package. For Ruby, there is a pretty much exact equivalent in the form of Ruby Version Manager (rvm). Haskell has somewhat less developed cabal-dev utility, which should nevertheless suffice for most purposes.
More exotic languages might have their own tools for that. In that case, searching for “language virtualenv” is almost certain way to find them.
In a post from top of the year I mentioned that I was looking into Haskell programming language, and promised to give some insight on how it fares compared to others. Well, I think it’s time to deliver on that promise, for the topic of this particular language – and functional programming in general – is indeed very insightful.
I do not claim to have achieved any kind of proficiency in Haskell, of course; I might very well be just scratching the surface. However, this is exactly the sort of perspective I wanted to employ when evaluating language usefulness: a practical standpoint, held by programmer who is looking to use it for actual tasks, without having mastered it in great detail – at least initially. This is, by the way, a pretty common setting when tackling all new things related to coding, be it frameworks, software platforms or languages.
Still, I knew it was almost supposed to be a rough ride. A language whose tutorials purposefully shy away from classic Hello World example (typically inserting a factorial function instead) looks like something designed specifically to melt brains of poor programmers who dare to venture forth to investigate it. Those few who make their way back are told about in folk tales, portrayed as mildly crazy types who profess this weird idea of “purity”, and utter the word ‘monad’ with great contempt.
Okay, I may be exaggerating… slightly. But there is no denying that Haskell attained a certain kind of reputation, something like a quirky cousin in the big and diverse family of languages. Unlike Lisp, he isn’t picked on due to any (particular (and (rather) obvious)) property. There is just this general aura of weirdness and impracticality that he supposedly radiates, repelling all but the most adventurous of coders.
If it really exists, then pity: it certainly failed to repel me. As a result, I may have a story (or two) to share with those who want to learn what’s this functional business is really about, and why should they care.
Grab a chair and something to drink – it’s going to take a while. But in the end, you shouldn’t regret staying.
I’m a big fan of Python generators. With seemingly little effort, they often allow to greatly reduce memory overhead. They do so by completely eliminating intermediate results of list manipulations. Along with functions defined within itertools
package, generators also introduce a basic lazy computation capabilities into Python.
This combination of higher-order functions and generators is truly remarkable. Why? Because compared to ordinary loops, it gains us both speed and readability at the same time. That’s quite an achievement; typically, optimizations sacrifice one for the other.
All this goodness, however, comes with few strings attached. There are some ways in which we can be bitten by improperly used generators, and I think it’s helpful to know about them. So today, I’ll try to outline some of those traps.
You don’t say, eh? That’s one of the main reasons we are so keen on using them, after all! But what I’m implying is the following: if a generator-based code is to actually do anything, there must be a point where all this laziness “bottoms out”. Otherwise, you are just building expression thunks and not really evaluating anything.
How such circumstances might occur, though? One case involves using generators for their side effects – a rather questionable practice, mind you:
The starmap
call does not print anything here, because it just constructs a generator. Only when this generator is iterated over, dictionary elements would be fed to process_kvp
function. Such iteration would of course require a loop (or consume
recipe from itertools
docs), so we might as well do away with generator altogether and just stick with plain old for
:
In real code the judgement isn’t so obvious, of course. We’ll most likely receive the generator from some external function – a one that probably refers to its result only as an iterable. As usual in such cases, we must not assume anything beyond what is explicitly stated. Specifically, we cannot expect that we have any kind of existing, tangible collection with actual elements. An iterable can very well be just a generator, whose items are yet to be evaluated. This fact can impact performance by moving some potentially heavy processing into unexpected places, resulting in e.g. executing database queries while rendering website’s template.
The above point was concerned mostly with performance, but what about correctness? You may think it’s not hard to conform to iterables’ contract, which should in turn guarantee that they don’t backfire on us with unexpected behavior. Yet how many times did you find yourself reading (or writing!) code such as this:
It’s small and may look harmless, but it still manages to make an ungrounded (thus potentially false) assumption about an iterable. The problem lies in if
condition, meant to check whether we have any items
to do stuff with. It would work correctly if items
were a list, dict
, deque
or any other actual collection. But for generators, it will not work that way – even though they are still undoubtedly iterable!
Why? Because generators are not collections; they are just suppliers. We can only tell them to return their next
value, but we cannot peek inside to see if the value is really there. As such, generators are not susceptible to boolean coercion in the same way that collections are; it’s not possible to check their “emptiness”. They behave like regular Python objects and are always truthy, even if it’s “obvious” they won’t yield any value when asked:
Going back to previous sample, we can see that if
block won’t be executed in case of get_stuff_to_do
returning an “empty” generator. Consequences of this fact may vary from barely noticeable to disastrous, depending on how the rest of do_stuff
function looks like. Nevertheless, that code will run with one of its invariants violated: a fertile ground for any kind of unintended effects.
An intuitive, informal understanding of the term ‘iterable’ is likely to include one more unjustified assumption: that it can iterated over, and over, i.e. multiple times. Again, this is very much true if we’re dealing with a collection, but generators simply don’t carry enough information to repeat the sequence they yield. In other words, they cannot be rewound: once we go through a generator, it’s stuck in its final state, not usable for anything more.
Just like with previous caveats, failure to account for that can very well go unnoticed – at least until we spot some weird behavior it results in. Continuing our superficial example from preceding paragraph, let’s pretend the rest of do_stuff
function requires going over items
at least twice. It could be, for example, an iterable of objects in a game or physics simulation; objects that can potentially move really fast and thus require some more sophisticated collision detection (based on e.g. intersection of segments):
Even assuming the incredible coincidence of getting all the required math right (;-)), we wouldn’t see any action whatsoever if items
is a generator. The reason for that is simple: when calculate_displacement
goes through items
once (vigorously applying the Eulerian integration, most likely), it fully expends the generator. For any subsequent traversal (like the one in detect_collitions
), the generator will appear empty. In this particular snippet, it will most likely result in blank screen, which hopefully is enough of a hint to figure out what’s going on :P
An overarching conclusion of the above-mentioned pitfalls is rather evident and seemingly contrary to statement from the beginning. Indeed, generators may not be a drop-in replacement for lists (or other collections) if we are very much relying on their “listy” behavior. And unless memory overhead proves troublesome, it’s also not worth it to inject them into older code that already uses lists.
For new code, however, sticking with generators right off the bat has numerous benefits which I mentioned at the start. What it requires, though, is evicting some habits that might have emerged after we spent some time manipulating lists. I think I managed to pinpoint the most common ways in which those habits result in incorrect code. Incidentally, they all origin from accidental, unfounded expectations towards iterables in general. That’s no coincidence: generators simply happen to be the “purest” of iterables, supporting only the bare minimum of required operations.