Archive for Computer Science & IT

Thinning Your Fat Fingers

2012-09-07 10:32

I often say I don’t believe programmers need to be great typists. No software project was ever late because its code couldn’t be typed fast enough. However, the fact that developer’s job consists mostly of thinking, intertwined with short outbursts of typing, means that it is beneficial to type fast, therefore getting back quickly to what’s really important.
Yet, typing code is significantly different game than writing prose in natural language (unless you are sprinkling your code with copious amount of comments and docstrings). I don’t suppose the skill of typing regular text fast (i.e. with all ten fingers) translates well into building screens of code listings. You need a different sort of exercise to be effective at that; usually, it just comes with a lot of coding practice.

But you may want to rush things a bit, and maybe have some fun in the process. I recently discovered a website called which aims to help you with improving your code-specific typing skills. When you sign up, you get presented with a choice of about dozen common languages and popular open source projects written in them. Your task is simple: you have to type their code in short, 15-line sprints, and your speed and accuracy will be measured and reported afterwards.

The choice of projects, and their fragments to type in, is generally pretty good. It definitely provides a very nice way to get the “feel” of any language you might want to learn in the future. You’ll get to see a lot of good, working, practical code written in it – not to mention you get to type it yourself :) Personally, I’ve found the C listings (of Redis data store) to be the most pleasant to both read and type, but it’s pretty likely you will have different preferences.

The application isn’t perfect, of course: it doesn’t really replicate the typical indentation dynamics of most code editors and IDEs. Instead, it opts for handling it implicitly, so the only whitespace you get to type is line and word break. You also don’t get to use your text navigation skills and clipboard-fu, which I’ve seen many coders leverage extensively when they are programming.
I think that’s fine, though, because the whole thing is specifically about typing. It’s great and pretty clear idea, and as such I strongly encourage you to try it out!

Tags: , ,
Author: Xion, posted under Applications, Programming » 1 comment

Don’t Write Classes

2012-08-29 11:17

On this year’s PyCon US, there was a talk with rather (thought-)provoking title Stop Writing Classes. The speaker might not be the most charismatic one you’ve listened to, but his point is important, even if very simple. Whenever you have class with a constructor and just one other method, you could probably do better by turning it into a single function instead.

Examples given in the presentation were in Python, of course, but the whole advice is pretty generic. It can be applied with equal success even to languages that are object-oriented to the extreme (like Java): just replace ‘function’ with ‘static method’. However, if we are talking about Python, there are many more situations where we can replace classes with functions. Often this will result in simpler code with less nesting levels.

Let’s see a few examples.

Inheriting for custom __init__

Sometimes we want to construct many similar objects that differ only slightly in a way their constructors are invoked. A rather simple example would be a urllib2.Request with some custom HTTP headers included:

  1. import urllib2
  3. class CustomRequest(urllib2.Request):
  4.     def __init__(self, url, data=None, headers=None, *args, **kwargs):
  5.         headers = headers or {}
  6.         headers.setdefault('User-Agent', 'MyAwesomeApplication')
  7.         super(CustomRequest, self).__init__(url, data, headers, *args, **kwargs)
  9. # usage
  10. request = urllib2.urlopen(CustomRequest("")).read()

That works, but it’s unnecessarily complex without adding any notable benefits. It’s unlikely that we ever want to perform an isinstance check to distinguish between CustomRequest and the original Request, which is the main “perk” of using class-based approach.

Indeed, we could do just as well with a function:

  1. def CustomRequest(url, data=None, headers=None, *args, **kwargs):
  2.     headers = headers or {}
  3.     headers.setdefault('User-Agent', 'MyAwesomeApplication')
  4.     return urllib2.Request(url, data, headers, *args, **kwargs)

Note how usage doesn’t even change, thanks to Python handling classes like any other callables. Also, notice the reduced amount of underscores ;)

Patching-in methods

Even if the method we want to override is not __init__, it might still make sense to not do it through inheritance. Python allows to add or replace methods of specific objects simply by assigning them to some attribute. This is commonly referred to as monkey patching and it enables to more or less transparently change behavior of most objects once they have been created:

  1. import logging
  2. import functools
  4. def log_method_calls(obj, method_name):
  5.     """Wrap the given method of ``obj`` object with log calls."""
  6.     method = getattr(obj, method_name)
  8.     @functools.wraps(method)
  9.     def wrapped_method(*args, **kwargs):
  10.         logging.debug("Calling %r.%s with args=%s, kwargs=%s",
  11.                       obj, method_name, args, kwargs)
  12.         result = method(*args, **kwargs)
  13.         logging.debug("Call to %r.%s returned %r",
  14.                       obj, method_name, result)
  15.         return result
  17.     setattr(obj, method_name, wrapped_method)
  18.     return obj

You will likely say that this look more hackish than using inheritance and/or decorators, and you’ll be correct. In some cases, though, this might be a right thing. If the solution for the moment is indeed a bit hacky, “disguising” it into seemingly more mature and idiomatic form is unwarranted pretension. Sometimes a hack is fine as long as you are honest about it.

Plain old data objects

Coming to Python from a more strict language, like C++ or Java, you may be tempted to construct types such as this:

  1. /*
  2.  * Represents a MIME content type, e.g. text/html or image/png.
  3.  */
  4. class ContentType {
  5. private:
  6.     std:string _major, _minor;
  7. public:
  8.     ContentType(const std::string& major, const std::string& minor)
  9.         : _major(major), _minor(minor) { }
  10.     ContentType(const std::string& spec) {
  11.         std::vector<string> parts;
  12.         boost::split(parts, spec, boost::is_any_of("/"));
  13.         _major =; _minor =;
  14.     }
  16.     const std::string& Major() const { return _major; }
  17.     const std::string& Minor() const { return _minor; }
  19.     operator std::string () const {
  20.         return Major() + "/" + Minor();
  21.     }
  22. };
  24. // usage
  25. ContentType plainText("text/plain");
  26. sendMail("Hello", "Hello world!", plainText);

An idea is to encapsulate some common piece of data and pass it along in uniform way. In compiled, statically typed languages this is a good way to make the type checker work for us to eliminate certain kind of bugs and errors. If we declare a function to take ContentType, we can be sure we won’t get anything else. As a result, once we convert the initial string (like "application/json") into an object somewhere at the edge of the system, the rest of it can be simpler: it doesn’t have to bother with strings anymore.

But in dynamically typed, interpreted languages you can’t really extract such benefits because there is no compiler you can instruct to do your bookkeeping. Although you are perfectly allowed to write analogous classes:

  1. class ContentType(object):
  2.     """Represents a MIME content type, e.g. text/html or image/png."""
  3.     def __init__(self, major, minor=None):
  4.         if minor is None:
  5.             self._major, self._minor = major.split('/')
  6.         else:
  7.             self._major = major
  8.             self._minor = minor
  10.     major = property(lambda self: self._major)
  11.     minor = property(lambda self: self._minor)
  13.     def __str__(self):
  14.         return '%s/%s' % (self.major, self.minor)

there is no real benefit in doing so. Since you cannot be bulletproof-sure that a function will only receive objects of your type, a better solution (some would say “more pythonic”) is to keep the data in original form, or a simple form that is immediately usable. In this particular case a raw string will probably do best, although a tuple ("text", "html") – or better yet, namedtuple – may be more convenient in some applications.

So indeed…

…stop writing classes. Not literally all of them, of course, but always be on the lookout for alternatives. More often than not, they tend to make code (and life) simpler and easier.

Tags: , ,
Author: Xion, posted under Programming » 3 comments

Hello World Fallacy

2012-08-14 19:50

These days you cannot make more than few steps on the Web before tripping over yet another wonderful framework, technology, library, platform… or even language. More often that not they are promising heaven and stars: ease of use, flexibility, scalability, performance, and so on. Most importantly, they almost always emphasize how easy it is to get started and have working, tangible results – sometimes even whole apps – in very short time.

In many cases, they are absolutely right. With just the right tools, you can make some nice stuff pretty quickly. True, we’re still far from a scenario where you simply choose features you’d like to have, with them blending together automatically – even if some folks make serious leaps in that direction.
But if you think about it for a moment, it’s not something that we actually want, for reasons that are pretty obvious. The less effort is needed to create something, the less value it presents, all other things being equal. We definitely don’t expect to see software development reduced into rough equivalent of clicking through Windows wizards, because everything produced like that would be just hopelessly generic.

But think how easy it would be to get started with that

And thus we come to the titular issue which I took liberty in calling a “Hello World” Fallacy. It occurs when a well-meaning programmer tries out a new piece of technology and finds how easy it is to do simple stuff in it. Everything seems to fall into place: tutorials are clear, to the point and easy to follow; results appear quickly and are rather impressive; difficulties or setbacks are few and far between. Everything just goes extremely well.. What is the problem, then?

The problem lies in a sort of “halo effect” those early successes are likely to create. While surveying a new technology, it’s extremely tempting to look at the early victories as useful heuristic for evaluating the solution as a whole. We may think the way particular tech makes it easy to produce relatively simple apps is a good indicator of how it would work for bigger, more complicated projects. It’s about assuming a specific type of scalability: not necessarily tied to performance of handling heavy load of thousands of users, but to size and complexity of the system handling it.

Point is, your new technology may not really scale all that well. What makes it easy to pick up, among other things, is how good it fits to the simple use cases you will typically exercise when you are just starting out. But this early adequacy is not an evidence for ability to scale into bigger, more serious applications. If anything, it might constitute a feasible argument for the contrary. Newbie-friendliness often goes against long-term usability for more advanced users; compare, for example, the “intuitive” Ribbon UI introduced in relatively recent version Microsoft Office to its previous, much more powerful and convenient interface. While I don’t stipulate it’s a pure zero-sum game, I think catering to beginners and experts alike is surely more difficult than addressing the needs of only one target audience. The former is definitely a road less traveled.

When talking about software libraries or frameworks, the ‘expert’ would typically refer to developer using the tech for large and long-term project. They are likely to explore most of the crooks and crannies, often hitting brick walls that at first may even appear impassable. For them, the most important quality for a software library is its “workaroundability”: how well it performs at not getting in the way between programmer and job done, and how hackable it is – i.e. susceptible to stretching its limits beyond what authors originally intended.

This quality is hardly evident when you’ve only done few casual experiments with your shiny new package. General experience can help a great deal with arriving at unbiased conclusion, and so can the explicit knowledge about the whole issue. While it’s beyond my limited powers to help you significantly to the former, I can at least gently point to the latter.

Happy hacking!

You Are Smarter than Quantum Physicists

2012-07-31 22:13

Fairly recently, I started reading up on quantum mechanics (QM) to brush up my understanding of the topic and, quite surprisingly, I’ve found it ripe with analogies to my typical interests: software development. The one that stands out particularly well relates to the very basics of QM and the way they were widely misunderstood for many decades. What’s really amusing here is that while majority of physicists seem to have been easily fooled by how the world operates on quantum level, any contemporary half-decent software engineer, faced with problems of very similar nature, typically doesn’t exhibit folly of this magnitude.

We are not uncovering the Grand Scheme of Things every day, of course; what I’m saying is that we seem to be much less likely to come up with certain extremely bad answers to all the why? questions we encounter constantly in our work. Even the really hard ones (“Why-oh-why it doesn’t work?!”) are rarely different in this regard.

Thus I dare to say that we would not be so easily tricked by some “bizarre” phenomena that have fooled many of the early QM researchers. In fact, they turn out to be perfectly reasonable (and rather simple) if we look at them with programmer’s mindset. The hard part, of course, is to discover that such a perspective applies here, instead of quickly jumping to “intuitive” but wrong conclusions.

To see how tempting that jump can be, we should now look at one simple experiment with light and mirrors, and try to decipher its puzzling results.

A story of shy photons

The setup is not very complicated. We have one light source, two detectors and two pairs of mirrors. One pair consists of standard, fully reflective mirrors. Second pair has half-silvered ones; they reflect only half of the light, letting the other half through without changing its direction.
We arrange this equipment as shown in the following picture. Here, the yellow lines depict path the light is taking after being emitted from the source, somewhere beyond the left edge.

Source of this and subsequent images

But in this experiment, we are not letting out a continuous ray of light. Instead, we send out individual photons. We know (from some previous observations) that half-silvered mirrors are still behaving correctly in this scenario: they just reflect a photon about 50% of the time. Normal mirrors, obviously, are always reflecting all the photons.

Knowing this, we would expect both detectors to go off with roughly similar frequency. What we find out in practice is that only detector 2 is ever registering any photons, and no particle whatsoever reaches detector 1, at any time. (This is illustrated by a dashed line).

At this point we might want to perform a sanity check, to see whether we are really dealing with individual particles (rather than waves that can interfere and thus cancel themselves out). So, we block out one of the paths:

and now both detectors are going off, but not simultaneously. This indicates that our photons are indeed localized particles, as they appear to be only in one place at a time. Yet, for some weird inexplicable reason, they don’t show at detector 1 if we remove the barrier.

There are all sorts of peculiar conclusions we could come up with already, including the mere possibility of photon going both ways to have an effect on results we observe. Let’s try not to be crazy just yet, though. Surely we can establish which one of the two paths is actually being taken; it’s just a matter of putting an additional sensor:

So we do just that, and we turn on the machinery again. What we find out, however, is far from definite answer. Actually, it’s totally opposite: both detectors are going off now, just like in the previous setup – but we haven’t blocked anything this time! We just wanted to take a sneak peak and learn about the actual paths that our photons are taking.

But as it turns out, we are now preventing the phenomenon from occurring at all… What the hell?!

Tags: , ,
Author: Xion, posted under Computer Science & IT, Science, Thoughts » Comments Off on You Are Smarter than Quantum Physicists


2012-07-24 13:39

The term ‘cookie’ shall be familiar not only to programmers, but also to many of the more conscious, ordinary computer users. I’m not, unfortunately, talking about sweet pastry, but HTTP cookies.

Those cookies that we all know and love (if we’re web developers) or hate (if we’re overly paranoid about privacy) are not the only thing in computing to be known under this name, however. I learned this quite recently when talking to a friend of mine who is working in the realm of IT security. As it turns out, ‘cookie’ can refer to quite diverse array of different solutions, all unified through similar underlying concept.

Tags: , , , ,
Author: Xion, posted under Computer Science & IT » Comments Off on Cookies!

Python Optimization 101

2012-07-14 21:17

It’s pretty much assumed that if you’re writing Python, you are not really concerned with the performance and speed of your code, provided it gets the job done in sufficiently timely manner. The benefits of using such a high level language usually outweigh the cons, so we’re at ease with sacrificing some of the speed in exchange for other qualities. The feasibility of this trade is always relative, though, and depends entirely on the tasks at hand. Sometimes the ‘sufficiently fast’ bar might be hung quite high up.

But while some attitudes are clearly beyond Python’s reach – like real-time software on embedded systems – it doesn’t mean it’s impossible to write efficient code. More importantly, it’s almost always possible to write more efficient code than we currently have; the nimble domain of optimization has its subdivision dedicated specifically to Python. And quite surprisingly, performance-tuning at this high level of abstraction often proves to be even more challenging than squeezing nanoseconds out of bare metal.

So today, we’re going to look at some basic principles of optimization and good practices targeted at writing efficient Python code.

Tools of the trade

Before jumping into specific advice, it’s essential to briefly mention few standard modules that are indispensable when doing any kind of optimization work.

The first one is timeit, a simple utility for measuring execution time of snippets of Python code. Using timeit is often one of the easiest way to confirm (or refute) our suspicions about insufficient performance of particular piece of code. timeit helps us in straightforward way: by executing the statement in questions many times and showing average, as well as cumulative, time it has taken.

As for more detailed analysis, the profile and cProfile modules can be used to gain insight on CPU time consumed by different parts of our code. Profiling a statement will yield us some vital data about number of times that any particular function was called, how much time a single call takes on average and how big is the function’s impact on overall execution time. These are the essential information for identifying bottlenecks and therefore ensuring that our optimizations are correctly targeted.

Then there is the dis module: Python’s disassembler. This nifty tool allows us to inspect instructions that the interpreter is executing, with handy names translated from actual binary bytecode. Compared to actual assemblers or even code executed on Java VM, Python’s bytecode is pretty trivial to analyze:

  1. >>> dis.dis(lambda x: x + 1)
  2.   1           0 LOAD_FAST                0 (x)
  3.               3 LOAD_CONST               1 (1)
  4.               6 BINARY_ADD          
  5.               7 RETURN_VALUE

Getting familiar with it proves to be very useful, though, as eliminating slow instructions in favor of more efficient ones is a fundamental (and effective) optimization technique.

Tags: , , ,
Author: Xion, posted under Computer Science & IT » 2 comments

jqpm: Package Manager for jQuery

2012-07-07 15:05

There is a specific technology I wanted to play around with for some time now; it’s called node.js. It also happens that I think the best way to get to know new stuff is to create something small, but complete and functional. Note that by ‘functional’ I don’t really mean ‘practical’; that distinction is pretty important, given what I’m about to present here.

Basically, I wrote a package manager for jQuery. The idea was to have a straightforward way to install jQuery plugins – a way that somewhat mirrors the experience of dozens of other package managers, from pip to cabal. End result looks pretty decent, in my opinion:

  1. $ jqpm install flot
  2. [jqpm] flot installed successfully
  3. $ ls *.js
  4. jquery.flot.js

The funny part? It doesn’t use any central, remote registry of plugins. What it does is searching GitHub and pulling code directly from there – provided it is able to find something relevant that looks like jQuery plugin. That seems to work well for quite a few popular ones, which is rather surprising given how silly and simplistic the underlying algorithm is. Certainly, there’s plenty of room for improvement, including support for jquery.json manifests – the future standard for the upcoming official plugin site.

As I said before, though, the main purpose of jqpm was educational one. After toying with underlying technologies for a couple of evenings, I definitely have better perspective to evaluate their usefulness. While the topic might warrant a follow-up posts in the future, I think I can briefly summarize my findings in few bullet points:

  • Node’s JavaScript is almost the same language you can find in your browser, with all of its wats, warts and shortcomings. That’s not a big problem if you already learned to deal with them, but I surely wouldn’t recommend it as starter language for novices. Additionally, it also turns out to be quite verbose language, with all the ubiquitous functions and loops, and without denser syntactic sugar such as list comprehensions.
  • By contrast, the standard library of Node is very nice mixture of usefulness and minimalism. It’s certainly not as rich as Python’s or Java’s, but it’s more than usable, despite sitting a bit on the low level side.
  • The canonical tool for managing dependencies, npm, is rather curious creature. Combined with the way Node resolves require() calls, it makes for an unusual system that resembles classic C/C++ #includes – but improved, of course. What stands out the most is the lack of virtualenv/rvm-style utilities; instead, an equivalent approach of local node_modules subfolders is used instead. (npm faq and npm help folders provide more elaborate explanation on how does it work exactly).
  • The callback-based, asynchronous computation is a big hindrance that doesn’t really seem worthwhile. Intriguingly, the hassles of async vs. sync feel strangely similar to issues with pure vs. impure code in functional languages such as Haskell; in both cases you need some serious refactoring of brainware to start coding effectively. In Haskell, however, you are gaining tremendous boons to correctness, modularization, parallelization and testability. In Node, it’s disputable whether you actually gain anything: the whole idea of I/O based on a single event loop sounds all too similar to what an operating system already does with threads sleeping on I/O calls and hardware interrupts that wake them. Granted, this incarnation of asynchronous I/O is much better than some older ones, but that’s mostly thanks to JavaScript being much better equipped to handle the callback bonanza than plain ol’ C.

The bottom line: node.js is definitely not a cancer and has many legitimate uses, mostly pertaining to rapid transfer of relatively small pieces of data over the Internet. API backends, single page web applications or certain game servers all fall easily into this category.

From developer’s point of view, it’s also quite fun platform to code in, despite the asynchronous PITA mentioned above (which is partially alleviated by libraries like async.js or frameworks providing futures/promises). On the overall abstraction ladder, I think it can be placed noticeably lower than Java and not very much higher than plain C. That place is an interesting one, and it’s also not densely populated by any similar technologies and languages (only Go and Objective-C come to mind). Occupying this mostly overlooked niche could very well be one of reasons for Node’s recent popularity.

Tags: , , , ,
Author: Xion, posted under Internet, Programming, Thoughts » 2 comments

© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with