Archive for Programming

Infinitely Extensible Projects

2012-10-23 14:10

When diving into new language, or radically different framework, it may be a good idea to have a bigger project where you can apply your newfound skills. In my experience, this is typically better than having a lot of smaller ones, because it minimizes the hassle of project’s initial setup. Therefore, it encourages you to experiment more.

To reap the largest benefits of this approach, the project of choice should exhibit two important properties:

  • It must be easy to get started. Note that it doesn’t necessarily mean a complete programming newbie should be able to code the initial scaffolding in one session. But getting started must be easy for you specifically, so that you can dabble in interesting stuff almost right away.
    What it means exactly is dependent on your overall coding experience, and also on comparative difficulty of whatever you are trying to learn. For those taking their first steps in programming as a whole, extending their first Hello world program might be appropriate. However, if you are learning your fourth of fifth language you can aim for something a tad more ambitious.
  • It should offer practically infinite possibilities of extension. The idea is for the project to grow along with skills and knowledge you acquire, enabling you to try ever more things.
    In normal development, this typically results in feature creep and is best avoided. But experimental, exploratory, educational coding does not really have to be concerned with such notions. Of course, if you can pack your learning experiences into a usable program then double kudos to you.

What types of projects fit into this characteristic? I’d say quite a lot of them.

When I was honing my Python skills, I started programming an IRC bot so that I could cram a few ideas into it rather quickly. They were implemented mostly as commands that users could input and have the bot perform some actions, like searching Wikipedia for any given term.
A similar pattern (collection of mostly independent commands) can be realized in many different scenarios. Aspiring web programmers could come up with something like a YubNub clone (bonus points if it allows users to add their own commands). Complete coding novices would probably have to resort to simple, menu-based programs in terminal instead.

Another option is to attack a problem which is a very broad and/or vague. Text editors, for example, fuel countless discussions (even wars) over what functionality should they contain and how it should be accessible in the UI. Chances are slim that your take on the problem sprouts a new Emacs or Vim, but a home-brewed editor is easy enough to start and obviously extensible, almost without limits. Additionally, editors can fit into pretty much any environment, from terminals to desktop UIs or HTML5 applications.

Some endeavors are a bit more specific, though. In web development, a CMS or blogging engine became something of a timeless classic now. Everyone has written one at some point, and there’s a lot of additional (thought not always useful) functionality that can be added to it. Getting the basics right is also a challenge here, especially from security standpoint.

For mobile app creators, the infamous To-Do list app is an idea exercised ad nauseam. But it’s actually a good playground for toying with various device capabilities (e.g. location-based reminders) or web services (like Google or iCloud calendar).

More?…

I’m pretty sure I’m far from exhausting the list of possibilities here. I cannot really speak for domains I have little-to-no experience with, for example embedded or hardware-oriented programming with equipment such as Arduino.

It should be possible to come up with infinitely extensible projects for almost every environment and platform, though. After all, every program has always one more feature to add ;)

Tags: ,
Author: Xion, posted under Applications, Programming » Comments Off on Infinitely Extensible Projects

Yummy YAML

2012-10-15 19:48

It won’t be a stretch to bet that you’ve heard of XML. The infamous markup format was intended to be easily parseable by machines, in addition to being readable by humans. Needless to say, it failed to deliver on either of these promises. Markup elements tend to obscure the actual data, while parsing it – with all its namespaces, !DOCTYPEs and ![CDATA[s – is convoluted and not exactly efficient.

Other formats have thus risen to popularity, out of which JSON is probably widest known and used. It also has excellent support on many platforms.

For the purpose of transporting data between Internet endpoints, or for various APIs offered by websites and services, it works pretty great. There are other applications, though, where it might be the obvious first choice – but not necessarily the best one.

What JSON does well is ease to read and parse: the syntax can be outlined and defined in few paragraphs. However, at the same it’s not that convenient to write.

Unlike actual JavaScript, it needs quotes around keynames, even if they would pass as identifiers. (And by quotes I mean double-quotes, also where apostrophes would be better). Furthermore, it requires keeping track of separators between key-value pairs or array elements, not allowing to have a handy trailing comma at the end.

And lastly, there is no good support for longer texts due to lack of multi-line string literals.

Enter YAML

There is another, lesser known format which addresses these concerns very well. It’s called YAML, recursively from YAML Ain’t Markup Language. Here’s a short sample:

  1. name: John Smith
  2. birth date:
  3.   day: 10
  4.   month: 11
  5.   year: 1982
  6. phone:
  7. - type: mobile
  8.   number: 1123581321
  9. - type: work
  10.   number: 0918237465
  11. friends:
  12. - Robert Jones
  13. - Jane Taylor

At first sight it probably appears as pythonic (or haskellish) counterpart to the C-like JSON. Indentation does indeed matter, at least in the most popular and powerful variant of YAML syntax.

However, giving this bit of significance to whitespace allows to substantially reduce syntactic clutter. There are no curly or square brackets, and no commas. For the most part, there’s not much use for quotes either: strings are easily recognized as both keys and values, even if they contain spaces. Arrays are also supported in a straightforward way: by listing their elements with a leading dash (-), including cases where they are key-value objects themselves.

There’s even support for long strings that span multiple lines:

  1. almost_limerick: |  
  2.     There once was a man from the sticks
  3.     Who liked to compose limericks.
  4.         But he failed at the sport,
  5.         For he wrote 'em too short.

Here, pipe character (|) instructs parsers to preserve newlines and most of the whitespace, excluding the leading indent. Were it replaced with greater-than sign (>), an HTML-like fold would be performed, converting chains of whitespace into a single space.

Parsing the thing

Simple YAML documents represent a tree of keys and values which is much the same as the one produced from JSON files. For some programming languages, this similarity extends to the parsing libraries, as they offer more or less the same interface:

  1. def parse_file(path, parser):
  2.     with open(path, 'r') as f:
  3.         return parser.load(f)
  4.  
  5. import json, yaml
  6. print parse_file('data.json', parser=json)
  7. print parse_file('data.yaml', parser=yaml)

Even if it’s not the case for your language, it’s quite likely there is a YAML parser readily available.

More features

Funny thing about YAML is that its glaringly simple syntax hides very powerful and flexible format.

For example, the values are actually typed. They can be strings or nulls, but also integers, floats or booleans. Syntactic rules make a very good job here at automatically detecting the types; for instance, the word Yes will be recognized as boolean truth if standing on its own, but a text starting with it will be correctly recognized as string.

The other nifty feature is referencing. Parts of YAML document tree can be given labels, while other parts can later use those labels to point back to specific nodes. The whole structure can therefore morph into something more general than just a tree:

  1. triangle_graph: &first_vertex
  2. - label: First vertex
  3.   adjacent_to:
  4.   - label: Second vertex
  5.     adjacent_to:
  6.     - label: Third vertex
  7.       adjacent_to:
  8.      - *first_vertex

With types (incl. custom ones) and references, YAML can actually serve pretty well as serialization format for persisting objects.

Uses?…

But besides that, what YAML is actually good for?

I have hinted in the beginning that it seems like a decent choice for structured text data which is meant to be hand-edited. Various configuration files fall into this category, as well as some datasets around the size of contact list. I used YAML as config format for my IRC bot and it worked very well for this purpose. I’m also using it to store initialization data for database used by another side project of mine.

So, if you are not exercising YAML in any of your current endeavors, I’m encouraging you to give it a try. It might not be the best thing since sliced bread, but it’s very pleasant format to work with.

Tags: , ,
Author: Xion, posted under Applications, Programming » 2 comments

Unintended Consequences

2012-10-02 12:13

In Unix-like systems, files and directories with names starting from ‘.’ (dot) are “hidden”: they don’t appear when listed by ls and are not shown by default in graphical file browsers. It doesn’t seem very clear why there is such a mechanism, especially when we have extensive chmod permissions and attributes which are not tied to filename. Actually, that’s one of distinguishing features of Unix/Linux: there are neither .exe nor .app files, just chmod +x.

But, here it is: name-based visibility control for files and directories. Why such a thing was ever implemented in the first place? Well, it turns out it was purely by accident:

Long ago, as the design of the Unix file system was being worked out, the entries . and .. appeared, to make navigation easier. (…) When one typed ls, however, these entries appeared, so either Ken [Thompson] or Dennis [Ritchie] added a simple test to the program. It was in assembler then, but the code in question was equivalent to something like this:

  1. if (name[0] == '.') continue;

That test was meant to filter out . and .. only. Unintentionally, though, it ruled out a much bigger class of names: all that start with a dot, because that’s what it actually checked for. Back then it probably seemed like an innocuous detail. Fast-forward a couple of decades and it’s a de facto standard for storing program-specific data inside user’s home path. They can grow quite numerous over time, too:

  1. $ ls -A1 ~ | grep '^\.' | wc -l
  2. 113

That’s over 100 entries, a vast majority of my home directory. It’s neither elegant nor efficient to have that much of app-specific cruft inside the most important place in the filesystem. And even if GUI applications tend to collectively use a single ~/.config directory, the tradition to clutter the root $HOME path is strong enough to persist for foreseeable future.

Heed this as a warning. In the event your software becomes a basis for many derived solutions, future programmers will exploit every corner case of every piece of logic you have written. It doesn’t really matter what you wanted to put into your code, but only what you actually did.

Tags: , , ,
Author: Xion, posted under Programming » 3 comments

Singleton is Bad Example

2012-09-20 16:59

Design patterns are often criticized, typically in the context of object-oriented programming. I buy into many such critiques, mostly because I value simplicity as one of the most important qualities of good code. Patterns – especially when overused – often stand in the way to achieve it,

Not all critique aimed towards design patterns is well founded and targeted, though. More specifically, the example I’ve seen brought up quite often is the Singleton pattern, and I don’t think it’s a good one in this context. Actually, for making a case for design patterns being (sometimes) harmful, the singleton is probably one of the worst picks.
Realizing this is important, because whatever point you’re trying to convey will be significantly watered down if you use an inadequate example. It’s just too easy to make up counterarguments or excuses, concentrating on specific flaws of your sloppy choice, rather than addressing more general issues you wanted to put some light on. A bad example can simply be a red herring, drawing attention from the topic you wanted it to stand for.

What’s so bad about singleton pattern, though?

It’s not representative

Especially in their classic incarnation formulated in famous work of Gang of Four, design patterns are mostly about increasing robustness and flexibility of software design by introducing additional layers of indirection between existing concepts. For instance, you can consider the Factory pattern as proxy that separates the process of creating an object from specific type (class) of that object.

This goes along the same lines as separation between interface and implementation, a fundamental concept behind the whole object-oriented paradigm. The purpose is to decrease coupling, i.e. dependencies between different parts of the code, and it’s noble goal in its own regard.

Unfortunately, the Singleton pattern doesn’t really aid us in this pursuit. Quite the opposite: it talks about having at most one single instance of some class, which will easily make it a choke point for many otherwise independent parts of program logic. It happens especially often with top-level objects, representing whole subsystems; thanks to making them into singletons, they end up being used almost everywhere.

It has lots of its own problems

We also shouldn’t forget what singletons really are – that is, global variables. (You can have singletons with more limited scope, of course, but OO languages typically support them as language feature that doesn’t require dedicated design pattern). The pattern attempts to abstract them away but they tend to leak out rather eagerly, causing numerous problems.
Indeed, there are all sorts of nastiness related to global variables, with these two being – in my opinion – the most important ones:

  • They play badly when concurrency is involved. Global state is the prime cause of difficulties in concurrent programming, which is why one of the possible solutions is just doing away with state altogether (see functional programming). Regardless whether you are willing to go to such extremes, it’s indisputable that global variables in concurrent setting are liability to be minimized, if not straight out eliminated.
  • They complicate automated testing. Modules that use global variables cannot really be unit tested, because they have (implicit) interdependencies with other parts of the code that use them. More importantly, it can be hard to replace global singletons with their mocked (“fake”) versions for the purpose of testing.

It is worth noting that these problems are somewhat language-specific. In several programming languages, you can relatively easily create “global” variables which are only apparent; in reality, they proxy to thread-local and/or mockable objects, addressing both concerns outlined above.
However, in such languages the Singleton pattern is often obsolete as explicit technique, because they readily provide it as part of the language. For example, Python module objects are already singletons: their singularity is guaranteed by interpreter itself.

Try something else

So, if you are to discuss the merits of software design patterns: pros and (specifically) cons, make sure you don’t base your whole argumentation on the example of Singleton. Accuracy, integrity and honesty would require choosing a target which is more representative and has no severe, unrelated issues.

Something like, say, Iterator. Or Factory. Or Composite.
Or pretty much anything else.

Essential Coding Activities (That I No Longer Do)

2012-09-13 18:32

Writing code is not everything there is in programming. But writing code comprises of much more than just typing it in. There is compiling or otherwise building it; running the application to see whether it works how it breaks; and of course debugging to pinpoint the issue and fix it. These are inherent parts of development process and we shouldn’t expect to be skipping them anytime soon…

Well, except that right now, I virtually pass on all of them. Your mileage may vary, of course, but I wouldn’t be surprised if many more developers found themselves in this peculiar position. I actually think this might be a sign of times, and that we should expect more changes in developer’s workflow that head in this very direction.

So, how do you “develop without developing”? Let’s look at the before mentioned activities one by one.

Compiling

Getting rid of the build step is not really inconceivable. There are plenty of languages that do not require additional processing prior to running their code. They are called interpreted languages, and are steadily gaining grounds (and hype) in the programming world for quite some time now.
Python, JavaScript and Ruby are probably the most popular among them. Given that I’m currently doing most of my development work in the first two, it’s no wonder I don’t find myself compiling stuff all that often.

But even if we’re talking about traditional, de facto compiled languages (like Java or C++), there’s still something missing. It’s the fact that you don’t often have to explicitly order your IDE to compile & build your project, because it’s already doing it, all the time.
I feel there’s tremendous productivity gain by shortening the feedback loop and having your editor/IDE work with you as your write the code. When you can spot and correct simple mistakes as you go, you end up having more time and cognitive power for more interesting problems. This background assistance is something that I really like to have at all times, therefore I’ve set it up in my editor for Python as well.

Running

The kind of programs I’m writing most often now – server-side code for web applications and backends – does not require another, seemingly necessary step all that often: running the app. As it stands, their software scaffolding is clever enough to detect changes in runtime and automatically reload program’s code without explicit prompting.

Granted, this works largely because we’re talking about interpreted languages. For compiled ones, there are usually many more hurdles to overcome if we want to allow for hot-swapping code into and out of a running program. Still, there are languages that allow for just that, but they are usually chosen because of reliability requirements for some mission critical systems.

In my opinion, there are also significant programming benefits if you can pull it off on your development machine. They are again related to making the cycle of writing code and testing it shorter, therefore making the whole flow more interactive and “real-time”. As of recently, we can see some serious pushes into this very direction. Maybe we will see this approach hitting mainstream soon enough.

Debugging

“Oh, come on”, you might say, “how can you claim you’ve got rid of debugging? Is all your code always correct and magically bug-free?…”

I wish this was indeed true, but so far reality refuses to comply. What I’m referring to is proactive debugging: stepping though code to investigate the state of variables and objects. This is done to verify whether the actual control flow of particular piece of code is the one that we’ve really intended. If we find a divergence, it might indicate a possible cause for a bug we’re trying to find and fix.

Unfortunately, this debugging ordeal is both ineffective and time consuming. It’s still necessary for investigating errors in some remote, test-forsaken parts of the code which are not (easily) traceable with other methods and tools. For most, however, it’s an obsolete, almost antiquated way of doing things. That’s mainly because:

  • Post-mortem investigation is typically more than enough. Between log messages and stacktraces (especially if they contain full frames with local variables), you’re often very capable of figuring out what’s wrong, and what part of the code you should be looking at. Once there, the fix should be evident… usually :)
  • Many (most?) fixes are made because of failed run of an automated test suite. In this scenario, you have even more information about the bug (like the exacting assert that fails), while the relevant part of the code is even easier to localize. You might occasionally drop into debugger to examine local variables of the test run, but you never really step through whole algorithms.
  • It’s increasingly easier to “try stuff” before putting it into project’s codebase, reducing the likelihood of some unknown factor making our program blow up. If we have a REPL (Read-Eval-Print Loop) to test small snippets of code and e.g. verify assumptions about API we’re going to use, we can eliminate whole classes of errors.

It’s not like you can throw away your Xdb completely. With generous logging, decent test coverage and a little cautiousness when adding new things, the usefulness of long debugging sessions is greatly diminished, though. It is no longer mandatory, or even typical part of development workflow.

Whatever else it may be, I won’t hesitate calling it a progress.

Tags: , , ,
Author: Xion, posted under Programming » 1 comment

Thinning Your Fat Fingers

2012-09-07 10:32

I often say I don’t believe programmers need to be great typists. No software project was ever late because its code couldn’t be typed fast enough. However, the fact that developer’s job consists mostly of thinking, intertwined with short outbursts of typing, means that it is beneficial to type fast, therefore getting back quickly to what’s really important.
Yet, typing code is significantly different game than writing prose in natural language (unless you are sprinkling your code with copious amount of comments and docstrings). I don’t suppose the skill of typing regular text fast (i.e. with all ten fingers) translates well into building screens of code listings. You need a different sort of exercise to be effective at that; usually, it just comes with a lot of coding practice.

But you may want to rush things a bit, and maybe have some fun in the process. I recently discovered a website called typing.io which aims to help you with improving your code-specific typing skills. When you sign up, you get presented with a choice of about dozen common languages and popular open source projects written in them. Your task is simple: you have to type their code in short, 15-line sprints, and your speed and accuracy will be measured and reported afterwards.

The choice of projects, and their fragments to type in, is generally pretty good. It definitely provides a very nice way to get the “feel” of any language you might want to learn in the future. You’ll get to see a lot of good, working, practical code written in it – not to mention you get to type it yourself :) Personally, I’ve found the C listings (of Redis data store) to be the most pleasant to both read and type, but it’s pretty likely you will have different preferences.

The application isn’t perfect, of course: it doesn’t really replicate the typical indentation dynamics of most code editors and IDEs. Instead, it opts for handling it implicitly, so the only whitespace you get to type is line and word break. You also don’t get to use your text navigation skills and clipboard-fu, which I’ve seen many coders leverage extensively when they are programming.
I think that’s fine, though, because the whole thing is specifically about typing. It’s great and pretty clear idea, and as such I strongly encourage you to try it out!

Tags: , ,
Author: Xion, posted under Applications, Programming » 1 comment

Don’t Write Classes

2012-08-29 11:17

On this year’s PyCon US, there was a talk with rather (thought-)provoking title Stop Writing Classes. The speaker might not be the most charismatic one you’ve listened to, but his point is important, even if very simple. Whenever you have class with a constructor and just one other method, you could probably do better by turning it into a single function instead.

Examples given in the presentation were in Python, of course, but the whole advice is pretty generic. It can be applied with equal success even to languages that are object-oriented to the extreme (like Java): just replace ‘function’ with ‘static method’. However, if we are talking about Python, there are many more situations where we can replace classes with functions. Often this will result in simpler code with less nesting levels.

Let’s see a few examples.

Inheriting for custom __init__

Sometimes we want to construct many similar objects that differ only slightly in a way their constructors are invoked. A rather simple example would be a urllib2.Request with some custom HTTP headers included:

  1. import urllib2
  2.  
  3. class CustomRequest(urllib2.Request):
  4.     def __init__(self, url, data=None, headers=None, *args, **kwargs):
  5.         headers = headers or {}
  6.         headers.setdefault('User-Agent', 'MyAwesomeApplication')
  7.         super(CustomRequest, self).__init__(url, data, headers, *args, **kwargs)
  8.  
  9. # usage
  10. request = urllib2.urlopen(CustomRequest("http://www.google.com")).read()

That works, but it’s unnecessarily complex without adding any notable benefits. It’s unlikely that we ever want to perform an isinstance check to distinguish between CustomRequest and the original Request, which is the main “perk” of using class-based approach.

Indeed, we could do just as well with a function:

  1. def CustomRequest(url, data=None, headers=None, *args, **kwargs):
  2.     headers = headers or {}
  3.     headers.setdefault('User-Agent', 'MyAwesomeApplication')
  4.     return urllib2.Request(url, data, headers, *args, **kwargs)

Note how usage doesn’t even change, thanks to Python handling classes like any other callables. Also, notice the reduced amount of underscores ;)

Patching-in methods

Even if the method we want to override is not __init__, it might still make sense to not do it through inheritance. Python allows to add or replace methods of specific objects simply by assigning them to some attribute. This is commonly referred to as monkey patching and it enables to more or less transparently change behavior of most objects once they have been created:

  1. import logging
  2. import functools
  3.  
  4. def log_method_calls(obj, method_name):
  5.     """Wrap the given method of ``obj`` object with log calls."""
  6.     method = getattr(obj, method_name)
  7.  
  8.     @functools.wraps(method)
  9.     def wrapped_method(*args, **kwargs):
  10.         logging.debug("Calling %r.%s with args=%s, kwargs=%s",
  11.                       obj, method_name, args, kwargs)
  12.         result = method(*args, **kwargs)
  13.         logging.debug("Call to %r.%s returned %r",
  14.                       obj, method_name, result)
  15.         return result
  16.  
  17.     setattr(obj, method_name, wrapped_method)
  18.     return obj

You will likely say that this look more hackish than using inheritance and/or decorators, and you’ll be correct. In some cases, though, this might be a right thing. If the solution for the moment is indeed a bit hacky, “disguising” it into seemingly more mature and idiomatic form is unwarranted pretension. Sometimes a hack is fine as long as you are honest about it.

Plain old data objects

Coming to Python from a more strict language, like C++ or Java, you may be tempted to construct types such as this:

  1. /*
  2.  * Represents a MIME content type, e.g. text/html or image/png.
  3.  */
  4. class ContentType {
  5. private:
  6.     std:string _major, _minor;
  7. public:
  8.     ContentType(const std::string& major, const std::string& minor)
  9.         : _major(major), _minor(minor) { }
  10.     ContentType(const std::string& spec) {
  11.         std::vector<string> parts;
  12.         boost::split(parts, spec, boost::is_any_of("/"));
  13.         _major = parts.at(0); _minor = parts.at(1);
  14.     }
  15.  
  16.     const std::string& Major() const { return _major; }
  17.     const std::string& Minor() const { return _minor; }
  18.  
  19.     operator std::string () const {
  20.         return Major() + "/" + Minor();
  21.     }
  22. };
  23.  
  24. // usage
  25. ContentType plainText("text/plain");
  26. sendMail("Hello", "Hello world!", plainText);

An idea is to encapsulate some common piece of data and pass it along in uniform way. In compiled, statically typed languages this is a good way to make the type checker work for us to eliminate certain kind of bugs and errors. If we declare a function to take ContentType, we can be sure we won’t get anything else. As a result, once we convert the initial string (like "application/json") into an object somewhere at the edge of the system, the rest of it can be simpler: it doesn’t have to bother with strings anymore.

But in dynamically typed, interpreted languages you can’t really extract such benefits because there is no compiler you can instruct to do your bookkeeping. Although you are perfectly allowed to write analogous classes:

  1. class ContentType(object):
  2.     """Represents a MIME content type, e.g. text/html or image/png."""
  3.     def __init__(self, major, minor=None):
  4.         if minor is None:
  5.             self._major, self._minor = major.split('/')
  6.         else:
  7.             self._major = major
  8.             self._minor = minor
  9.  
  10.     major = property(lambda self: self._major)
  11.     minor = property(lambda self: self._minor)
  12.  
  13.     def __str__(self):
  14.         return '%s/%s' % (self.major, self.minor)

there is no real benefit in doing so. Since you cannot be bulletproof-sure that a function will only receive objects of your type, a better solution (some would say “more pythonic”) is to keep the data in original form, or a simple form that is immediately usable. In this particular case a raw string will probably do best, although a tuple ("text", "html") – or better yet, namedtuple – may be more convenient in some applications.

So indeed…

…stop writing classes. Not literally all of them, of course, but always be on the lookout for alternatives. More often than not, they tend to make code (and life) simpler and easier.

Tags: , ,
Author: Xion, posted under Programming » 3 comments
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.