When diving into new language, or radically different framework, it may be a good idea to have a bigger project where you can apply your newfound skills. In my experience, this is typically better than having a lot of smaller ones, because it minimizes the hassle of project’s initial setup. Therefore, it encourages you to experiment more.
To reap the largest benefits of this approach, the project of choice should exhibit two important properties:
What types of projects fit into this characteristic? I’d say quite a lot of them.
When I was honing my Python skills, I started programming an IRC bot so that I could cram a few ideas into it rather quickly. They were implemented mostly as commands that users could input and have the bot perform some actions, like searching Wikipedia for any given term.
A similar pattern (collection of mostly independent commands) can be realized in many different scenarios. Aspiring web programmers could come up with something like a YubNub clone (bonus points if it allows users to add their own commands). Complete coding novices would probably have to resort to simple, menu-based programs in terminal instead.
Another option is to attack a problem which is a very broad and/or vague. Text editors, for example, fuel countless discussions (even wars) over what functionality should they contain and how it should be accessible in the UI. Chances are slim that your take on the problem sprouts a new Emacs or Vim, but a home-brewed editor is easy enough to start and obviously extensible, almost without limits. Additionally, editors can fit into pretty much any environment, from terminals to desktop UIs or HTML5 applications.
Some endeavors are a bit more specific, though. In web development, a CMS or blogging engine became something of a timeless classic now. Everyone has written one at some point, and there’s a lot of additional (thought not always useful) functionality that can be added to it. Getting the basics right is also a challenge here, especially from security standpoint.
For mobile app creators, the infamous To-Do list app is an idea exercised ad nauseam. But it’s actually a good playground for toying with various device capabilities (e.g. location-based reminders) or web services (like Google or iCloud calendar).
I’m pretty sure I’m far from exhausting the list of possibilities here. I cannot really speak for domains I have little-to-no experience with, for example embedded or hardware-oriented programming with equipment such as Arduino.
It should be possible to come up with infinitely extensible projects for almost every environment and platform, though. After all, every program has always one more feature to add ;)
It won’t be a stretch to bet that you’ve heard of XML. The infamous markup format was intended to be easily parseable by machines, in addition to being readable by humans. Needless to say, it failed to deliver on either of these promises. Markup elements tend to obscure the actual data, while parsing it – with all its namespaces, !DOCTYPE
s and ![CDATA[
s – is convoluted and not exactly efficient.
Other formats have thus risen to popularity, out of which JSON is probably widest known and used. It also has excellent support on many platforms.
For the purpose of transporting data between Internet endpoints, or for various APIs offered by websites and services, it works pretty great. There are other applications, though, where it might be the obvious first choice – but not necessarily the best one.
What JSON does well is ease to read and parse: the syntax can be outlined and defined in few paragraphs. However, at the same it’s not that convenient to write.
Unlike actual JavaScript, it needs quotes around keynames, even if they would pass as identifiers. (And by quotes I mean double-quotes, also where apostrophes would be better). Furthermore, it requires keeping track of separators between key-value pairs or array elements, not allowing to have a handy trailing comma at the end.
And lastly, there is no good support for longer texts due to lack of multi-line string literals.
There is another, lesser known format which addresses these concerns very well. It’s called YAML, recursively from YAML Ain’t Markup Language. Here’s a short sample:
At first sight it probably appears as pythonic (or haskellish) counterpart to the C-like JSON. Indentation does indeed matter, at least in the most popular and powerful variant of YAML syntax.
However, giving this bit of significance to whitespace allows to substantially reduce syntactic clutter. There are no curly or square brackets, and no commas. For the most part, there’s not much use for quotes either: strings are easily recognized as both keys and values, even if they contain spaces. Arrays are also supported in a straightforward way: by listing their elements with a leading dash (-
), including cases where they are key-value objects themselves.
There’s even support for long strings that span multiple lines:
Here, pipe character (|
) instructs parsers to preserve newlines and most of the whitespace, excluding the leading indent. Were it replaced with greater-than sign (>
), an HTML-like fold would be performed, converting chains of whitespace into a single space.
Simple YAML documents represent a tree of keys and values which is much the same as the one produced from JSON files. For some programming languages, this similarity extends to the parsing libraries, as they offer more or less the same interface:
Even if it’s not the case for your language, it’s quite likely there is a YAML parser readily available.
Funny thing about YAML is that its glaringly simple syntax hides very powerful and flexible format.
For example, the values are actually typed. They can be strings or null
s, but also integers, floats or booleans. Syntactic rules make a very good job here at automatically detecting the types; for instance, the word Yes
will be recognized as boolean truth if standing on its own, but a text starting with it will be correctly recognized as string.
The other nifty feature is referencing. Parts of YAML document tree can be given labels, while other parts can later use those labels to point back to specific nodes. The whole structure can therefore morph into something more general than just a tree:
With types (incl. custom ones) and references, YAML can actually serve pretty well as serialization format for persisting objects.
But besides that, what YAML is actually good for?
I have hinted in the beginning that it seems like a decent choice for structured text data which is meant to be hand-edited. Various configuration files fall into this category, as well as some datasets around the size of contact list. I used YAML as config format for my IRC bot and it worked very well for this purpose. I’m also using it to store initialization data for database used by another side project of mine.
So, if you are not exercising YAML in any of your current endeavors, I’m encouraging you to give it a try. It might not be the best thing since sliced bread, but it’s very pleasant format to work with.
The upcoming release of Windows 8 is stirring up the tech world, mostly because of some controversial decisions Microsoft has made regarding the new version of their flagship OS.
Were it a few years ago, I would likely participate in various discussions about this, springing up on message boards I typically frequent. Probably I would have also tested the development preview which was made available several months before. Most likely, I would have clear opinion about viability of the new Metro UI, WinRT apps’ platform or the overall push in the general direction of HTML5.
I would… But I do not. Actually, I couldn’t care less about the whole thing.
It’s not only because I touch my Windows installation once in a blue moon, almost exclusively for gaming. No, that’s mostly because it’s 2012, and Windows still doesn’t support some crucial functionality you could expect from modern operating system.
Unfortunately, most users don’t expect it because they never knew any better. Therefore I will try to highlight some features from alternative operating systems (typically Linux or OSX) that I think Windows can be rightfully scorned for lacking.
In Unix-like systems, files and directories with names starting from ‘.’ (dot) are “hidden”: they don’t appear when listed by ls and are not shown by default in graphical file browsers. It doesn’t seem very clear why there is such a mechanism, especially when we have extensive chmod permissions and attributes which are not tied to filename. Actually, that’s one of distinguishing features of Unix/Linux: there are neither .exe nor .app files, just chmod +x.
But, here it is: name-based visibility control for files and directories. Why such a thing was ever implemented in the first place? Well, it turns out it was purely by accident:
Long ago, as the design of the Unix file system was being worked out, the entries . and .. appeared, to make navigation easier. (…) When one typed ls, however, these entries appeared, so either Ken [Thompson] or Dennis [Ritchie] added a simple test to the program. It was in assembler then, but the code in question was equivalent to something like this:
if (name[0] == '.') continue;
That test was meant to filter out . and .. only. Unintentionally, though, it ruled out a much bigger class of names: all that start with a dot, because that’s what it actually checked for. Back then it probably seemed like an innocuous detail. Fast-forward a couple of decades and it’s a de facto standard for storing program-specific data inside user’s home path. They can grow quite numerous over time, too:
That’s over 100 entries, a vast majority of my home directory. It’s neither elegant nor efficient to have that much of app-specific cruft inside the most important place in the filesystem. And even if GUI applications tend to collectively use a single ~/.config directory, the tradition to clutter the root $HOME path is strong enough to persist for foreseeable future.
Heed this as a warning. In the event your software becomes a basis for many derived solutions, future programmers will exploit every corner case of every piece of logic you have written. It doesn’t really matter what you wanted to put into your code, but only what you actually did.