An extremely common programming exercise – popping up usually as an interview question – is to write a function that turns all characters in a string into uppercase. As you may know or suspect, such task is not really about the problem stated explicitly. Instead, it’s meant to probe if you can program at all, and whether you remember about handling special subsets of input data. That’s right: the actual problem is almost insignificant; it’s all about the necessary plumbing. Without a need for it, the task becomes awfully easy, especially in certain kind of languages:
This simplicity may be a cause of misconception that the whole problem of letter case is similarly trivial. Actually, I would not be surprised if the notion of having any sort of real ‘problem’ here is baffling to some. After all, every self-respecting language has those toLowerCase
/toUpperCase
functions built-in, right?…
Sure it has. But even assuming they work correctly, they are usually the only case-related transformations available out of the box. As it turns out, it’s hardly uncommon to need something way more sophisticated.
Also known as Obligatory New Year’s Post.
It was quite a year, this 2011. No single ground-breaking change, but a lot of somewhat significant events and small steps – mostly in the right direction. A short summary is of course in order, because taking time to stop and reflect is a good thing from time to time.
Technically, the biggest change would be the fact that I’m no longer a student. Attaining MSc. some time in the first quarter, I finished a five year-long period of computer science studies at Warsaw University of Technology. While there are mixed views on the importance of formal education, I consider this a major and important achievement – and a one with practical impact as well.
My master thesis was about implementing a reflection system for C++. Ironically, since then I haven’t really got to code anything in this language. That’s not actually something I’m at odds with. For me, sticking to just one language for extended period of time seems somewhat detrimental to development of one’s programming skills. On the other hand, there goes the saying that a language which doesn’t change your view on programming as a whole is not worth learning. As usual, it looks like a question of proper balance.
This year, I’ve got to use a handful of distinct languages in different contexts and applications. There was Java but mostly (if not exclusively) on the Android platform. There was JavaScript in its original incarnation – i.e. on client side, in the browser.
Finally, there was Python: for scripts, for cloud computing on Google App Engine, for general web programming, and for many everyday tasks and experiments. It seems to be my first choice language as of now – a one that I’m most productive in. Still, it probably has many tricks and crispy details waiting to be uncovered, which makes it likely to grab my attention for quite a bit longer.
Its status always has contenders, though. Clojure, Ruby and Haskell are among languages which I gave at least a brief glance in 2011. The last one is especially intriguing and may therefore be a subject of few posts later on.
2011 was also a busy year for me when it comes to attending various software-related events. Many of these were organized or influenced by local Google Technology User Group. Some of those I even got to speak at, lecturing on the Google App Engine platform or advanced topics in Android UI programming. In either case it was an exciting and refreshing experience.
There were also several other events and meet-ups I got to attend in the passing year. Some of them even required traveling abroad, some resulted in grabbing juicy awards (such as autographed books), while some were slightly less formal albeit still very interesting.
And kinda unexpected, too. I learned that there is bunch of thriving communities gathered around specific technologies, and they are all just around the corner – literally. Because contrary to the stereotype of lone hacker, their members are regularly meeting in real life. Wow! ;-)
Great services like GitHub encourage to share projects and collaborate on them publicly. But not every piece of code feels like it deserves its own repository. Thus it’s quite reasonable to keep a “miscellaneous” repo which collects smaller, often unrelated hacks.
But how to set up such a repository and what structure should it have? Possible options include separate branches or separate folders within single branch. Personally, I prefer the former approach, as it keeps both the commit history and working directory cleaner. It also makes it rather trivial to promote a project into its own repo.
I speak from experience here, since I did exactly this with my repository of presentation slides. So far, it serves me well.
It’s not hard to arrange a new Git repository in such manner. The idea is to keep the master branch either completely empty, or only store common stuff there – such as a README file:
The actual content will be kept in separate branches, with no relation to each other and to the master one. Such entities are sometimes referred to as root branches. We create them as usual – for example via git checkout:
However, this is not nearly enough. We don’t want to base the new branch upon the content from master, but we still have it in the working directory. And even if we were to clean it up manually (using a spell such as ls | xargs rm -r to make sure the .git subdirectory is preserved), the removal would have to be registered as a commit in the new branch. Certainly, it would go against our goal to make it independent from master.
But the working copy is just one thing. In order to have truly independent, root branch we also need to disconnect its history from everything else in the repo. Otherwise, any changesets added before the branch was created would carry over and appear in its log.
Fortunately, making the history clear is very easy – although somewhat scary. We need to reach out to internal .git directory and remove the index file:
Don’t worry, this doesn’t touch any actual data, which is mostly inside .git/objects directory. What we removed is a “table of contents” for current branch, making it pristine clear – just like the master right after git init.
As a nice side effect, the whole content of working directory is now unknown to Git. Once we removed the index, every file and directory has became untracked. Now it’s possible to remove all of them in one go using git clean:
And that’s it. We now have a branch that has nothing in common with rest of the repository. If we need more, we can simply repeat those three steps, starting from a clean working copy (not necessarily from master branch).
The other day I encountered a small but very interesting effect, visible in Bitbucket issues’ table. Some of the cells were slightly too narrow for the text they contained, and it had to be ellipsized. Usually this is done by cropping some of the text’s trailing chars and replacing them with dots – mostly because that’s what the
text-ellipsis
style is doing. Here, however, I saw something much more original: the text was fading out in gradient-like style, going from full black to full transparent/white over a distance of about 30 pixels. It made quite of an eye-catching effect.
So, I decided to bring up Firebug and find out how this nifty trick actually works. Taught by past experiences, I expected a tightly coupled mis-mash of DOM and CSS hacks, with lots of moving parts that need to be carefully adjusted in face of any changes. Alas, I was wrong: it turned out to only use CSS, in succinct and elegant manner. After simple reverse-engineering, I uncovered a clever solution involving gradients, opacity and :before
/:after
pseudo-elements. It definitely deserves some press, so let’s look into it.
As the ubiquitous spirit of laziness relaxation permeates the holiday season, a will to do any serious and productive coding seems appallingly low. Instead of fighting it, I went along and tried to write something just for pure fun.
And guess what: it worked pretty well. Well enough, in fact, that the result seems to be worthy of public exposure – which I’m hereby doing. Behold the coded4 project:
What exactly is this thing? It’s a “time sheet” for the marvelous jQuery project, reconstructed from commit timestamps in its Git repository. coded4 created this result by analyzing repo’s history, grouping changesets by contributors, and running some heuristics to approximate timespans of their coding sessions.
And of course, this can be done for any Git (or Hg) repository. Pretty neat for a mere *pops up shell and types coded4 .
* 3 hours of casual coding, eh?
Quite a few languages allow strings to be put in either single ('
) or double ("
) quotes. Some of them – like PHP, Perl or Ruby – make a minor distinction by enabling string interpolation to occur only in doubly-quoted ones. Others – including Javascript and Python – offer no distinction whatsoever, bar the possible (in)convenience of using quote chars inside the strings themselves.
But if neither of those apply to your specific case, is there any compelling argument to prefer one type of quotes over another?…
Replying with “Who cares?” seems like a sane thing to do and until recently, I would have concurred. Indeed, it looks like a token example of something totally irrelevant. That’s why I was rather surprised to discover that there might be deep logic behind such choice.
And it’s pretty simple, really – almost obvious in hindsight. Use double quotes for strings which are to be eventually seen by user. Not necessarily the end-user, mind you; an admin or coder looking at logs is equally valid recipient. Similarly, reserve single quotes (apostrophes) for texts used internally: identifiers, enum-like values, keys within hashmaps, XML/JSON attributes, and the like.
It might still seem like somewhat superficial distinction – and blurry at times. But I think that ultimately, it pays off to focus a little on details such as these. As a benefit, we may develop a subtle sense of structure, allowing to see into underlying semantics that much quicker.
There is somewhat common misconception about garbage collecting, that it totally frees the programmer from memory-related concerns. Granted, it makes the task easier in great many cases, but it does so at the expense of significant loss of control over objects’ lifetime. Normally, they are kept around for at least until they are not needed anymore – and usually that’s fine for the typical definitions of “need” and “at least”. Usually – but not always.
For those less typical use cases, garbage-collected environments provide mechanisms allowing to regain some of that lost control, to the extent necessary for particular task. Java, for example, offers a variety of different types of references, enabling to change the notion of what it means for an object to be eligible for garbage collecting. Choosing the right one for a problem at hand can be crucial, especially if we are concerned with the memory footprint of our application. Since – as the proverb goes – JVM expands to fill all available memory, it’s good to know about techniques which help maintain our heap size in check.
So today, I will discuss the SoftReference
and WeakReference
classes, which can be both found in the java.lang.ref
package. They provide the so-called soft and weak references, which are both considerably less powerful when it comes to prolonging the lifetime of an object.