When thinking about concurrent programs, we are sometimes blinded by the notion of bare threads. We create them, start them, join them, and sometimes even interrupt
them, all by operating directly on those tiny little abstractions over several paths of simultaneous execution. At the same time we might be extremely reluctant to directly use synchronization primitives (semaphores, mutexes, etc.), preferring more convenient and tailored solutions – such as thread-safe containers. And this is great, because synchronization is probably the most difficult aspect of concurrent programming. Any place where we can avoid it is therefore one less place to be infested by nasty bugs it could ensue.
So why we still cling to somewhat low-level Thread
s for actual execution, while having no problems with specialized solutions for concurrent data exchange and synchronization?… Well, we can be simply unaware that “getting code to execute in parallel” is also something that can benefit from safety and clarity of more targeted approach. In Java, one such approach is oh-so-object-orientedly called executors.
As we might expect, an Executor
is something that executes, i.e. runs, code. Pieces of those code are given it in a form of Runnable
s, just like it would happen for regular Thread
s:
Executor
itself is an abstract class, so it could be used without any knowledge about queuing policy, scheduling algorithms and any other details of the way it conducts execution of tasks. While this seems feasible in some real cases – such as servicing incoming network requests – executors are useful mainly because they are quite diverse in kind. Their complex and powerful variants are also relatively easy to use.
Simple functions for creating different types of executors are contained within the auxiliary Executors
class. Behind the scenes, most of them have a thread pool which they pull threads from when they are needed to process tasks. This pool may be of fixed or variable size, and can reuse a thread for more than one task,
Depending on how much load we expect and how many threads can we afford to create, the choice is usually between newCachedThreadPool
and newFixedThreadPool
. There is also peculiar (but useful) newSingleThreadExecutor
, as well as time-based newScheduledThreadPool
and newSingleThreadScheduledExecutor
, allowing to specify delay for our Runnable
s by passing them to schedule
method instead of execute
.
There is one case where the abstract nature of base Executor
class comes handy: testing and performance tuning. A certain types of executors can serve as good approximation of some common concurrency scenarios.
Suppose that we are normally handling our tasks using a pool with fixed number of threads, but we are not sure whether it’s actually the most optimal number. If our tasks appear to be mostly I/O-bound, it could be good idea to increase the thread count, seeing that threads waiting for I/O operations simply lay dormant for most of the time.
To see if our assumptions have grounds, and how big the increase can be, we can temporarily switch to cached thread pool. By experimenting with different levels of throughput and observing the average execution time along with numbers of threads used by application, we can get a sense of optimal number of threads for our fixed pool.
Similarly, we can adjust and possibly decrease this number for tasks that appear to be mostly CPU-bound.
Finally, it might be also sensible to use the single-threaded executor as a sort of “sanity check” for our complicated, parallel program. What we are checking this way is both correctness and performance, in rather simple and straightforward way.
For starters, our program should still compute correct results. Failing to do so serves as indication that seemingly correct behavior in multi-threaded setting may actually be an accidental side effect of unspotted hazards. In other words, threads might “align just right” if there is more than one running, and this would hide some insidious race conditions which we failed to account for.
As for performance, we should expect the single-thread code to run for longer time than its multi-thread variant. This is somewhat obvious observation that we might carelessly take for granted and thus never verify explicitly – and that’s a mistake. Indeed, it’s not unheard of to have parallelized algorithms which are actually slower than their serial counterparts. Throwing some threads is not a magic bullet, unfortunately: concurrency is still hard.
An extremely common programming exercise – popping up usually as an interview question – is to write a function that turns all characters in a string into uppercase. As you may know or suspect, such task is not really about the problem stated explicitly. Instead, it’s meant to probe if you can program at all, and whether you remember about handling special subsets of input data. That’s right: the actual problem is almost insignificant; it’s all about the necessary plumbing. Without a need for it, the task becomes awfully easy, especially in certain kind of languages:
This simplicity may be a cause of misconception that the whole problem of letter case is similarly trivial. Actually, I would not be surprised if the notion of having any sort of real ‘problem’ here is baffling to some. After all, every self-respecting language has those toLowerCase
/toUpperCase
functions built-in, right?…
Sure it has. But even assuming they work correctly, they are usually the only case-related transformations available out of the box. As it turns out, it’s hardly uncommon to need something way more sophisticated.
Also known as Obligatory New Year’s Post.
It was quite a year, this 2011. No single ground-breaking change, but a lot of somewhat significant events and small steps – mostly in the right direction. A short summary is of course in order, because taking time to stop and reflect is a good thing from time to time.
Technically, the biggest change would be the fact that I’m no longer a student. Attaining MSc. some time in the first quarter, I finished a five year-long period of computer science studies at Warsaw University of Technology. While there are mixed views on the importance of formal education, I consider this a major and important achievement – and a one with practical impact as well.
My master thesis was about implementing a reflection system for C++. Ironically, since then I haven’t really got to code anything in this language. That’s not actually something I’m at odds with. For me, sticking to just one language for extended period of time seems somewhat detrimental to development of one’s programming skills. On the other hand, there goes the saying that a language which doesn’t change your view on programming as a whole is not worth learning. As usual, it looks like a question of proper balance.
This year, I’ve got to use a handful of distinct languages in different contexts and applications. There was Java but mostly (if not exclusively) on the Android platform. There was JavaScript in its original incarnation – i.e. on client side, in the browser.
Finally, there was Python: for scripts, for cloud computing on Google App Engine, for general web programming, and for many everyday tasks and experiments. It seems to be my first choice language as of now – a one that I’m most productive in. Still, it probably has many tricks and crispy details waiting to be uncovered, which makes it likely to grab my attention for quite a bit longer.
Its status always has contenders, though. Clojure, Ruby and Haskell are among languages which I gave at least a brief glance in 2011. The last one is especially intriguing and may therefore be a subject of few posts later on.
2011 was also a busy year for me when it comes to attending various software-related events. Many of these were organized or influenced by local Google Technology User Group. Some of those I even got to speak at, lecturing on the Google App Engine platform or advanced topics in Android UI programming. In either case it was an exciting and refreshing experience.
There were also several other events and meet-ups I got to attend in the passing year. Some of them even required traveling abroad, some resulted in grabbing juicy awards (such as autographed books), while some were slightly less formal albeit still very interesting.
And kinda unexpected, too. I learned that there is bunch of thriving communities gathered around specific technologies, and they are all just around the corner – literally. Because contrary to the stereotype of lone hacker, their members are regularly meeting in real life. Wow! ;-)
Great services like GitHub encourage to share projects and collaborate on them publicly. But not every piece of code feels like it deserves its own repository. Thus it’s quite reasonable to keep a “miscellaneous” repo which collects smaller, often unrelated hacks.
But how to set up such a repository and what structure should it have? Possible options include separate branches or separate folders within single branch. Personally, I prefer the former approach, as it keeps both the commit history and working directory cleaner. It also makes it rather trivial to promote a project into its own repo.
I speak from experience here, since I did exactly this with my repository of presentation slides. So far, it serves me well.
It’s not hard to arrange a new Git repository in such manner. The idea is to keep the master branch either completely empty, or only store common stuff there – such as a README file:
The actual content will be kept in separate branches, with no relation to each other and to the master one. Such entities are sometimes referred to as root branches. We create them as usual – for example via git checkout:
However, this is not nearly enough. We don’t want to base the new branch upon the content from master, but we still have it in the working directory. And even if we were to clean it up manually (using a spell such as ls | xargs rm -r to make sure the .git subdirectory is preserved), the removal would have to be registered as a commit in the new branch. Certainly, it would go against our goal to make it independent from master.
But the working copy is just one thing. In order to have truly independent, root branch we also need to disconnect its history from everything else in the repo. Otherwise, any changesets added before the branch was created would carry over and appear in its log.
Fortunately, making the history clear is very easy – although somewhat scary. We need to reach out to internal .git directory and remove the index file:
Don’t worry, this doesn’t touch any actual data, which is mostly inside .git/objects directory. What we removed is a “table of contents” for current branch, making it pristine clear – just like the master right after git init.
As a nice side effect, the whole content of working directory is now unknown to Git. Once we removed the index, every file and directory has became untracked. Now it’s possible to remove all of them in one go using git clean:
And that’s it. We now have a branch that has nothing in common with rest of the repository. If we need more, we can simply repeat those three steps, starting from a clean working copy (not necessarily from master branch).
The other day I encountered a small but very interesting effect, visible in Bitbucket issues’ table. Some of the cells were slightly too narrow for the text they contained, and it had to be ellipsized. Usually this is done by cropping some of the text’s trailing chars and replacing them with dots – mostly because that’s what the
text-ellipsis
style is doing. Here, however, I saw something much more original: the text was fading out in gradient-like style, going from full black to full transparent/white over a distance of about 30 pixels. It made quite of an eye-catching effect.
So, I decided to bring up Firebug and find out how this nifty trick actually works. Taught by past experiences, I expected a tightly coupled mis-mash of DOM and CSS hacks, with lots of moving parts that need to be carefully adjusted in face of any changes. Alas, I was wrong: it turned out to only use CSS, in succinct and elegant manner. After simple reverse-engineering, I uncovered a clever solution involving gradients, opacity and :before
/:after
pseudo-elements. It definitely deserves some press, so let’s look into it.
As the ubiquitous spirit of laziness relaxation permeates the holiday season, a will to do any serious and productive coding seems appallingly low. Instead of fighting it, I went along and tried to write something just for pure fun.
And guess what: it worked pretty well. Well enough, in fact, that the result seems to be worthy of public exposure – which I’m hereby doing. Behold the coded4 project:
What exactly is this thing? It’s a “time sheet” for the marvelous jQuery project, reconstructed from commit timestamps in its Git repository. coded4 created this result by analyzing repo’s history, grouping changesets by contributors, and running some heuristics to approximate timespans of their coding sessions.
And of course, this can be done for any Git (or Hg) repository. Pretty neat for a mere *pops up shell and types coded4 .
* 3 hours of casual coding, eh?
Quite a few languages allow strings to be put in either single ('
) or double ("
) quotes. Some of them – like PHP, Perl or Ruby – make a minor distinction by enabling string interpolation to occur only in doubly-quoted ones. Others – including Javascript and Python – offer no distinction whatsoever, bar the possible (in)convenience of using quote chars inside the strings themselves.
But if neither of those apply to your specific case, is there any compelling argument to prefer one type of quotes over another?…
Replying with “Who cares?” seems like a sane thing to do and until recently, I would have concurred. Indeed, it looks like a token example of something totally irrelevant. That’s why I was rather surprised to discover that there might be deep logic behind such choice.
And it’s pretty simple, really – almost obvious in hindsight. Use double quotes for strings which are to be eventually seen by user. Not necessarily the end-user, mind you; an admin or coder looking at logs is equally valid recipient. Similarly, reserve single quotes (apostrophes) for texts used internally: identifiers, enum-like values, keys within hashmaps, XML/JSON attributes, and the like.
It might still seem like somewhat superficial distinction – and blurry at times. But I think that ultimately, it pays off to focus a little on details such as these. As a benefit, we may develop a subtle sense of structure, allowing to see into underlying semantics that much quicker.