How IT Failed to Make People Care

2012-01-24 0:28

Unless you were living under an enormous, media-impervious rock for at least few weeks, you must have heard of the recent ruckus about laws concerned with the so-called intellectual property. It was mostly due to infamous SOPA and PIPA bills that were to pass in the US, and the temporary “blackout” of various big websites whose owners voiced their protest against those bills. But we also have something of a local spin-off, relevant to the EU and specifically Poland: an outcry against ACTA, a similar albeit slightly less ridiculous piece of law, which (as of this writing) is due to be signed in a few days.

Even though we do not really know yet how those issues are going to be resolved, there are already few important lessons to be learned here. And I think that the biggest one is not actually concerned with the merit of laws in question. Instead, it draws attention to the problem of communication between IT industry and general public.

It’s no longer OK to be ignorant of the Internet

It never was, really. If anything, recent events served as a good wake-up call, reminding that the Internet and all related technological infrastructure is something that we take for granted way too often. While I might exaggerate a bit, I don’t think it’s very far-fetched to say that for average person, the Internet is pretty much magic. You pop up “the Internet app”, type whatever you are looking for, and few moments later, voilà: you just got it, by the sheer magic of intertubes. Assuming, of course, that there is actually something specific you have in mind; otherwise, you can always look at what your friends have “shared”, and from there start your clicky journey through the nether. It’s awesome, virtually boundless, and it just works… right?

So far, consequences of such technical ignorance were also mostly technical, surfacing as security issues, loss of data, malware spread and so on. But that’s alright! We’re dabbling into arcane and invoking supernatural powers, so it’s no wonder we sometimes accidentally summon few annoying daemons. Should that happen, we can always call an exorcist in a form of friendly geek-next-door, or (at worst) tech support.

Now, however, failure to grasp the fundamental nature of the Internet can result in much more dramatic backlash. See, this technical stuff underneath is not just a plumbing that can be safely ignored. The foundational, idealistic principles of the ‘net – decentralization, freedom, knowledge, progress, innovation, flexibility, and so on – are woven into its very fabric. It is by exposure to those “boring details” that those ideas may influence and reshape minds, helping to do away with flawed and outdated notions – including, for example, the 19th century’s concept of intellectual property. I cannot even fathom how someone with reasonable understanding of how software, Internet and IT work could conceive something as outrageous as those infamous IP laws. It just doesn’t compute.

It is no longer OK to be ignorant of the society

Yet it was conceived, put into words, formed into a legal document and officially proposed – more than once, in fact. Obviously, such things don’t happen by itself, especially when powerful interest groups are at play. And that’s precisely the reason why we have various public institutions, from parliaments to international organizations: to weed out blatantly bad solutions, and sometimes even let the good ones pass through.

At least, that’s the theory. It’s naive to postulate virtues among politicians (i.e. those willingly aspiring for power), so we make them cling to one value they’ll always embrace, for it is needed to maintain a ruling position: popularity. This should roughly translate to caring for the same things the voters care for. Roughly, because there will be always some disconnection due to effects of scale, perception, biases and multitudes of other factors.

Still, this is pretty much how it works. In order to push an agenda we are vitally interested in (or obstruct one we are strongly against), we need to gain enough publicity. We need to make people share our values and consider as utility those things that we consider as such. That’s how we set the appropriate casual chain in motion, eventually leading to fulfillment of our goals.

And this is where the IT industry has failed miserably. It’s the reason why we’re frantically looking for support, rolling out our biggest cannons, hitting the news with blackouts of absolutely crucial Internet services and DDoS attacks on government sites. We have failed in making the society share our values in elegant, gradual and systematic manner, so now we need to condone a shock therapy to compensate for this negligence. Actually, in some cases we have been actively making things worse, professing the exact opposites of those values mentioned before: centralization, limitation, monocultures, stagnation and rehashing of old concepts.

Now we are just reaping what we have sown.

On Clever Usage of Python ‘with’ Clauses

2012-01-21 20:21

As the Python docs state, with statement is used to encapsulate common patterns of try-except-finally constructs. It is most often associated with managing external resources in order to ensure that they are properly freed, released, closed, disconnected from, or otherwise cleaned up. While at times it is sufficient to just write the finally block directly, repeated occurrences ask for using this language goodness more consciously, including writing our own context managers for specialized needs.

Those managers – basically a with-enabled, helper objects – are strikingly similar to small local objects involved in the RAII technique from C++. The acronym expands to Resource Acquisition Is Initialization, further emphasizing the resource management part of this pattern. But let us not be constrained by that notion: the usage space of with is much wider.

Self-Replacing Script Blocks for Dynamic Lists

2012-01-17 8:52

On contemporary websites and web applications, it is extremely common task to display a list of items on page. In any reasonable framework and/or templating engine, this can be accomplished rather trivially. Here's how it could look like in Jinja or Django templates:

<ul>
{% for item in items %}
    <li>{{ item }}</li>
{% endfor %}
</ul>

But it's usually not too long before our list grows big and it becomes unfeasible to render it all on the server. Or maybe, albeit less likely, we want it to be updated in real-time, without having to reload the whole page. Either case requires to incorporate some JavaScript code, talking to the server and obtaining next portion of data whenever it's needed.

Obviously, that data has to be rendered as well, and there is one option of doing it on the server side, serving actual HTML directly to JS. An arguably better solution is to respond with JSON or similar representation of our items. This is conceptually simpler, feels less messy and is potentially reusable as a part of website's external API. There is just one drawback: it forces rendering to be done also in JavaScript.

Tags: , , , , , , ,
Author: Xion, posted under Applications, Internet » Add comment

Using Executors in Java

2012-01-12 21:17

When thinking about concurrent programs, we are sometimes blinded by the notion of bare threads. We create them, start them, join them, and sometimes even interrupt them, all by operating directly on those tiny little abstractions over several paths of simultaneous execution. At the same time we might be extremely reluctant to directly use synchronization primitives (semaphores, mutexes, etc.), preferring more convenient and tailored solutions - such as thread-safe containers. And this is great, because synchronization is probably the most difficult aspect of concurrent programming. Any place where we can avoid it is therefore one less place to be infested by nasty bugs it could ensue.

So why we still cling to somewhat low-level Threads for actual execution, while having no problems with specialized solutions for concurrent data exchange and synchronization?... Well, we can be simply unaware that "getting code to execute in parallel" is also something that can benefit from safety and clarity of more targeted approach. In Java, one such approach is oh-so-object-orientedly called executors.

As we might expect, an Executor is something that executes, i.e. runs, code. Pieces of those code are given it in a form of Runnables, just like it would happen for regular Threads:

executor.execute(new Runnable() {
    @Override public void run() {
        calculatePiToDecimalPlaces(10000000);
    }
});

Executor itself is an abstract class, so it could be used without any knowledge about queuing policy, scheduling algorithms and any other details of the way it conducts execution of tasks. While this seems feasible in some real cases - such as servicing incoming network requests - executors are useful mainly because they are quite diverse in kind. Their complex and powerful variants are also relatively easy to use.

Let's play in pool

Simple functions for creating different types of executors are contained within the auxiliary Executors class. Behind the scenes, most of them have a thread pool which they pull threads from when they are needed to process tasks. This pool may be of fixed or variable size, and can reuse a thread for more than one task,

Depending on how much load we expect and how many threads can we afford to create, the choice is usually between newCachedThreadPool and newFixedThreadPool. There is also peculiar (but useful) newSingleThreadExecutor, as well as time-based newScheduledThreadPool and newSingleThreadScheduledExecutor, allowing to specify delay for our Runnables by passing them to schedule method instead of execute.

Swapping them

There is one case where the abstract nature of base Executor class comes handy: testing and performance tuning. A certain types of executors can serve as good approximation of some common concurrency scenarios.

Suppose that we are normally handling our tasks using a pool with fixed number of threads, but we are not sure whether it's actually the most optimal number. If our tasks appear to be mostly I/O-bound, it could be good idea to increase the thread count, seeing that threads waiting for I/O operations simply lay dormant for most of the time.
To see if our assumptions have grounds, and how big the increase can be, we can temporarily switch to cached thread pool. By experimenting with different levels of throughput and observing the average execution time along with numbers of threads used by application, we can get a sense of optimal number of threads for our fixed pool.
Similarly, we can adjust and possibly decrease this number for tasks that appear to be mostly CPU-bound.

Finally, it might be also sensible to use the single-threaded executor as a sort of "sanity check" for our complicated, parallel program. What we are checking this way is both correctness and performance, in rather simple and straightforward way.
For starters, our program should still compute correct results. Failing to do so serves as indication that seemingly correct behavior in multi-threaded setting may actually be an accidental side effect of unspotted hazards. In other words, threads might "align just right" if there is more than one running, and this would hide some insidious race conditions which we failed to account for.

As for performance, we should expect the single-thread code to run for longer time than its multi-thread variant. This is somewhat obvious observation that we might carelessly take for granted and thus never verify explicitly - and that's a mistake. Indeed, it's not unheard of to have parallelized algorithms which are actually slower than their serial counterparts. Throwing some threads is not a magic bullet, unfortunately: concurrency is still hard.

Tags: , , ,
Author: Xion, posted under Programming » 1 comment

A Curious Case of Letter Case

2012-01-08 18:32

An extremely common programming exercise - popping up usually as an interview question - is to write a function that turns all characters in a string into uppercase. As you may know or suspect, such task is not really about the problem stated explicitly. Instead, it's meant to probe if you can program at all, and whether you remember about handling special subsets of input data. That's right: the actual problem is almost insignificant; it's all about the necessary plumbing. Without a need for it, the task becomes awfully easy, especially in certain kind of languages:

toUpperCase :: String -> String
toUpperCase s = map toUpper s

This simplicity may be a cause of misconception that the whole problem of letter case is similarly trivial. Actually, I would not be surprised if the notion of having any sort of real 'problem' here is baffling to some. After all, every self-respecting language has those toLowerCase/toUpperCase functions built-in, right?...

Sure it has. But even assuming they work correctly, they are usually the only case-related transformations available out of the box. As it turns out, it's hardly uncommon to need something way more sophisticated.

Looking Back, and Maybe Even Forward

2012-01-01 12:35

Also known as Obligatory New Year's Post.

It was quite a year, this 2011. No single ground-breaking change, but a lot of somewhat significant events and small steps - mostly in the right direction. A short summary is of course in order, because taking time to stop and reflect is a good thing from time to time.

Technically, the biggest change would be the fact that I'm no longer a student. Attaining MSc. some time in the first quarter, I finished a five year-long period of computer science studies at Warsaw University of Technology. While there are mixed views on the importance of formal education, I consider this a major and important achievement - and a one with practical impact as well.

Being a polyglot is fun

My master thesis was about implementing a reflection system for C++. Ironically, since then I haven't really got to code anything in this language. That's not actually something I'm at odds with. For me, sticking to just one language for extended period of time seems somewhat detrimental to development of one's programming skills. On the other hand, there goes the saying that a language which doesn't change your view on programming as a whole is not worth learning. As usual, it looks like a question of proper balance.

This year, I've got to use a handful of distinct languages in different contexts and applications. There was Java but mostly (if not exclusively) on the Android platform. There was JavaScript in its original incarnation - i.e. on client side, in the browser.
Finally, there was Python: for scripts, for cloud computing on Google App Engine, for general web programming, and for many everyday tasks and experiments. It seems to be my first choice language as of now - a one that I'm most productive in. Still, it probably has many tricks and crispy details waiting to be uncovered, which makes it likely to grab my attention for quite a bit longer.

Its status always has contenders, though. Clojure, Ruby and Haskell are among languages which I gave at least a brief glance in 2011. The last one is especially intriguing and may therefore be a subject of few posts later on.

Speaking and listening

2011 was also a busy year for me when it comes to attending various software-related events. Many of these were organized or influenced by local Google Technology User Group. Some of those I even got to speak at, lecturing on the Google App Engine platform or advanced topics in Android UI programming. In either case it was an exciting and refreshing experience.

There were also several other events and meet-ups I got to attend in the passing year. Some of them even required traveling abroad, some resulted in grabbing juicy awards (such as autographed books), while some were slightly less formal albeit still very interesting.
And kinda unexpected, too. I learned that there is bunch of thriving communities gathered around specific technologies, and they are all just around the corner - literally. Because contrary to the stereotype of lone hacker, their members are regularly meeting in real life. Wow! ;-)

Tags: , , , ,
Author: Xion, posted under Computer Science & IT, Life, Thoughts » Add comment

Disjoint Branches in Git

2011-12-30 12:18

Great services like GitHub encourage to share projects and collaborate on them publicly. But not every piece of code feels like it deserves its own repository. Thus it's quite reasonable to keep a "miscellaneous" repo which collects smaller, often unrelated hacks.

But how to set up such a repository and what structure should it have? Possible options include separate branches or separate folders within single branch. Personally, I prefer the former approach, as it keeps both the commit history and working directory cleaner. It also makes it rather trivial to promote a project into its own repo.
I speak from experience here, since I did exactly this with my repository of presentation slides. So far, it serves me well.

It's not hard to arrange a new Git repository in such manner. The idea is to keep the master branch either completely empty, or only store common stuff there - such as a README file:

$ git init
$ echo "This is my repo with miscellaneous hacks."> README
$ git add . && git commit -m "Initial commit"

The actual content will be kept in separate branches, with no relation to each other and to the master one. Such entities are sometimes referred to as root branches. We create them as usual - for example via git checkout:

$ git checkout -b foo

However, this is not nearly enough. We don't want to base the new branch upon the content from master, but we still have it in the working directory. And even if we were to clean it up manually (using a spell such as ls | xargs rm -r to make sure the .git subdirectory is preserved), the removal would have to be registered as a commit in the new branch. Certainly, it would go against our goal to make it independent from master.

But the working copy is just one thing. In order to have truly independent, root branch we also need to disconnect its history from everything else in the repo. Otherwise, any changesets added before the branch was created would carry over and appear in its log.
Fortunately, making the history clear is very easy - although somewhat scary. We need to reach out to internal .git directory and remove the index file:

$ rm .git/index

Don't worry, this doesn't touch any actual data, which is mostly inside .git/objects directory. What we removed is a "table of contents" for current branch, making it pristine clear - just like the master right after git init.

As a nice side effect, the whole content of working directory is now unknown to Git. Once we removed the index, every file and directory has became untracked. Now it's possible to remove all of them in one go using git clean:

$ git clean -xdf

And that's it. We now have a branch that has nothing in common with rest of the repository. If we need more, we can simply repeat those three steps, starting from a clean working copy (not necessarily from master branch).

 


© 2012 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.