Archive for Programming

Working within Temporary Directory

2012-03-24 15:35

Few days ago I needed to write a script which was supposed to run inside a temporary directory. The exact matter was about deployment from an ad hoc Git repository, and it’s something that I may describe in more detail later on. Today, however, I wanted to focus on its small part: a one that (I think) has neatly captured the notion of executing something within a non-persistent, working directory. Because it’s a very general technique, I suppose quite a few readers may find it pretty useful.

Obtaining a temporary file or even directory shouldn’t be a terribly complicated thing – and indeed, it’s very easy in case of Python. We have a standard tempfile module here and it serves our needs pretty well in this regard. For one, it has the mkdtemp function which creates a temporary directory and returns path to it:

  1. temp_dir = tempfile.mkdtemp()

That’s what it does. What it doesn’t do is e.g ensuring a proper cleanup once the directory is not needed anymore. This is especially important on Windows where the equivalent of /tmp is not wiped out at boot time.
We also wanted our fresh temp directory to be set as the program’s working one (PWD), and obviously this is also something we need to manually take care of. To combine those two needs, I think the best solution is to employ a context manager.

Context manager is basically a fancy name for an object that the with statement can be applied upon. You may recall that some time ago I wrote about interesting use cases for the with construct. This one could also qualify as such, but the principles are very typical. It’s about introducing a scope where some resource (here: a temporary directory) remains accessible as long as we’re inside it. Once we leave the with block, it is cleaned up – just like file handles, network sockets, concurrent locks and plenty of other similar objects.

But while semantics are pretty clear, there are of course several ways to do this syntactically. I took this opportunity to try out the supposedly simplest one which I learned recently on local Python community meet-up: the contextlib library. It includes the contextmanager decorator: a simple and clever way to write with-enabled objects as simple functions. It is based on particular usage of yield statement which makes it very interesting even by itself.

So without further ado, let’s look at the final solution I wanted to present:

  1. import os
  2. import shutil
  3. import tempfile
  4. from contextlib import contextmanager
  5.  
  6. @contextmanager
  7. def temp_directory(*args, **kwargs):
  8.     """Allows the program to operate inside temporary directory.
  9.    Sets the app's working dir automatically and restores it
  10.    to original one upon existing the `with` clause.
  11.    """
  12.     orig_workdir = os.getcwd()
  13.     temp_workdir = tempfile.mkdtemp(*args, **kwargs)
  14.     os.chdir(temp_workdir)
  15.  
  16.     yield temp_workdir
  17.  
  18.     os.chdir(orig_workdir)
  19.     shutil.rmtree(temp_workdir)

As we can see, yield divides this function into two parts: setup and cleanup. Setup will be executed when we enter the with block, while cleanup will run when we’re about to exit it. By the way, this scheme of multiple entry and exit points in one function is typically referred to as coroutine, and it allows for several very intriguing techniques of smart computation.

Usage of temp_directory function is pretty obvious, I’d say. Here’s a simplified excerpt of the Git-based deployment script that I used it in:

  1. import subprocess
  2. shell = lambda cmd: subprocess.call(cmd, shell=True)
  3.  
  4. orig_repo = os.getcwd()
  5. with temp_directory():
  6.     shell('git clone --shared %s .' % orig_repo)
  7.     shell('./build')
  8.     shell('git add -f ' + build_products)
  9.     shell('git commit -m "%s"' % message)
  10.     shell('git push %s master' % deploy_remote)

Note how the meaning of '.' (current directory) shifts depending on whether we’re inside or outside the with block. Users of Fabric (Python- and SSH-based remote administration tool) will find this very similar to its cd context manager. The main difference is of course that directory we’re cd-ing to is not a predetermined one, and that it will disappear once we’re done with it.

Tags: , , ,
Author: Xion, posted under Programming » Comments Off on Working within Temporary Directory

How Does this Work (in JavaScript)

2012-03-18 21:19

Many caveats clutter the JavaScript language. Some of them are quite hilarious and relatively harmless, but few can get really nasty and lead to insidious bugs. Today, I’m gonna talk about something from the second group: the semantics of this keyword in JavaScript.

Why’s this?

It is worth noting why JS has the this keyword at all. Normally, we would expect it only in those languages which also have the corresponding class keyword. That’s what C++, Java and C# have taught us: that this represents the current object of a class when used inside one of its methods. It only makes sense, then, to use this keyword in a class scope, denoted by the class keyword – both of which JavaScript doesn’t seem to have. So, why’s this even there?

The most likely reason is that JavaScript actually has something that resembles traditional classes – but it does so very poorly. And like pretty much everything in JS, it is written as a function:

  1. function Greeting(text) {
  2.     this.text = text
  3. }
  4. Greeting.prototype.greet = function(who) {
  5.     alert("Hello, " who + "! " + this.text);
  6. }
  7.  
  8. var greeting = new Greeting("Nice to meet you!");
  9. greeting.greet("Alice");

Here, the Greeting is technically a function and is defined as one, but semantically it works more like constructor for the Greeting “class”. As for this keyword, it refers to the object being created by such a constructor when invoked by new statement – another familiar construct, by the way. Additionally, this also appears inside greet method and does its expected job, allowing access to the text member of an object that the method was called upon.

So it would seem that everything with this keyword is actually fine and rather unsurprising. Have we maybe overlooked something here, looking only at half of the picture?…

Well yes, very much so. And not even a half but more like a quarter, with the remaining three parts being significantly less pretty – to say it mildly.

Tags: , , , ,
Author: Xion, posted under Programming » 2 comments

Adding Recursive Depth to Our Functions

2012-03-05 22:46

I suppose it this not uncommon to encounter a general situation such as the following. Say you have some well-defined function that performs a transformation of one value into another. It’s not particularly important how lengthy or complicated this function is, only that it takes one parameter and outputs a result. Here’s a somewhat trivial but astonishingly useful example:

  1. def is_true(value):
  2.     ''' Checks whether given value can be interpreted as "true",
  3.    using various typical representations of truth. '''
  4.     s = str(value).lower()
  5.     can_be_true = s in ['1', 'y', 'yes', 'true']
  6.     can_be_false = s in ['0', 'n' 'no', 'false']
  7.     if can_be_true != (not can_be_false):
  8.         return bool(value) # fall back in case of inconsistency
  9.     return can_be_true

Depending on what happens in other parts of your program, you may find yourself applying such function to many different inputs. Then at some point, it is possible that you’ll need to handle lists of those inputs in addition to supporting single values. Query string of URLs, for example, often require such treatment because they may contain more than one value for given key, and web frameworks tend to collate those values into lists of strings.

In those situations, you will typically want to deal just with the list case. This leads to writing a conditional in either the caller code:

  1. if not isinstance(values, list):
  2.     values = [values]
  3. bools = map(is_true, values)

or directly inside a particular function. I’m not a big fan of similar solutions because everyone do them differently, and writing the same piece several times is increasingly prone to errors. Quite not incidentally, a mistake is present in the very example above – it shouldn’t be at all hard to spot it.

In any case, repeated application calls for extracting the pattern into something tangible and reusable. What I devised is therefore a general “recursivator”, whose simplified version is given below:

  1. def recursive(func):
  2.     ''' Creates a recursive function out of supplied one.
  3.    Resulting function recurses on lists, applying itself
  4.    to its elements. '''
  5.     def recursive_func(obj, *args, **kwargs):
  6.         if hasattr(obj, '__iter__'):
  7.             return [recursive_func(i, *args, **kwargs)
  8.                     for i in obj]
  9.         return obj
  10.     return recursive_func

As for usage, I think it’s equally feasible for both on-a-spot calls:

  1. bools = recursive(is_true)(values)

as well as decorating functions to make them recursive permanently. For this, though, it would be wise to turn it into class-based decorator, applying the technique I’ve described previously. This way we could easily extend the solution and tie it to our needs.

But what are the specific ways of doing so? I could think of some, like:

  • Recursing not only on lists, bit also on mappings (dictionaries) and applying the function to dictionary values. A common use case could be a kind of sanitization function for preparing values to be serialized, e.g. by turning datetimes into ISO-formatted strings.
  • Excluding some data types from recursion, preventing, say, sets from being turned into lists, as obviously sets are also iterable. In more general version, one could supply a predicate function for deciding whether to recurse or not.
  • Turning recursive into generator for more memory-efficient solution. If we’re lucky to program in Python 3.x, it would be a good excuse to employ the new yield from construct from 3.3.

One way or another, capturing a particular concept of computation into actual API such as recursive looks like a good way for making the code more descriptive and robust. Certainly it adheres to one of the statements from Zen: that explicit is better than implicit.

Tags: , , ,
Author: Xion, posted under Programming » Comments Off on Adding Recursive Depth to Our Functions

Against Unit Tests

2012-02-26 21:23

When discussing the topic of unit testing and methodologies they might entail (mostly TDD, i.e. Test-Driven Development), I noticed a curious imbalance in the number and strength of arguments pro and contra. The latter are few and far between, up to the point of ridiculous scarcity when googling “arguments against TDD” is equally likely to yield stories from both sides of the fence. That’s pretty telling. Is it so that TDD in general and unit tests in particular are just the best thing ever, because there is an industry-wide consensus about them?…

I wouldn’t be so sure. All this unequivocal acknowledgement looks suspiciously similar to many other trends and fashions that were (or still are) sweeping through the IT domain, receding only when the alternative approach gains enough traction.


O RLY?

Take OOP, for example. Back in the 90s and around 2000, you would hear all kinds of praise for the object-oriented methodology: how natural it is, how it helps to model problems in intuitive way, how flexible and useful its abstractions are. Critics’ camp existed, of course, but they were small, scattered and not taken very seriously. Objects and classes were reigning supreme.

Compare this to present day, when OOP is taking blows from almost every direction. On one hand, it is rejected on performance basis, as the unknown factors of virtual method’s call are seen as a liability. On the other hand, its abstraction patterns are considered baroque, overblown and outdated, unfit for modern computing challenges – most notably concurrency and asynchronism.

Could it be that approaches emphasizing the utmost importance of unit tests are following the same route? Given the pretty much universal praise they are receiving, it’s not unimaginable. In this context, providing some reasonable counterarguments seems like a good thing: if we let some air out of this balloon, we may prevent it from popping later on.

Incidentally, this is a service for TDD/unit testing that I’m glad to provide ;-) So in the rest of this post, I’m going to discuss some of their potential drawbacks, hopefully helping to even-out the playing field. Ultimately, this should always lead to better software engineering practices, and better software.

Tags: , , ,
Author: Xion, posted under Programming, Thoughts » 3 comments

Importance of Using Virtual Environments

2012-02-22 19:59

One of technological marvels behind modern languages is the easiness of installing new libraries, packages and modules. Thanks to having a central repository (PyPI, RubyGems, Hackage, …) and a suitable installer (pip/easy_install, gem, cabal, …), any library is usually just one command away. For one, this makes it very easy to bootstrap development of a new project – or alternatively, to abandon the idea of doing so because there is already something that does what you need :)

But being generous with external libraries also means adding a lot of dependencies. After a short while, they become practically untraceable, unless we keep an up-to-date list. In Python, for example, it would be the content of requirements.txt file, or a value for requires parameter of the setuptools.setup/distutils.setup function call inside setup.py module. Other languages have their own means of specifying dependencies but the principles are generally the same.

How to ensure this list is correct, though?… The best way is to create a dedicated virtual environment specifically for our project. An environment is simply a sandboxed interpreter/compiler, along with all the packages that it can use for executing/compiling) programs.

Rationale

Normally, there is just one, global environment for a system as a whole: all external libraries or packages for a particular language are being installed there. This makes it easy to accidentally introduce extraneous dependencies to our project. More importantly, with this setting we are sharing our required libraries with other applications installed or developed on the system. This spells trouble if we’re relying on particular version of a library: some other program could update it and suddenly break our application this way.

If we use a virtual environment instead, our program is isolated from the rest and is using its own, dedicated set of libraries and packages. Besides preventing conflicts, this also has an added benefit of keeping our dependency list up to date. If we use an API which isn’t present in our virtual environment, the program will simply blow up – hopefully with a helpful error :) Should this happen, we need to make proper amends to the list, and use it to update the environment by reinstalling our project into it. As a bonus – though in practice that’s the main treat – deploying our program to another machine is as trivial as repeating this last step, preferably also in a dedicated virtual environment created there.

Using it

So, how to use all this goodness? It heavily depends on what programming language are we actually using. The idea of virtual environments (or at least this very term) comes from Python, where it coalesced into the virtualenv package. For Ruby, there is a pretty much exact equivalent in the form of Ruby Version Manager (rvm). Haskell has somewhat less developed cabal-dev utility, which should nevertheless suffice for most purposes.

More exotic languages might have their own tools for that. In that case, searching for “language virtualenv” is almost certain way to find them.

Tags: , , , , ,
Author: Xion, posted under Programming » 2 comments

The “Wat” of Python

2012-01-31 21:19

It is quite likely you are familiar with the Wat talk by Gary Bernhardt. It is sweeping through the Internet, giving some good laugh to pretty much anyone who watches it. Surely it did to me!
The speaker is making fun of Ruby and JavaScript languages (although mostly the latter, really), showing totally unexpected and baffling results of some seemingly trivial operations – like adding two arrays. It turns out that in JavaScript, the result is an empty string. (And the reasons for that provoke even bigger “wat”).

After watching the talk for about five times (it hardly gets old), I started to wonder whether it is only those two languages that exhibit similarly confusing behavior… The answer is of course “No”, and that should be glaringly obvious to anyone who knows at least a bit of C++ ;) But beating on that horse would be way too easy, so I’d rather try something more ambitious.
Hence I ventured forth to search for “wat” in Python 2.x. The journey wasn’t short enough to stop at mere addition operator but nevertheless – and despite me being nowhere near Python expert – I managed to find some candidates rather quickly.

I strove to keep with the original spirit of Gary’s talk, so I only included those quirks that can be easily shown in interactive interpreter. The final result consists of three of them, arranged in the order of increasing puzzlement. They are given without explanation or rationale, hopefully to encourage some thought beyond amusement :)

Behold, then, the Wat of Python!

Tags: , , ,
Author: Xion, posted under Internet, Programming » 12 comments

Using Executors in Java

2012-01-12 21:17

When thinking about concurrent programs, we are sometimes blinded by the notion of bare threads. We create them, start them, join them, and sometimes even interrupt them, all by operating directly on those tiny little abstractions over several paths of simultaneous execution. At the same time we might be extremely reluctant to directly use synchronization primitives (semaphores, mutexes, etc.), preferring more convenient and tailored solutions – such as thread-safe containers. And this is great, because synchronization is probably the most difficult aspect of concurrent programming. Any place where we can avoid it is therefore one less place to be infested by nasty bugs it could ensue.

So why we still cling to somewhat low-level Threads for actual execution, while having no problems with specialized solutions for concurrent data exchange and synchronization?… Well, we can be simply unaware that “getting code to execute in parallel” is also something that can benefit from safety and clarity of more targeted approach. In Java, one such approach is oh-so-object-orientedly called executors.

As we might expect, an Executor is something that executes, i.e. runs, code. Pieces of those code are given it in a form of Runnables, just like it would happen for regular Threads:

  1. executor.execute(new Runnable() {
  2.     @Override public void run() {
  3.         calculatePiToDecimalPlaces(10000000);
  4.     }
  5. });

Executor itself is an abstract class, so it could be used without any knowledge about queuing policy, scheduling algorithms and any other details of the way it conducts execution of tasks. While this seems feasible in some real cases – such as servicing incoming network requests – executors are useful mainly because they are quite diverse in kind. Their complex and powerful variants are also relatively easy to use.

Let’s play in pool

Simple functions for creating different types of executors are contained within the auxiliary Executors class. Behind the scenes, most of them have a thread pool which they pull threads from when they are needed to process tasks. This pool may be of fixed or variable size, and can reuse a thread for more than one task,

Depending on how much load we expect and how many threads can we afford to create, the choice is usually between newCachedThreadPool and newFixedThreadPool. There is also peculiar (but useful) newSingleThreadExecutor, as well as time-based newScheduledThreadPool and newSingleThreadScheduledExecutor, allowing to specify delay for our Runnables by passing them to schedule method instead of execute.

Swapping them

There is one case where the abstract nature of base Executor class comes handy: testing and performance tuning. A certain types of executors can serve as good approximation of some common concurrency scenarios.

Suppose that we are normally handling our tasks using a pool with fixed number of threads, but we are not sure whether it’s actually the most optimal number. If our tasks appear to be mostly I/O-bound, it could be good idea to increase the thread count, seeing that threads waiting for I/O operations simply lay dormant for most of the time.
To see if our assumptions have grounds, and how big the increase can be, we can temporarily switch to cached thread pool. By experimenting with different levels of throughput and observing the average execution time along with numbers of threads used by application, we can get a sense of optimal number of threads for our fixed pool.
Similarly, we can adjust and possibly decrease this number for tasks that appear to be mostly CPU-bound.

Finally, it might be also sensible to use the single-threaded executor as a sort of “sanity check” for our complicated, parallel program. What we are checking this way is both correctness and performance, in rather simple and straightforward way.
For starters, our program should still compute correct results. Failing to do so serves as indication that seemingly correct behavior in multi-threaded setting may actually be an accidental side effect of unspotted hazards. In other words, threads might “align just right” if there is more than one running, and this would hide some insidious race conditions which we failed to account for.

As for performance, we should expect the single-thread code to run for longer time than its multi-thread variant. This is somewhat obvious observation that we might carelessly take for granted and thus never verify explicitly – and that’s a mistake. Indeed, it’s not unheard of to have parallelized algorithms which are actually slower than their serial counterparts. Throwing some threads is not a magic bullet, unfortunately: concurrency is still hard.

Tags: , , ,
Author: Xion, posted under Programming » 1 comment
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.