Monthly archive for December, 2011

Disjoint Branches in Git

2011-12-30 12:18

Great services like GitHub encourage to share projects and collaborate on them publicly. But not every piece of code feels like it deserves its own repository. Thus it’s quite reasonable to keep a “miscellaneous” repo which collects smaller, often unrelated hacks.

But how to set up such a repository and what structure should it have? Possible options include separate branches or separate folders within single branch. Personally, I prefer the former approach, as it keeps both the commit history and working directory cleaner. It also makes it rather trivial to promote a project into its own repo.
I speak from experience here, since I did exactly this with my repository of presentation slides. So far, it serves me well.

It’s not hard to arrange a new Git repository in such manner. The idea is to keep the master branch either completely empty, or only store common stuff there – such as a README file:

  1. $ git init
  2. $ echo "This is my repo with miscellaneous hacks." > README
  3. $ git add . && git commit -m "Initial commit"

The actual content will be kept in separate branches, with no relation to each other and to the master one. Such entities are sometimes referred to as root branches. We create them as usual – for example via git checkout:

  1. $ git checkout -b foo

However, this is not nearly enough. We don’t want to base the new branch upon the content from master, but we still have it in the working directory. And even if we were to clean it up manually (using a spell such as ls | xargs rm -r to make sure the .git subdirectory is preserved), the removal would have to be registered as a commit in the new branch. Certainly, it would go against our goal to make it independent from master.

But the working copy is just one thing. In order to have truly independent, root branch we also need to disconnect its history from everything else in the repo. Otherwise, any changesets added before the branch was created would carry over and appear in its log.
Fortunately, making the history clear is very easy – although somewhat scary. We need to reach out to internal .git directory and remove the index file:

  1. $ rm .git/index

Don’t worry, this doesn’t touch any actual data, which is mostly inside .git/objects directory. What we removed is a “table of contents” for current branch, making it pristine clear – just like the master right after git init.

As a nice side effect, the whole content of working directory is now unknown to Git. Once we removed the index, every file and directory has became untracked. Now it’s possible to remove all of them in one go using git clean:

  1. $ git clean -xdf

And that’s it. We now have a branch that has nothing in common with rest of the repository. If we need more, we can simply repeat those three steps, starting from a clean working copy (not necessarily from master branch).

Text Ellipsis with Gradient Fade in Pure CSS

2011-12-26 18:56

The other day I encountered a small but very interesting effect, visible in Bitbucket issues’ table. Some of the cells were slightly too narrow for the text they contained, and it had to be ellipsized. Usually this is done by cropping some of the text’s trailing chars and replacing them with dots – mostly because that’s what the text-ellipsis style is doing. Here, however, I saw something much more original: the text was fading out in gradient-like style, going from full black to full transparent/white over a distance of about 30 pixels. It made quite of an eye-catching effect.

So, I decided to bring up Firebug and find out how this nifty trick actually works. Taught by past experiences, I expected a tightly coupled mis-mash of DOM and CSS hacks, with lots of moving parts that need to be carefully adjusted in face of any changes. Alas, I was wrong: it turned out to only use CSS, in succinct and elegant manner. After simple reverse-engineering, I uncovered a clever solution involving gradients, opacity and :before/:after pseudo-elements. It definitely deserves some press, so let’s look into it.

Tags: , ,
Author: Xion, posted under Internet, Programming » 3 comments

Coded4: Time-based Statistics for Git/Hg Repos

2011-12-24 16:23

As the ubiquitous spirit of laziness relaxation permeates the holiday season, a will to do any serious and productive coding seems appallingly low. Instead of fighting it, I went along and tried to write something just for pure fun.

And guess what: it worked pretty well. Well enough, in fact, that the result seems to be worthy of public exposure – which I’m hereby doing. Behold the coded4 project:

  1. $ git clone git@github.com:jquery/jquery
  2. $ coded4 jquery

  1. name                   commits time        
  2. -------------------------------------------
  3. John Resig             1211    5d 09:11:59
  4. jeresig                503     2d 07:50:02
  5. Jörn Zaefferer         309     1d 09:05:53
  6. Brandon Aaron          247     1d 01:34:29
  7. Dave Methvin           235     22:17:27    
  8. jaubourg               221     22:22:31    
  9. timmywil               221     23:00:56    
  10. Ariel Flesler          200     22:10:46    
  11.  
  12. ...

What exactly is this thing? It’s a “time sheet” for the marvelous jQuery project, reconstructed from commit timestamps in its Git repository. coded4 created this result by analyzing repo’s history, grouping changesets by contributors, and running some heuristics to approximate timespans of their coding sessions.

And of course, this can be done for any Git (or Hg) repository. Pretty neat for a mere *pops up shell and types coded4 .* 3 hours of casual coding, eh?

A Brief Note on Quotes

2011-12-20 20:30

Quite a few languages allow strings to be put in either single (') or double (") quotes. Some of them – like PHP, Perl or Ruby – make a minor distinction by enabling string interpolation to occur only in doubly-quoted ones. Others – including Javascript and Python – offer no distinction whatsoever, bar the possible (in)convenience of using quote chars inside the strings themselves.

But if neither of those apply to your specific case, is there any compelling argument to prefer one type of quotes over another?…

Replying with “Who cares?” seems like a sane thing to do and until recently, I would have concurred. Indeed, it looks like a token example of something totally irrelevant. That’s why I was rather surprised to discover that there might be deep logic behind such choice.

And it’s pretty simple, really – almost obvious in hindsight. Use double quotes for strings which are to be eventually seen by user. Not necessarily the end-user, mind you; an admin or coder looking at logs is equally valid recipient. Similarly, reserve single quotes (apostrophes) for texts used internally: identifiers, enum-like values, keys within hashmaps, XML/JSON attributes, and the like.

It might still seem like somewhat superficial distinction – and blurry at times. But I think that ultimately, it pays off to focus a little on details such as these. As a benefit, we may develop a subtle sense of structure, allowing to see into underlying semantics that much quicker.

Tags: , ,
Author: Xion, posted under Computer Science & IT, Thoughts » 2 comments

About Java references

2011-12-17 19:22

There is somewhat common misconception about garbage collecting, that it totally frees the programmer from memory-related concerns. Granted, it makes the task easier in great many cases, but it does so at the expense of significant loss of control over objects’ lifetime. Normally, they are kept around for at least until they are not needed anymore – and usually that’s fine for the typical definitions of “need” and “at least”. Usually – but not always.

For those less typical use cases, garbage-collected environments provide mechanisms allowing to regain some of that lost control, to the extent necessary for particular task. Java, for example, offers a variety of different types of references, enabling to change the notion of what it means for an object to be eligible for garbage collecting. Choosing the right one for a problem at hand can be crucial, especially if we are concerned with the memory footprint of our application. Since – as the proverb goes – JVM expands to fill all available memory, it’s good to know about techniques which help maintain our heap size in check.

The default is strong

So today, I will discuss the SoftReference and WeakReference classes, which can be both found in the java.lang.ref package. They provide the so-called soft and weak references, which are both considerably less powerful when it comes to prolonging the lifetime of an object.

Tags: , , ,
Author: Xion, posted under Programming » 1 comment

Decorators with Optional Arguments in Python

2011-12-13 18:34

It is common that features dubbed ‘syntactic sugar’ are often fostering novel approaches to programming problems. Python’s decorators are no different here, and this was a topic I touched upon before. Today I’d like to discuss few quirks which are, unfortunately, adding to their complexity in a way that often doesn’t feel necessary.

Let’s start with something easy. Pretend that we have a simple decorator named @trace, which logs every call of the function it is applied to:

  1. @trace
  2. def some_function(*args):
  3.     pass

An implementation of such decorator is relatively trivial, as it wraps the decorated function directly. One of the possible variants can be seen below:

  1. def trace(func):
  2.     def wrapped(*args, **kwargs):
  3.         logging.debug("Calling %s with args=%s, kwargs=%s",
  4.                       func.__name__, args, kwargs)
  5.         return func(*args, **kwargs)
  6.     return wrapped

That’s pretty cool for starters, but let’s say we want some calls to stand out in the logging output. Perhaps there are functions that we are more interested in than the rest. In other words, we’d like to adjust the priority of log messages that are generated by @trace:

  1. @trace(level=logging.INFO)
  2. def important_func():
  3.     pass

This seemingly small change is actually mandating massive conceptual leap in what our decorator really does. It becomes apparent when we de-sugar the @decorator syntax and look at the plumbing underneath:

  1. important_func = trace(level=logging.INFO)(important_func)

Introduction of parameters requires adding a new level of indirection, because it’s the return value of trace(level=logging.INFO) that does the actual decorating (i.e. transforming given function into another). This might not be obvious at first glance and admittedly, a notion of function that returns a function which takes some other function in order to output a final function might be – ahem – slightly confusing ;-)

But wait! There is just one more thing… When we added the level argument, we not necessarily wanted to lose the ability to invoke @trace without it. Yes, it is still possible – but the syntax is rather awkward:

  1. @trace()
  2. def some_function(*args):
  3.     pass

That’s expected – trace only returns the actual decorator now – but at least slightly annoying. Can we get our pretty syntax back while maintaining the added flexibility of specifying custom arguments? Better yet: can we make @trace, @trace() and @trace(level) all work at the same time?…

Looks like tough call, but fortunately the answer is positive. Before we delve into details, though, let’s step back and try to somewhat improve the way we are writing our decorators.

Tags: , , , ,
Author: Xion, posted under Programming » 2 comments

Synchronization Through Memcache

2011-12-10 18:11

In web and server-side development, a memcache is a remote service that keeps transient data in temporary, in-memory store and allows for fast access. It is usually based on keys which identify values being stored and retrieved. Due to storing technique – RAM rather than actual disks – it offers great speed (often less than few milliseconds per operation) while introducing a possibility for values to be evicted (deleted) if memcache runs out of space. Should that happen, oldest values are usually disposed of first.

This functionality makes memcache a good secondary storage that complements the usual persistent database, increasing the efficiency of the system as a whole. The usual scenario is to poll memcache first, and resort to querying the database only if the result cannot be found in cache. Once the query is made, its results are memcached for some reasonable amount time (usually called TTL: time to live), depending on allowed trade-offs between speed and being up-to-date with our results.

While making applications more responsive is the primary use case for memcache, it turns out that we can also utilize it for something completely different. That’s because memcache implementations offer something more besides the obvious get/set commands. They also have operations which make it possible to use memcache as synchronization facility.

 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.