Posts tagged ‘Git’

One Thing that Git Does Better: Rebranching

2013-05-04 0:06

Much can be said about similarities between two popular, distributed version control systems: Git and Mercurial. For the most part, choosing between them can be a matter of taste. And because Git seems to have considerably more proponents in the Cool Kids camp, it doesn’t necessarily follow it’s a better option.

But I have found at least one specific and common scenario where Git clearly outshines Hg. Suppose you have coded yourself into a dead end: the feature you planned doesn’t pan out the way you wanted it; or you have some compatibility issues you cannot easily resolve; or you just need to escape the refactoring bathtub.

Rebranching in Git (start)

In any case, you just want to step back a few commits and pretend nothing happened, for now. The mishap might be useful later, though, so it’d be nice if we left it marked for the future.

In Git, this is easily done. You would start by creating a new branch that points to your dead end:

  1. $ git checkout my_feature
  2. $ git branch __my_feature__dead_end__

Afterwards, both my_feature and __my_feature__dead_end__ refer to the same, head commit. We would then move the former a little back, sticking it to one of the earlier hashes. Let’s find a suitable target:

  1. $ git log
  2. commit 130fe41a3a0587a48c1ef8797030ae2e682c6fb4
  3. Author: John Doe <john.doe@example.com>
  4. Date:   Fri May 3 21:39:32 2013 +0200
  5.  
  6.     I don't like this adventure. Not one bit.
  7.  
  8. commit f834a10d199f5c23fa14ed0ebfcf89226d9c21a1
  9. Author: John Doe <john.doe@example.com>
  10. Date:   Fri May 3 21:25:46 2013 +0200
  11.  
  12.     Let's try this, maybe...
  13.  
  14. commit 9d877d32f3a08f416615f9a051ad3985e6e4d2ad
  15. Author: John Doe <john.doe@example.com>
  16. Date:   Fri May 3 21:18:57 2013 +0200
  17.  
  18.     Shiny!
  19. $ git checkout 9d877
  20. Note: checking out '9d877'.
  21. # trimmed for brevity

If it looks right, we can reset the my_feature branch so it points to this specific commit:

  1. $ git reset --hard 9d877

Our final situation would then looks like this:

Rebranching in Git (end)

which is exactly what we wanted. Note how any further commits starting from the referent of my_feature would fork at that point, diverging from development line which has lead us into dead end.

Why the same thing is not so easily done in Mercurial?… In general, this is mostly because of its one fundamental design decision: every commit belongs to one branch, forever and for always. Branch designation is actually part of the changeset’s metadata, just like the commit message or diff. Moving things around – like we did above – is therefore equivalent to changing history and requires tools that are capable of doing so, such as hg strip.

Tags: , , ,
Author: Xion, posted under Programming » 5 comments

Slides from My Talk at Name Collision

2013-02-26 14:12

Last Friday I had the pleasure of attending the Name Collision hackathon in Warsaw. I haven’t actually taken part in the competition but I served as a mentor, trying to help the teams get stuff done as much I could. It was really fun experience.

Before that, though, I gave a short lightning talk about the Git version control systems. I showed a few (hopefully) lesser known tricks that may prove useful both in everyday workflow and in some exceptional situations.
The slides from this talk are available here. They might be a bit uninformative on their own but I suppose the liberal use of memes can make up for this somewhat ;-)

Tags: , ,
Author: Xion, posted under Computer Science & IT, Events » Comments Off on Slides from My Talk at Name Collision

git outgoing

2012-11-20 12:05

Now that I don’t use Mercurial at work anymore, I’ve found that despite its shortcomings (hg status taking 10+ seconds?!) it has some few nice features. One of those is hg outgoing, which shows you which changesets you are going to send to remote repo in your next push. A quick glance at this list will typically ensure that everything is in order, or allow to amend some commits before making them public.

In Git you can do the similar by applying a filter to git log:

  1. $ git log origin/master..

But while origin is most often the remote you want to compare against, the master branch is typically not the one where most of development takes place. So if we want to create a git outgoing command, we would rather check what the current branch is and compare it with its remotely tracked equivalent:

  1. #!/bin/sh
  2. BRANCH=$(git name-rev HEAD 2>/dev/null | awk "{ print \$2 }")
  3. git log origin/$BRANCH..

Simply naming this script git-outgoing and making it executable somewhere within your $PATH (e.g. /usr/bin) will make the git outgoing command available:

  1. $ git outgoing
  2. commit 8c96c21c420dd10a34441cbd7d4c6904a6077716
  3. Author: Karol Kuczmarski <karol.kuczmarski@gmail.com>
  4. Date:   Tue Nov 20 11:51:44 2012 +0100
  5.  
  6.     Add .gitignore
  7.  
  8. commit 8a51a4f39b383c9dff64532403ab3922bc2ae13c
  9. Author: Karol Kuczmarski <karol.kuczmarski@gmail.com>
  10. Date:   Tue Nov 20 11:50:01 2012 +0100
  11.  
  12.     Comments in install script

There are few untold assumptions here, like the fact that branch names must match on both local and remote repo. If you find yourself breaking those, then you’re probably better to just use git log directly.

Tags: , ,
Author: Xion, posted under Programming » 3 comments

Working within Temporary Directory

2012-03-24 15:35

Few days ago I needed to write a script which was supposed to run inside a temporary directory. The exact matter was about deployment from an ad hoc Git repository, and it’s something that I may describe in more detail later on. Today, however, I wanted to focus on its small part: a one that (I think) has neatly captured the notion of executing something within a non-persistent, working directory. Because it’s a very general technique, I suppose quite a few readers may find it pretty useful.

Obtaining a temporary file or even directory shouldn’t be a terribly complicated thing – and indeed, it’s very easy in case of Python. We have a standard tempfile module here and it serves our needs pretty well in this regard. For one, it has the mkdtemp function which creates a temporary directory and returns path to it:

  1. temp_dir = tempfile.mkdtemp()

That’s what it does. What it doesn’t do is e.g ensuring a proper cleanup once the directory is not needed anymore. This is especially important on Windows where the equivalent of /tmp is not wiped out at boot time.
We also wanted our fresh temp directory to be set as the program’s working one (PWD), and obviously this is also something we need to manually take care of. To combine those two needs, I think the best solution is to employ a context manager.

Context manager is basically a fancy name for an object that the with statement can be applied upon. You may recall that some time ago I wrote about interesting use cases for the with construct. This one could also qualify as such, but the principles are very typical. It’s about introducing a scope where some resource (here: a temporary directory) remains accessible as long as we’re inside it. Once we leave the with block, it is cleaned up – just like file handles, network sockets, concurrent locks and plenty of other similar objects.

But while semantics are pretty clear, there are of course several ways to do this syntactically. I took this opportunity to try out the supposedly simplest one which I learned recently on local Python community meet-up: the contextlib library. It includes the contextmanager decorator: a simple and clever way to write with-enabled objects as simple functions. It is based on particular usage of yield statement which makes it very interesting even by itself.

So without further ado, let’s look at the final solution I wanted to present:

  1. import os
  2. import shutil
  3. import tempfile
  4. from contextlib import contextmanager
  5.  
  6. @contextmanager
  7. def temp_directory(*args, **kwargs):
  8.     """Allows the program to operate inside temporary directory.
  9.    Sets the app's working dir automatically and restores it
  10.    to original one upon existing the `with` clause.
  11.    """
  12.     orig_workdir = os.getcwd()
  13.     temp_workdir = tempfile.mkdtemp(*args, **kwargs)
  14.     os.chdir(temp_workdir)
  15.  
  16.     yield temp_workdir
  17.  
  18.     os.chdir(orig_workdir)
  19.     shutil.rmtree(temp_workdir)

As we can see, yield divides this function into two parts: setup and cleanup. Setup will be executed when we enter the with block, while cleanup will run when we’re about to exit it. By the way, this scheme of multiple entry and exit points in one function is typically referred to as coroutine, and it allows for several very intriguing techniques of smart computation.

Usage of temp_directory function is pretty obvious, I’d say. Here’s a simplified excerpt of the Git-based deployment script that I used it in:

  1. import subprocess
  2. shell = lambda cmd: subprocess.call(cmd, shell=True)
  3.  
  4. orig_repo = os.getcwd()
  5. with temp_directory():
  6.     shell('git clone --shared %s .' % orig_repo)
  7.     shell('./build')
  8.     shell('git add -f ' + build_products)
  9.     shell('git commit -m "%s"' % message)
  10.     shell('git push %s master' % deploy_remote)

Note how the meaning of '.' (current directory) shifts depending on whether we’re inside or outside the with block. Users of Fabric (Python- and SSH-based remote administration tool) will find this very similar to its cd context manager. The main difference is of course that directory we’re cd-ing to is not a predetermined one, and that it will disappear once we’re done with it.

Tags: , , ,
Author: Xion, posted under Programming » Comments Off on Working within Temporary Directory

Disjoint Branches in Git

2011-12-30 12:18

Great services like GitHub encourage to share projects and collaborate on them publicly. But not every piece of code feels like it deserves its own repository. Thus it’s quite reasonable to keep a “miscellaneous” repo which collects smaller, often unrelated hacks.

But how to set up such a repository and what structure should it have? Possible options include separate branches or separate folders within single branch. Personally, I prefer the former approach, as it keeps both the commit history and working directory cleaner. It also makes it rather trivial to promote a project into its own repo.
I speak from experience here, since I did exactly this with my repository of presentation slides. So far, it serves me well.

It’s not hard to arrange a new Git repository in such manner. The idea is to keep the master branch either completely empty, or only store common stuff there – such as a README file:

  1. $ git init
  2. $ echo "This is my repo with miscellaneous hacks." > README
  3. $ git add . && git commit -m "Initial commit"

The actual content will be kept in separate branches, with no relation to each other and to the master one. Such entities are sometimes referred to as root branches. We create them as usual – for example via git checkout:

  1. $ git checkout -b foo

However, this is not nearly enough. We don’t want to base the new branch upon the content from master, but we still have it in the working directory. And even if we were to clean it up manually (using a spell such as ls | xargs rm -r to make sure the .git subdirectory is preserved), the removal would have to be registered as a commit in the new branch. Certainly, it would go against our goal to make it independent from master.

But the working copy is just one thing. In order to have truly independent, root branch we also need to disconnect its history from everything else in the repo. Otherwise, any changesets added before the branch was created would carry over and appear in its log.
Fortunately, making the history clear is very easy – although somewhat scary. We need to reach out to internal .git directory and remove the index file:

  1. $ rm .git/index

Don’t worry, this doesn’t touch any actual data, which is mostly inside .git/objects directory. What we removed is a “table of contents” for current branch, making it pristine clear – just like the master right after git init.

As a nice side effect, the whole content of working directory is now unknown to Git. Once we removed the index, every file and directory has became untracked. Now it’s possible to remove all of them in one go using git clean:

  1. $ git clean -xdf

And that’s it. We now have a branch that has nothing in common with rest of the repository. If we need more, we can simply repeat those three steps, starting from a clean working copy (not necessarily from master branch).

Coded4: Time-based Statistics for Git/Hg Repos

2011-12-24 16:23

As the ubiquitous spirit of laziness relaxation permeates the holiday season, a will to do any serious and productive coding seems appallingly low. Instead of fighting it, I went along and tried to write something just for pure fun.

And guess what: it worked pretty well. Well enough, in fact, that the result seems to be worthy of public exposure – which I’m hereby doing. Behold the coded4 project:

  1. $ git clone git@github.com:jquery/jquery
  2. $ coded4 jquery

  1. name                   commits time        
  2. -------------------------------------------
  3. John Resig             1211    5d 09:11:59
  4. jeresig                503     2d 07:50:02
  5. Jörn Zaefferer         309     1d 09:05:53
  6. Brandon Aaron          247     1d 01:34:29
  7. Dave Methvin           235     22:17:27    
  8. jaubourg               221     22:22:31    
  9. timmywil               221     23:00:56    
  10. Ariel Flesler          200     22:10:46    
  11.  
  12. ...

What exactly is this thing? It’s a “time sheet” for the marvelous jQuery project, reconstructed from commit timestamps in its Git repository. coded4 created this result by analyzing repo’s history, grouping changesets by contributors, and running some heuristics to approximate timespans of their coding sessions.

And of course, this can be done for any Git (or Hg) repository. Pretty neat for a mere *pops up shell and types coded4 .* 3 hours of casual coding, eh?

Terminologia rozproszonych systemów kontroli wersji

2011-06-06 19:31

W ostatnich latach rozproszone systemy kontroli wersji (w skrócie DVCS) stały się popularne, zwłaszcza w środowiskach związanych z open source. Warto je jednak znać nie tylko wtedy, gdy pracujemy nad projektami z otwartym źródłem. Jak bowiem pisałem wcześniej, mogą być one przydatne chociażby do małych jednoosobowych projektów. Ponadto bywają nierzadko używane w większych oraz mniejszych firmach. Ogólnie mamy więc duże szanse, aby prędzej lub później zetknąć się z takimi systemami.
Ten post jest adresowany przede wszystkim do osób, które nie miały wcześniej dłuższej styczności rozproszonymi systemami kontroli wersji. Ma on na celu wyjaśnienie wielu spośród tych tajemniczo brzmiących słówek, na jakie nieustannie można się natknąć, gdy pracujemy z DVCS-ami. Istnieje oczywiście szansa, że bardziej zaawansowani użytkownicy również wyciągną nowe wiadomości z lektury (albo chociaż przejrzenia) niniejszej notki. A jeśli nawet nie, to powtórzenie znanej sobie wiedzy na pewno im nie zaszkodzi ;)

Tags: , ,
Author: Xion, posted under Applications, Programming » 3 comments
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.