Frequent code checkpointing with git

A nice feature of distributed version control is that you can commit into your repository more often because you aren't impacting others with each commit. Recently I've been taking this to the extreme using git and performing a commit almost at every save. I have an emacs key wired for the job: it save the buffers and then runs 'git commit -a -m "checkpoint"'. During a coding session I hammer it frequently.

I've found this approach particularly useful when I'm programming with unfamiliar tools (as I am at the moment with factor). I tend to re-write stuff a lot and occasionally change my mind mid-rewrite, wanting some old code back. Using git in the above manner makes it painless to checkpoint my work continuously and pull back changes from previous revisions when appropriate.

However there's a problem when I want to share my repository with others: Nobody else is interested in the 100s of incremental checkpoint commits with no commit messages; they want to see commits in functional units which tally with the changelog. To make things more legable for them I need to be able to roll up the checkpoints into functional commits with full commit messages.

Originally I assumed that the best way to do this was to have two branches: one for checkpoint commits whilst developing, and a second public one for containing the larger functional commits.

o--o--o--o--o--o--o--o--o  < --- checkpoint (dev) branch
  /  ___/ ______/
 /  /  __/
|  |  |
v  v  v
o--o--o                    <--- public branch ('proper' commits)

I tried various approaches to merging multiple commits from the checkpoint branch into single commits in the public branch, but couldn't find anything that worked.

I think the main problem with the 2 branch idea is that once you've rolled up the commits from one branch into a single commit in the other (e.g. with git merge --squash), they no longer match and so the branches don't have a recent common ancestor. This means git is unable to track the histories by checksum and so each subsequent merge results in conflicts that must be resolved by hand.

I found the easiest way to get round this was to just copy the content over from one branch to the other rather than merging it. This could be done either by creating patches or by simply checking out the contents of one into the other with 'git checkout branch .' (i.e. checkout branch <path>). This removes the need to resolve conflicts, but the branches still don't share common ancestors. Ultimately you have to manage the branches separately - for example you have to pull external changesets into each branch individually.

In the end the best way turned out to be to dispense with the 2nd public branch all together and just operate in one. My method is as follows:

I tag the most recent public commit in the branch, and then perform lots of checkpoint commits as I code. When I'm ready to roll up the checkpoint commits into the next 'proper' commit I go back to the previous public commit with:

% git checkout public 
Then, assuming master is the current branch I checkout the contents of the HEAD of the branch (i.e. all the checkpointed commits) into the working directory, but without moving the index:

% git checkout master .   

Then I move the HEAD of the master branch to this point. I do this by deleting and recreating the branch again:

% git branch -D master
% git checkout -b master

Finally I commit the changes and tag this as the new latest public commit:

- git commit -a     
- git tag -f public 

And that's it. Now I haven't been using this technique for long, so there's a good chance that something might trip me up in the future - If anybody can see a problem with this (or a better way) then I'd really appreciate a comment.

Bzr Vs Mercurial (again)

Ok - for me it boils down to: hg's speed vs bzr's renames

Mercurial is much much quicker. That's not an imperical measurement using a large source tree - it's an anecdotal observation on a tiny one. For example, just typing 'hg' returns in 80ms, 'bzr' takes more than half a second: That's just to dump the help text. Actual commits, logs etc.. appear to take similar order-of-magnitude differences in time.

This shouldn't matter to me; with a small source repository its just a small user experience thing, but for some reason it niggles. Actually I can't help thinking this is an easy fix. Maybe I'll crank up the profiler tonight if I have a spare moment.

On the other hand, afaics mercurial doesn't do directory renames. If I move my source directory structure about I lose the per-file history. That sucks, especially for new projects that haven't quite got their file structures sorted yet.

BazaarNG and Mercurial and Git

I've been using bazaarNG (bzr) for bicyclerepairman version control recently, but I've also got a close eye on Mercurial(hg) and Git.

Git is Linus' implementation of a distributed SCM tool (sort of) for use with managing the decentralized development of the linux kernel, Mercurial is a project started at much the same time as Linux steered away from the commercial bitkeeper.

Here's the differences according to my very limited experience:

  • Both mercurial and git feel more snappy and responsive than bzr. The hg command returns immediately, most operations are O(1) or O(files).
  • Bzr and Mercurial are x-platform and work on windows. Git only works on unix (slowly on cygwin apparently).
  • Bzr and Mercurial implement a single branch in a working directory. For multiple branches, you need multiple copies of the working directory. Git provides multiple branches in the same working directory, and you change between them with 'git checkout <branchname>'.
  • Bzr is python only, Mercurial is python with a bit of C, Git is C only. Git is currently more of a pain to build - no autoconf.
  • Tailor 0.9.21 can convert both to and from bzr and git repositories, but only to mercurial.
  • Bzr is maintained by a commercial company, which always makes me a little wary - does the development community disappear when the company goes bust?