Frequent code checkpointing with git

Oct 20 2007

A nice feature of distributed version control is that you can commit into your repository more often because you aren't impacting others with each commit. Recently I've been taking this to the extreme using git and performing a commit almost at every save. I have an emacs key wired for the job: it save the buffers and then runs 'git commit -a -m "checkpoint"'. During a coding session I hammer it frequently.

I've found this approach particularly useful when I'm programming with unfamiliar tools (as I am at the moment with factor). I tend to re-write stuff a lot and occasionally change my mind mid-rewrite, wanting some old code back. Using git in the above manner makes it painless to checkpoint my work continuously and pull back changes from previous revisions when appropriate.

However there's a problem when I want to share my repository with others: Nobody else is interested in the 100s of incremental checkpoint commits with no commit messages; they want to see commits in functional units which tally with the changelog. To make things more legable for them I need to be able to roll up the checkpoints into functional commits with full commit messages.

Originally I assumed that the best way to do this was to have two branches: one for checkpoint commits whilst developing, and a second public one for containing the larger functional commits.

o--o--o--o--o--o--o--o--o  < --- checkpoint (dev) branch
  /  ___/ ______/
 /  /  __/
|  |  |
v  v  v
o--o--o                    <--- public branch ('proper' commits)

I tried various approaches to merging multiple commits from the checkpoint branch into single commits in the public branch, but couldn't find anything that worked.

I think the main problem with the 2 branch idea is that once you've rolled up the commits from one branch into a single commit in the other (e.g. with git merge --squash), they no longer match and so the branches don't have a recent common ancestor. This means git is unable to track the histories by checksum and so each subsequent merge results in conflicts that must be resolved by hand.

I found the easiest way to get round this was to just copy the content over from one branch to the other rather than merging it. This could be done either by creating patches or by simply checking out the contents of one into the other with 'git checkout branch .' (i.e. checkout branch <path>). This removes the need to resolve conflicts, but the branches still don't share common ancestors. Ultimately you have to manage the branches separately - for example you have to pull external changesets into each branch individually.

In the end the best way turned out to be to dispense with the 2nd public branch all together and just operate in one. My method is as follows:

I tag the most recent public commit in the branch, and then perform lots of checkpoint commits as I code. When I'm ready to roll up the checkpoint commits into the next 'proper' commit I go back to the previous public commit with:

% git checkout public

Then, assuming master is the current branch I checkout the contents of the HEAD of the branch (i.e. all the checkpointed commits) into the working directory, but without moving the index:

% git checkout master .

Then I move the HEAD of the master branch to this point. I do this by deleting and recreating the branch again:

% git branch -D master
% git checkout -b master

Finally I commit the changes and tag this as the new latest public commit:

- git commit -a     
- git tag -f public

And that's it. Now I haven't been using this technique for long, so there's a good chance that something might trip me up in the future - If anybody can see a problem with this (or a better way) then I'd really appreciate a comment.