My recent work on protest has rekindled latent interest in bicyclerepairman - my (aging) python refactoring toolkit project. BRM has a number of useful facilities for parsing and querying python source trees, and I think this functionality is just the sort of thing Protest needs for inter-package dependency tracking.

The problem is that these facilities are heavily tangled into the bicyclerepairman codebase and are proving tricky to prise out. The main reason for this is state and singletons.

In the early days BRM worked by constructing a big up-front abstract syntax tree model of the code from which to do its code traversal and manipulations. This central tree was maintained by BRM throughout the refactoring/development session. The 'big-AST' design was informed at the time by the then state-of-the-art refactoring toolkits from the Smalltalk and Java worlds, but the approach turned out to be a dead end: Python is too dynamic to be able to do this sort of up-front code inspection accurately. And building the AST was slooooowww...

Later I migrated the design to a completely different approach which involved dynamically inspecting the code base. This design didn't physically require a big stateful tree to be maintained, but the legacy of long-term state left its hooks in the design well past its sell-by-date. Migrating the design over a period of time meant leaving the illusion of a central AST (so that the old code still worked), but now generated dynamically on the fly in a 'lazy' fashion. Unfortunately this notion of centralized structure still provided implicit hooks for hidden global state (like a common 'pythonpath').

As an aside: I always find it interesting when I come back to code I've written a few years proviously - if you'd asked me to list the things I'd learnt about programming between then and now I probably wouldn't have be able to come up with anything tangible, but looking at this I can easily spot old habits and subsequent hard learnt lessons. These days I have a strong urge to break things into small chunks, and that generally means minimising state wherever possible. I think this is informed by my coding at work where I sit in an infrastructure team where everybody has their own different programming skills and favourite languages. I write my stuff mostly in Python, whereas other team members prefer Ruby, PHP and Perl and occasionally Java. I think having a successful team with diverse language skills without stifling productivity means that one must minimise the amount of maintenance time spent on any component - essentially building applications to be replaced rather than maintained. Thus all our stuff comes in small packages; big web applications and workflows end up being lots of smaller components knitted together with HTTP, RSS and databases.

Anyway, back to the main picture: I spent a few hours yesterday refactoring the central state out of bicyclerepairman. I've been off work recently with food poisoning and was feeling pretty weak so a hot cup of tea and a braindead refactoring session sounded like a palateable idea. Actually it turned out to be a game of wits lasting many hours. Teasing out the global state, propping up the old algorithms with stubs and hacks, removing them one by one.

Going through this exercise reminded me of a couple of things: 1) Unittests are the only way you can do this sort of thing productively. To go fast you need to make big educated bets about how things work and then be cavilier about changing and simplifying them: A unittest framework tells you when you're wrong, and sooner rather than later.

2) It reminded me why emacs is the king of editors. I've been doubting this a bit recently, but I think this would have taken twice as long without emacs' powerful keyboard macro support. Somebody ought to do an emacs screencast.

Anyway, the code is now clean. Although the external API is the same, internally BRM is a loosly coupled lean machine ripe for further beautification and cleanup. It should be trivial now to use the BRM query and parsing functionality within the protest project. In return, I hope protest will repay bicyclerepairman in the form of testable documentation and a new lease of life.

The moral of the story: I'm now more convinced than ever that singletons and 'central' state are the root of all programming evil. The last 15 years in IT has demonstrated numerous general trends where moving from the central to the decentralized as created a network effect of value - I'm not sure code is any different.