Somebody does a (better?) bicyclerepairman

I don't use python that much any more for my own projects, and so it shouldn't be any surprise to me that I missed one of the most exciting developments:

Somebody else has done a python refactoring library!

Rope looks tres cool. The problem with bicyclerepairman was that after it did just about enough to allow me to develop code efficiently I lost interest and concentrated more on other projects. At first glance Rope looks considerably more complete and polished than my efforts, which is excellent news for the python community. Hurrah!!

Application UIs - automating the CRUD

Pretty much all the 'work' applications I've built in the last couple of years can be split into 2 parts:

1) CRUD: The database, and a UI for manipulating the data in it 2) LOGIC: The actual application functionality. (i.e. the reason for the app to exist)

These days I'm finding that the amount of actual logic/functionality I put into each app is pretty small, and so part (1) becomes a correspondingly big piece of each application. This is especially true now that we're getting used to gluing small applications together with HTTP rather than building big pieces. This appears to be a global trend - especially with web2.0 mashup buzz and lots of apps that focus on doing a single thing well.

The interesting thing is that the CRUD part is usually pretty generic. One UI fronting a bunch of crud operations on a database isn't that much different to another. Thus it makes sense to factor this out and implement the two parts as seperate physical deployments that share the same database.

Breaking the two parts out means you can choose different technologies for each one. This works really well with web applications because you can stitch the seperate deployments together into a cohesive looking site using links and a shared CSS. In my most recent app I used Django for the CRUD UI, and implemented the application logic (which didn't need much of a UI - it was an opaque service) using tomcat. In fact the actual app functionality was implemented as a single servlet.

So why Django for the CRUD UI? Well mainly because it dynamically creates a pretty sophisticated admin UI for you for free.

Aside: A few years ago I wrote a small webapp that dynamically generated a UI to edit relational database tables. Sortof similar to phpmyadmin, but it used FK information to work out relationships between entities and gave you hyperlinks and drop-down lists. This was fine, except that RDBs don't provide particularly intuitive mechanisms for dealing with complex relationships (ownership, many-many etc..), so you can only get so far using one.

Django goes further than this by moving the model definition from the database into a configuration file (well, actually a python source file but you can treat it as a config file). This allows you to augument the basic entities with relationship metadata. Django can then autogenerate tables from the config (or you can wire the model up to existing tables). However, the icing on the cake is that the config also enables you to give django UI hints for how the dynamically-generated admin GUI should look. I think this is a really powerful idea and gives Django a edge over Rails when building simple CRUD apps.

So this leads me to think that there's a killer business application here just waiting to be written: Decent dynamic RDB CRUD GUIs without code.

(yes, I know dabbledb does something like this, but I think relational DBs are key here. Both because they're well understood by 'business' developers and because they're an excellent technology for integrating data with the 'functionality' bit of an application.)

Here's some basic ideas:

  • UI for creating tables and metadata. Metadata stored in the RDB with the tables.
  • Auto generated sophisticated admin gui, a-la Django.
  • Security model that enables access control at a data level. (e.g. think unix file permissioning: if I add a row, only people in appropriate group can edit it)

It might make sense to actually implement this with django.

(N.B. I'm not intending to actually write this app - I've got my spare-time hands full with large-scale data aggregation. Hopefully somebody else will do it. LazyWeb?)

All roads lead to lisp?

I started programming seriously when I was at school. I became seduced by the idea of writing games and ended up learning 6502 and then ARM assembler. I sort of skipped the typically-british BBC BASIC introduction because I was impatient and had read in my Dad's computer magazines that real games didn't get written in basic. Assembler was the real game-programmers language.

A couple of years later I got an x86 PC and learnt 386 assembler language with all the dirty hardware-hacking tricks that it entailed. My mum characterized the process of me programming as hours of typing occasionally punctuated by a flurry of excitement and a small graphic moving across the screen. (moving very smoothly across the screen I'd always point out).

At university I learnt C and C++, and over the first two year's transitioned slowly from one to the other. The seductive thing about C was that it would interface easily with assembler, and so I started using it as a prototying language. Eventually I wasn't writing any assembler any more.

A similar thing happened a couple of years after I left university. I started using python, mainly because I could use it as a prototyping language, and it supported all the nice OO stuff I'd got used to in C++. Eventually I stopped writing stuff in C/C++.

N.B. During this time I used Java a lot at work (after all, I did end up working for BEA/Weblogic), but I never really fell in love with this language: It wasn't as fast as C/C++, and wasn't as productive to program as python. I estimated that I could turn stuff around in python about 3-4 times faster than I could in Java.

Anyway, I've been writing my stuff almost exclusively in python for the last 6 years or so and am starting to look for the next leap. All the previous moves have been ones that involve a large productivity boost. Assembler->C->python. The main reason I didn't properly switch to Ruby is that (despite the nice block syntax) I didn't seem to get a large productivity gain. The amount of code you write appears to be pretty comparible between python and ruby - certainly not the less-than-half you get between python/ruby and java.

So now I'm left with lisp. The 1950's language that all the others are slowly emulating. Most of the features that lisp introduced have now been integrated into the mainstream dynamic languages - dynamic typing/strong typing, garbage collection, conditionals/loops, first class functions, closures. Even continuations are now getting a look-in in python and ruby. However, lispers keep going on about this macros thing - writing code to write code. Now that sound's like something worth investigating...

Bzr Vs Mercurial (again)

Ok - for me it boils down to: hg's speed vs bzr's renames

Mercurial is much much quicker. That's not an imperical measurement using a large source tree - it's an anecdotal observation on a tiny one. For example, just typing 'hg' returns in 80ms, 'bzr' takes more than half a second: That's just to dump the help text. Actual commits, logs etc.. appear to take similar order-of-magnitude differences in time.

This shouldn't matter to me; with a small source repository its just a small user experience thing, but for some reason it niggles. Actually I can't help thinking this is an easy fix. Maybe I'll crank up the profiler tonight if I have a spare moment.

On the other hand, afaics mercurial doesn't do directory renames. If I move my source directory structure about I lose the per-file history. That sucks, especially for new projects that haven't quite got their file structures sorted yet.

Solving the Bicyclerepairman ‘you have to save before you query’ problem

BicycleRepairMan operates by searching and modifying python files on the filesystem, and thus has always required that you save your work before you do a query or a refactoring. I've never felt this to be a big deal before, but more recently I've been using its functionality more aggressively within emacs and I've started to see this as a bit of a pain.

Moreover, as BRM (hopefully) develops 'autocomplete' functionality IDEs are going to want to pass partially completed (unsaved) code to BRM. I originally thought this problem could be solved by passing a copy of the unsaved buffer through to BRM, however this proved to be more tricky than I thought - the pymacs python bridge for emacs doesn't cope well with large chunks of unescaped text and even if I fix that I can't expect that other IDEs will be problem free.

The best solution came in the form of emacs 'autosave' files: For those not familiar: emacs periodically saves the contents of unsaved buffers into temporary files (just in case the power goes or something). The filenames are the same as the original filename but prefixed with a # or a dot. All I had to do was make emacs auto-save all the modified buffers prior to the query, and then have BRM load these files if they existed and were newer than the 'real' python ones. I can't think of any reason why this shouldn't work with other IDEs - any ideas?

Ruby and Python

I've been playing with Ruby a bit recently. The big question: is the language better than python ? For me it comes down to a punchup between 2 killer features: Ruby's blocks vs Python's whitespace-indent magic.

Ruby's smalltalk style blocks are great - they neatly support looping, closures and resource management idioms in one simple powerful construct. Much better than the 3 or 4 python alternatives required to support them (for, with, def etc..).

On the other hand, Ruby's superflous 'end' statements everywhere make me balk. As Tim Bray mentioned on a recent podcast: Python is just right about the whitespace thing. The problem is that I've been spoilt by python's killer indents-seperate-blocks for too long to want to go back now.

Will I convert in the end? Hmm... dunno.

BazaarNG and Mercurial and Git

I've been using bazaarNG (bzr) for bicyclerepairman version control recently, but I've also got a close eye on Mercurial(hg) and Git.

Git is Linus' implementation of a distributed SCM tool (sort of) for use with managing the decentralized development of the linux kernel, Mercurial is a project started at much the same time as Linux steered away from the commercial bitkeeper.

Here's the differences according to my very limited experience:

  • Both mercurial and git feel more snappy and responsive than bzr. The hg command returns immediately, most operations are O(1) or O(files).
  • Bzr and Mercurial are x-platform and work on windows. Git only works on unix (slowly on cygwin apparently).
  • Bzr and Mercurial implement a single branch in a working directory. For multiple branches, you need multiple copies of the working directory. Git provides multiple branches in the same working directory, and you change between them with 'git checkout <branchname>'.
  • Bzr is python only, Mercurial is python with a bit of C, Git is C only. Git is currently more of a pain to build - no autoconf.
  • Tailor 0.9.21 can convert both to and from bzr and git repositories, but only to mercurial.
  • Bzr is maintained by a commercial company, which always makes me a little wary - does the development community disappear when the company goes bust?

BicycleRepairMan performance tricks: Masking strings and comments in the source

I didn't have a weblog when I originally wrote the bulk of the bicyclerepairman querying and refactoring functionality, which I think is a shame because it meant that the design decisions never really got documented. In an attempt to rectify this I'm (hopefully) going to sling these up onto my weblog now in the hope that they'll be of use/interest to somebody.

Basically, BRM's parsing and querying engine is stateless - it parses the code fresh off of the filesystem for each query or refactoring. AFAIK this is in contrast to other refactoring engines (for other languages) which build up an upfront detailed model of the source code and operate on that*. Perhaps the most surprising thing is the speed in which it's able to do this - especially if you've ever used the python compiler package which parses at a snails pace.

The key to BRM's query speed is in careful leverage of the re regular expression module; basically BRM text-searches the code first to narrow down the search space before embarking on detailed parsing of the interesting sections. In order to do this accurately BRM employs a simple but effective technique - it masks out the contents of strings and comments with '*' characters. This means that occurances of keywords and variable names in strings and comments won't be found by a regex search.

I've pasted the code of the 'maskStringsAndComments()' function below for reference (it's currently part of the bike.parsing.parserutils module).


import re
import string

escapedQuotesRE = re.compile(r"(\\\\|\\\"|\\\')")

stringsAndCommentsRE =  \
      re.compile("(\"\"\".*?\"\"\"|'''.*?'''|\"[^\"]*\"|\'[^\']*\'|#.*?\n)", re.DOTALL)

allchars = string.maketrans("", "")
allcharsExceptNewline = allchars[: allchars.index('\n')]+allchars[allchars.index('\n')+1:]
allcharsExceptNewlineTranstable = string.maketrans(allcharsExceptNewline, '*'*len(allcharsExceptNewline))


# replaces all chars in a string or a comment with * (except newlines).
# this ensures that text searches don't mistake comments for keywords, and that all
# matches are in the same line/comment as the original
def maskStringsAndComments(src):
    src = escapedQuotesRE.sub("**", src)
    allstrings = stringsAndCommentsRE.split(src)
    # every odd element is a string or comment
    for i in xrange(1, len(allstrings), 2):
        if allstrings[i].startswith("'''")or allstrings[i].startswith('"""'):
            allstrings[i] = allstrings[i][:3]+ \
                           allstrings[i][3:-3].translate(allcharsExceptNewlineTranstable)+ \
                           allstrings[i][-3:]
        else:
            allstrings[i] = allstrings[i][0]+ \
                           allstrings[i][1:-1].translate(allcharsExceptNewlineTranstable)+ \
                           allstrings[i][-1]

    return "".join(allstrings)

  • N.B. BRM used to work in that way in the early days, but I found that building a model was far to slow (taking minutes to build), and maintaining the model was cumbersome

Decentralized Version Control

Have been experimenting with BazaarNG, a decentralized version control system. I find decentralized source control intriguing mainly I've been using CVS for so long that it's odd not to have a central coordinating server. Anyway, I've converted the bicyclerepairman CVS tree into a Bazaar branch using Tailor, so you could say I'm sort of committed now.

A benefit of decentralized version control is low barrier to entry - you don't need to setup and manage a central server, so getting something version controlled is just a case of doing 'bzr init' in the root directory of the project you want versioned. This creates a local branch - note that the working copy is the branch - there's no checking out of a seperate copy to develop on. Now I never got round to creating a subversion repo for my tagtriples stuff mainly because the overhead didn't seem worth it for a one man project. I suspect I would have created a bzr one from the word go and that would have had a number of advantages.

From an opensource perspective the interesting thing is that everyone effectively has their own branch by default. They can publish this branch on the web by just sticking it somewhere (e.g. using rsync to keep it up to date) and then merge and cherry pick updates from other branches trivially - the powerful decentralized tracking algorithms carefully track the provenance and history of each changeset.

The appealing thing about this for me is the ease at which people can join in project development. With CVS or Subversion the project maintainer must approve somebody and give them write access to the source repository for versioned development to happen. As a developer this usually requires creating some un-versioned patches here and there to prove your worth. With Bazaar you just create your own branch from somebody else's public one and away you go - do some versioned changes and then mail the maintainer pointing them at your branch.