Somebody does a (better?) bicyclerepairman

I don't use python that much any more for my own projects, and so it shouldn't be any surprise to me that I missed one of the most exciting developments:

Somebody else has done a python refactoring library!

Rope looks tres cool. The problem with bicyclerepairman was that after it did just about enough to allow me to develop code efficiently I lost interest and concentrated more on other projects. At first glance Rope looks considerably more complete and polished than my efforts, which is excellent news for the python community. Hurrah!!

Spread Betting

Over the last few years I’ve read a number of books on stock trading but despite this I never really felt compelled to risk any of my own money on the markets.

Then in October I discovered that spread betting offers a cheap way to test the water on a small budget. Spread betting on markets is treated as gambling by uk laws and so there’s no stamp duty or capital gains tax to pay. Also the competition between firms is enough that pretty much all of them provide £1 per point betting on the major stocks and indices. Competition keeps the spreads narrow too - usually less than 1%.

Finally a lot of the firms offer a period of training where for a few weeks you can put on bets at 10 or 20p a point with guaranteed stops, which means it’s feasible to be putting £3-£6 total risk on each trade, especially for stocks priced under 500p.

I got started with IGIndex, which provides an impressive range of uk stocks including all of the ftse 100 and 250. The online web software is really very good: the trading platform is totally ajax using yahoo YUI libraries. Unfortunately the charting stuff is a java applet, but that appears to be par for the course with spread betting platforms.

I funded the account with £100 of risk capital and started betting. As an employee of an investment bank I need to get clearance for each stock from our compliance dept, but this turned out to be not as onerous as it sounds. Anyway, I had the misfortune of my first trade in october winning big: a £6 short bet on the ftse100 index at 10p a point. I trailed a stop and made £30 before the trade was stopped out. 30% appreciation in capital in one trade!

Of course this was pure fluke but it made me over confident and I quickly lost almost everything over the next 3 weeks: I got down from £130 to about £35. This was exactly what I needed: in my opinion you want to lose big during the training period when your capital exposure is small. During this first month I learnt a bunch of things:

  • Money management and risk management are really important!

    I suspect this is more important than stock picking skills. Trading is a probabilistic thing and so you have to expect to lose a (maybe large) percentage of trades. For example I currently get stopped out on approximately two thirds of my trades. Apportioning risk capital between trades and using calculated stops ensures that a run of bad trades doesn’t destroy your capital account and allows you to stop and re-consider.

  • I was confusing short term oscillations with trends

    I did this quite a bit in the first few trades, and the lesson is: always look at longer term graphs for overall patterns. My trades usually last a couple of weeks when they don’t get stopped out. This means I need to look at charts over a year to check for trends.

  • Plan the trade, trade the plan

    I’ve discovered that trading can be a stressful affair and it’s easy to make silly mistakes in front of the 5 minute charts. I suspect this is because for a beginner like me there’s a big perception of time pressure, and also because you’re always in a position to do something with your stocks which makes it difficult look away.

    To mitigate the background stress and the opportunity for mistakes I’ve found it much better to make plans when the markets are closed (e.g. in the evening or weekend) and not in my lunch break. This is one reason I prefer to trade uk stocks which trade during the day rather than e.g. forex or commodities which are trading all the time.

    For the stocks I have open at a point in time I make notes about what I expect to happen and what to do if it does or doesn’t happen. IG index offers an SMS alert service which I use to alert me of price breaks. The alerts and the written plan keep me from worrying about my stocks while I’m working.

  • record everything

    This is super important! I keep a spreadsheet of trades, a diary of notes about each entry/exit and screenshots of the intraday charts. This was especially important during the early weeks as I was able to learn from the many beginner mistakes I made. A couple of months later and I’m still making plenty of mistakes that I can learn from and I don’t expect that to change any time soon.

I’m out of the training period at IGIndex now and so am betting £1 a point on uk stocks and I’m currently apportioning a maximum risk of £20 per trade. I’m starting to become more consistent and confident but I’m wary that it’s difficult to tell at this early stage whether this is more luck or skill. If you’re thinking about trying your hand at trading I’d definitely recommend spread betting as a first step before committing any real money to the markets.

Disqus comments

Last night I switched to using the Disqus service for comments on this blog and I must say I’m pretty impressed so far.

The reason for the switch is that I’m just not staying on top of wordpress upgrades and tweaks and so ideally I’d like to replace my current setup with something much simpler, probably a static html blog.

Also I recently got a cheap 64mb vps from vps villiage and I’d like to move my blog to it. Running a static html blog will be much less memory hungry and considerably less vulnerable to attack than running wordpress, php, mysql et al.

I figured that if I outsourced the comments to a dynamic comment provider then the rest of the site can be made static relatively easy. I won’t need to worry about user authentication, spam, openid or any of the other things that don’t quite work perfectly on my blog.

Disqus was the first I tried and so far it’s been pretty amazing: Using the disqus wordpress plugin you can actually import all of your existing wordpress comments into it and it keeps both sides in sync, so comments made through disqus also get added to your wordpress database. Also it looks like it would be pretty easy to back out without losing comments if I find something wrong with Disqus - sweet!

Really simple html templating in factor

I'm currently building an early release of the webapp database project I've been working on in my spare time. The app is predictably written in factor which has a neat deployment mechanism: you create an image file with just the compiled code that you need to run the app and then ship that for each platform.

This is a nice solution, except I found that using the built in 'fhtml' templating language in factor requires the whole parser-compiler runtime which ramps up the image footprint of my app from 3MB to about 20MB - D'oh!

So instead I've written a very simple web templating language which doesn't require the factor parser at all:


DEFER: parse-text

: display-variable ( elem -- )
  dup callable? [ call ] [ write ] if ;

: parse-variable ( -- )
  "}" read-until [ get display-variable parse-text ]
                 [ drop "missing closing }" throw ] if ;

: maybe-variable ( -- )
  read1 dup CHAR: { = [ drop parse-variable ] [ write1 parse-text ] if ;

: parse-text ( -- )
  "$" read-until [ write maybe-variable ] [ write ] if ;

: eval-template ( path -- )
  utf8 <file-reader> [ parse-text ] with-input-stream ;

To run it you just call 'eval-template' with the path to the template file.

The syntax for variables within the template is similar to that of korn shell scripts: e.g. '${MYVAR}'. On processing the template, variable strings are just looked up in the current factor namespace. If the result is a quotation it gets invoked, otherwise the value of the variable is output as a string.

The namespace thing is cool because in factor you can 'bind' a hashtable to the top of the namespace stack before invoking eval-template, so this removes the need to pass tables around and makes the lookup code very neat.

You'll also notice that the template input is bound to stdin using the with-input-stream word prior to processing, which means I also don't have to pass the template stream around. Similarly the code expects the output stream to be bound so there's no passing that around either. Dynamic variables are a really neat factor feature and it alludes me as to why they don't appear in more mainstream languages.

Finally, something I've learnt about html templating languages over the last couple of years:

Html templates are best when the html boilerplate is clean and free of control statements (loops, conditionals), but they begin to suck as the page gets sophisticated about its rendering; usually when you hit tables and nested lists.

However the opposite seems to be true of html DSLs in lisp and factor: they're great when there's plenty of logic but become overkill when you're just doing mostly static html, especially header and footer boilerplate.

The best compromise I've found is to drive the page from the html template, but then call into real code to render tables, lists and conditional content. This also has the nice side effect of making the template language trivial.

Searching arrays in X86 assembler with a bloom filter pt 3

Continuing on from yesterday evening, I had a bit of time tonight in front of the telly so I implemented the rest of the fast-search functionality in factor using the assembler code from the previous post.

The first step was to create the simple bloom filter for sending to the assember code. To keep the assembler fast and simple I'm just setting one bit for each element in the input sequence. Also I hardcoded the size of the filter for simplicity (more later).

Factor has a bit-array type that backs onto a block of memory so writing 'prepare-filter' was easy:


: set-bit ( n filter -- ) t -rot set-nth ; inline

: (prepare-filter) ( filter seq -- )
  [ [ 1048576 mod ] dip set-bit ] curry each ; 

: prepare-filter ( seq -- filter )
  1048576 <bit-array> [ (prepare-filter) ] keep ;

I also needed some code to filter out false positives found by the assembler filter. To do this I pre-create a hash from the input set and then pass each result to a 'collect-element-if-correct' word:


: seq>hash ( seq -- hash ) [ dup ] H{ } map>assoc ; inline

: check-elt ( ptr i -- v/f )
  alien-unsigned-cell itemshash get at ;

: collect-element-if-correct ( alien i -- )
  tuck check-elt [ 2array , ] [ drop ] if* ;

Incidently, somebody left a comment on my last post saying that the bt (bit-test) instruction was a performance hog when testing directly against memory, so I modified the code to load the relevant bits into a register and bt against that instead. He was right - on the simple test I got a speedup from 170ms to around 105ms! I replaced the single bt instruction from the last post with the following:


  tmpreg "val" operand MOV
  tmpreg 5 SHR                    ! tmp contains dword offset into filter
  tmpreg 2 SHL                    ! clear bottom 2 bits and make it a byte offset
  tmpreg "filter" operand tmpreg [+] MOV   ! tmp contains filter dword
  "val" operand tmpreg BT

which disassembled with gas looks like this:


mov    %edx,%edi
shr    $0x5,%edi
shl    $0x2,%edi
mov    (%eax,%edi,1),%edi
bt     %edx,%edi

I'd actually intended on using bit shifts to eliminate the BT instruction altogether but it seems bitshifts can only take immediate operands in X86 assembler so that means I can't dynamically shift according to the contents of a register. There's probably another way - are there any x86 assembler experts out there with hints?

The only other performance affecting variable was the size of the bitarray used as the filter. This was where I was expecting to come unstuck: I sometimes have 1000s of elements in the input set and so would need a pretty big bitarray to keep the number of false positives low enough to get good performance. On the other hand I expected a large bitarray would hammer the cpu caches (as access is pretty random) and kill performance.

As it happens the performance on a 1Kbit filter seems to be about the same as one 4Mbits wide (i.e. half a megabyte). I'm not sure I fully understand why is as I'm pretty sure the L1 data cache on a core2duo is 32kB. Probably all the action takes place in L2 cache.

Anyway, I'm getting ~107ms to check an entire 48MB array (12M elements) for matches against a set of a 2836 elements using a 1Mb filter on a CPU pegged to 800mhz. At 1800mhz it takes about 48ms. For my requirements that pretty much validates the idea and the approach.

Searching arrays in X86 assembler with a bloom filter pt 2

This is a continuation of the previous post

Thanks to the commenters who suggested using other datastructures rather than an unordered array. The reason I'm going for linear search is that it allows the searching of an index sorted for other purposes. The fewer indexes I use, the less data has to be in memory and the larger datasets I can work with. In short, fast-enough linear searching means more data in memory.

I'm pretty sure I'm not going to get near the 100ms at 800mhz I was aiming for yesterday. I wrote the assembler code this evening and did some preliminary tests.

The idea is that I load each element in the array, mod the value to the size of the bloom filter and then see if the corresponding bit is set in the filter and escape out if so. X86 assembler has a BT instruction that tests if an individual bit in memory is set. That's got to be faster than doing all the loading ANDing and bitshifting myself.

Unfortunately the factor X86 assembler doesn't support the BT instruction since it's just designed for generating machinecode for the compiler. A quick look at the intel manuals gave me the opcodes I needed to add it though:


: BT ( operand operand -- ) { HEX: 0F HEX: A3 } 2-operand ;

The assembler loop that does the work looks like this in factor's X86 DSL:


: %idx-matching-filter ( -- )
  "loop" define-label "end" define-label

  "filter" get dup %unbox-byte-array               ! unbox input params
  "ptr" get dup %unbox-alien
  "i" operand %untag-fixnum
  "val" operand %untag-fixnum        ! val contains length at this point

  ds-reg [] "val" operand MOV        ! free up val register for tmp use

"loop" resolve-label                      
  "val" operand "ptr" operand "i" operand [+] MOV  
  "val" operand 1023 AND   ! val mod 1024 
  "val" operand "filter" operand [] BT
  "end" get JB

  "i" operand 4 ADD
  "i" operand ds-reg [] CMP
  "loop" get JNE

"end" resolve-label
   "i" operand %tag-fixnum
;

Notice that it's full of '"foo" operand' clauses - factor maps these to registers (e.g. EAX, EBX etc..) according to function parameters. This is really handy and in this case saves doing the stack->register mapping manually. For completeness, here's the gas dissassembly which includes unpacking registers from the stack and unboxing integers:


( scratchpad ) \ idx-matching-filter disassemble
[Thread debugging using libthread_db enabled]
[New Thread 0xb79dc6c0 (LWP 26068)]
[New Thread 0xb12f1b90 (LWP 26069)]
0xb7f18410 in __kernel_vsyscall ()
Dump of assembler code from 0xb17028e0 to 0xb1702927:
0xb17028e0: mov    $0xb17028e0,%ebx
0xb17028e5: mov    -0xc(%esi),%eax
0xb17028e8: mov    -0x8(%esi),%ecx
0xb17028eb: mov    -0x4(%esi),%edx
0xb17028ee: mov    (%esi),%ebp
0xb17028f0: lea    0x5(%eax),%eax
0xb17028f3: mov    0x9(%ecx),%ecx
0xb17028f6: sar    $0x3,%edx
0xb17028f9: sar    $0x3,%ebp
0xb17028fc: mov    %ebp,(%esi)
0xb17028fe: mov    (%ecx,%edx,1),%ebp
0xb1702901: and    $0x3ff,%ebp
0xb1702907: bt     %ebp,(%eax)
0xb170290a: jb     0xb170291b
0xb1702910: add    $0x4,%edx
0xb1702913: cmp    (%esi),%edx
0xb1702915: jne    0xb17028fe
0xb170291b: shl    $0x3,%edx
0xb170291e: mov    %edx,-0xc(%esi)
0xb1702921: sub    $0xc,%esi
0xb1702924: ret    

I've got to go to bed soon so I just tried the best case of having an empty filter (and thus no matches): I was able to get 171ms for the 48MB array on a processor pegged at 800mhz, which isn't the sub-100ms I was hoping for but is still pretty good. I'll maybe experiment with real data and filter sizes tomorrow if I get any spare time in the evening.

Searching arrays in X86 assembler with a bloom filter

I'm currently writing what amounts to an olap database in factor in my spare time, which is a pretty interesting project. Actually spare time is limited now that I have a child but factor is affording me a pretty good productivity compared to python and scheme so it's probably net-par compared to what I was doing a couple of years ago. I should probably write a bit more about that some time.

Anyway I recently made the bizarro discovery that in the world of factor for ease of deployment it makes more sense to write low level code in assembler rather than C. This is because factor effectively has a built in macro assembler DSL for each of the chipsets it supports, whereas shipping C code requires having a C compiler setup on the target machine which is a pita for windows platforms. Also X86 is fast becoming the only server processor that matters. Crazy world.

Ideally I'd not have to write any low-level code at all and I'd do it all in factor. To be honest factor is pretty bloody fast, and could conceivably compete well with raw assembler in the not-too-distant future given enough tweaking of the code generator. For the time being though C and assembler rule the performance roost in this environment.

Ok, so the interesting problem I have is:

I have a set of 32bit integers, and I want to search a large unsorted 32bit array accumulating the array indices of values matching those from the set. To solve this in factor I've been putting the input set in a hashtable and then sequentially looking up each element from the array, which algorithmically is pretty optimal O(n+m) but the ever present constant is pretty high. This takes about 4 seconds on a processor pegged at 800mhz for a 48MB array.

Ideally I want this performance down to under 100ms. This is going to be very tight because a 48mb file contains ~12M elements, so if I'm aiming for 100ms that means about 6 cycles per element on an 800mhz chip. Probably a bit too ambitious.

I'm thinking I'll create a bloom filter with a simple hashing function (like e.g. 'mod' ) and iterate the large array in assembler checking the filter for each element and then escaping back to factor to confirm and accumulate matches.

Incidentally, I've noticed that my motivation for writing about my programming ideas drops after I've implemented them, so this time I thought I'd experiment with using my blog to think through the idea before I attempt it. Can anybody see a better way to solve this problem given the constraints? I'll post again with the results...

Nesting REPLs in factor

Factor ships with a gui ide, but I prefer to use the command line listener REPL within an emacs buffer for my coding. The advantage is that it's more tightly integrated with emacs - e.g. I can send blocks of code to the repl with keypresses. The downside is the lack of debugger.

Today I discovered that factor REPLs can be nested, rather like creating subshells in unix, using the 'listener' word. This is really handy because it means you can drop into a listener at any point in your code. E.g.

"mydir" [ listener ] with-directory

...opens a nested repl with "mydir" being the current working directory. Exit the repl with 'bye' (or ctrl-d) and execution returns to the parent listener in the parent directory.

This trick can be employed from any point so it's really handy for debugging state-machine style nested code. Here's a trivial example to illustrate the point:

3 [ listener ] map
  --- starts a new listener ---
drop "hello" bye
  --- starts a new listener ---
bye
  --- starts a new listener ---
drop "goodbye" bye
  --- back to the parent ---
.
==> { "hello" 1 "goodbye" }

Cool.

Grown up shaving

I shave every day and recently I've gotten tired of pissing money away on expensive mach3 blades. After dabbling a bit with cheap disposables without much joy I hit the internet for some shaving advice. A bit of research threw up the 'double edge' safety razor option and I decided to give it a go, buying a Merkur Hefty Classic (HD) which I understand is a good DE razor for beginners. (voted best safety razor for beginners in some shaving forum or other).

I shave in the shower without a mirror (comfort is more important to me than shave quality) and so was a little tentative after reading about nicks and cuts associated with DE razor shaving. Also my posh shave cream and brush hadn't arrived yet so I was using baby oil and el-crappo gel-in-a-can.

As it turned out the shave was easy and comfortable which was a bit of a surprise. The heavy razor head and sharp blade means you don't need to apply pressure, and the top of the razor is rounded so you can vary the cutting angle starting off with it barely touching your stubble and then varying as you get more confident.

So even after one shave I think I can recommend this to other people looking for a decent but much cheaper shave than Mach3. I'm not sure how long the blades last yet but you do get two edges per blade and they can work out really cheap even high-end ones. Maybe I'll post an update in a few days once I've got my cream and brush.