Feed on
Posts
Comments

I found an excellent and short introductory tutorial pdf on principal components analysis (PCA). It provides a good overview of the following concepts in a particularly intuitive manner:

1) Mean Average
2) Standard Deviation
3) Variance
4) Co-Variance
5) Matrix transformations
6) Eigenvectors & EigenValues
7) Principal Component Analysis

Unfortunately I found the eigenvectors bit a bit heavy going. Luckily the wikipedia page for eigenvectors has a fantastic illustration on the right that gave me an instant feel of what was happening.

I program in Python, Javascript and Factor on a roughly daily basis. My experience is that I can write functions/methods quicker in Python and Javascript than I can in Factor, but that my Factor code ends up being of considerably higher quality. By higher quality I mean that it’s better factored and easier to pull apart and change. In this post I’m making the claim that factor forces me to write better code, and I’m going to illustrate this with an example.

(I also use perl, ruby, scheme and java, but not nearly as often)

I’ve recently been writing a trading simulator in my spare time so that I can test my trading ideas on historical data. As part of this project I’ve written some of the same functionality in both javascript and factor and this experience gave me a good basis from which to compare the languages.

The example I’m going to use to illustrate the comparison is: Coding a simple moving average (SMA) function.

A simple moving average involves stepping along an array of numbers, generating at each step the average (mean) of the last p elements of the sequence (where p is the period). The output of the function is the sequence/array of averages.

E.g. an sma with period 4 on a six element array:

sma([0,1,2,3,4,5],4) => [0,0,0,1.5,2.5,3.5]

(I padded the start of the array with zeros in the javascript version)

For the javascript implementation I built SMA as two nested ‘for’ loops, with the inner loop summing the last n elements at each turn. This isn’t the most efficient way of computing a moving average, but it is what I thought of and implemented first:


function sma (arr,period) {
    var out = [];
    // fill initial space with zeros
    for(var i=0;i<period -1;i++) { out.push(0);}
    // fill rest with averages
    for (var i=period-1; i<arr.length;i++) {
        var sum = 0;
        for (var j=i-(period-1); j<=i; j++){
           sum += arr[j];
        }
        out.push(sum / period) ;
    }
    return out;
}

When I went to code the Factor version the idea of coding up nested loops made my head hurt. Factor’s stack based approach effectively means serial access to state - you have to shuffle the right variables into the right order at the right time. This makes it is very hard to write functions that manage more than ~3-4 variables at a time.

Javascript by comparison has random access to local variables* and my javascript version uses: ‘arr’, ‘out’, ‘i’, ‘j’, ‘period’, ’sum’, not to mention a bunch of unnamed temporaries like ‘length’ arr[j], ‘period-1′ etc…

Shuffling all these variables manually on a stack while mentally keeping tabs on the order and position of each variable is a pretty tough challenge. I suspect the resultant code would be the sort of thing only a compiler could love.

So faced with this problem I used my traditional factor problem-hammer, which is to step away from the screen, walk around a bit and ask myself the question: ‘What abstraction could there be that would make this easier?’.

I came up with ‘map-window’ which implements a sliding window across the input sequence and applies a block of code to each subset in turn. The code to implement the moving average is then:


[ mean ] map-window

Which is clearly a much cleaner implementation of SMA.

Before I continue I should mention that I could also have written the map-window abstraction in javascript (javascript has good higher order function facilities), but the point of this post is that factor *forced* me to come up with the approach.

Once I’d had the ‘map-window’ idea I could easily see how to compute the moving average. I also had an idea of how I could build map-window using ‘head’ and ‘tail’, or at least I had enough of an idea to motivate my trying it.

Ok, so here’s my full implementation for comparison with the javascript:


: window ( seq start window-width -- subseq )
    [ 1+ head ] dip short tail* ; 

: map-i ( seq quot: ( seq i -- elt ) -- seq' )
    [ dup length ] dip with map ; inline

: map-window ( seq window-width quot -- seq )
    '[ _ window @ ] map-i ; inline

: sma ( seq period -- seq' )
    [ mean ] map-window ;

To my eyes the factor implementation is quite a bit more complex than the javascript one, at least consumed in its entirety. This might be because the concept of a for-loop is deeply engrained in my brain whereas the Factor implementation invents both map-i and map-window to build sma.

However the individual parts of the factor implementation are both generic and composable, and once you know what each bit does the whole thing pretty elegantly describes itself.

A big advantage to all this abstraction is that when you discover an implementation pattern occuring more than once, the chances are that the pattern is already factored out to some extent and is ripe for reuse with very little modification. I find this makes refactoring quicker and easier than with python and keeps the codebase relatively lean. This in turn means that the codebase doesn’t drag as much as it gets bigger. The tradeoff is that I spend more time upfront finding and creating abstractions in the first place.

Of course if the right abstractions already exist then coding performance is improved dramatically. e.g. if map-window had already existied then sma would have been a slam dunk. I’d assume that as the factor library improves the likelyhood of this happening will increase, maybe at the expense of more time required to learn the core vocabularies. Programming in factor is already more about the libraries than the native language and I’d imagine this trend will continue, especially when you consider that in a lot of cases the libraries implement the core language.

Aside: I was surprised to discover last year that genuinely new and important stack language abstractions like ‘fry’ and the cleave/spread combinators were only just being conceived, despite Factor being quite a few years old and stack languages in general being many decades old. When you consider that very few languages actually ‘invent’ new features this makes Factor quite an interesting language in itself. Also interesting is that apart from a small bootstrapping core, the factor language is actually implemented in libraries meaning that anybody can build and experiment with new language constructs.

Anyway I’m diverging from the subject so I ought to sum up. The takeaway is: Whereas other languages provide the ability to create good abstractions, Factor pretty much forces you to create good abstractions because it is so bloody difficult to write any code without them.

Update: During writing this post I realised that what I’m doing with map-window is actually very similar to an abstraction in the factor library called <clumps> which constructs a virtual array of overlapping subsequences. That’s the nature of factor programming: you keep finding that somebody else has built a similar abstraction to yours and it would have saved you a ton of time if only you’d realised!

* Factor actually has support for efficient lexical local variables via the ‘locals’ vocabulary (library), which is a pretty impressive feat. However I only tend to use this when the problem I’m solving doesn’t factor well (or sometimes temporarily out of desperation when I can’t come up with the right abstraction).

I don’t use python that much any more for my own projects, and so it shouldn’t be any surprise to me that I missed one of the most exciting developments:

Somebody else has done a python refactoring library!

Rope looks tres cool. The problem with bicyclerepairman was that after it did just about enough to allow me to develop code efficiently I lost interest and concentrated more on other projects. At first glance Rope looks considerably more complete and polished than my efforts, which is excellent news for the python community. Hurrah!!

Spread Betting

Over the last few years I’ve read a number of books on stock trading but despite this I never really felt compelled to risk any of my own money on the markets.

Then in October I discovered that spread betting offers a cheap way to test the water on a small budget. Spread betting on markets is treated as gambling by uk laws and so there’s no stamp duty or capital gains tax to pay. Also the competition between firms is enough that pretty much all of them provide £1 per point betting on the major stocks and indices. Competition keeps the spreads narrow too - usually less than 1%.

Finally a lot of the firms offer a period of training where for a few weeks you can put on bets at 10 or 20p a point with guaranteed stops, which means it’s feasible to be putting £3-£6 total risk on each trade, especially for stocks priced under 500p.

I got started with IGIndex, which provides an impressive range of uk stocks including all of the ftse 100 and 250. The online web software is really very good: the trading platform is totally ajax using yahoo YUI libraries. Unfortunately the charting stuff is a java applet, but that appears to be par for the course with spread betting platforms.

I funded the account with £100 of risk capital and started betting. As an employee of an investment bank I need to get clearance for each stock from our compliance dept, but this turned out to be not as onerous as it sounds. Anyway, I had the misfortune of my first trade in october winning big: a £6 short bet on the ftse100 index at 10p a point. I trailed a stop and made £30 before the trade was stopped out. 30% appreciation in capital in one trade!

Of course this was pure fluke but it made me over confident and I quickly lost almost everything over the next 3 weeks: I got down from £130 to about £35. This was exactly what I needed: in my opinion you want to lose big during the training period when your capital exposure is small. During this first month I learnt a bunch of things:

  • Money management and risk management are really important!
  • I suspect this is more important than stock picking skills. Trading is a probabilistic thing and so you have to expect to lose a (maybe large) percentage of trades. For example I currently get stopped out on approximately two thirds of my trades. Apportioning risk capital between trades and using calculated stops ensures that a run of bad trades doesn’t destroy your capital account and allows you to stop and re-consider.

  • I was confusing short term oscillations with trends
  • I did this quite a bit in the first few trades, and the lesson is: always look at longer term graphs for overall patterns. My trades usually last a couple of weeks when they don’t get stopped out. This means I need to look at charts over a year to check for trends.

  • Plan the trade, trade the plan
  • I’ve discovered that trading can be a stressful affair and it’s easy to make silly mistakes in front of the 5 minute charts. I suspect this is because for a beginner like me there’s a big perception of time pressure, and also because you’re always in a position to do something with your stocks which makes it difficult look away.

    To mitigate the background stress and the opportunity for mistakes I’ve found it much better to make plans when the markets are closed (e.g. in the evening or weekend) and not in my lunch break. This is one reason I prefer to trade uk stocks which trade during the day rather than e.g. forex or commodities which are trading all the time.

    For the stocks I have open at a point in time I make notes about what I expect to happen and what to do if it does or doesn’t happen. IG index offers an SMS alert service which I use to alert me of price breaks. The alerts and the written plan keep me from worrying about my stocks while I’m working.

  • record everything
  • This is super important! I keep a spreadsheet of trades, a diary of notes about each entry/exit and screenshots of the intraday charts. This was especially important during the early weeks as I was able to learn from the many beginner mistakes I made. A couple of months later and I’m still making plenty of mistakes that I can learn from and I don’t expect that to change any time soon.

I’m out of the training period at IGIndex now and so am betting £1 a point on uk stocks and I’m currently apportioning a maximum risk of £20 per trade. I’m starting to become more consistent and confident but I’m wary that it’s difficult to tell at this early stage whether this is more luck or skill. If you’re thinking about trying your hand at trading I’d definitely recommend spread betting as a first step before committing any real money to the markets.

Disqus comments

Last night I switched to using the Disqus service for comments on this blog and I must say I’m pretty impressed so far.

The reason for the switch is that I’m just not staying on top of wordpress upgrades and tweaks and so ideally I’d like to replace my current setup with something much simpler, probably a static html blog.

Also I recently got a cheap 64mb vps from vps villiage and I’d like to move my blog to it. Running a static html blog will be much less memory hungry and considerably less vulnerable to attack than running wordpress, php, mysql et al.

I figured that if I outsourced the comments to a dynamic comment provider then the rest of the site can be made static relatively easy. I won’t need to worry about user authentication, spam, openid or any of the other things that don’t quite work perfectly on my blog.

Disqus was the first I tried and so far it’s been pretty amazing: Using the disqus wordpress plugin you can actually import all of your existing wordpress comments into it and it keeps both sides in sync, so comments made through disqus also get added to your wordpress database. Also it looks like it would be pretty easy to back out without losing comments if I find something wrong with Disqus - sweet!

I’m currently building an early release of the webapp database project I’ve been working on in my spare time. The app is predictably written in factor which has a neat deployment mechanism: you create an image file with just the compiled code that you need to run the app and then ship that for each platform.

This is a nice solution, except I found that using the built in ‘fhtml’ templating language in factor requires the whole parser-compiler runtime which ramps up the image footprint of my app from 3MB to about 20MB - D’oh!

So instead I’ve written a very simple web templating language which doesn’t require the factor parser at all:

DEFER: parse-text

: display-variable ( elem -- )
  dup callable? [ call ] [ write ] if ;

: parse-variable ( -- )
  "}" read-until [ get display-variable parse-text ]
                 [ drop "missing closing }" throw ] if ;

: maybe-variable ( -- )
  read1 dup CHAR: { = [ drop parse-variable ] [ write1 parse-text ] if ;

: parse-text ( -- )
  "$" read-until [ write maybe-variable ] [ write ] if ;

: eval-template ( path -- )
  utf8 <file-reader> [ parse-text ] with-input-stream ;

To run it you just call ‘eval-template’ with the path to the template file.

The syntax for variables within the template is similar to that of korn shell scripts: e.g. ‘${MYVAR}’. On processing the template, variable strings are just looked up in the current factor namespace. If the result is a quotation it gets invoked, otherwise the value of the variable is output as a string.

The namespace thing is cool because in factor you can ‘bind’ a hashtable to the top of the namespace stack before invoking eval-template, so this removes the need to pass tables around and makes the lookup code very neat.

You’ll also notice that the template input is bound to stdin using the with-input-stream word prior to processing, which means I also don’t have to pass the template stream around. Similarly the code expects the output stream to be bound so there’s no passing that around either. Dynamic variables are a really neat factor feature and it alludes me as to why they don’t appear in more mainstream languages.

Finally, something I’ve learnt about html templating languages over the last couple of years:

Html templates are best when the html boilerplate is clean and free of control statements (loops, conditionals), but they begin to suck as the page gets sophisticated about its rendering; usually when you hit tables and nested lists.

However the opposite seems to be true of html DSLs in lisp and factor: they’re great when there’s plenty of logic but become overkill when you’re just doing mostly static html, especially header and footer boilerplate.

The best compromise I’ve found is to drive the page from the html template, but then call into real code to render tables, lists and conditional content. This also has the nice side effect of making the template language trivial.

Continuing on from yesterday evening, I had a bit of time tonight in front of the telly so I implemented the rest of the fast-search functionality in factor using the assembler code from the previous post.

The first step was to create the simple bloom filter for sending to the assember code. To keep the assembler fast and simple I’m just setting one bit for each element in the input sequence. Also I hardcoded the size of the filter for simplicity (more later).

Factor has a bit-array type that backs onto a block of memory so writing ‘prepare-filter’ was easy:

: set-bit ( n filter -- ) t -rot set-nth ; inline

: (prepare-filter) ( filter seq -- )
  [ [ 1048576 mod ] dip set-bit ] curry each ; 

: prepare-filter ( seq -- filter )
  1048576 <bit-array> [ (prepare-filter) ] keep ;

I also needed some code to filter out false positives found by the assembler filter. To do this I pre-create a hash from the input set and then pass each result to a ‘collect-element-if-correct’ word:

: seq>hash ( seq -- hash ) [ dup ] H{ } map>assoc ; inline

: check-elt ( ptr i -- v/f )
  alien-unsigned-cell itemshash get at ;

: collect-element-if-correct ( alien i -- )
  tuck check-elt [ 2array , ] [ drop ] if* ;

Incidently, somebody left a comment on my last post saying that the bt (bit-test) instruction was a performance hog when testing directly against memory, so I modified the code to load the relevant bits into a register and bt against that instead. He was right - on the simple test I got a speedup from 170ms to around 105ms! I replaced the single bt instruction from the last post with the following:

  tmpreg "val" operand MOV
  tmpreg 5 SHR                    ! tmp contains dword offset into filter
  tmpreg 2 SHL                    ! clear bottom 2 bits and make it a byte offset
  tmpreg "filter" operand tmpreg [+] MOV   ! tmp contains filter dword
  "val" operand tmpreg BT

which disassembled with gas looks like this:

mov    %edx,%edi
shr    $0x5,%edi
shl    $0x2,%edi
mov    (%eax,%edi,1),%edi
bt     %edx,%edi

I’d actually intended on using bit shifts to eliminate the BT instruction altogether but it seems bitshifts can only take immediate operands in X86 assembler so that means I can’t dynamically shift according to the contents of a register. There’s probably another way - are there any x86 assembler experts out there with hints?

The only other performance affecting variable was the size of the bitarray used as the filter. This was where I was expecting to come unstuck: I sometimes have 1000s of elements in the input set and so would need a pretty big bitarray to keep the number of false positives low enough to get good performance. On the other hand I expected a large bitarray would hammer the cpu caches (as access is pretty random) and kill performance.

As it happens the performance on a 1Kbit filter seems to be about the same as one 4Mbits wide (i.e. half a megabyte). I’m not sure I fully understand why is as I’m pretty sure the L1 data cache on a core2duo is 32kB. Probably all the action takes place in L2 cache.

Anyway, I’m getting ~107ms to check an entire 48MB array (12M elements) for matches against a set of a 2836 elements using a 1Mb filter on a CPU pegged to 800mhz. At 1800mhz it takes about 48ms. For my requirements that pretty much validates the idea and the approach.

This is a continuation of the previous post

Thanks to the commenters who suggested using other datastructures rather than an unordered array. The reason I’m going for linear search is that it allows the searching of an index sorted for other purposes. The fewer indexes I use, the less data has to be in memory and the larger datasets I can work with. In short, fast-enough linear searching means more data in memory.

I’m pretty sure I’m not going to get near the 100ms at 800mhz I was aiming for yesterday. I wrote the assembler code this evening and did some preliminary tests.

The idea is that I load each element in the array, mod the value to the size of the bloom filter and then see if the corresponding bit is set in the filter and escape out if so. X86 assembler has a BT instruction that tests if an individual bit in memory is set. That’s got to be faster than doing all the loading ANDing and bitshifting myself.

Unfortunately the factor X86 assembler doesn’t support the BT instruction since it’s just designed for generating machinecode for the compiler. A quick look at the intel manuals gave me the opcodes I needed to add it though:

: BT ( operand operand -- ) { HEX: 0F HEX: A3 } 2-operand ;

The assembler loop that does the work looks like this in factor’s X86 DSL:

: %idx-matching-filter ( -- )
  "loop" define-label "end" define-label

  "filter" get dup %unbox-byte-array               ! unbox input params
  "ptr" get dup %unbox-alien
  "i" operand %untag-fixnum
  "val" operand %untag-fixnum        ! val contains length at this point

  ds-reg [] "val" operand MOV        ! free up val register for tmp use

"loop" resolve-label
  "val" operand "ptr" operand "i" operand [+] MOV
  "val" operand 1023 AND   ! val mod 1024
  "val" operand "filter" operand [] BT
  "end" get JB

  "i" operand 4 ADD
  "i" operand ds-reg [] CMP
  "loop" get JNE

"end" resolve-label
   "i" operand %tag-fixnum
;

Notice that it’s full of ‘”foo” operand’ clauses - factor maps these to registers (e.g. EAX, EBX etc..) according to function parameters. This is really handy and in this case saves doing the stack->register mapping manually. For completeness, here’s the gas dissassembly which includes unpacking registers from the stack and unboxing integers:

( scratchpad )  idx-matching-filter disassemble
[Thread debugging using libthread_db enabled]
[New Thread 0xb79dc6c0 (LWP 26068)]
[New Thread 0xb12f1b90 (LWP 26069)]
0xb7f18410 in __kernel_vsyscall ()
Dump of assembler code from 0xb17028e0 to 0xb1702927:
0xb17028e0: mov    $0xb17028e0,%ebx
0xb17028e5: mov    -0xc(%esi),%eax
0xb17028e8: mov    -0x8(%esi),%ecx
0xb17028eb: mov    -0x4(%esi),%edx
0xb17028ee: mov    (%esi),%ebp
0xb17028f0: lea    0x5(%eax),%eax
0xb17028f3: mov    0x9(%ecx),%ecx
0xb17028f6: sar    $0x3,%edx
0xb17028f9: sar    $0x3,%ebp
0xb17028fc: mov    %ebp,(%esi)
0xb17028fe: mov    (%ecx,%edx,1),%ebp
0xb1702901: and    $0x3ff,%ebp
0xb1702907: bt     %ebp,(%eax)
0xb170290a: jb     0xb170291b
0xb1702910: add    $0x4,%edx
0xb1702913: cmp    (%esi),%edx
0xb1702915: jne    0xb17028fe
0xb170291b: shl    $0x3,%edx
0xb170291e: mov    %edx,-0xc(%esi)
0xb1702921: sub    $0xc,%esi
0xb1702924: ret

I’ve got to go to bed soon so I just tried the best case of having an empty filter (and thus no matches): I was able to get 171ms for the 48MB array on a processor pegged at 800mhz, which isn’t the sub-100ms I was hoping for but is still pretty good. I’ll maybe experiment with real data and filter sizes tomorrow if I get any spare time in the evening.

I’m currently writing what amounts to an olap database in factor in my spare time, which is a pretty interesting project. Actually spare time is limited now that I have a child but factor is affording me a pretty good productivity compared to python and scheme so it’s probably net-par compared to what I was doing a couple of years ago. I should probably write a bit more about that some time.

Anyway I recently made the bizarro discovery that in the world of factor for ease of deployment it makes more sense to write low level code in assembler rather than C. This is because factor effectively has a built in macro assembler DSL for each of the chipsets it supports, whereas shipping C code requires having a C compiler setup on the target machine which is a pita for windows platforms.
Also X86 is fast becoming the only server processor that matters. Crazy world.

Ideally I’d not have to write any low-level code at all and I’d do it all in factor. To be honest factor is pretty bloody fast, and could conceivably compete well with raw assembler in the not-too-distant future given enough tweaking of the code generator. For the time being though C and assembler rule the performance roost in this environment.

Ok, so the interesting problem I have is:

I have a set of 32bit integers, and I want to search a large unsorted 32bit array accumulating the array indices of values matching those from the set.
To solve this in factor I’ve been putting the input set in a hashtable and then sequentially looking up each element from the array, which algorithmically is pretty optimal O(n+m) but the ever present constant is pretty high. This takes about 4 seconds on a processor pegged at 800mhz for a 48MB array.

Ideally I want this performance down to under 100ms. This is going to be very tight because a 48mb file contains ~12M elements, so if I’m aiming for 100ms that means about 6 cycles per element on an 800mhz chip. Probably a bit too ambitious.

I’m thinking I’ll create a bloom filter with a simple hashing function (like e.g. ‘mod’ ) and iterate the large array in assembler checking the filter for each element and then escaping back to factor to confirm and accumulate matches.

Incidentally, I’ve noticed that my motivation for writing about my programming ideas drops after I’ve implemented them, so this time I thought I’d experiment with using my blog to think through the idea before I attempt it. Can anybody see a better way to solve this problem given the constraints? I’ll post again with the results…

Factor ships with a gui ide, but I prefer to use the command line listener REPL within an emacs buffer for my coding. The advantage is that it’s more tightly integrated with emacs - e.g. I can send blocks of code to the repl with keypresses. The downside is the lack of debugger.

Today I discovered that factor REPLs can be nested, rather like creating subshells in unix, using the ‘listener’ word. This is really handy because it means you can drop into a listener at any point in your code. E.g.

"mydir" [ listener ] with-directory

…opens a nested repl with “mydir” being the current working directory. Exit the repl with ‘bye’ (or ctrl-d) and execution returns to the parent listener in the parent directory.

This trick can be employed from any point so it’s really handy for debugging state-machine style nested code. Here’s a trivial example to illustrate the point:

3 [ listener ] map
  --- starts a new listener ---
drop "hello" bye
  --- starts a new listener ---
bye
  --- starts a new listener ---
drop "goodbye" bye
  --- back to the parent ---
.
==> { "hello" 1 "goodbye" }

Cool.

Grown up shaving

I shave every day and recently I’ve gotten tired of pissing money away on expensive mach3 blades. After dabbling a bit with cheap disposables without much joy I hit the internet for some shaving advice. A bit of research threw up the ‘double edge’ safety razor option and I decided to give it a go, buying a Merkur Hefty Classic (HD) which I understand is a good DE razor for beginners. (voted best safety razor for beginners in some shaving forum or other).

I shave in the shower without a mirror (comfort is more important to me than shave quality) and so was a little tentative after reading about nicks and cuts associated with DE razor shaving. Also my posh shave cream and brush hadn’t arrived yet so I was using baby oil and el-crappo gel-in-a-can.

As it turned out the shave was easy and comfortable which was a bit of a surprise. The heavy razor head and sharp blade means you don’t need to apply pressure, and the top of the razor is rounded so you can vary the cutting angle starting off with it barely touching your stubble and then varying as you get more confident.

So even after one shave I think I can recommend this to other people looking for a decent but much cheaper shave than Mach3. I’m not sure how long the blades last yet but you do get two edges per blade and they can work out really cheap even high-end ones. Maybe I’ll post an update in a few days once I’ve got my cream and brush.

salt

A day’s worth of sodium in a single serve. I need to start bringing food into work.

A colleague in my team at work has a theory regarding the nature of the much-hyped Dan Kaminsky DNS attack (which he’ll be revealing at Black Hat this year). Take it away Rob…

Slava proposes a new syntax for multi-methods in factor which just happens to be a total re-invention of the way normal words work by turning everything generic. As a side-effect we then wind up getting optional static type checks and compiler dispatch-elimination optimisation across word boundaries.

It’s stuff like this that makes factor an exciting language to be programming with. It seems every couple of months we get a new feature or approach which turns everything on its head. Last time it was fry, then cleave combinators.

I regularly question doing my own projects in such a niche language, especially one with a brutal learning curve, but the reality is that factor is streets ahead of anything else I’ve developed with, and still accelerating…

In determining the meaning of tokens used in communication there are two widely used approaches to disambiguate that I’ll charactise as ‘namespacing’ and ‘context’.

When humans communicate amongst themselves they use the context of the communication to narrow down the range of possible meanings of terms used in the exchange, and human language doesn’t employ namespaces at all. On the other hand computer identifier schemes typically use namespaces to prevent term clash, and don’t use context at all.

The mechanisms operate differently:

Namespaces:

  • Every use of the namespaced term refers to the same concept.
    (or at least if it doesn’t this is considered an error)
  • Deterministic

Context:

  • the concept denoted to by the term depends on the context in which it is used
  • Statistical ( is that the right word? )

My thinking is that namespaces work well in a closed environment because coordination overhead is low and deterministic programs are easy to write. Namespaced schemes do however require a management mechanism to ensure that each use of the same term denotes exactly the same thing. This works well if the terms are grounded in the system - e.g. on the www a URL is used to fetch a document, and thus its use as identifier for that document is grounded.

However the semantic web is an open environment with little grounding, which means that holistic term coordination and management isn’t practical. Thus web-scale semantic web systems need to employ some degree of context based disambiguation anyway - i.e. the system can’t globally merge statements together without considering issues of provenance and consistency. I wrote about this issue here and at present this consideration is usually handled manually by the person operating the RDF store or software, but as these systems grow and scale more of these issues will need to be addressed by software.

Note that it is important to distinguish between this and the issue of trust in the content of the communication - here I am purely talking about interpreting the meaning of the communication, specifically measuring term consistency between documents from disconnected sources.

Now if you take this this inevitable use of context at web scale as given, my question is: Could the semantic web bootstrap and scale better with a system that disambiguated entirely based on context and didn’t employ namespaces at all? (i.e. like human language communication).

So I’ve been thinking along the lines of a scheme where literals and bnodes are used in place of URIs in RDF documents. Vocabularies use literal terms in place of URIs, and the combination of terms are used to infer meaning in aggregate.

Non-determinism issues aside this approach does have a central advantage: it removes the coordination and bootsrap overhead associated with use of namespaced identifiers, and particularly with issues peculiar to URIs:

  • artificial namespaces mean there’s little term match serendipity between disconnected uncoordinated clients
  • pre-existing identifier schemes are commonly not valid URIs, making reuse difficult
  • URIs introduce unnecessary term ownership, authority and lifecycle issues
  • Other URI proprietary issues add to cognitive overhead: hash v slash, uri denotes document vs thing it describes

One particular advantage of the literals-in-combination approach is that data can be lifted from existing sources without the requirement to invent and translate identifiers into URI schemes. Currently translation of data into traditional RDF consists of two challenges:

  • converting the structure of the data into a triple graph
  • translating the identifiers into a URI scheme

Whereas the former is a one-shot deal for each data format, the latter frequently requires manual input for each document and is IMO the single biggest hurdle to putting data onto the semantic web.

Of course the downside of the approach is that software consuming the data needs to take a non-deterministic approach to term meaning. There is no globally correct answer to ‘does this term in this document mean the same as this one?’ - instead it is a function of both the context under which the documents were written and of the requirements of the querying client.
Unfortunately I suspect that as people try to get traditional w3c semweb technologies to scale up in web scale environments they’re going to find themselves in the same non-deterministic boat.

I’m experimenting with a literals and bnodes approach in my own software and will post updates to my blog.

I like listening to the talis podcasts because they motivate me to think about semantic-web issues. Unfortunately I usually spend the entire session muttering to myself because I disagree with so much that is said.

The issue for me is that speakers often paint a rosy view of a merged data world where ‘if only’ people would adopt RDF and share their ontologies, systems would be able to communicate and share data. OWL is commonly painted in a broad-brushed way as the mechanism that would then enable semantic web interoperability. I have my doubts - deterministic ontologies get complicated and brittle very quickly.

Here’s a test:

If atom were an RDF format (i.e. same data structure, just in RDF), could OWL realistically be used to allow an RSS1.0 app to interpret atom data?

I wrote this post partly as an advocacy piece and partly to put down a bunch of things I’ve learnt about the factor compiler over the last few weeks. I should point out that I’m no expert in this area and so there are probably inaccuracies and omissions - hopefully Slava or one of the factor gurus will point them out. With that in mind, check this out!:

Factor has an optimising compiler which generates machine code as you type code into the REPL. If you have gdb on your system you can see this in action by firing up factor and using the tools.disassembler vocab:

( scratchpad ) : hello "hello" print ;           ! defines the word hello
( scratchpad ) USING: tools.disassembler ;
( scratchpad ) \ hello disassemble

Using host libthread_db library “/lib/tls/i686/cmov/libthread_db.so.1″.
[Thread debugging using libthread_db enabled]
[New Thread -1213614400 (LWP 25688)]
0xffffe410 in __kernel_vsyscall ()
Dump of assembler code from 0xb0f694b0 to 0xb0f694c7:
0xb0f694b0: mov    $0xb0f694b0,%ecx
0xb0f694b5: mov    0xb0f694e4,%ebx
0xb0f694bb: mov    %ebx,0×4(%esi)
0xb0f694be: add    $0×4,%esi
0xb0f694c1: jmp    0xb123e7d0
…
End of assembler dump.

This is very cool in itself, but for me the real beauty of the factor compiler is the very modular design, composed of small pieces that you can pull apart and tinker with in isolation. This makes the compiler accessible to people both as a learning tool and for those wanting to generate highly optimized code for tight loops.

The three stages of the compiler are

  1. Parsing the code and generating a ‘dataflow’ abstract syntax tree. (Also called ‘IR’ - intermediate representation)
  2. Optimizing the dataflow tree
  3. Generating machine code from the dataflow tree

I’ll dig into each of these steps in order:

Stage 1: Parsing factor code to dataflow IR

The first step parses factor code into a dataflow datastructure. You can run and inspect the results of this yourself using the dataflow word:

USE: inference
[ "hello" print ] dataflow pprint

=> T{
    #push
    T{
        node
        f
        f
        f
        V{ T{ value f “hello” 673850 f } }
        f
        f
        f
        f
        f
        f
        T{
            #call
            T{
                node
                f
                print
                V{ T{ value f “hello” 673850 f } }
                V{ }
                f
                f
                f
                f
                f
                f
                T{
                    #return
                    T{ node f f V{ } f f f f f f f f f }
                }
                f
            }
        }
        f
    }
}

Obviously inspecting this datastructure manually is pretty cumbersome, so fortunately there’s some dataflow inspection functionality in the ‘optimizer.debugger’ vocab. The dataflow>quot word renders the dataflow structure back into quotations (code blocks) that you can print and inspect. I use it here to define some words for dataflow pretty-printing:

USE: optimizer.debugger
: print-dataflow f dataflow>quot pprint nl ;
: print-annotated-dataflow t dataflow>quot pprint nl ;

So now we can turn quotations into dataflow graphs and back again:

[ "hello" print ] dataflow print-dataflow
=> [ “hello” print ]

(N.B. there are already words in the optimizer.debugger vocab for displaying optimized dataflows, but for this post I wanted to be able to print dataflows prior to optimisation)

This also works for pre-defined words using ‘word-dataflow’ :

: print-hello "hello" print ;
USE: generator
\ print-hello word-dataflow print-dataflow
=> [ “hello” print ]

In most cases the output quotation will be the same as the input quotation, however there are a couple of expansions that happen at this stage. The first is that words marked ‘inline’ are inlined directly into the dataflow:

: inlinedword "this" "is" "an" "inlined" "word" ; inline
[ inlinedword ] dataflow print-dataflow
=> [ “this” “is” “an” “inlined” “word” ]

Also any compiler-transforms (macros) are evaluated at this stage.

USE shuffle
[ 1 2 2 ndup ] dataflow print-dataflow      ! ndup is a macro
=> [ 1 2 2 drop 2 drop >r dup r> swap 2 drop >r dup r> swap ]

Stage 2: Dataflow Optimisation

Here’s where the fun starts. You can get a feel for how this stage works by looking at ‘optimizer.factor’. Here it is in its entirety:

! Copyright (C) 2006, 2008 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel namespaces optimizer.backend optimizer.def-use
optimizer.known-words optimizer.math optimizer.control
optimizer.inlining inference.class ;
IN: optimizer

: optimize-1 ( node -- newnode ? )
    [
        H{ } clone class-substitutions set
        H{ } clone literal-substitutions set
        H{ } clone value-substitutions set
        dup compute-def-use
        kill-values
        dup detect-loops
        dup infer-classes
        optimizer-changed off
        optimize-nodes
        optimizer-changed get
    ] with-scope ;

: optimize ( node -- newnode )
    optimize-1 [ optimize ] when ;

‘optimize’ iteratively calls ‘optimize-1′ until nothing changes in the output graph - i.e. that it has reached a fixed point and no more optimizations can be performed. If you dig into the words used by optimize-1 (try executing them individually and inspecting the dataflow result) you’ll find that optimize-1 performs a number of inferences and optimizations:

  • It tracks the types (classes) of stack elements created within the code block
  • It inlines specific generic word implementations (methods) when it can deduce the class instance on the stack
  • It prunes unused literals and flushable words. (this is actually more useful than it sounds, since other optimisations can generate unused code)
  • It performs branch analysis, marking tail calls in loops and pruning branches that can’t be executed
  • It evaluates ‘foldable’ words at compile time if the values of arguments are known
  • It executes any custom inference code attached to words, allowing words to evaluate their results at compile time if inputs are known

Examples:

evaluating foldable words at compile time

‘+’ is a foldable word (see help for ‘+’), so the optimizer evaluates it at compile time if the values of both arguments are known. Here’s the dataflow before and after optimization:

[ 2 3 + ] dataflow print-dataflow            ! before optimization
=> [ 2 3 + ]

[ 2 3 + ] dataflow optimize print-dataflow   ! after optimization
=> [ 5 ]

type inference and inlining generic word implementations

To illustrate this we first create two tuples (classes) with constructors, and a generic word

TUPLE: classa ;
C: <classa> classa
TUPLE: classb ;
C: <classb> classb 

GENERIC: dosomething ( obj -- val )

Now we create an implementation of the generic word specialised for each class:

M: classa dosomething drop "something for class a" ;
M: classb dosomething drop "something for class b" ;

Finally, some code which calls ‘dosomething’ with an instance of ‘classa’ on the stack, before and after optimization:

[ <classa> dosomething ] dataflow print-dataflow            ! before optimization
=> [  classa drop  classa 2 <tuple -boa> dosomething ]

[ <classa> dosomething ] dataflow optimize print-dataflow    ! after optimization
=> [  classa 2 <tuple -boa> drop “something for class a” ]

It’s a little messy because of the inlined tuple creation, but you can see that prior to optimization ‘dosomething’ is a word call in the dataflow, and afterwards the optimizer has inlined the implementation of ‘dosomething’ specialized on ‘classa’. (if you look you can also see an example of pruning literals here, as the first dataflow has resulted in a superflous ‘\ classa drop’).

This is easier to see if I cheat a bit and use factor’s ‘declare’ word, which declares that elements on the top of the stack are instances of specific classes. So this quotation assumes that top stack element before it is called is of type ‘classa’:

[ { classa } declare dosomething ] dataflow optimize print-dataflow
=> [ drop “something for class a” ]

N.B. you wouldn’t normally use ‘declare’ in user code, but it could be really handy for optimizing performance sensitive tight loops where the results of an external word call are known to the programmer but not the compiler.

conditional folding

The compiler optimizes out conditional branches when it can deduce the outcome of the conditional at compile time:

[ 1 0 =  [ "do if true" ] [ "do if false" ] if ] dataflow optimize print-dataflow
=> [ “do if false” ]

1 isn’t equal to 0 so it optimizes this whole block into the contents of the false quotation.
This is a simple example, but it turns out to be really cool in performance sensitive code (e.g. tight loops) because you can use a generic library function whose behaviour depends on a conditional, specialize it with a hardcoded ‘f’ and the compiler will optimize the conditional branch right out of the resulting code. You get the elegance of the generic combinator with the speed of a hand coded loop.

Stage 3: Machine Code Generation

Code generation is implemented by the ‘generate’ word. This iterates through the nodes calling ‘generate-node’ on each.

: generate-nodes ( node -- )
    [ node@ generate-node ] iterate-nodes end-basic-block ;

: generate ( node word label -- )
    [
        init-generate-nodes
        [ generate-nodes ] with-node-iterator
    ] with-generator ;

‘generate-node’ is a generic word with specialized implementations for each type of dataflow node.

As described in this post from Slava’s excellent Factor blog there are a number of dataflow node types, the important ones being:

* #push - push literals on the data stack
* #shuffle - permute the elements of the data or call stack
* #call - call a word
* #label - an inlined recursive block (loop, etc)
* #if - conditional with two child nodes
* #dispatch - jump table with multiple nodes; jumps to the node indexed by a number on the data stack

The generate-node implementations invoke lower level words in the ‘architecture’ vocabulary, which in turn are generic words that write out small pieces of machine code specialized for each CPU architecture.

The machine code generation code is particularly cool and easy to follow because factor has an assembler DSL for each cpu architecture it supports. The assembler words match the commands and registers of the target cpu architecture and evaluate to their corresponding machine code.

You can even try this out in the REPL using ‘make’ to collect the results into an array. I’m on x86 so I load the cpu.x86.assembler vocabulary:

USE: cpu.x86.assembler

[ EAX 35 MOV ] { } make .   ! postfix assembler evaluates to machine code!

=> { 184 35 0 0 0 }

The assembler DSLs make code generation easy to follow because you can see the assembler in the generation code and then check it against the disassembled machine code using the ‘disassemble’ word we used at the start of the post. When following the code it helps to know which registers are used for what purpose. I found this information in assembler files in the factor VM source - I’m an x86 so for me the declares are in the ‘cpu-x86.32.S’ file:

#define ARG0 %eax
#define ARG1 %edx
#define XT_REG %ecx
#define STACK_REG %esp
#define DS_REG %esi
#define RETURN_REG %eax

#define CELL_SIZE 4

#define PUSH_NONVOLATILE 
	push %ebx ; 
	push %ebp

#define POP_NONVOLATILE 
	pop %ebp ; 
	pop %ebx

register CELL ds asm("esi");
register CELL rs asm("edi");

So lets check this against some generated code for a really basic word that just pops the number 42 on the stack:

: myfunc 42 ;
 myfunc disassemble
=>
Using host libthread_db library “/lib/tls/i686/cmov/libthread_db.so.1″.
[Thread debugging using libthread_db enabled]
[New Thread -1213696320 (LWP 32499)]
0xffffe410 in __kernel_vsyscall ()
Dump of assembler code from 0xb136b230 to 0xb136b247:
0xb136b230: mov    $0xb136b230,%ecx
0xb136b235: mov    $0×150,%ebx
0xb136b23a: mov    %ebx,0×4(%esi)
0xb136b23d: add    $0×4,%esi
0xb136b240: ret 

The first assembler line puts the address of the word into the XT_REG, which is %ecx. For some reason the start of each function puts it’s address into this register - not quite sure why.
The second line puts the number 42 into the ebx register. Note that factor uses the first 3 bits of a value (’cell’) to store its type (called a tag - see layouts.h). In this case it’s a fixnum which is 000. 42 shifted left 3 bits is 336, which in hex is 0×150.
The third line puts the number onto the stack, and the forth updates the stack pointer to point to the new top of the stack.

code generation optimizations

Factor has another couple of tricks up its sleeve during the code generation stages:

optimizing shuffle words

The first is that stack shuffle words (e.g. dup, swap, tuck etc..) don’t get translated into machine code. Instead the compiler has a compile time ‘phantom stack’ which records the positions of items in the stack. When it generates the machine code values are accessed from the stack out of order (the runtime stack is after all a random access piece of memory). This makes stack shuffling words and the retain stack effectively ‘free’ within a code block. A #merge node in the dataflow signifies a code boundary (usually before a subroutine call) which causes the compiler to output instructions which synchronise the physical runtime stack with its phantom stack.

word intrinsics

The second trick is word ‘intrinsics’. Word intrinsics are essentially blocks of open-coded assembler that are output in place of a subroutine call. They are associated with the word via a ‘word-property’, which is a nifty feature of factor that allows meta information to be attached to each word. For example the ‘fixnum+fast’ word has intrinsics which you can see using ‘word-prop’:

 fixnum+fast "intrinsics" word-prop pprint
=>
{
    {
        [ “x” operand “y” operand ADD ]
        H{
            { +output+ { “x” } }
            { +input+ { { f “x” } { [ small-tagged? ] “y” } } }
        }
    }
    {
        [ “x” operand “y” operand ADD ]
        H{
            { +output+ { “x” } }
            { +input+ { { f “x” } { f “y” } } }
        }
    }
}

This tells the compiler to inline the x86 ADD instruction instead of making a subroutine call to the implementation of fixnum+fast. You can add assembler intrinsics to existing words with ‘define-intrinsics’; Here’s a description from the help for the define-intrinsics word:

Defines a set of assembly intrinsics for the word. When a call to the word is being compiled, each intrinsic is tested in turn; the first applicable one will be called to generate machine code. If no suitable intrinsic is found, a simple call to the word is compiled instead.

What I particularly like about this feature is that it neatly provides the ability to specialize a highly optimized implementation for a particular hardware set, and then fall back gracefully on other architectures.

That concludes my ad-hoc tour of the factor compiler. I’ve skipped over a number of things and no doubt there are bits I haven’t discovered yet and some inaccuracies, but I hope I’ve supplied enough information to spark interest in this excellent compiler.

As I mentioned in a previous post I got interested in factor as a direct result of tinkering with jonesforth, which takes you through the entire forth bootstrap process starting with raw assembly. I’ve been delighted to find that factor retains a lot of the ‘right-down-to-the-metal’ accessibility of its low level cousin.

I finally got my F5D7132 wireless repeater working. The trick was to ignore all the auto-negotiate ‘one push setup’ rubbish and actually connect to the web interface of the device.
I don’t use windows, so the installation CD was no good to me, but I found some docs from Belkin that the NIC for the device is on IP 192.168.2.254 by default. The trick was to connect my laptop via ethernet to the repeater and configure the laptop nic to be another address on that range

ifconfig eth0 192.168.2.2

Then I was able to point my browser at http://192.168.2.254/ to configure the repeater, adding the right SID and connection details and manually finding the right wireless network - Yay!

Motivated by Tom Moertel’s ‘A coders guide to coffee’, I’ve been experimenting with roasting my own coffee on the cheap. Here’s my equipment bought to date:

- Bodum 5679 C-Mill Electric Coffee Grinder
- Rival popcorn popper (cost me a fiver from ebay).
- Aeropress brewer
- Digital oven Thermometer 100169 E19

N.B. I didn’t start off roasting: a couple of months ago I bought a grinder and started ordering roasted beans from hasbean. If you live in the UK then HasBean comes highly recommended - the coffee has usually been roasted on the morning of dispatch and the beans arrive through the door the next day in a vacuum sealed bag.

I happily made filter coffee for a couple of weeks before Jay at work encouraged me to get an Aeropress. This is a cheap device that allows you to force hot water through the grinds under pressure to create shots. It’s not the same as a 500 quid espresso maker but I’ve been really happy with the quality of brew I get from this contraption; I don’t think you can get a better cup of coffee for the price and it’s a lot better than the filter in my mind.

Anyway, it was only a matter of time before the lure of subverting cheap consumer appliances to roast coffee proved too tempting. I found a wealth of info on the net about roasting coffee in popcorn poppers and bought a cheap popper for a fiver off of ebay. The best guides I found were:

There’s also a guide to modding your popper on engadget, although I’m not convinced this is strictly necessary. Ed Spiegel’s guide has numerous tips to change the roasting speed and profile without modifying the innards simply by varying bean mass, the tilt of the popper, using an extension cable (which adds resistance and slows the roast) and stirring the beans.

Also the Sweet maria’s site has loads of material on coffee roasting and brewing in general. In particular I found this colour chart really helpful. If you live in the US then consider getting your beans from them as people rave about them all over the internet.

I’m certainly no coffee expert, but I really recommend trying fresh coffee you’ve ground yourself. I understand that ground coffee goes stale in a matter of hours, so pretty much anything ground you buy from a supermarket is already well off by the time you brew it regardless of the vacuum packaging it comes in. Roasted whole beans go stale in a couple of weeks apparently, so it’s best to order them from a supplier that indicates the date of roast. In the UK this means getting them from the internet. If you need more persuading, check out ‘A coders guide to coffee‘ for some top quality coffee advocacy.

At this stage the benefit of home roasting for me is mainly that I can roast what I need for the next few days and since we only drink a couple of cups a day that’s not very much. Also the green beans are pretty cheap even for the gourmet stuff and keep well for years. As I get better at roasting I’m hoping I’ll develop the technique and a palette that’ll give me pleasure experimenting with varieties and varying roasts.
I’m not at that level of precision yet, but I’m certainly getting better tasting coffee than I pick up at starbucks.

Yesterday I was musing over why it took me so much longer to get going with Factor compared to the other languages I’ve learnt over the years. I came up with this:

Factor’s base language is very fundamental and primitive in nature. Programming it is similar to programming with assembler language: you have to keep track of the computation machinery in your head.

Where factor differs from assembler is that careful combination of primitives allows the level of abstraction to be ramped right up. This means that the real productivity action happens not in the base language but in the additional language components built on it in the extensive packaged libraries. Unfortunately until you’re familiar with these libraries you’re stuck programming at a nut-and-bolts level of abstraction.

Contrast this to your typical applicative language, maybe python or ruby: here the core language includes some useful abstractions which allow you to be relatively productive without having up-front intimate knowledge of the libraries. A couple of hours learning the ideas and syntax is enough to get you up and running and feeling like you’re progressing.

So the result is: Factor’s initial learning curve is brutally steep (unless maybe if you’re already proficient in stack languages). Getting to productivity takes time and study, and unfortunately probably more than you’re prepared to give up front if you’re just checking out the language.

Older Posts »

generic acomplia purchase cialis overnight delivery cheap acomplia online buy generic clomid buy cialis low price viagra without prescription where to buy cialis lowest price levitra where to buy propecia cheap cialis from canada lasix no prescription viagra without rx cheap accutane tablets viagra online without prescription viagra no rx buying cialis online zithromax viagra in uk free cialis cialis us where to buy acomplia find cialis online buy viagra lowest price accutane prescription buy cheap accutane online cialis buy buy generic cialis online acomplia order propecia online lowest price synthroid synthroid without a prescription synthroid online buy propecia online cheap levitra online where to buy levitra cialis online review synthroid prices cialis generic cialis buy drug buy viagra on line viagra pharmacy cialis for order price of levitra zithromax online where to buy synthroid soma generic generic clomid propecia online stores viagra cheap drug cheap generic soma cialis cheap zithromax online cheap order accutane online purchase zithromax online purchase viagra online buy cheap clomid cheap generic propecia zithromax pharmacy online pharmacy cialis cheapest acomplia cost of cialis no prescription viagra free viagra purchase lasix online cialis from india viagra from india order discount cialis soma online stores find no rx cialis cialis no rx required find viagra without prescription approved cialis pharmacy lasix discount