<?xml version="1.0" encoding="utf-8"?>
<!-- generator="wordpress/2.1.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Phil Dawes' Stuff</title>
	<link>http://www.phildawes.net/blog</link>
	<description>Programming, data, web things, ai, stuff like that.</description>
	<pubDate>Mon, 09 Jun 2008 11:58:52 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.1.1</generator>
	<language>en</language>
			<item>
		<title>Multi-Methods in factor</title>
		<link>http://www.phildawes.net/blog/2008/06/09/multi-methods-in-factor/</link>
		<comments>http://www.phildawes.net/blog/2008/06/09/multi-methods-in-factor/#comments</comments>
		<pubDate>Mon, 09 Jun 2008 11:58:52 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[factor]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/06/09/multi-methods-in-factor/</guid>
		<description><![CDATA[Slava proposes a new syntax for multi-methods in factor which just happens to be a total re-invention of the way normal words work by turning everything generic. As a side-effect we then wind up getting optional static type checks and compiler dispatch-elimination optimisation across word boundaries.
It&#8217;s stuff like this that makes factor an exciting language [...]]]></description>
			<content:encoded><![CDATA[<p>Slava proposes a <a href="http://factor-language.blogspot.com/2008/06/syntax-proposal-for-multi-methods.html">new syntax for multi-methods in factor</a> which just happens to be a total re-invention of the way normal words work by turning everything generic. As a side-effect we then wind up getting optional static type checks and compiler dispatch-elimination optimisation across word boundaries.</p>
<p>It&#8217;s stuff like this that makes factor an exciting language to be programming with. It seems every couple of months we get a new feature or approach which turns everything on its head. Last time it was fry, then cleave combinators.</p>
<p>I regularly question doing my own projects in such a niche language, especially one with a brutal learning curve, but the reality is that factor is streets ahead of anything else I&#8217;ve developed with, and still accelerating&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/06/09/multi-methods-in-factor/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Namespacing &#038; Context - ramifications for the semantic web</title>
		<link>http://www.phildawes.net/blog/2008/04/18/namespacing-context-ramifications-for-the-semantic-web/</link>
		<comments>http://www.phildawes.net/blog/2008/04/18/namespacing-context-ramifications-for-the-semantic-web/#comments</comments>
		<pubDate>Fri, 18 Apr 2008 13:42:46 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/04/18/namespacing-context-ramifications-for-the-semantic-web/</guid>
		<description><![CDATA[In determining the meaning of tokens used in communication there are two widely used approaches to disambiguate that I&#8217;ll charactise as &#8216;namespacing&#8217; and &#8216;context&#8217;.
When humans communicate amongst themselves they use the context of the communication to narrow down the range of possible meanings of terms used in the exchange, and human language doesn&#8217;t employ namespaces [...]]]></description>
			<content:encoded><![CDATA[<p>In determining the meaning of tokens used in communication there are two widely used approaches to disambiguate that I&#8217;ll charactise as &#8216;namespacing&#8217; and &#8216;context&#8217;.</p>
<p>When humans communicate amongst themselves they use the context of the communication to narrow down the range of possible meanings of terms used in the exchange, and human language doesn&#8217;t employ namespaces at all. On the other hand computer identifier schemes typically use namespaces to prevent term clash, and don&#8217;t use context at all.</p>
<p>The mechanisms operate differently:</p>
<p>Namespaces:</p>
<ul>
<li>Every use of the namespaced term refers to the same concept.<br />
(or at least if it doesn&#8217;t this is considered an error)</li>
<li>Deterministic</li>
</ul>
<p>Context:</p>
<ul>
<li>the concept denoted to by the term depends on the context in which it is used</li>
<li>Statistical  ( is that the right word? )</li>
</ul>
<p>My thinking is that namespaces work well in a closed environment because coordination overhead is low and deterministic programs are easy to write. Namespaced schemes do however require a management mechanism to ensure that each use of the same term denotes exactly the same thing. This works well if the terms are grounded in the system - e.g. on the www a URL is used to fetch a document, and thus its use as identifier for that document is grounded.</p>
<p>However the semantic web is an open environment with little grounding, which means that holistic term coordination and management isn&#8217;t practical.  Thus web-scale semantic web systems need to employ some degree of context based disambiguation anyway  - i.e. the system can&#8217;t globally merge statements together without considering issues of provenance and consistency. I wrote about this issue <a href="http://www.phildawes.net/blog/2007/12/15/uris-are-syntactically-universal-not-semantically-universal/">here</a> and at present this consideration is usually handled manually by the person operating the RDF store or software, but as these systems grow and scale more of these issues will need to be addressed by software.</p>
<p>Note that it is important to distinguish between this and the issue of trust in the <em>content</em> of the communication - here I am purely talking about interpreting the <em>meaning</em> of the communication, specifically measuring term consistency between documents from disconnected sources.</p>
<p>Now if you take this this inevitable use of context at web scale as given, my question is: <strong>Could the semantic web bootstrap and scale better with a system that disambiguated <em>entirely</em> based on context and didn&#8217;t employ namespaces at all?</strong> (i.e. like human language communication).</p>
<p>So I&#8217;ve been thinking along the lines of a scheme where literals and bnodes are used in place of URIs in RDF documents. Vocabularies use literal terms in place of URIs, and the combination of terms are used to infer meaning in aggregate.</p>
<p>Non-determinism issues aside this approach does have a central advantage:  it removes the coordination and bootsrap overhead associated with use of namespaced identifiers, and particularly with issues peculiar to URIs:</p>
<ul>
<li>artificial namespaces mean there&#8217;s little term match serendipity between disconnected uncoordinated clients</li>
<li>pre-existing identifier schemes are commonly not valid URIs, making reuse difficult</li>
<li>URIs introduce unnecessary term ownership, authority and lifecycle issues</li>
<li>Other URI proprietary issues add to cognitive overhead: hash v slash, uri denotes document vs thing it describes</li>
</ul>
<p>One particular advantage of the literals-in-combination approach is that data can be lifted from existing sources without the requirement to invent and translate identifiers into URI schemes. Currently translation of data into traditional RDF consists of two challenges:</p>
<ul>
<li>converting the structure of the data into a triple graph</li>
<li>translating the identifiers into a URI scheme</li>
</ul>
<p>Whereas the former is a one-shot deal for each data format, the latter frequently requires manual input for each document and is IMO the single biggest hurdle to putting data onto the semantic web.</p>
<p>Of course the downside of the approach is that software consuming the data needs to take a non-deterministic approach to term meaning. There is no globally correct answer to &#8216;does this term in this document mean the same as this one?&#8217; - instead it is a function of both the context under which the documents were written and of the requirements of the querying client.<br />
Unfortunately I suspect that as people try to get traditional w3c semweb technologies to scale up in web scale environments they&#8217;re going to find themselves in the same non-deterministic boat.</p>
<p>I&#8217;m experimenting with a literals and bnodes approach in my own software and will post updates to my blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/04/18/namespacing-context-ramifications-for-the-semantic-web/feed/</wfw:commentRss>
		</item>
		<item>
		<title>How realistic is using OWL for semweb data integration?</title>
		<link>http://www.phildawes.net/blog/2008/04/11/how-realisitic-is-using-owl-for-semweb-data-integration/</link>
		<comments>http://www.phildawes.net/blog/2008/04/11/how-realisitic-is-using-owl-for-semweb-data-integration/#comments</comments>
		<pubDate>Fri, 11 Apr 2008 12:00:51 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/04/11/how-realisitic-is-using-owl-for-semweb-data-integration/</guid>
		<description><![CDATA[I like listening to the talis podcasts because they motivate me to think about semantic-web issues. Unfortunately I usually spend the entire session muttering to myself because I disagree with so much that is said.
The issue for me is that speakers often paint a rosy view of a merged data world where &#8216;if only&#8217; people [...]]]></description>
			<content:encoded><![CDATA[<p>I like listening to the <a href="http://talk.talis.com/">talis podcasts</a> because they motivate me to think about semantic-web issues. Unfortunately I usually spend the entire session muttering to myself because I disagree with so much that is said.</p>
<p>The issue for me is that speakers often paint a rosy view of a merged data world where &#8216;if only&#8217; people would adopt RDF and share their ontologies, systems would be able to communicate and share data. OWL is commonly painted in a broad-brushed way as the mechanism that would then enable semantic web interoperability. I have my doubts - deterministic ontologies get complicated and brittle very quickly.</p>
<p>Here&#8217;s a test:</p>
<p>If atom were an RDF format (i.e. same data structure, just in RDF), could OWL realistically be used to allow an RSS1.0 app to interpret atom data?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/04/11/how-realisitic-is-using-owl-for-semweb-data-integration/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Digging into Factor&#8217;s compiler</title>
		<link>http://www.phildawes.net/blog/2008/04/08/digging-into-factors-compiler/</link>
		<comments>http://www.phildawes.net/blog/2008/04/08/digging-into-factors-compiler/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 08:27:16 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[factor]]></category>

		<category><![CDATA[compiler]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/04/08/digging-into-factors-compiler/</guid>
		<description><![CDATA[I wrote this post partly as an advocacy piece and partly to put down a bunch of things I&#8217;ve learnt about the factor compiler over the last few weeks. I should point out that I&#8217;m no expert in this area and so there are probably inaccuracies and omissions - hopefully Slava or one of the [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote this post partly as an advocacy piece and partly to put down a bunch of things I&#8217;ve learnt about the factor compiler over the last few weeks. I should point out that I&#8217;m no expert in this area and so there are probably inaccuracies and omissions - hopefully Slava or one of the factor gurus will point them out. With that in mind, check this out!:</p>
<p>Factor has an optimising compiler which generates machine code as you type code into the <a href="http://en.wikipedia.org/wiki/REPL">REPL</a>. If you have gdb on your system you can see this in action by firing up factor and using the tools.disassembler vocab:</p>
<p><code></p>
<pre>
( scratchpad ) : hello "hello" print ;           ! defines the word hello
( scratchpad ) USING: tools.disassembler ;
( scratchpad ) \ hello disassemble
<em>
Using host libthread_db library &#8220;/lib/tls/i686/cmov/libthread_db.so.1&#8243;.
[Thread debugging using libthread_db enabled]
[New Thread -1213614400 (LWP 25688)]
0xffffe410 in __kernel_vsyscall ()
Dump of assembler code from 0xb0f694b0 to 0xb0f694c7:
0xb0f694b0: mov    $0xb0f694b0,%ecx
0xb0f694b5: mov    0xb0f694e4,%ebx
0xb0f694bb: mov    %ebx,0&#215;4(%esi)
0xb0f694be: add    $0&#215;4,%esi
0xb0f694c1: jmp    0xb123e7d0
&#8230;
End of assembler dump.
</em>
</pre>
<p></code></p>
<p>This is very cool in itself, but for me the real beauty of the factor compiler is the very modular design, composed of small pieces that you can pull apart and tinker with in isolation. This makes the compiler accessible to people both as a learning tool and for those wanting to generate highly optimized code for tight loops. </p>
<p>The three stages of the compiler are</p>
<ol>
<li>Parsing the code and generating a &#8216;dataflow&#8217; abstract syntax tree. (Also called &#8216;IR&#8217; - intermediate representation)</li>
<li>Optimizing the dataflow tree</li>
<li>Generating machine code from the dataflow tree</li>
</ol>
<p>I&#8217;ll dig into each of these steps in order:</p>
<h3>Stage 1: Parsing factor code to dataflow IR</h3>
<p>The first step parses factor code into a dataflow datastructure. You can run and inspect the results of this yourself using the dataflow word:</p>
<p><code></p>
<pre>
USE: inference
[ "hello" print ] dataflow pprint
<em>
=> T{
    #push
    T{
        node
        f
        f
        f
        V{ T{ value f &#8220;hello&#8221; 673850 f } }
        f
        f
        f
        f
        f
        f
        T{
            #call
            T{
                node
                f
                print
                V{ T{ value f &#8220;hello&#8221; 673850 f } }
                V{ }
                f
                f
                f
                f
                f
                f
                T{
                    #return
                    T{ node f f V{ } f f f f f f f f f }
                }
                f
            }
        }
        f
    }
}
</em>
</pre>
<p></code></p>
<p>Obviously inspecting this datastructure manually is pretty cumbersome, so fortunately there&#8217;s some dataflow inspection functionality in the &#8216;optimizer.debugger&#8217; vocab. The dataflow>quot word renders the dataflow structure back into quotations (code blocks) that you can print and inspect. I use it here to define some words for dataflow pretty-printing:</p>
<p><code></p>
<pre>
USE: optimizer.debugger
: print-dataflow f dataflow>quot pprint nl ;
: print-annotated-dataflow t dataflow>quot pprint nl ;
</pre>
<p></code></p>
<p>So now we can turn quotations into dataflow graphs and back again:</p>
<p><code></p>
<pre>
[ "hello" print ] dataflow print-dataflow
<em>=> [ &#8220;hello&#8221; print ]</em>
</pre>
<p></code></p>
<p>(N.B. there are already words in the optimizer.debugger vocab for displaying optimized dataflows, but for this post I wanted to be able to print dataflows prior to optimisation)</p>
<p>This also works for pre-defined words using &#8216;word-dataflow&#8217; :<br />
<code></p>
<pre>
: print-hello "hello" print ;
USE: generator
\ print-hello word-dataflow print-dataflow
<em>=> [ &#8220;hello&#8221; print ]</em>
</pre>
<p></code></p>
<p>In most cases the output quotation will be the same as the input quotation, however there are a couple of expansions that happen at this stage. The first is that words marked &#8216;inline&#8217; are inlined directly into the dataflow:</p>
<p><code></p>
<pre>
: inlinedword "this" "is" "an" "inlined" "word" ; inline
[ inlinedword ] dataflow print-dataflow
<em>=> [ &#8220;this&#8221; &#8220;is&#8221; &#8220;an&#8221; &#8220;inlined&#8221; &#8220;word&#8221; ]</em>
</pre>
<p></code></p>
<p>Also any compiler-transforms (macros) are evaluated at this stage. </p>
<p><code></p>
<pre>
USE shuffle
[ 1 2 2 ndup ] dataflow print-dataflow      ! ndup is a macro
<em>=> [ 1 2 2 drop 2 drop >r dup r> swap 2 drop >r dup r> swap ]</em>
</pre>
<p></code></p>
<h3>Stage 2: Dataflow Optimisation</h3>
<p>Here&#8217;s where the fun starts. You can get a feel for how this stage works by looking at &#8216;optimizer.factor&#8217;. Here it is in its entirety:</p>
<p><code></p>
<pre>
! Copyright (C) 2006, 2008 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel namespaces optimizer.backend optimizer.def-use
optimizer.known-words optimizer.math optimizer.control
optimizer.inlining inference.class ;
IN: optimizer

: optimize-1 ( node -- newnode ? )
    [
        H{ } clone class-substitutions set
        H{ } clone literal-substitutions set
        H{ } clone value-substitutions set
        dup compute-def-use
        kill-values
        dup detect-loops
        dup infer-classes
        optimizer-changed off
        optimize-nodes
        optimizer-changed get
    ] with-scope ;

: optimize ( node -- newnode )
    optimize-1 [ optimize ] when ;
</pre>
<p></code></p>
<p>&#8216;optimize&#8217; iteratively calls &#8216;optimize-1&#8242; until nothing changes in the output graph - i.e. that it has reached a fixed point and no more optimizations can be performed. If you dig into the words used by optimize-1 (try executing them individually and inspecting the dataflow result) you&#8217;ll find that optimize-1 performs a number of inferences and optimizations:</p>
<ul>
<li>It tracks the types (classes) of stack elements created within the code block</li>
<li>It inlines specific generic word implementations (methods) when it can deduce the class instance on the stack</li>
<li>It prunes unused literals and flushable words. (this is actually more useful than it sounds, since other optimisations can generate unused code)
</li>
<li>It performs branch analysis, marking tail calls in loops and pruning branches that can&#8217;t be executed
</li>
<li>It evaluates &#8216;foldable&#8217; words at compile time if the values of arguments are known
</li>
<li>It executes any custom inference code attached to words, allowing words to evaluate their results at compile time if inputs are known</li>
</ul>
<p>Examples:</p>
<h5>evaluating foldable words at compile time</h5>
<p>&#8216;+&#8217; is a foldable word (see help for &#8216;+&#8217;), so the optimizer evaluates it at compile time if the values of both arguments are known. Here&#8217;s the dataflow before and after optimization:<br />
<code></p>
<pre>
[ 2 3 + ] dataflow print-dataflow            ! before optimization
<em>=> [ 2 3 + ]</em>

[ 2 3 + ] dataflow optimize print-dataflow   ! after optimization
<em>=> [ 5 ]</em>
</pre>
<p></code></p>
<h5>type inference and inlining generic word implementations</h5>
<p>To illustrate this we first create two tuples (classes) with constructors, and a generic word<br />
<code></p>
<pre>
TUPLE: classa ;
C: &lt;classa> classa
TUPLE: classb ;
C: &lt;classb> classb 

GENERIC: dosomething ( obj -- val )
</pre>
<p></code></p>
<p>Now we create an implementation of the generic word specialised for each class:<br />
<code></p>
<pre>
M: classa dosomething drop "something for class a" ;
M: classb dosomething drop "something for class b" ;
</pre>
<p></code></p>
<p>Finally, some code which calls &#8216;dosomething&#8217; with an instance of &#8216;classa&#8217; on the stack, before and after optimization:<br />
<code></p>
<pre>
[ &lt;classa> dosomething ] dataflow print-dataflow            ! before optimization
<em>=> [  classa drop  classa 2 &lt;tuple -boa> dosomething ]</em>

[ &lt;classa> dosomething ] dataflow optimize print-dataflow    ! after optimization
<em>=> [  classa 2 &lt;tuple -boa> drop &#8220;something for class a&#8221; ]</em>
</pre>
<p></code></p>
<p>It&#8217;s a little messy because of the inlined tuple creation, but you can see that prior to optimization &#8216;dosomething&#8217; is a word call in the dataflow, and afterwards the optimizer has inlined the implementation of &#8216;dosomething&#8217; specialized on &#8216;classa&#8217;. (if you look you can also see an example of pruning literals here, as the first dataflow has resulted in a superflous &#8216;\ classa drop&#8217;).</p>
<p>This is easier to see if I cheat a bit and use factor&#8217;s &#8216;declare&#8217; word, which declares that elements on the top of the stack are instances of specific classes. So this quotation assumes that top stack element before it is called is of type &#8216;classa&#8217;:</p>
<p><code></p>
<pre>
[ { classa } declare dosomething ] dataflow optimize print-dataflow
<em>=> [ drop &#8220;something for class a&#8221; ]</em>
</pre>
<p></code></p>
<p>N.B. you wouldn&#8217;t normally use &#8216;declare&#8217; in user code, but it could be really handy for optimizing performance sensitive tight loops where the results of an external word call are known to the programmer but not the compiler.</p>
<h5>conditional folding</h5>
<p>The compiler optimizes out conditional branches when it can deduce the outcome of the conditional at compile time:</p>
<p><code></p>
<pre>
[ 1 0 =  [ "do if true" ] [ "do if false" ] if ] dataflow optimize print-dataflow
<em>=> [ &#8220;do if false&#8221; ]</em>
</pre>
<p></code></p>
<p>1 isn&#8217;t equal to 0 so it optimizes this whole block into the contents of the false quotation.<br />
This is a simple example, but it turns out to be really cool in performance sensitive code (e.g. tight loops) because you can use a generic library function whose behaviour depends on a conditional, specialize it with a hardcoded &#8216;f&#8217; and the compiler will optimize the conditional branch right out of the resulting code. You get the elegance of the generic combinator with the speed of a hand coded loop. </p>
<h3>Stage 3: Machine Code Generation</h3>
<p>Code generation is implemented by the &#8216;generate&#8217; word. This iterates through the nodes calling &#8216;generate-node&#8217; on each. </p>
<p><code></p>
<pre>
: generate-nodes ( node -- )
    [ node@ generate-node ] iterate-nodes end-basic-block ;

: generate ( node word label -- )
    [
        init-generate-nodes
        [ generate-nodes ] with-node-iterator
    ] with-generator ;
</pre>
<p></code></p>
<p>&#8216;generate-node&#8217; is a generic word with specialized implementations for each type of dataflow node. </p>
<p>As described in <a href="http://factor-language.blogspot.com/2006/04/look-at-new-compiler-design.html">this post from Slava&#8217;s excellent Factor blog</a> there are a number of dataflow node types, the important ones being:</p>
<blockquote><p>
    * #push - push literals on the data stack<br />
    * #shuffle - permute the elements of the data or call stack<br />
    * #call - call a word<br />
    * #label - an inlined recursive block (loop, etc)<br />
    * #if - conditional with two child nodes<br />
    * #dispatch - jump table with multiple nodes; jumps to the node indexed by a number on the data stack
</p></blockquote>
<p>The generate-node implementations invoke lower level words in the &#8216;architecture&#8217; vocabulary, which in turn are generic words that write out small pieces of machine code specialized for each CPU architecture. </p>
<p>The machine code generation code is particularly cool and easy to follow because factor has an assembler DSL for each cpu architecture it supports. The assembler words match the commands and registers of the target cpu architecture and evaluate to their corresponding machine code.</p>
<p>You can even try this out in the REPL using &#8216;make&#8217; to collect the results into an array. I&#8217;m on x86 so I load the cpu.x86.assembler vocabulary:</p>
<p><code></p>
<pre>
USE: cpu.x86.assembler

[ EAX 35 MOV ] { } make .   ! postfix assembler evaluates to machine code!

<em>=> { 184 35 0 0 0 }</em>
</pre>
<p></code></p>
<p>The assembler DSLs make code generation easy to follow because you can see the assembler in the generation code and then check it against the disassembled machine code using the &#8216;disassemble&#8217; word we used at the start of the post. When following the code it helps to know which registers are used for what purpose. I found this information in assembler files in the factor VM source - I&#8217;m an x86 so for me the declares are in the &#8216;cpu-x86.32.S&#8217; file:</p>
<p><code></p>
<pre>
#define ARG0 %eax
#define ARG1 %edx
#define XT_REG %ecx
#define STACK_REG %esp
#define DS_REG %esi
#define RETURN_REG %eax

#define CELL_SIZE 4

#define PUSH_NONVOLATILE 
	push %ebx ; 
	push %ebp

#define POP_NONVOLATILE 
	pop %ebp ; 
	pop %ebx

register CELL ds asm("esi");
register CELL rs asm("edi");
</pre>
<p></code></p>
<p>So lets check this against some generated code for a really basic word that just pops the number 42 on the stack:</p>
<p><code></p>
<pre>
: myfunc 42 ;
 myfunc disassemble
<em>=>
Using host libthread_db library &#8220;/lib/tls/i686/cmov/libthread_db.so.1&#8243;.
[Thread debugging using libthread_db enabled]
[New Thread -1213696320 (LWP 32499)]
0xffffe410 in __kernel_vsyscall ()
Dump of assembler code from 0xb136b230 to 0xb136b247:
0xb136b230: mov    $0xb136b230,%ecx
0xb136b235: mov    $0&#215;150,%ebx
0xb136b23a: mov    %ebx,0&#215;4(%esi)
0xb136b23d: add    $0&#215;4,%esi
0xb136b240: ret </em>
</pre>
<p></code></p>
<p>The first assembler line puts the address of the word into the XT_REG, which is %ecx. For some reason the start of each function puts it&#8217;s address into this register - not quite sure why.<br />
The second line puts the number 42 into the ebx register. Note that factor uses the first 3 bits of a value (&#8217;cell&#8217;) to store its type (called a tag - see layouts.h). In this case it&#8217;s a fixnum which is 000. 42 shifted left 3 bits is 336, which in hex is 0&#215;150.<br />
The third line puts the number onto the stack, and the forth updates the stack pointer to point to the new top of the stack.</p>
<h4>code generation optimizations</h4>
<p>Factor has another couple of tricks up its sleeve during the code generation stages:</p>
<h5>optimizing shuffle words</h5>
<p>The first is that stack shuffle words (e.g. dup, swap, tuck etc..) don&#8217;t get translated into machine code. Instead the compiler has a compile time &#8216;phantom stack&#8217; which records the positions of items in the stack. When it generates the machine code values are accessed from the stack out of order (the runtime stack is after all a random access piece of memory). This makes stack shuffling words and the retain stack effectively &#8216;free&#8217; within a code block. A #merge node in the dataflow signifies a code boundary (usually before a subroutine call) which causes the compiler to output instructions which synchronise the physical runtime stack with its phantom stack.</p>
<h5>word intrinsics</h5>
<p>The second trick is word &#8216;intrinsics&#8217;. Word intrinsics are essentially blocks of open-coded assembler that are output in place of a subroutine call. They are associated with the word via a &#8216;word-property&#8217;, which is a nifty feature of factor that allows meta information to be attached to each word. For example the &#8216;fixnum+fast&#8217; word has intrinsics which you can see using &#8216;word-prop&#8217;:</p>
<p><code></p>
<pre>
 fixnum+fast "intrinsics" word-prop pprint
<em>=>
{
    {
        [ &#8220;x&#8221; operand &#8220;y&#8221; operand ADD ]
        H{
            { +output+ { &#8220;x&#8221; } }
            { +input+ { { f &#8220;x&#8221; } { [ small-tagged? ] &#8220;y&#8221; } } }
        }
    }
    {
        [ &#8220;x&#8221; operand &#8220;y&#8221; operand ADD ]
        H{
            { +output+ { &#8220;x&#8221; } }
            { +input+ { { f &#8220;x&#8221; } { f &#8220;y&#8221; } } }
        }
    }
}</em>
</pre>
<p></code></p>
<p>This tells the compiler to inline the x86 ADD instruction instead of making a subroutine call to the implementation of fixnum+fast. You can add assembler intrinsics to existing words with &#8216;define-intrinsics&#8217;; Here&#8217;s a description from the help for the define-intrinsics word:</p>
<blockquote><p>
Defines a set of assembly intrinsics for the word. When a call to the word is being compiled, each intrinsic is tested in turn; the first applicable one will be called to generate machine code. If no suitable intrinsic is found, a simple call to the word is compiled instead.
</p></blockquote>
<p>What I particularly like about this feature is that it neatly provides the ability to specialize a highly optimized implementation for a particular hardware set, and then fall back gracefully on other architectures.</p>
<p>&#8211;</p>
<p>That concludes my ad-hoc tour of the factor compiler. I&#8217;ve skipped over a number of things and no doubt there are bits I haven&#8217;t discovered yet and some inaccuracies, but I hope I&#8217;ve supplied enough information to spark interest in this excellent compiler. </p>
<p>As I mentioned in a previous post I got interested in factor as a direct result of tinkering with <a href="http://annexia.org/forth">jonesforth</a>, which takes you through the entire forth bootstrap process starting with raw assembly. I&#8217;ve been delighted to find that factor retains a lot of the &#8216;right-down-to-the-metal&#8217; accessibility of its low level cousin.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/04/08/digging-into-factors-compiler/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Belkin F5D7132 wireless repeater working!</title>
		<link>http://www.phildawes.net/blog/2008/03/25/belkin-f5d7132-wireless-repeater-working/</link>
		<comments>http://www.phildawes.net/blog/2008/03/25/belkin-f5d7132-wireless-repeater-working/#comments</comments>
		<pubDate>Tue, 25 Mar 2008 12:50:20 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/03/25/belkin-f5d7132-wireless-repeater-working/</guid>
		<description><![CDATA[I finally got my F5D7132 wireless repeater working. The trick was to ignore all the auto-negotiate &#8216;one push setup&#8217; rubbish and actually connect to the web interface of the device.
I don&#8217;t use windows, so the installation CD was no good to me, but I found some docs from Belkin that the NIC for the device [...]]]></description>
			<content:encoded><![CDATA[<p>I finally got my F5D7132 wireless repeater working. The trick was to ignore all the auto-negotiate &#8216;one push setup&#8217; rubbish and actually connect to the web interface of the device.<br />
I don&#8217;t use windows, so the installation CD was no good to me, but I found some docs from <a href="http://www.belkin.com/support/download/downloaddetails.asp?download=2022&#038;lang=1">Belkin</a> that the NIC for the device is on IP 192.168.2.254 by default. The trick was to connect my laptop via ethernet to the repeater and configure the laptop nic to be another address on that range<br />
<code><br />
ifconfig eth0 192.168.2.2<br />
</code></p>
<p>Then I was able to point my browser at http://192.168.2.254/ to configure the repeater, adding the right SID and connection details and manually finding the right wireless network - Yay!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/03/25/belkin-f5d7132-wireless-repeater-working/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Roasting coffee on the cheap</title>
		<link>http://www.phildawes.net/blog/2008/03/20/roasting-coffee-on-the-cheap/</link>
		<comments>http://www.phildawes.net/blog/2008/03/20/roasting-coffee-on-the-cheap/#comments</comments>
		<pubDate>Thu, 20 Mar 2008 10:02:25 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[workfriendly]]></category>

		<category><![CDATA[coffee]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/03/20/roasting-coffee-on-the-cheap/</guid>
		<description><![CDATA[Motivated by Tom Moertel&#8217;s &#8216;A coders guide to coffee&#8217;, I&#8217;ve been experimenting with roasting my own coffee on the cheap. Here&#8217;s my equipment bought to date:
- Bodum 5679 C-Mill Electric Coffee Grinder
- Rival popcorn popper (cost me a fiver from ebay).
- Aeropress brewer
- Digital oven Thermometer 100169 E19 
N.B. I didn&#8217;t start off roasting: a [...]]]></description>
			<content:encoded><![CDATA[<p>Motivated by Tom Moertel&#8217;s <a href="http://blog.moertel.com/pages/coders-guide-to-coffee">&#8216;A coders guide to coffee&#8217;</a>, I&#8217;ve been experimenting with roasting my own coffee on the cheap. Here&#8217;s my equipment bought to date:</p>
<p>- Bodum 5679 C-Mill Electric Coffee Grinder<br />
- Rival popcorn popper (cost me a fiver from ebay).<br />
- <a href="http://www.sweetmarias.com/aeropress_instructions.html">Aeropress</a> brewer<br />
- <a href="http://www.thermometersdirect.co.uk/acatalog/Thermometers_Direct__Oven_Thermometers_12.html">Digital oven Thermometer</a> 100169 E19 </p>
<p>N.B. I didn&#8217;t start off roasting: a couple of months ago I bought a grinder and started ordering roasted beans from hasbean. If you live in the UK then <a href="http://www.hasbean.co.uk/">HasBean</a> comes highly recommended - the coffee has usually been roasted on the morning of dispatch and the beans arrive through the door the next day in a vacuum sealed bag.</p>
<p>I happily made <a href="http://www.bluecoffeecafe.com/home.php?page_id=10">filter coffee</a> for a couple of weeks before Jay at work encouraged me to get an <a href="http://www.aerobie.com/Products/aeropress_story.htm">Aeropress</a>. This is a cheap device that allows you to force hot water through the grinds under pressure to create shots. It&#8217;s not the same as a 500 quid espresso maker but I&#8217;ve been really happy with the quality of brew I get from this contraption; I don&#8217;t think you can get a better cup of coffee for the price and it&#8217;s a lot better than the filter in my mind.</p>
<p>Anyway, it was only a matter of time before the lure of subverting cheap consumer appliances to roast coffee proved too tempting. I found a wealth of info on the net about roasting coffee in popcorn poppers and bought a cheap popper for a fiver off of ebay. The best guides I found were:</p>
<ul>
<li><a href="http://www.thedomesticbarista.com/index.php/Roast_coffee_at_home_with_a_popcorn_popper">Roast coffee at home with a popcorn_popper</a></li>
<li><a href="http://www.edwardspiegel.org/coffee/roastingwithpoppers.htm">roasting with poppers</a></li>
</ul>
<p>There&#8217;s also a <a href="http://www.engadget.com/2006/02/28/how-to-make-a-popcorn-popper-coffee-roaster/">guide to modding your popper</a> on engadget, although I&#8217;m not convinced this is strictly necessary. <a href="http://www.edwardspiegel.org/coffee/roastingwithpoppers.htm">Ed Spiegel&#8217;s guide</a> has numerous tips to change the roasting speed and profile without modifying the innards simply by varying bean mass, the tilt of the popper, using an extension cable (which adds resistance and slows the roast) and stirring the beans.</p>
<p>Also the <a href="http://www.sweetmarias.com/">Sweet maria&#8217;s</a> site has loads of material on coffee roasting and brewing in general. In particular I found this <a href="http://www.sweetmarias.com/roasting-VisualGuideV2.html">colour chart</a> really helpful. If you live in the US then consider getting your beans from them as people rave about them all over the internet.</p>
<p>I&#8217;m certainly no coffee expert, but I really recommend trying fresh coffee you&#8217;ve ground yourself. I understand that <a href="http://coffeegeek.com/forums/espresso/questions/321174">ground coffee goes stale in a matter of hours</a>, so pretty much anything ground you buy from a supermarket is already well off by the time you brew it regardless of the vacuum packaging it comes in. Roasted whole beans go stale in a couple of weeks apparently, so it&#8217;s best to order them from a supplier that indicates the date of roast. In the UK this means getting them from the <a href="http://www.hasbean.co.uk/">internet</a>. If you need more persuading, check out &#8216;<a href="http://blog.moertel.com/pages/coders-guide-to-coffee">A coders guide to coffee</a>&#8216; for some top quality coffee advocacy.</p>
<p>At this stage the benefit of home roasting for me is mainly that I can roast what I need for the next few days and since we only drink a couple of cups a day that&#8217;s not very much. Also the green beans are <a href="http://www.hasbean.co.uk/index.php?cPath=45_30">pretty cheap</a> even for the gourmet stuff and keep well for years. As I get better at roasting I&#8217;m hoping I&#8217;ll develop the technique and a palette that&#8217;ll give me pleasure experimenting with varieties and varying roasts.<br />
I&#8217;m not at that level of precision yet, but I&#8217;m certainly getting better tasting coffee than I pick up at starbucks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/03/20/roasting-coffee-on-the-cheap/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Beginning Factor is like programming assembler</title>
		<link>http://www.phildawes.net/blog/2008/03/12/beginning-factor-is-like-programming-assembler/</link>
		<comments>http://www.phildawes.net/blog/2008/03/12/beginning-factor-is-like-programming-assembler/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 20:42:43 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[factor]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/03/12/beginning-factor-is-like-programming-assembler/</guid>
		<description><![CDATA[Yesterday I was musing over why it took me so much longer to get going with Factor compared to the other languages I&#8217;ve learnt over the years. I came up with this:
Factor&#8217;s base language is very fundamental and primitive in nature. Programming it is similar to programming with assembler language: you have to keep track [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I was musing over why it took me so much longer to get going with <a href="http://factorcode.org/">Factor</a> compared to the other languages I&#8217;ve learnt over the years. I came up with this:</p>
<p>Factor&#8217;s base language is very fundamental and primitive in nature. Programming it is similar to programming with assembler language: you have to keep track of the computation machinery in your head.</p>
<p>Where factor differs from assembler is that careful combination of primitives allows the level of abstraction to be ramped right up. This means that the real productivity action happens not in the base language but in the additional language components built on it in the extensive packaged libraries. Unfortunately until you&#8217;re familiar with these libraries you&#8217;re stuck programming at a nut-and-bolts level of abstraction.</p>
<p>Contrast this to your typical applicative language, maybe python or ruby: here the core language includes some useful abstractions which allow you to be relatively productive without having up-front intimate knowledge of the libraries. A couple of hours learning the ideas and syntax is enough to get you up and running and feeling like you&#8217;re progressing.</p>
<p>So the result is: Factor&#8217;s initial learning curve is brutally steep (unless maybe if you&#8217;re already proficient in stack languages). Getting to productivity takes time and study, and unfortunately probably more than you&#8217;re prepared to give up front if you&#8217;re just checking out the language.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/03/12/beginning-factor-is-like-programming-assembler/feed/</wfw:commentRss>
		</item>
		<item>
		<title>SSH connection multiplexing!</title>
		<link>http://www.phildawes.net/blog/2008/03/12/ssh-connection-multiplexing/</link>
		<comments>http://www.phildawes.net/blog/2008/03/12/ssh-connection-multiplexing/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 20:28:25 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[workfriendly]]></category>

		<category><![CDATA[ssh]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/03/12/ssh-connection-multiplexing/</guid>
		<description><![CDATA[Uncovered an awesome ssh trick today. At work we use ssh extensively to run stuff on remote unix and windows servers, using SSH agents to handle batch job authentication. That&#8217;s all sweet because on unix you don&#8217;t need to install client software on each box, and you don&#8217;t need a root account - just copy [...]]]></description>
			<content:encoded><![CDATA[<p>Uncovered an awesome ssh trick today. At work we use ssh extensively to run stuff on remote unix and windows servers, using SSH agents to handle batch job authentication. That&#8217;s all sweet because on unix you don&#8217;t need to install client software on each box, and you don&#8217;t need a root account - just copy a public key and you&#8217;re good to go. In a big multi-national company not installing stuff means less bureaucracy, and that counts for a lot.</p>
<p>Now unbeknown to me it appears that on top of all that good stuff you can also <a href="http://blog.johnjosephbachir.org/2006/11/19/multiplex-several-ssh-sessions-over-a-single-tcp-connection/">multiplex ssh sessions over existing connections</a> using just vanilla openssh 4. This is tres-bien-cool for a couple of reasons:</p>
<p>1) Connection times are massively reduced. Like a few milliseconds rather than a couple of seconds.</p>
<p>2) Lots of connections to the same account only consume a single tcp connection and don&#8217;t load the cpu with handshakes.</p>
<p>OMG! Surely this makes ssh the killer app for a whole slew of things: monitoring remote processes, interactive remote applications, infact anything that you&#8217;d previously have installed client software for. Now why isn&#8217;t there a tool industry built around this awesome protocol?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/03/12/ssh-connection-multiplexing/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Introduction to Garbage Collection</title>
		<link>http://www.phildawes.net/blog/2008/03/10/introduction-to-garbage-collection/</link>
		<comments>http://www.phildawes.net/blog/2008/03/10/introduction-to-garbage-collection/#comments</comments>
		<pubDate>Mon, 10 Mar 2008 15:54:27 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[workfriendly]]></category>

		<category><![CDATA[programming]]></category>

		<category><![CDATA[factor]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/03/10/introduction-to-garbage-collection/</guid>
		<description><![CDATA[Dan Ehrenberg completes a second short post on garbage collection, expanding on his excellent &#8216;Quick intro to garbage collection&#8216; post. Although Dan&#8217;s focus is on research prior to implementing a better collector for the factor language, the lucid explanations and chatty style make this a must-read for anybody with a passing interest in language runtime [...]]]></description>
			<content:encoded><![CDATA[<p>Dan Ehrenberg completes a <a href="http://useless-factor.blogspot.com/2008/03/little-more-about-garbage-collection.html">second short post on garbage collection</a>, expanding on his excellent &#8216;<a href="http://useless-factor.blogspot.com/2008/02/quick-intro-to-garbage-collection.html">Quick intro to garbage collection</a>&#8216; post. Although Dan&#8217;s focus is on research prior to implementing a better collector for the factor language, the lucid explanations and chatty style make this a must-read for anybody with a passing interest in language runtime systems.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/03/10/introduction-to-garbage-collection/feed/</wfw:commentRss>
		</item>
		<item>
		<title>whitebox unit tests slow you down</title>
		<link>http://www.phildawes.net/blog/2008/03/06/whitebox-unit-tests-slow-you-down/</link>
		<comments>http://www.phildawes.net/blog/2008/03/06/whitebox-unit-tests-slow-you-down/#comments</comments>
		<pubDate>Thu, 06 Mar 2008 12:30:24 +0000</pubDate>
		<dc:creator>Phil Dawes</dc:creator>
		
		<category><![CDATA[workfriendly]]></category>

		<category><![CDATA[programming]]></category>

		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.phildawes.net/blog/2008/03/06/whitebox-unit-tests-slow-you-down/</guid>
		<description><![CDATA[Is it just me or do whitebox unit tests really bog you down? 
I do pretty much all my coding in a test-first stylee; it&#8217;s the only way to code if you&#8217;re snatching 20mins here and there for spare time projects. Much of the time these tests serve as scaffolding to keep me on the [...]]]></description>
			<content:encoded><![CDATA[<p>Is it just me or do whitebox unit tests really bog you down? </p>
<p>I do pretty much all my coding in a test-first stylee; it&#8217;s the only way to code if you&#8217;re <a href="http://www.phildawes.net/blog/2007/07/23/coding-when-youre-tired-and-unmotivated/">snatching 20mins here and there for spare time projects</a>. Much of the time these tests serve as scaffolding to keep me on the straight and narrow while I bootstrap up some functionality. Unfortunately after they&#8217;ve served this purpose they just sit there like a ball and chain round my leg slowing any future change in direction.</p>
<p>These days I&#8217;ve got into the habit of converting these tests into more stable blackbox functional tests once there&#8217;s enough actual functionality to support it. Or I just delete them. Life&#8217;s too short to be worrying about breaking brittle old tests.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phildawes.net/blog/2008/03/06/whitebox-unit-tests-slow-you-down/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
