Searching arrays in X86 assembler with a bloom filter

I'm currently writing what amounts to an olap database in factor in my spare time, which is a pretty interesting project. Actually spare time is limited now that I have a child but factor is affording me a pretty good productivity compared to python and scheme so it's probably net-par compared to what I was doing a couple of years ago. I should probably write a bit more about that some time.

Anyway I recently made the bizarro discovery that in the world of factor for ease of deployment it makes more sense to write low level code in assembler rather than C. This is because factor effectively has a built in macro assembler DSL for each of the chipsets it supports, whereas shipping C code requires having a C compiler setup on the target machine which is a pita for windows platforms. Also X86 is fast becoming the only server processor that matters. Crazy world.

Ideally I'd not have to write any low-level code at all and I'd do it all in factor. To be honest factor is pretty bloody fast, and could conceivably compete well with raw assembler in the not-too-distant future given enough tweaking of the code generator. For the time being though C and assembler rule the performance roost in this environment.

Ok, so the interesting problem I have is:

I have a set of 32bit integers, and I want to search a large unsorted 32bit array accumulating the array indices of values matching those from the set. To solve this in factor I've been putting the input set in a hashtable and then sequentially looking up each element from the array, which algorithmically is pretty optimal O(n+m) but the ever present constant is pretty high. This takes about 4 seconds on a processor pegged at 800mhz for a 48MB array.

Ideally I want this performance down to under 100ms. This is going to be very tight because a 48mb file contains ~12M elements, so if I'm aiming for 100ms that means about 6 cycles per element on an 800mhz chip. Probably a bit too ambitious.

I'm thinking I'll create a bloom filter with a simple hashing function (like e.g. 'mod' ) and iterate the large array in assembler checking the filter for each element and then escaping back to factor to confirm and accumulate matches.

Incidentally, I've noticed that my motivation for writing about my programming ideas drops after I've implemented them, so this time I thought I'd experiment with using my blog to think through the idea before I attempt it. Can anybody see a better way to solve this problem given the constraints? I'll post again with the results...