Feed on
Posts
Comments

One of the most useful bits of veudas’ functionality is being able to search the knowledgebase for labels (and subproperties) containing a particular string. Unfortunately this is also the slowest operation.
Currently veudas’ ifp store does a table scan over the nodes table for regex searches - very slow (mysql doesnt support a general regex index).

I’m currently only handling a few million triples in my store, so I’m not ready to invest a large amount of time writing e.g. a suffix tree implementation. I do need a performance boost though, since some text searches take a number of seconds; Given this, I thought I’d compare database performance with flat-file searching using the Wordnet data (222439 literals). Here’s the comparison data between (1) mysql table scan and (2) grepping a flat file.

1) put all the literals in 1 table and do a regex table scan

select id from labelcache where value LIKE “%football%”
.
> 146 matches in 0.32 seconds

2) put all the literals in a file and do a grep and pull results into
python

mysql -D ifptest1 -u root -e ’select id,value from nodes where literal = “1″‘ > nodes

and then…

t = time.time()
f = os.popen(”grep -i football nodes”)
ids = [l[:l.index(’t')] for l in f]
print len(ids),”matches in”,time.time() - t,”seconds”
.
> 146 matches in 0.042 seconds

I found this quite surprising - grepping a file and piping to python is an order of magnitude faster than mysql doing a table scan(!)

The rest of the query (once the results have been reduced to ids) takes 0.02 seconds. It looks like grepping a file may be a quick and easy way of getting a ~x10 performance boost (at least in the short term).

close Reblog this comment
blog comments powered by Disqus