<?xml version="1.0" encoding="utf-8"?><!-- generator="wordpress/2.1.1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Suffix array performance problems</title>
	<link>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/</link>
	<description>Mostly programming with a few bits of other stuff</description>
	<pubDate>Fri, 21 Nov 2008 01:23:52 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.1.1</generator>

	<item>
		<title>By: Phil Dawes</title>
		<link>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/#comment-252</link>
		<author>Phil Dawes</author>
		<pubDate>Sat, 01 Jan 2005 12:37:18 +0000</pubDate>
		<guid>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/#comment-252</guid>
					<description>Cool - think I've cracked it. The solution was to remove the original suffix and links tables, and replace them with a single suffixes table that just maps char(4) suffixes to literals.
This reduces the accuracy of the suffix match (meaning that substring searches &gt; 4 chars need to join and filter with the nodes table (containing the literals). The width of the suffix field could be increased if this becomes a bottleneck, trading off more space for speed.

Have got the query down to 30ms on my laptop (down from 82.61 seconds on the original scheme). Other queries are looking good as well (mostly in the 10-30ms mark). Should fly on the 10GB 8 proc box at work :-)

The reason this is so much faster is that the mysql query optimiser gets a better view of the number of literals each substring will match, and so correctly identifies the best match order. Also, by indexing the new suffix table both ways (suffix-literal and literal-suffix), the query analyser can chose to impose other constraints in between substring filtering 
(e.g. reducing the literal matches to those that are rdfs:labels).

Now I just need to update the veudastore code to use the new scheme...</description>
		<content:encoded><![CDATA[<p>Cool - think I&#8217;ve cracked it. The solution was to remove the original suffix and links tables, and replace them with a single suffixes table that just maps char(4) suffixes to literals.<br />
This reduces the accuracy of the suffix match (meaning that substring searches > 4 chars need to join and filter with the nodes table (containing the literals). The width of the suffix field could be increased if this becomes a bottleneck, trading off more space for speed.</p>
<p>Have got the query down to 30ms on my laptop (down from 82.61 seconds on the original scheme). Other queries are looking good as well (mostly in the 10-30ms mark). Should fly on the 10GB 8 proc box at work <img src='http://phildawes.net/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>The reason this is so much faster is that the mysql query optimiser gets a better view of the number of literals each substring will match, and so correctly identifies the best match order. Also, by indexing the new suffix table both ways (suffix-literal and literal-suffix), the query analyser can chose to impose other constraints in between substring filtering<br />
(e.g. reducing the literal matches to those that are rdfs:labels).</p>
<p>Now I just need to update the veudastore code to use the new scheme&#8230;</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Phil Dawes</title>
		<link>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/#comment-256</link>
		<author>Phil Dawes</author>
		<pubDate>Sat, 01 Jan 2005 21:32:59 +0000</pubDate>
		<guid>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/#comment-256</guid>
					<description>Have implemented the changes in veudastore - released &lt;a href="http://www.sf.net/projects/veudas"&gt;version 0.3&lt;/a&gt;. 
To use this with veudas, replace the cgi-bin/veudas-0.6/veudastore directory with this release</description>
		<content:encoded><![CDATA[<p>Have implemented the changes in veudastore - released <a href="http://www.sf.net/projects/veudas">version 0.3</a>.<br />
To use this with veudas, replace the cgi-bin/veudas-0.6/veudastore directory with this release</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Ed</title>
		<link>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/#comment-567</link>
		<author>Ed</author>
		<pubDate>Thu, 10 Feb 2005 21:38:12 +0000</pubDate>
		<guid>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/#comment-567</guid>
					<description>Hi !

Looking for a suffix index that would tell me what programs open files with what suffix.  Such that it would tell me what program created on what programs would open files say with .jpg suffix.

Probably totally in the wrong place, but could you help?

Ed.</description>
		<content:encoded><![CDATA[<p>Hi !</p>
<p>Looking for a suffix index that would tell me what programs open files with what suffix.  Such that it would tell me what program created on what programs would open files say with .jpg suffix.</p>
<p>Probably totally in the wrong place, but could you help?</p>
<p>Ed.</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: HoangThanh</title>
		<link>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/#comment-11744</link>
		<author>HoangThanh</author>
		<pubDate>Sat, 14 Oct 2006 01:58:36 +0000</pubDate>
		<guid>http://phildawes.net/blog/2005/01/01/suffix-array-performance-problems/#comment-11744</guid>
					<description>i'm HMT</description>
		<content:encoded><![CDATA[<p>i&#8217;m HMT</p>
]]></content:encoded>
				</item>
</channel>
</rss>
