<?xml version="1.0" encoding="utf-8"?><!-- generator="wordpress/2.1.1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Indexing structured data (again)</title>
	<link>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/</link>
	<description>Mostly programming with a few bits of other stuff</description>
	<pubDate>Wed, 07 Jan 2009 03:24:07 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.1.1</generator>

	<item>
		<title>By: Nick Johnson</title>
		<link>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-30525</link>
		<author>Nick Johnson</author>
		<pubDate>Sun, 29 Apr 2007 21:05:33 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-30525</guid>
					<description>I'd consider using an n-ary structure for your indexes instead of just a sorted list: Sort your index, then take every nth item (sized so the number of items returned just fits in a single disk block), and create a node from them. Recurse for each subdivision thus created. This will give you _much_ better locality of reference when you access the index, so for an infrequently used index, you won't have to read nearly as much into memory.</description>
		<content:encoded><![CDATA[<p>I&#8217;d consider using an n-ary structure for your indexes instead of just a sorted list: Sort your index, then take every nth item (sized so the number of items returned just fits in a single disk block), and create a node from them. Recurse for each subdivision thus created. This will give you _much_ better locality of reference when you access the index, so for an infrequently used index, you won&#8217;t have to read nearly as much into memory.</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: carmen</title>
		<link>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-30726</link>
		<author>carmen</author>
		<pubDate>Mon, 30 Apr 2007 17:46:45 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-30726</guid>
					<description>are there any stores that don't support named-graphs/quads/contexts and instead focus on providing niceties for a single 'shared context' as you put it.

my biggest problem has been stuff like - how do you grab all the children of http://mysite/cars without either bypassing the RDF lib and doing "select * from resources where id like 'http://mysite.com/cars%'" since doing the same in SPARQL is just waaaay to slow, and doing a triple like:
  

is pretty insane. don't get me started on count/group_by etc..</description>
		<content:encoded><![CDATA[<p>are there any stores that don&#8217;t support named-graphs/quads/contexts and instead focus on providing niceties for a single &#8217;shared context&#8217; as you put it.</p>
<p>my biggest problem has been stuff like - how do you grab all the children of <a href="http://mysite/cars" rel="nofollow">http://mysite/cars</a> without either bypassing the RDF lib and doing &#8220;select * from resources where id like &#8216;http://mysite.com/cars%&#8217;&#8221; since doing the same in SPARQL is just waaaay to slow, and doing a triple like:</p>
<p>is pretty insane. don&#8217;t get me started on count/group_by etc..</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Phil Dawes&#8217; Stuff &#187; Blog Archive &#187; Some ideas for static triple indexing</title>
		<link>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-30728</link>
		<author>Phil Dawes&#8217; Stuff &#187; Blog Archive &#187; Some ideas for static triple indexing</author>
		<pubDate>Mon, 30 Apr 2007 20:54:31 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-30728</guid>
					<description>[...] wrote a bit about representing structured data in the last post. Here&#8217;s some ideas for how I plan to index the [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] wrote a bit about representing structured data in the last post. Here&#8217;s some ideas for how I plan to index the [&#8230;]</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Semantic Web Blog</title>
		<link>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-44428</link>
		<author>Semantic Web Blog</author>
		<pubDate>Mon, 20 Aug 2007 17:48:04 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-44428</guid>
					<description>[...] Indexing structured data (again) Now that I&#8217;m up and running and starting to get productive on Gambit-C, I&#8217;ve turned my attention back to indexing structured data. I&#8217;ve modified the tagtriples idea a bit to reflect my experience on importing data in other formats. I still think the most effective approach is not to try and &#8230; [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] Indexing structured data (again) Now that I&#8217;m up and running and starting to get productive on Gambit-C, I&#8217;ve turned my attention back to indexing structured data. I&#8217;ve modified the tagtriples idea a bit to reflect my experience on importing data in other formats. I still think the most effective approach is not to try and &#8230; [&#8230;]</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Jonathan</title>
		<link>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-56361</link>
		<author>Jonathan</author>
		<pubDate>Tue, 20 Nov 2007 17:46:12 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-56361</guid>
					<description>check out www.coppereye.com

already doing this...</description>
		<content:encoded><![CDATA[<p>check out <a href="http://www.coppereye.com" rel="nofollow">www.coppereye.com</a></p>
<p>already doing this&#8230;</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Laura Taylor</title>
		<link>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-74618</link>
		<author>Laura Taylor</author>
		<pubDate>Mon, 10 Mar 2008 14:52:37 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/04/29/indexing-structured-data-again/#comment-74618</guid>
					<description>... or SAND Technology. Their DNA Access product not only indexes every value, it also deduplicates, compacts and compresses - in some instances the resulting footprint is less than 2% of original data (so 100 Terabytes reduced to 2 Terabytes - but still queryable with SQL).</description>
		<content:encoded><![CDATA[<p>&#8230; or SAND Technology. Their DNA Access product not only indexes every value, it also deduplicates, compacts and compresses - in some instances the resulting footprint is less than 2% of original data (so 100 Terabytes reduced to 2 Terabytes - but still queryable with SQL).</p>
]]></content:encoded>
				</item>
</channel>
</rss>
