<?xml version="1.0" encoding="utf-8"?><!-- generator="wordpress/2.1.1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Indexes, Hashes &#038; Compression</title>
	<link>http://phildawes.net/blog/2007/07/26/indexes-hashes-compression/</link>
	<description>Mostly programming with a few bits of other stuff</description>
	<pubDate>Tue, 06 Jan 2009 19:22:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.1.1</generator>

	<item>
		<title>By: pigalle</title>
		<link>http://phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-41758</link>
		<author>pigalle</author>
		<pubDate>Thu, 26 Jul 2007 21:39:18 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-41758</guid>
					<description>re: optimal storage / read efficiency - have you tried reiser4? it does a wonderful job of not wasting disk space. a 'du -k' inside a dir used roughly the same amount of total space as a n3 serialization of the same data. said ~30 mb of data took up 230 mb on ext3. and, about one in every 5 triples is a blog post / news story text where theres a 5K chunk of text - the difference would be even more absurd if not for that. also read back is much faster than your numbers would suggest - its nowhere near 10 ms per call. what kind of drive are you using a 423 mb thing you found in a discared PC on the street?

as for 'in memory' - the kernel disk cache is a great for 'in memory' - especially in the concurrency department - 10 mongrels can all benefit from it w/o a seperate memcached..

as for indexing - i havent thought about it much yet - my query engine takes about 0.1 seconds for a basic 'fetch the content, title, author, date, abstract of ___ resources sorted by ascending date'.. hopefully that can be shaved down once i learn some stuff, and your previous post is my jumping off point - thanks!

oh ya. wheres your source? mines http://whats-your.name/yard</description>
		<content:encoded><![CDATA[<p>re: optimal storage / read efficiency - have you tried reiser4? it does a wonderful job of not wasting disk space. a &#8216;du -k&#8217; inside a dir used roughly the same amount of total space as a n3 serialization of the same data. said ~30 mb of data took up 230 mb on ext3. and, about one in every 5 triples is a blog post / news story text where theres a 5K chunk of text - the difference would be even more absurd if not for that. also read back is much faster than your numbers would suggest - its nowhere near 10 ms per call. what kind of drive are you using a 423 mb thing you found in a discared PC on the street?</p>
<p>as for &#8216;in memory&#8217; - the kernel disk cache is a great for &#8216;in memory&#8217; - especially in the concurrency department - 10 mongrels can all benefit from it w/o a seperate memcached..</p>
<p>as for indexing - i havent thought about it much yet - my query engine takes about 0.1 seconds for a basic &#8216;fetch the content, title, author, date, abstract of ___ resources sorted by ascending date&#8217;.. hopefully that can be shaved down once i learn some stuff, and your previous post is my jumping off point - thanks!</p>
<p>oh ya. wheres your source? mines <a href="http://whats-your.name/yard" rel="nofollow">http://whats-your.name/yard</a></p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Seth Ladd</title>
		<link>http://phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-41765</link>
		<author>Seth Ladd</author>
		<pubDate>Fri, 27 Jul 2007 01:02:51 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-41765</guid>
					<description>I've had good experience storing my data in columns.  If I sort the data in each column, I'll get very good compression.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve had good experience storing my data in columns.  If I sort the data in each column, I&#8217;ll get very good compression.</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Phil Dawes</title>
		<link>http://phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-41786</link>
		<author>Phil Dawes</author>
		<pubDate>Fri, 27 Jul 2007 09:27:34 +0000</pubDate>
		<guid>http://phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-41786</guid>
					<description>@pigalle: thanks for the comments - I'll take a look at reiser4.
The ~10ms latency is for a disk seek not a read. What sort of timings are you getting?

@Seth: Cool - I'm planning on doing the same thing (have you read the research papers for cstore?).</description>
		<content:encoded><![CDATA[<p>@pigalle: thanks for the comments - I&#8217;ll take a look at reiser4.<br />
The ~10ms latency is for a disk seek not a read. What sort of timings are you getting?</p>
<p>@Seth: Cool - I&#8217;m planning on doing the same thing (have you read the research papers for cstore?).</p>
]]></content:encoded>
				</item>
</channel>
</rss>
