Feed on
Posts
Comments

I’ve made some modifications to the table layout used by my experimental smushing store. In particular, I’ve ditched the hashes table, and lumped node hashing in with the literals table. This improves smushing performance as it reduces the number of tables to update (the current biggest overhead when smushing resources on ifp).

Here’s the new layout:

CREATE TABLE `triples` (
`graph` integer NOT NULL default ‘0′,
`subject` integer NOT NULL default ‘0′,
`predicate` integer NOT NULL default ‘0′,
`object` integer NOT NULL default ‘0′,
`literal` tinyint(1) NOT NULL default ‘0′,
`inferred` tinyint(1) NOT NULL default ‘0′,
UNIQUE KEY `spog` (`subject`,`predicate`,`object`,`graph`),
KEY `posg` (`predicate`,`object`,`subject`,`graph`),
KEY `ospg` (`object`,`subject`,`predicate`,`graph`),
KEY `gspo` (`graph`,`subject`,`predicate`,`object`),
KEY `gpos` (`graph`,`predicate`,`object`,`subject`),
KEY `gosp` (`graph`,`object`,`subject`,`predicate`)
) TYPE=MyISAM

CREATE TABLE nodes (
id integer NOT NULL AUTO_INCREMENT,
hash bigint(20) NOT NULL,
value text,
PRIMARY KEY (hash),
KEY (id)
) TYPE=MyISAM

CREATE TABLE graphs (
id integer NOT NULL,
UNIQUE KEY (id)
) TYPE=MyISAM

N.B. I added a lot more indexing to the triples table a while back. This pretty much copies the kowari indexing from Paul’s post in May. (note that he wrongly assumes that mysql doesnt support use of compound indexes in his post. I’m sure somebody must have corrected him by now).

The cool thing about this is that all the relational queries can now be done by mysql solely in the indexes (no need crossref with the data file), which speeds things up a bit. Haven’t done any profiling with really large datasets yet, but performs well with a couple of million statements or so.

Most queries (once optimised by the mysql query engine) boil down to:

  1. Translate the literals/uris into IDs via the nodes table (using an md5 hash to rapidly look up the values).
  2. Do the triple pattern joins on the triples table (all within the indexes)
  3. translate the result ids back into values (uris or literals) by joining back to the nodes table.

Ideally I would have got mysql to index the ‘value’ column in the nodes table directly, and lost the md5 hash column altogether. Unfortunately mysql doesnt appear to have hash index functionality for text/blobs. (It indexes the first n bytes of the field - this doesnt work very well with URIs, and means I can’t put a UNIQUE constraint on the index.)

Any comments/suggestions for improvement gratefully received

Viewing 1 Comment

    • ^
    • v
    Nope, nobody corrected me. :-) I tend to operate in a vacuum, and so a lot of the time I could just be plain wrong in my assertions.
    In some ways using MySQL would have been the way to go, since it provides all the transaction support, remote APIs, etc, that we had to build into Kowari. On the other hand, we could never get it to load data very quickly, even without extensive indexing. I've also found that joining a table onto itself numerous times starts to make most relational databases slow down (including MySQL). Tracing a transitive predicate is an extreme example of this.
    Basically It would be great to get all the MySQL features for free, but having a datastore that is purpose built for triples (or quads) seems to be the only way to get really high performance on both loads and queries. Depends on how big you need to scale I suppose.
    Out of curiosity, have you considered different models in your system? Are you doing this with different "databases", tables, or something else?
close Reblog this comment
blog comments powered by Disqus

generic acomplia purchase cialis overnight delivery cheap acomplia online buy generic clomid buy cialis low price viagra without prescription where to buy cialis lowest price levitra where to buy propecia cheap cialis from canada lasix no prescription viagra without rx cheap accutane tablets viagra online without prescription viagra no rx buying cialis online zithromax viagra in uk free cialis cialis us where to buy acomplia find cialis online buy viagra lowest price accutane prescription buy cheap accutane online cialis buy buy generic cialis online acomplia order propecia online lowest price synthroid synthroid without a prescription synthroid online buy propecia online cheap levitra online where to buy levitra cialis online review synthroid prices cialis generic cialis buy drug buy viagra on line viagra pharmacy cialis for order price of levitra zithromax online where to buy synthroid soma generic generic clomid propecia online stores viagra cheap drug cheap generic soma cialis cheap zithromax online cheap order accutane online purchase zithromax online purchase viagra online buy cheap clomid cheap generic propecia zithromax pharmacy online pharmacy cialis cheapest acomplia cost of cialis no prescription viagra free viagra purchase lasix online cialis from india viagra from india order discount cialis soma online stores find no rx cialis cialis no rx required find viagra without prescription approved cialis pharmacy lasix discount