Feed on
Posts
Comments

Archive for the 'tagtriples' Category

I’m trying to work out if it’s possible to get the index searching performance I want using a disk-backed store, or whether I need to focus on optimising the indexes to fit totally in memory. The problem is that the optimisation strategies are somewhat different:
Storing indexes on disk:
- Increase redundancy, trading space for better locality […]

Read Full Post »

The new triplestore is coming along. It can do substring text searches (using a suffix array) and has a basic relational query engine. It doesn’t optimise the query plans yet, but if you enter the queries in a good order (most selective clauses first) then you get good performance.
A few things have changed in my […]

Read Full Post »

I wrote a bit about representing structured data in the last post. Here’s some ideas for how I plan to index the data.
Indexing graphs as subject ranges
In indexing triples I need to provide indexed lookups for all 6 of the possible triple query patterns:
s->po
sp->o
p->os
po->s
o->sp
os->p
(s=subject p=property/predicate o=object)
Most mature triplestores also index a 4th query element […]

Read Full Post »

Now that I’m up and running and starting to get productive on Gambit-C, I’ve turned my attention back to indexing structured data.
I’ve modified the tagtriples idea a bit to reflect my experience on importing data in other formats. I still think the most effective approach is not to try and define an interchange format, but […]

Read Full Post »

I tried to comment on Seth’s post, but I think the comments on his blog are a bit broken at the moment (the capcha question wasn’t rendering, so I couldn’t answer it!). I guess I’ll trackback instead:
The path from specificity to usefulness that Seth describes was exactly the trip I took attempting to implement semantic […]

Read Full Post »

I haven’t said anything much about semantic web stuff for a while as I’ve been occupied with other things. However Jim Hendler’s ‘Tales from the Dark Side’ piece in IEEE Intelligent Systems reawoke an old interest. In short: I still think the RDF people have got it wrong with URIs, and so far nobody’s convinced […]

Read Full Post »

I couldn’t find a way to comment on Benjamins post, so I’ve stuck it here:
What indexing are you using? My tagtriples store schema is basically a table with 4 ids which joins to an (ID,String) table. When it used to be an RDF store this held both literals and URIs.
I found the key to getting […]

Read Full Post »

microqueries

I’ve recently been experimenting with ways to provide simpler structured searching/querying to ‘normal’ web users (i.e. not techies). Sparql/SQL querying doesn’t cut it here - we need something simpler.
One approach I’ve been trying is allowing simple query constraints in with the text search facility. Using the proximity searching capability JAM*VAT then finds a collection of […]

Read Full Post »

Using a relational database as a triplestore backend has a number of advantages - one of which is leveraging features of the backend SQL support with very little effort.
I’ve recently added a whole bunch of functionality to ttql (the experimental query language that JAM*VAT uses for querying). These include:
SQL (mysql) numeric and string functions […]

Read Full Post »

Danny! Here’s another

Read Full Post »

It’s just occurred to me that I never posted about the proximity search capability that I built into JAM*VAT about 3 months ago.
It works by looking for symbols in close proximity. E.g. searching for ‘Danny Ayers Blog‘ yields an answer ‘raw’, even though the word isn’t in the search string. This is because the ‘raw’ […]

Read Full Post »

JAM*VAT is now mature enough that it handles relational operations over large amounts of aggregated structured data quickly and scalably, and also provides very fast regex text search operations (due to its inbuilt suffix array implementation).
However one area where it doesn’t perform very well is in handling dates and numbers. E.g if you aggregated 10000 […]

Read Full Post »

I’m quite excited about this release - it includes new POST functionality that accepts HTTP-POSTed content interpreted via mimetype. The upshot of which is that people can cut-n-paste xml chunks into JAM*VAT (which is a compelling way to demonstrate the technology).
You can try it via the online demo - click on the ‘Post Data’ link […]

Read Full Post »

With all the buzz around the possibility of an ‘RDF-Lite’, I feel compelled to list a few barriers that I think URIs raise for a new user trying to get to grips with RDF metadata creation.
Here they are, in no particular order:
(1) URIs don’t allow you to use existing identity schemes.
Apart from existing web […]

Read Full Post »

In the last post I mentioned importing XML into the JAM*VAT tagtriples store. One of JAM*VATs main features is that it can translate *any* XML into a tagtriples representation using some simple heuristics. I thought I’d better elaborate on this, especially as I haven’t documented the heuristics anywhere.
Before starting, I ought to point out that […]

Read Full Post »

Ian Davies has been discussing the complexity of RDF and considering the possibility of an RDF Lite. Danny Ayers also picked it up here.
Readers of this blog will already know that I struggled with teaching RDF’s complexity when attempting to promote and deploy it at work. This prompted my research and creation of a simpler […]

Read Full Post »

I’ve just put up v0.7.5 of JAM*VAT. This improves the RDF uri-to-symbol heuristics and adds some stuff that we needed at work - mainly arithmetic comparisons in structured queries. Have also put the new release on the demo site.

Read Full Post »

I hardly got any response to the launch of my JAM*VAT structured aggregator tool. Either that means that nobody’s got a use for it, or they just don’t understand what does. I’m hoping it’s the latter, so I thought I’d post some things to try with the demo installation. Here’s a first stab:
Getting a […]

Read Full Post »

Added some more to the tagtriples site. The main change is a model and semantics page.
Please let me know if there’s any glaring holes or mistakes!

Read Full Post »

This is the first BETA release of JAM*VAT - my opensource structured data aggregator software. With it you can import information from various structured-data formats including xml, rss, atom, rdf and csv, and then browse and query across the aggregated data.
There’s a demo of the software here (imports capped at 5000 triples) . Try importing […]

Read Full Post »

Next »