Feed on
Posts
Comments

Ian Davies has been discussing the complexity of RDF and considering the possibility of an RDF Lite. Danny Ayers also picked it up here.

Readers of this blog will already know that I struggled with teaching RDF’s complexity when attempting to promote and deploy it at work. This prompted my research and creation of a simpler model at home in my spare time - tagtriples.

Tagtriples is an attempt to create a structured metadata format and model with similar properties to RDF (e.g. encoding graphs, trivially mergeable/aggregatable), but made much simpler by allowing any symbol to be an identifier - not just URIs.

This ‘freedom’ of identifier symbols begs the question - without the precise framework of URIs or namespaces to handle identity*, how do you describe or refer to something with any degree of precision?

Well, tagtriples enables precise identification by inforcing a simple rule: If you use a symbol in a ‘graph’ (e.g. in a document), all other occurances of the symbol in the graph are assumed to be denoting the same thing. This allows you to describe things, and thus facilitates ‘identity by description’, which I think is the key to solving the identity problem on the web in a scalable, multi-context-compatible manner.

So to give the Dublin Core example, a document creator could be described in tagtriples using just his/her name:

http://www.w3.org/Home/Lassila creator "Ora Lassila"

But you could also elaborate on what you mean by “Ora Lassila” by adding more statements:

http://www.w3.org/Home/Lassila creator "Ora Lassila"
"Ora Lassila" tag Person
"Ora Lassila" name "Ora Lassila"[2]
"Ora Lassila" mbox ora@example.com

(I’ve suffixed the name ‘Ora Lassila’ with [2], because we’ve already used the symbol once in the graph to denote the person ‘Ora Lassila’, and now we’re using it to denote the name. This is how tagtriples allows multiple things with the same symbol).

Now JAM*VAT (opensource tagtriples aggregator) can extract tagtriples from any XML using a bunch of simple heuristics. The above graphs could be encoded as:

<document>
  <url>http://www.w3.org/Home/Lassila</url>
  <creator>Ora Lassila</creator>
</document>

..which yields the statements in the first document (and a few others), or

<document>
  <url>http://www.w3.org/Home/Lassila</url>
  <creator>
     <Person>
         <name>Ora Lassila</name>
         <mbox>ora@example.com</mbox>
     </Person>
  </creator>
</document>

..which yields the second.

This is very powerful because both XML documents will yield the same result for the following tagtriples query:

select ?person
where (http://www.w3.org/Home/Lassila creator ?person)

thus neatly solving the problem Ian Davies is having with RDF.

* Actually I’ve found that forcing a high level of precision onto metadata producers has its own problems - identity is tightly bound up with context, and as the context varies, so does the identity. But that’s another post entirely.

Viewing 7 Comments

    • ^
    • v
    This description capability even allows JAM*VAT to import RDF with no loss of information (i.e. you could export it back into RDF if required). It does this by tagging each human-readable symbol with the URI used in the RDF.

    E.g. the URI:<http://phildawes.net/phil> gets translated into the symbol 'phil', and the statement:


    phil tag http://phildawes.net/phil


    (This tags the symbol 'phil' with the URI). The JAM*VAT aggregator uses the 'tag' property to help manage identity between graphs. 'Tag' in JAM*VAT fulfils a similar role to a tag in del.icio.us - i.e. another symbol that can be used to categorise the target.
    • ^
    • v
    URI are not appropriate for the Semantic Web, Tim Berner Lee wrote that himself back in 2001:

    http://www.w3.org/DesignIssues/HTTP-URI.html

    section 2.5 - Extra info with URI:

    "Effectively, the URI scheme has now failed to identify anything by itself."

    This issue and others have since then not been addressed, although context was forced in RDF store implementation and semantic patched on RDF/XML with Named Graphs.

    As a data model for the Semantic Web, RDF triple is just broken, it is too simple.

    Have a look a:

    http://laurentszyster.be/blog/public-names/

    and

    http://laurentszyster.be/blog/public-rdf/

    then tell me what you think.
    • ^
    • v
    Hi Laurent,

    I've read your pages, but I'm afraid I'm not sure if I understand what you are getting at.

    Am I correct in thinking that you want to replace a URI with a string of associated words. Systems then disambiguate the meaning via the connected set of words?

    e.g. 5:apple,8:computer ?

    Also, why bother with the netstring numbers? - for the sake of simplicity, why not (apple,computer) or something?
    • ^
    • v
    "Also, why bother with the netstring numbers? - for the sake of simplicity, why not (apple,computer) or something?"

    Because articulated text is not made of sequence of byte strings, but like you understood as *sets* of text. For instance, the sentence:

    Steve Jobs is the creator of the Apple Computer

    may be articulated (considering "the", "is" and "of" as
    articulators and with CRLF added for readability) as the Public Name:

    15:
    5:Steve,
    4:Jobs,
    ,
    19:
    5:Apple,
    8:Computer,
    ,
    7:creator,

    Sorted sequence of netstring effectively represent the three sets of text:

    ((Steve, Jobs) creator (Apple, Computer))

    and actually preserve the semantic between those sets expressed by the original text articulation.

    Public Names have many other interesting properties: they can be used as URI, can be used to build fast indexes and can encode any 8-bit byte strings.
    • ^
    • v
    Perhaps an appropriate, and concrete subset of the SW and open systems should be outlined. A closed system (which controls the production of identification - by URI or any alternative means- as well as content), which does *not* automatically apply entailment rules (not even the 'simple' entailment rules as defined in RDF-MT) could be such a subset. Then finally, simplify the model to consist only of the following parts:

    Graph
    Context (or collection of statements)
    Statement
    Identifiers (URIs)
    Literals

    voila, you have RDF-Lite (clearly distinguished from the SW). Now the syntax for repsenting such a subset would also be much simpler, but an orthogonal issue to the underlying model.
    • ^
    • v
    I thinking about RDF and Topic Maps See also
    • ^
    • v
    When thinking about the general direction my (rather slow moving) Semantic Tagging stuff should head into, it became obvious pretty quickly that moving from facet-value-pairs to RDF-like subject-property-object-triples is the direction of choice. Vapou…
close Reblog this comment
blog comments powered by Disqus