can we remove the need for ordered collections?

The tagtriples thing has also let me experiment with ways to get around some of the not-so-beautiful areas of the RDF spec. One of these areas is ordered collections.

It occurred to me, (well, actually it occured to Julian Bond and he mentioned it to me a while ago in a conversation about RSS after foaf-galway, but it re-occurred to me the other day) that the requirement for an hacky ordered collection construct could be reduced if the order of asserted statements in graphs were maintained (like they are in XML documents). I'm thinking of adding a column to the triples table so that query results can be sorted in the order they were asserted.

Disambiguating tags in tagtriples

The tag triples idea is going quite well. Have built a simple prototype to test the idea, and have tried populating it with data from work. It's interesting working with a sort of triples-soup rather than well structured RDF. Things link when you don't expect them to (because the tags match) - this is especially handy with literals. (e.g. 'Phil Dawes' literal matches other mentions of my name because everything is a tag in tag-triples)

The big problem is when you have different things with the same tags in localised context and you want to be precise about which one you mean. AFAICS this doesnt happen often when you author the tags yourself (because you choose to make them unambiguous), but it happened today when I exported tag-triples from an SQL database. One such triple was:

BondTrader prodEnvironment BondTrader

(the subject being BondTrader the application, the object being BondTrader the production unix environment).

This is obviously a bit of a problem. Exporting from a database is also a common usecase at work, and so I'd like to get this sorted.

Possible Solutions:

1) Leave the system as it is, and force people to disambiguate their tags in the context of a single graph. e.g. BondTraderApplication prodEnvironment BondTraderProdEnvironment

2) Add some sort of namespacing thing

Application:BondTrader prodEnvironment Environment:BondTrader

3) Make more use of context tags

BondTrader prodEnvironment BondTrader (environment)

(where the 2nd BondTrader is described in a document with a tag 'environment')

4) Do something logic based with the property

prodEnvironment range Environment prodEnvironment domain Application Environment distinctFrom Application

5) Seperate the notion of resources from tags, and define a resource as being identified by n tags.

(application BondTrader) prodEnvironment (environment BondTrader)


Analysis:

1: doesnt solve the problem, although it keeps the model and implementation simple. Would need to change data in database, or have some export mapping.

2: Namespacing is interesting, although it increases complexity. Take namespaces to their natural conlusion and you pretty much end up with RDF. Also you don't get serendipity in the data, as people have to know the namespace to match the tag. Could enable and disable namespace disambiguation in the tool to get serendipity back. hmmm...

3: This is also interesting. It would mean removing the ability to use ids to match tags internally, since you'd have multiple 'senses' of a tag.

4: This is just too complicated. It might be possible to reduce the complexity a bit, but I can't see people understanding this immediately (which is sort of the point)

5: This is really interesting, if not a bit computationally expensive. Also some tags are more important to the resource than others - e.g. BondTrader is more important than Environment. (e.g. What happens when you want to discribe the type "Environment" - you'd have to use '(environment type)' or something). Can't help thinking this is just an extension of describing the thing.

TagTriples software

Made some software to explore the tag-triples idea a bit further. It doesnt attempt to do anything with context yet (although it does handle 'graphs'). It does illustrate the 'ease of metadata creation' that I want.

I've installed a copy here for demo purposes (this is a temporary URL and may disappear at some point). BTW, I've noticed that the python executable on the hosted server keeps crashing - not sure why. If you get a 500 just refresh.

I've also uploaded a release of the software at sourceforge. I don't expect anybody to want it, but it's good discipline for me to make things releasable.

Tags and Triples

Ok, I think I've refined the 'tags for structured-metadata' idea to the point that I'm ready to start a prototype web-based store. Heres the basic gist:

The syntax

Basic syntax is s,p,o triples of tags:

PhilDawes a person
PhilDawes age 23
PhilDawes worksFor drkw

To elaborate on the meaning of a term, you can add stuff in brackets after the term.

b2421 a DellLaptop (laptop computer)
b2421 price 1500.24 (ukpounds)

If a term needs to contain spaces (e.g. its some text or something), you cat put it in quotes.

PhilDawes fullname "Philip Leslie Arthur Dawes"

The model

Same as above, but stuff in brackets is shortcut for more triples.

b2421 a DellLaptop
DellLaptop _tag laptop
DellLaptop _tag computer
b2421 price 1500.24 
1500.24 _tag ukpounds

Triples are scoped in named graphs. Named graphs are super-important in this model because of the requirement to be able to disambiguate through use of tags.

Notes

  • I'm not currently distinguishing between literals and resources. Will think about this more when I hit problems.
  • I haven't decided whether to include blank-node functionality
  • Tagging numbers could be problematic - it is common to want to use the same number in different contexts within a graph. If this pattern is unique to numbers then the store could just store each number seperately (different internal id).
  • Am toying with the idea of making order implicit (like it is in xml). This would remove the need for cumbersome ordered-collection constructs that plague rdf, but at the expense of implementation complexity.

cut barbers

Popped in to cutbarbers at euston station on my way to work. I can't recommend this place enough: Cut costs 8 quid (you pay before by buying a ticket), they have a dynamic web page containing the waiting times, and there's even some crazy hoover technology shit to remove the cut hair from your head so you don't get an itchy collar.

folksonomy and *structured* metadata?

WARNING: this is a collection of ill thought-out ideas!

With all the talk of tags and folksonomies, I've started wondering whether it might be possible to pull a similar trick with structured metadata - i.e. reducing the precision and accuracy of the metadata in return for increasing the simplicity of metadata creation.

I suspect this idea would cause more problems than it would solve, but exploring it is interesting. For example, you could use the same s,p,o triples model as RDF, but with words(tags) instead of URIs.

E.g.

phildawes isa person phildawes age 29 phildawes worksfor drkw phildawes fullname "Phil Dawes" phildawes plays frenchhorn frenchhorn isa musicalinstrument etc..

Of course we'd have to capture and rely a lot more on context information for ultimately differentiating between two different 'phildawes''s etc.., but the payoff for this increased ambiguity would be simpler creation of metadata and potentually a lot more of it.

At work our most important rdf usecase is having an aggregated store of linked data that can be text-searched and navigated. Since the results are always parsed by humans, this tag approach would probably work quite well here.

<20 minutes later>

Ok, I've refined the idea a bit. Instead of a single tag for each resource, we could use a collection of tags to disambiguate it a bit. The collection of tags could be in brackets or something:

(d156126 dell desktop pc) price (749.99 uk pounds) d156126 isa (desktoppc pc personalcomputer) d156126 ram (512 megabytes) d156126 harddrive (40 gigabytes)

(The convention in the above is that the first tag in the brackets is the one you will use to identify the resource in other statements.)

I suppose you could use a pattern-matching query language similar to sparql to search through the metadata, albeit with a lot less precision. Maybe

select ?a where ?a isa desktoppc ?a price ?b ?a ram ?c AND ?c < 900 AND pound in ?c

(assuming some stemming for pound)

Hmmmm... Reducing the accuracy of the results would probably allow for a lot looser query syntax. I suppose if you take this to its natural conclusion you get a text search, but this is obviously a little more structured than that.

It's difficult for me to tell if this is worth persuing further or not. Will sleep on it and think more tomorrow.

jythonservlet with tweaks

Fixed a bug in my tweaked jython servlet. From the comments:

 *
 * This is a copy of the PyServlet that comes with jython, with the
 * following enhancements (written by Phil Dawes - pdawes@users.sf.net):
 *  
 * - It will try appending .py to the URL - e.g. /foo/bah resolves to
 * the jythonservlet at /foo/bah.py
 *
 * - It can take a 'defaultservlet' init parameter to specify a
 * jythonservlet to invoke if it can't map the URL to a regular
 * servlet
 *
 * - It adds the directory of the jythonservlet to sys.path, to allow
 * nested servlets to import modules in their nested directory

Get it from here

wordpress files deleted again

Doh! Logged in to find that a load of files in my wordpress installation had been deleted. Luckily I've re-installed wordpress before so it was a simple process.

Not exactly sure how the files were deleted this time - last time I did the install I made sure that they were all only user-writable. hmmm...

Blog category tags too cumbersome

For me, tagging posts with categories is too cumbersome in wordpress (and probably in other blogging platforms). Ideally adding category tags would be an adhoc thing you type after you've written your post, rather than something you have to 'setup'.

Time to deprecate RDF/XML?

It seems to me that the thing holding back widespread RDF adoption isn't RDF itself, but rather RDF/XML.

Personally I think RDF/XML fails as a human oriented syntax - it's just too complicated for the layman to bother with.

What's more, promoting RDF/XML as the default serialisation of RDF positions RDF as a direct competitor to vanilla XML for data interchange. History is showing that this is a competition rdf/xml cannot win, and generates lots of heat and bad publicity.

Before I go on, I ought to clarify that I don't necessarily think that RDF/XML is a bad implementation of an human-oriented XML profile of RDF, but rather that (in retrospect) the idea itself is bad. Any attempt to do this is doomed to failure.

The problem is that there is a fundamental mismatch between the RDF model (graphs/triples), and the XML model (ordered tree of elements with attributes). This mismatch makes RDF/XML painfully hard to write by hand, requiring an indepth understanding of both the rdf model and of the complicated serialization syntax nuances required to make graphs fit into a tree model.

Vanilla XML on the other hand looks like the model it represents, and that makes it simpler for people to write by hand.

So, What to do?

I'm sort of proposing the following: (but could probably be easily persuaded otherwise by good arguments)

  • Deprecate RDF/XML as the default serialisation of RDF. Make it clear that it is tricky to write by hand (i.e. by putting this note in the W3C literature), and that if people want human-oriented xml interchange, they should use xml.
  • Develop tools to make it easy to specify a mapping between an xml dialect and RDF triples. For important web xml protocols (atom, rss2) specify some default xml to rdf triple mappings.
  • Promote turtle/n3 as the default human-oriented syntax
  • Reposition RDF as an information integration and knowledge management technology. It really excels at this, more so than any competing technologies (IMHO).
  • Promote one of the other triple-based xml serialisations for embedding rdf directly in XML documents. (e.g. trix or rxr)

N.B. I don't expect RDF/XML to disappear anytime soon, and that's good since it is the most widely implemented serialization of RDF in libraries, making it ideal for RDF interchange between applications. It's just that I think promoting it to humans is a lost cause, and my concern is that RDF/XML could bring the whole RDF stack down with it.

Does anybody agree?