How to disambiguate tag senses!
Mar 3rd, 2005 by Phil Dawes
Had some conversations about tag disambiguation with people at work and in the pub yesterday (that’s right folks - I’m lots of fun to go drinking with). Stu reminded me that in the del.icio.us tagging folksonomy world, people are able to choose meaning by social convention - i.e. they see which tags people are using for what sort of things, and then choose an appropriate one for their needs (maybe to get the biggest audience).
The problem with this approach for me was that I’m intending to use the tagtriples system for aggregating metadata from web-based sources, (including data exported from databases etc..) so the author/generator isn’t involved in a feedback process and isn’t likely to change their tags to fit some convention.
It wasn’t until feeling poorly on the train home today that it occurred to me that there is somebody involved in the process - e.g. the person aggregating the data. There’s also the people who need to use that data. So maybe a social mechanism for disambiguation is a possibility after all!
Ok, here’s where my thoughts are going at the moment:
Example: Say the aggregator ends up with the following statements from different sources:
‘Phil Dawes’ email pdawes@users.sf.net
and:
‘Phil Dawes’ email foo@bah.com
Somebody using the aggregator (maybe the agg-keeper) notices that these people are different (and not just the same person with two email addresses) , and inserts a ‘disambiguation’ graph:
‘Phil Dawes’[1] email pdawes@users.sf.net
‘Phil Dawes’[2] email foo@bah.com
This basically says ‘the Phil Dawes with email pdawes@users.sf.net is different to Phil Dawes with email foo@bah.com’. The disambiguation graph is intended to contain statements that are true of one disambiguee, but not of the other (and vice versa).
Internally, the aggregator assigns them 2 different ids, and then backs out the existing statements in graphs containing the original ‘Phil Dawes’ tag. It then attempts to re-apply the statements to the new tag ’senses’ by matching statements (i.e. smushing). E.g. the ‘Phil Dawes’ in graph ‘a’ contains a pdawes@users.sf.net statement, and so matches ‘Phil Dawes’[1] sense.
So what about graphs not containing enough information to match the senses? E.g. the one-statement graph:
MyApp1 managed by Phil Dawes
My current thinking is that this should match both senses or none at the discretion of the queryer. This is easy when browsing - it just becomes an UI issue. For structured queries (a la sparql) I’m thinking maybe insertion of an extra ‘?’ or something indicates that you want a lexical tag match rather than a semantic one.
E.g. the query:
select ?person, ?email
where MyApp1 managed by ?person
?person email ?email
Would match nothing, but the query:
select ??person, ?email
where MyApp1 managed by ??person
??person email ?email
should return both people (with senses indicated so you don’t think they’re the same person with 2 email addresses), and an indication that some results are likely to false.

[…] General Semantic Web tagtriples — PhilDawes @ 10:20 Following up on the disambiguation thing, once the aggregator knows that it’s dealing […]
I think the only way to “disambiguate” a string is to do it in some context. Keep the context, and you keep the identity. After you loose the context, to try to recover it by logical means is, imho, going to be a loosing proposition. This is why quads (context,subject,verb,object see the url in website of this comment) is a far better structure than just triples.