Is Identity in the eye of the beholder?
Apr 20th, 2005 by Phil Dawes
A commonly held principle of identity in within RDF circles is that the owner of the URI gets to pick what it identifies. That sounds perfectly reasonable in theory, but unfortunately my experience with using RDF at work suggests that in practice the meaning of a URI tends to skew a bit with usage and context.
For example, at work we exported the ldap directory of employees as rdf, generating a series of URIs in the process. These URIs were then used in other graphs to connect other data to employees. In this data, sometimes the URI is used to identify the user-entry in the directory, and sometimes to denote the person itself. (e.g. probably app:BondTrader isn’t administered by the person:Frank ldap directory entry, it’s administered by the person it denotes).
This is of course a matter of precision, and in theory we could have one set of URIs to denote the people, and another set to denote the directory entries. But the problem is that, to a greater or lesser extent, this seems to happen all the time.
E.g. We’ve had similar happenings with dns-name vs server - i.e. sometimes the concept of dnsname blurs with the server it points to (e.g. if you’re in an app team), and sometimes it doesnt (if you look after dns). The properties of dnsname make sense in either context - e.g. you can think of an alias in terms of a server quite easily.
Off the top of my head, others include:
- application vs monitoring-configuration-entry
- windowslogin vs person
- application (software) vs project
So what’s the solution? Should we be murderously precise about what we mint, or do these vague overlapping concepts have their place?
This is especially interesting to me because in tagtriples the concept of identity is a little blurry anyway. Identity tends to get built up by description rather than by minting an ‘id’, and so skews according to the context it’s being used in.
This blurryness has lead to me consider putting the final responsibility of identity on the shoulders of the ‘aggregator’ (i.e. the person doing the aggregating) rather than the author of the data. It’s a compelling solution since the person doing the aggregating is collecting and merging data for a particular purpose under a particular context.
E.g my tagtriples aggregator at work is used to collect data for the purpose of managing applications in DRKW, and thus all the data is specific to DRKW. If, say, DRKW was ever merged with another investment bank I’d need to reconcile this data, and so would probably add ‘drkw’ tags to all of the existing application data in order to manage collision with the new data.
Using this mechanism I could also get to choose (to some degree) whether the dnsname ‘ab35622abc’ is different to the server ‘ab35622abc’ for the context I’m aggregating in.

The formal logic which has been specified in the RDF semantics only works where identity is not sensitive to context. But identity is always sensitive to context. The W3C cannot legislate that away by writing specifications. I’m thinking that Semantic Web enthuiasts will slowly recognize that a strong logic, requiring the excluded middle, will never work on a global world scale. Oh well …
I think this is true. More and more people are saying this: Identity (and meaning as well) are in Agents[1] and are context dependent. I now think all ontologies are personal and are, in fact, problem specific. I have been trying to think through a system without “class” or “type” - only properties. Any notion of “concept” or “class” or “type” could only be calculated on ad hoc on the basis of a sample of items with similar properties. Such ephemeral “concepts” could be shattered at any moment by the addition of incompatible properties - only to be replaced momentarily with a new set of “classes” that fit the new dataset.
1. http://kashori.com/2004/12/it-takes-agent-to-be-semantic.html
[…] orkable strategies to overcome the symbol-ambiguity problem, both in RDF (which can easily suffer from context skew), and also in my TagTriples scheme whic […]
Thanks. Nicely put and plenty to think about. I wonder if the real responsibility of the Aggregator isn’t to collect the identity information in such a way that the collection of data has a minimum of ambiguities by design? From any richly understood identity, one can probably derive RDF representations of several “masks” worn by the same person or thing .
My recent projects in address book directory design for mixed business and personal use have me circling this subject and I appreciated the viewpoint you offer here.