Feed on
Posts
Comments

I haven’t said anything much about semantic web stuff for a while as I’ve been occupied with other things. However Jim Hendler’s ‘Tales from the Dark Side’ piece in IEEE Intelligent Systems reawoke an old interest. In short: I still think the RDF people have got it wrong with URIs, and so far nobody’s convinced me otherwise.

My (same old) argument: URIs are bad for large-scale interoperability. The alternative: just use words and symbols occuring in real life, and use the context inherent in the communication to disambiguate meaning.

The interesting thing about the Hendler piece is that the it pretty much walks through the arguments I make for dropping URIs, but then avoids the conclusion:

If you and I decide that we will use the term “http://www.cs.rpi.edu/~hendler/elephant” to designate some particular entity, then it really doesn’t matter what the other blind men think it is, they won’t be confused when they use the natural language term “Elephant” which is not even close, lexigraphically, to the longer term you and I are using. And if they choose to use their own URI, “http://www.other.blind.guys.org/elephant” it won’t get confused with ours.

The trick comes, of course, as we try to make these things more interoperable. It would be nice if someone from outside could figure out, and even better assert in some machine-readable way, that these two URIs were really designating the same thing - or different things, or different parts of the same thing, or … ooops, notice how quickly we’re on that slippery slope. ”

And this neatly sums up the situation with URIs. The low chance of collision represents a tradeoff: You get a high level of semantic precision - it’s extremely unlikely that two parties will use the same URI to mean two totally unconnected things. You also get a very low level of semantic interoperability: it’s equally unlikely that two unconnected parties will use the same URI to denote (even parts of!) the same thing.

Now I think the precision part is overrated - disambiguation of natural language terms can be tractably (and often trivially) achieved using contextual cues. However interoperablity of data from unconnected sources is *really* hard, and that’s why I think this is a bad tradeoff.

Anyway, the crux of the Hendler piece is that for all the high level work going on in Semantic Web land (ontology languages, description logic), it’s currently simple interoperability mechanisms that gain most traction and add the most value: ‘a little semantics goes a long way’.

The piece implies (afaics) that this is where effort should be directed, and cites the example of matching FOAF data using email addresses as illustration of the potentual success of this approach. The matching heuristic is: if two FOAF resources are describing people with the same email address, they’re very likely to be about the same person.

My experience concurs with the ‘a little semantics goes a long way’ sentiment, but personally I think FOAF has succeeded (for some measure of success) not because of RDF but in spite of it. I’d argue that the only reason the email matching works on a large scale is because email addresses are already concrete symbols grounded in the real world. FOAF didn’t create them, it just provided a context for their use. FOAF’s formal semantics certainly didn’t create this interoperability - the largest example of foaf data is scraped from live journal’s databases where the users creating the data have little concept of the ramifications of the ‘http://xmlns.com/foaf/0.1/mbox’ property.

If FOAF had to rely on artificial URIs as the sole means for identifying people it would struggle to gain any traction in the messy real world of the web.

However on the flip side I think FOAF would work just as well (and gain a lot more traction) if its underlying model didn’t employ URIs at all and instead just used triples of words/symbols. Semantic web software would still be able to identify and index FOAF data: i.e. the symbol ‘FOAF’ is pretty unambiguous on its own, but even if it wasn’t the juxtaposition of the symbol FOAF with properties like ‘mbox’, ’surname’ etc.. would suffice for pretty accurate disambiguation.

Viewing 5 Comments

    • ^
    • v
    Phil, I disagree. In a model you suggest, where there are only ambiguous names, it is *impossible* to ever know if two documents are referring to the same thing. You can only have heuristics and best guesses and probabilistic answers and such. That's fine if you don't *ever* want certainty, but sometimes certainty is desired (I think it is even *expected* of any computer application, at least today) and sometimes it is not costly to produce, and in those cases it ought to be possible to have it.

    For sure there are going to be some applications that need some degree of certainty. Say an RDF-based application that manages your home's security. You ask it, "Has anyone broken in?" and it responds, "Now just what do you mean by broken in?" Or, "There's a 90% chance no one has broken in."

    On the flip side, a model that allows you to say things with certainty doesn't force you to do so. So if people have chosen non-collided URIs for roughly the same thing, while RDF may say that they are not-certainly the same thing, you can still apply the same heuristics (whatever they are) to get a probabilistic answer. Often entities have a rdfs:label anyway that you could use for that.

    Your example with FOAF actually relies on the fact that the foaf:mailbox property is reliably interpreted by applications as what it is. Applications don't have to first ask "is the foaf:mailbox property here the same as the foaf:mailbox property there?"

    So what you're really saying is that you want a system where it's impossible to say anything with certainty. You could have that if you want, but then that system is just solving a different set of problems that RDF solves. Plus, compare that to the system we have now where it's possible to say things with certainty if people want to do so, but there's nothing preventing heuristic comparisons either (as you pointed out with FOAF).

    Plus I think the idea that "disambiguation of natural language terms can be tractably (and often trivially) achieved" is really unlikely (being a linguistics grad student). On a large scale, I would have to see it to believe it.
    • ^
    • v
    Hi Josh, I think you're missing something here: the only way you can ever be 100% sure that two documents are talking about the exactly the same thing is through shared knowledge about the context and provenance of the documents. Nothing stops somebody from inadvertantly using a URI to mean something slightly different to the original author.

    In the foaf case you're relying on the client to have understood that 'http://xmlns.com/foaf/0.1/mbox' is an IFP property requiring a unique personal mailbox and not one e.g. shared with a spouse. So we're talking about sliding scales of confidence here, not absolutes.

    Using a description framework ala RDF allows you to disambigate terms through their relationship to other terms. This is a proven technique in natural language and translates well to software: Applications have knowledge of the problem domain they're operating in and the combination of terms they're expecting to operate with. That combination of terms provides a trivial way to disambiguate data from disperate sources. Besides, you can always add disambiguation metadata to your descriptions:
    <pre>

    <> type FoafPerson
    <> usesTermsFrom http://www.foaf-project.org/
    <> name "Phil Dawes"
    <> surname "Dawes"
    <> homepage http://www.phildawes.net/
    <> mbox phil@example.com

    </pre>

    Also you tend to find that where there are global areas of ambiguity humans tend to invent names and schemes which have a low chance of collision. Email addresses, vehicle number plates, URLs and names like 'FOAF' are examples of these. These are already grounded in real life, widely shared, and are ripe for use in data exchange.
    • ^
    • v
    Phil,

    I see your point about even URIs being on a fuzzy confidence scale. But there's a difference. The uncertainty in URIs is about whether the author has typed in what he intended to convey. That probability is fairly high in general, and it's an easy and useful assumption to ignore the possibility that an author simply made a mistake. In practice, we can get away with that pretty well.

    But if we have to rely on heuristics to tie together every name, there are a lot more parameters to consider besides the one parameter for probability of a mistake. How exactly do you compute the probability that the "Phil Dawes" on this page is the same one mentioned elsewhere? At the very least you need to know how many Phil Dawes there are in the world. These aren't parameters that are easily abstracted away.

    So I guess what I'm saying is, of course everything is fuzzy in the real world, but there are some types of fuzziness that we often want to ignore for the sake of building an application that is going to give us a yes/no answer, or an answer whose certainty is only contingent on whether we think the authors of the data were competent enough to have written it out correctly. We can deal with that type of contingency, but that's not available to us if the whole system is ambiguous.
    • ^
    • v
    I agree here with Phil. And I'm glad he spotted those statements in Jim Hendler's article and pointed them out to us.

    I believe this is the major problem remaining to be solved for the semantic web. Sure, you can mint any URI, and by definition of the formal system - whatever it is that is denoted by that URI by me shall be the very same concept that is denoted by you. The problem is that it is still necessary for me to communicate to you and our machines, unambiguously if possible, what exactly I intend to denote by that URI. How am I supposed to do that? Our if I am the audience, how am I supposed to know how to interpret that URI so that I understand the same concept you had when you published it?

    And its not just a matter of mistakes, carelessness, or incompetence. Nor is it really a failing of URIs or RDF. The real problem is that the world itself does not come neatly divided into conceptual categories. It presents as a very nearly continuous whole, and we each parse it up as best we can, depending on our context, or history, and many other factors. I have posted about this before, It Takes an Agent to be Semantic. Before we can really use that URI to communicate then, we must somehow bring our interpretations into alignment, we must establish a common ground of interpretation. I have proposed an proposed an approach to this in Creating a Common Ground for URI Meaning. Lately I've come to think of common ground as synonymous with utterance (or statement) context and now think that the meaning of a URI is its location in the context of the semantic web

    What I'm looking for now is how to represent this context of common ground in OWL. I'm trying to build an ontology of context.
    • ^
    • v
    Would it be possible if in a future blog entry you could talk about semantic approaches from an industry vertical perspective and also include what standards bodies such as ACORD (www.acord.org) and others need to do in order to embrace semantic approaches.

    Thanks in advance.
close Reblog this comment
blog comments powered by Disqus

generic acomplia purchase cialis overnight delivery cheap acomplia online buy generic clomid buy cialis low price viagra without prescription where to buy cialis lowest price levitra where to buy propecia cheap cialis from canada lasix no prescription viagra without rx cheap accutane tablets viagra online without prescription viagra no rx buying cialis online zithromax viagra in uk free cialis cialis us where to buy acomplia find cialis online buy viagra lowest price accutane prescription buy cheap accutane online cialis buy buy generic cialis online acomplia order propecia online lowest price synthroid synthroid without a prescription synthroid online buy propecia online cheap levitra online where to buy levitra cialis online review synthroid prices cialis generic cialis buy drug buy viagra on line viagra pharmacy cialis for order price of levitra zithromax online where to buy synthroid soma generic generic clomid propecia online stores viagra cheap drug cheap generic soma cialis cheap zithromax online cheap order accutane online purchase zithromax online purchase viagra online buy cheap clomid cheap generic propecia zithromax pharmacy online pharmacy cialis cheapest acomplia cost of cialis no prescription viagra free viagra purchase lasix online cialis from india viagra from india order discount cialis soma online stores find no rx cialis cialis no rx required find viagra without prescription approved cialis pharmacy lasix discount