<?xml version="1.0" encoding="utf-8"?><!-- generator="wordpress/2.1.1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Dark side of the semantic web</title>
	<link>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/</link>
	<description>Mostly programming with a few bits of other stuff</description>
	<pubDate>Tue, 07 Oct 2008 08:35:11 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.1.1</generator>

	<item>
		<title>By: Josh Tauberer</title>
		<link>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14797</link>
		<author>Josh Tauberer</author>
		<pubDate>Tue, 19 Dec 2006 12:37:37 +0000</pubDate>
		<guid>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14797</guid>
					<description>Phil, I disagree.  In a model you suggest, where there are only ambiguous names, it is *impossible* to ever know if two documents are referring to the same thing.  You can only have heuristics and best guesses and probabilistic answers and such. That's fine if you don't *ever* want certainty, but sometimes certainty is desired (I think it is even *expected* of any computer application, at least today) and sometimes it is not costly to produce, and in those cases it ought to be possible to have it.

For sure there are going to be some applications that need some degree of certainty. Say an RDF-based application that manages your home's security.  You ask it, "Has anyone broken in?" and it responds, "Now just what do you mean by broken in?"  Or, "There's a 90% chance no one has broken in."

On the flip side, a model that allows you to say things with certainty doesn't force you to do so. So if people have chosen non-collided URIs for roughly the same thing, while RDF may say that they are not-certainly the same thing, you can still apply the same heuristics (whatever they are) to get a probabilistic answer. Often entities have a rdfs:label anyway that you could use for that.

Your example with FOAF actually relies on the fact that the foaf:mailbox property is reliably interpreted by applications as what it is.  Applications don't have to first ask "is the foaf:mailbox property here the same as the foaf:mailbox property there?"

So what you're really saying is that you want a system where it's impossible to say anything with certainty.  You could have that if you want, but then that system is just solving a different set of problems that RDF solves.  Plus, compare that to the system we have now where it's possible to say things with certainty if people want to do so, but there's nothing preventing heuristic comparisons either (as you pointed out with FOAF).

Plus I think the idea that "disambiguation of natural language terms can be tractably (and often trivially) achieved" is really unlikely (being a linguistics grad student).  On a large scale, I would have to see it to believe it.</description>
		<content:encoded><![CDATA[<p>Phil, I disagree.  In a model you suggest, where there are only ambiguous names, it is *impossible* to ever know if two documents are referring to the same thing.  You can only have heuristics and best guesses and probabilistic answers and such. That&#8217;s fine if you don&#8217;t *ever* want certainty, but sometimes certainty is desired (I think it is even *expected* of any computer application, at least today) and sometimes it is not costly to produce, and in those cases it ought to be possible to have it.</p>
<p>For sure there are going to be some applications that need some degree of certainty. Say an RDF-based application that manages your home&#8217;s security.  You ask it, &#8220;Has anyone broken in?&#8221; and it responds, &#8220;Now just what do you mean by broken in?&#8221;  Or, &#8220;There&#8217;s a 90% chance no one has broken in.&#8221;</p>
<p>On the flip side, a model that allows you to say things with certainty doesn&#8217;t force you to do so. So if people have chosen non-collided URIs for roughly the same thing, while RDF may say that they are not-certainly the same thing, you can still apply the same heuristics (whatever they are) to get a probabilistic answer. Often entities have a rdfs:label anyway that you could use for that.</p>
<p>Your example with FOAF actually relies on the fact that the foaf:mailbox property is reliably interpreted by applications as what it is.  Applications don&#8217;t have to first ask &#8220;is the foaf:mailbox property here the same as the foaf:mailbox property there?&#8221;</p>
<p>So what you&#8217;re really saying is that you want a system where it&#8217;s impossible to say anything with certainty.  You could have that if you want, but then that system is just solving a different set of problems that RDF solves.  Plus, compare that to the system we have now where it&#8217;s possible to say things with certainty if people want to do so, but there&#8217;s nothing preventing heuristic comparisons either (as you pointed out with FOAF).</p>
<p>Plus I think the idea that &#8220;disambiguation of natural language terms can be tractably (and often trivially) achieved&#8221; is really unlikely (being a linguistics grad student).  On a large scale, I would have to see it to believe it.</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: manuel</title>
		<link>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14802</link>
		<author>manuel</author>
		<pubDate>Tue, 19 Dec 2006 13:21:27 +0000</pubDate>
		<guid>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14802</guid>
					<description>&lt;strong&gt;uris schmuris...&lt;/strong&gt;

Unlike del.icio.us which uses simple words to connect pieces of information, the Semantic Web proposes to use URIs so that you really know when two people mean the same thing. Phil Dawes echoes my feelings on this design issue:I still...</description>
		<content:encoded><![CDATA[<p><strong>uris schmuris&#8230;</strong></p>
<p>Unlike del.icio.us which uses simple words to connect pieces of information, the Semantic Web proposes to use URIs so that you really know when two people mean the same thing. Phil Dawes echoes my feelings on this design issue:I still&#8230;</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Phil Dawes</title>
		<link>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14808</link>
		<author>Phil Dawes</author>
		<pubDate>Tue, 19 Dec 2006 14:52:53 +0000</pubDate>
		<guid>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14808</guid>
					<description>Hi Josh, I think you're missing something here: the only way you can ever be 100% sure that two documents are talking about the exactly the same thing is through shared knowledge about the context and provenance of the documents. Nothing stops somebody from inadvertantly using a URI to mean something slightly different to the original author. 

In the foaf case you're relying on the client to have understood that 'http://xmlns.com/foaf/0.1/mbox' is an IFP property requiring a unique personal mailbox and not one e.g. shared with a spouse. So we're talking about sliding scales of confidence here, not absolutes. 

Using a description framework ala RDF allows you to disambigate terms through their relationship to other terms. This is a proven technique in natural language and translates well to software: Applications have knowledge of the problem domain they're operating in and the combination of terms they're expecting to operate with. That combination of terms provides a trivial way to disambiguate data from disperate sources. Besides, you can always add disambiguation metadata to your descriptions:
&lt;pre&gt;
&lt;code&gt;
&#60;&gt; type FoafPerson
&#60;&gt; usesTermsFrom http://www.foaf-project.org/
&#60;&gt; name "Phil Dawes"
&#60;&gt; surname "Dawes"
&#60;&gt; homepage http://www.phildawes.net/
&#60;&gt; mbox phil@example.com
&lt;/code&gt;
&lt;/pre&gt;

Also you tend to find that where there are global areas of ambiguity humans tend to invent names and schemes which have a low chance of collision. Email addresses, vehicle number plates, URLs and names like 'FOAF' are examples of these. These are already grounded in real life, widely shared, and are ripe for use in data exchange.</description>
		<content:encoded><![CDATA[<p>Hi Josh, I think you&#8217;re missing something here: the only way you can ever be 100% sure that two documents are talking about the exactly the same thing is through shared knowledge about the context and provenance of the documents. Nothing stops somebody from inadvertantly using a URI to mean something slightly different to the original author. </p>
<p>In the foaf case you&#8217;re relying on the client to have understood that &#8216;http://xmlns.com/foaf/0.1/mbox&#8217; is an IFP property requiring a unique personal mailbox and not one e.g. shared with a spouse. So we&#8217;re talking about sliding scales of confidence here, not absolutes. </p>
<p>Using a description framework ala RDF allows you to disambigate terms through their relationship to other terms. This is a proven technique in natural language and translates well to software: Applications have knowledge of the problem domain they&#8217;re operating in and the combination of terms they&#8217;re expecting to operate with. That combination of terms provides a trivial way to disambiguate data from disperate sources. Besides, you can always add disambiguation metadata to your descriptions:</p>
<pre>
<code>
&lt;> type FoafPerson
&lt;> usesTermsFrom <a href="http://www.foaf-project.org/" rel="nofollow">http://www.foaf-project.org/</a>
&lt;> name "Phil Dawes"
&lt;> surname "Dawes"
&lt;> homepage <a href="http://www.phildawes.net/" rel="nofollow">http://www.phildawes.net/</a>
&lt;> mbox <a href="mailto:phil@example.com">phil@example.com</a>
</code>
</pre>
<p>Also you tend to find that where there are global areas of ambiguity humans tend to invent names and schemes which have a low chance of collision. Email addresses, vehicle number plates, URLs and names like &#8216;FOAF&#8217; are examples of these. These are already grounded in real life, widely shared, and are ripe for use in data exchange.</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Josh Tauberer</title>
		<link>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14811</link>
		<author>Josh Tauberer</author>
		<pubDate>Tue, 19 Dec 2006 16:33:54 +0000</pubDate>
		<guid>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14811</guid>
					<description>Phil,

I see your point about even URIs being on a fuzzy confidence scale.  But there's a difference.  The uncertainty in URIs is about whether the author has typed in what he intended to convey.  That probability is fairly high in general, and it's an easy and useful assumption to ignore the possibility that an author simply made a mistake.  In practice, we can get away with that pretty well.

But if we have to rely on heuristics to tie together every name, there are a lot more parameters to consider besides the one parameter for probability of a mistake. How exactly do you compute the probability that the "Phil Dawes" on this page is the same one mentioned elsewhere?    At the very least you need to know how many Phil Dawes there are in the world.  These aren't parameters that are easily abstracted away.

So I guess what I'm saying is, of course everything is fuzzy in the real world, but there are some types of fuzziness that we often want to ignore for the sake of building an application that is going to give us a yes/no answer, or an answer whose certainty is only contingent on whether we think the authors of the data were competent enough to have written it out correctly. We can deal with that type of contingency, but that's not available to us if the whole system is ambiguous.</description>
		<content:encoded><![CDATA[<p>Phil,</p>
<p>I see your point about even URIs being on a fuzzy confidence scale.  But there&#8217;s a difference.  The uncertainty in URIs is about whether the author has typed in what he intended to convey.  That probability is fairly high in general, and it&#8217;s an easy and useful assumption to ignore the possibility that an author simply made a mistake.  In practice, we can get away with that pretty well.</p>
<p>But if we have to rely on heuristics to tie together every name, there are a lot more parameters to consider besides the one parameter for probability of a mistake. How exactly do you compute the probability that the &#8220;Phil Dawes&#8221; on this page is the same one mentioned elsewhere?    At the very least you need to know how many Phil Dawes there are in the world.  These aren&#8217;t parameters that are easily abstracted away.</p>
<p>So I guess what I&#8217;m saying is, of course everything is fuzzy in the real world, but there are some types of fuzziness that we often want to ignore for the sake of building an application that is going to give us a yes/no answer, or an answer whose certainty is only contingent on whether we think the authors of the data were competent enough to have written it out correctly. We can deal with that type of contingency, but that&#8217;s not available to us if the whole system is ambiguous.</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: John Black</title>
		<link>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14923</link>
		<author>John Black</author>
		<pubDate>Thu, 21 Dec 2006 14:12:57 +0000</pubDate>
		<guid>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-14923</guid>
					<description>I agree here with Phil. And I'm glad he spotted those statements in Jim Hendler's article and pointed them out to us.

I believe this is the major problem remaining to be solved for the semantic web. Sure, you can mint any URI, and by definition of the formal system - whatever it is that is denoted by that URI by me &lt;em&gt;shall be&lt;/em&gt; the very same concept that is denoted by you. The problem is that it is still necessary for me to communicate to you and our machines, unambiguously if possible, what exactly I intend to denote by that URI. How am I supposed to do that? Our if I am the audience, how am I supposed to know how to interpret that URI so that I understand the same concept you had when you published it?

And its not just a matter of mistakes, carelessness, or incompetence. Nor is it really a failing of URIs or RDF. The real problem is that the world itself does not come neatly divided into conceptual categories. It presents as a very nearly continuous whole, and we each parse it up as best we can, depending on our context, or history, and many other factors. I have posted about this before, &lt;a href="http://kashori.com/2004/12/it-takes-agent-to-be-semantic.html" rel="nofollow"&gt;It Takes an Agent to be Semantic&lt;/a&gt;. Before we can really use that URI to communicate then, we must somehow bring our interpretations into alignment, we must establish a common ground of interpretation. I have proposed an proposed an approach to this in &lt;a href="http://kashori.com/2006/06/creating-common-ground-for-uri-meaning.html" rel="nofollow"&gt;Creating a Common Ground for URI Meaning&lt;/a&gt;. Lately I've come to think of common ground as synonymous with utterance (or statement) context and now think that the &lt;a href="http://kashori.com/2006/07/words-or-uri-as-locations-in-fabric-of.html" rel="nofollow"&gt;meaning of a URI is its location in the context of the semantic web&lt;/a&gt;

What I'm looking for now is how to represent this context of common ground in OWL. I'm trying to build an ontology of context.</description>
		<content:encoded><![CDATA[<p>I agree here with Phil. And I&#8217;m glad he spotted those statements in Jim Hendler&#8217;s article and pointed them out to us.</p>
<p>I believe this is the major problem remaining to be solved for the semantic web. Sure, you can mint any URI, and by definition of the formal system - whatever it is that is denoted by that URI by me <em>shall be</em> the very same concept that is denoted by you. The problem is that it is still necessary for me to communicate to you and our machines, unambiguously if possible, what exactly I intend to denote by that URI. How am I supposed to do that? Our if I am the audience, how am I supposed to know how to interpret that URI so that I understand the same concept you had when you published it?</p>
<p>And its not just a matter of mistakes, carelessness, or incompetence. Nor is it really a failing of URIs or RDF. The real problem is that the world itself does not come neatly divided into conceptual categories. It presents as a very nearly continuous whole, and we each parse it up as best we can, depending on our context, or history, and many other factors. I have posted about this before, <a href="http://kashori.com/2004/12/it-takes-agent-to-be-semantic.html" rel="nofollow">It Takes an Agent to be Semantic</a>. Before we can really use that URI to communicate then, we must somehow bring our interpretations into alignment, we must establish a common ground of interpretation. I have proposed an proposed an approach to this in <a href="http://kashori.com/2006/06/creating-common-ground-for-uri-meaning.html" rel="nofollow">Creating a Common Ground for URI Meaning</a>. Lately I&#8217;ve come to think of common ground as synonymous with utterance (or statement) context and now think that the <a href="http://kashori.com/2006/07/words-or-uri-as-locations-in-fabric-of.html" rel="nofollow">meaning of a URI is its location in the context of the semantic web</a></p>
<p>What I&#8217;m looking for now is how to represent this context of common ground in OWL. I&#8217;m trying to build an ontology of context.</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: James</title>
		<link>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-15074</link>
		<author>James</author>
		<pubDate>Sun, 24 Dec 2006 22:27:04 +0000</pubDate>
		<guid>http://phildawes.net/blog/2006/12/19/dark-side-of-the-semantic-web/#comment-15074</guid>
					<description>Would it be possible if in a future blog entry you could talk about semantic approaches from an industry vertical perspective and also include what standards bodies such as ACORD (www.acord.org) and others need to do in order to embrace semantic approaches.

Thanks in advance.</description>
		<content:encoded><![CDATA[<p>Would it be possible if in a future blog entry you could talk about semantic approaches from an industry vertical perspective and also include what standards bodies such as ACORD (www.acord.org) and others need to do in order to embrace semantic approaches.</p>
<p>Thanks in advance.</p>
]]></content:encoded>
				</item>
</channel>
</rss>
