rdf: changeset 1087:fe8fbf908d32

--- a/rdf-dataset/index.html	Mon Aug 19 14:42:04 2013 -0700
+++ b/rdf-dataset/index.html	Tue Sep 17 09:13:18 2013 +0200
@@ -110,13 +110,13 @@
   <body>
 
 <section id="abstract">
-  <p>RDF defines the concept of RDF datasets, a structure composed of a distinguished RDF graph and zero or more named graphs, being pairs comprising an IRI and an RDF graph. While RDF graphs have a formal model-theoretic semantics that determines what arrangements of the world make an RDF graph true, no agreed formal semantics exists for RDF datasets. This document presents the issues to be addressed when defining a formal semantics for datasets, as they have been discussed in the RDF 1.1 Working Group, and specify several semantics in terms of model theory, each corresponding to a certain design choice for RDF datasets.</p>
+  <p>RDF defines the concept of RDF datasets, a structure composed of a distinguished RDF graph and zero or more named graphs, being pairs comprising an IRI or blank node and an RDF graph. While RDF graphs have a formal model-theoretic semantics that determines what arrangements of the world make an RDF graph true, no agreed formal semantics exists for RDF datasets. This document presents the issues to be addressed when defining a formal semantics for datasets, as they have been discussed in the RDF 1.1 Working Group, and specify several semantics in terms of model theory, each corresponding to a certain design choice for RDF datasets.</p>
 </section>
 
 <section id="sec-introduction">
     <h2>Introduction</h2>
 
-    <p>The <a href="http://www.w3.org/TR/rdf11-concepts/">Resource Description Framework (RDF)</a> version 1.1 defines the concept of RDF datasets, a notion introduced first by the SPARQL specification [[RDF-SPARQL-QUERY]].  A dataset is defined as a collection of <a title="RDF graph">RDF graphs</a> where all but one are <a title="named graph">named graphs</a> associated with an <a>IRI</a>, and the unnamed default graph [[RDF-CONCEPTS]].  Given that RDF is a data model equiped with a formal semantics [[RDF-MT]], it is natural to try and define what the semantics of datasets should be.</p>
+    <p>The <a href="http://www.w3.org/TR/rdf11-concepts/">Resource Description Framework (RDF)</a> version 1.1 defines the concept of RDF datasets, a notion introduced first by the SPARQL specification [[RDF-SPARQL-QUERY]].  An RDF dataset is defined as a collection of <a title="RDF graph">RDF graphs</a> where all but one are <a title="named graph">named graphs</a> associated with an <a>IRI</a> or <a>blank node</a> (the <a>graph name</a>), and the unnamed default graph [[RDF11-CONCEPTS]].  Given that RDF is a data model equiped with a formal semantics [[RDF11-MT]], it is natural to try and define what the semantics of datasets should be.</p>
 
     <p>The RDF 1.1 Working Group was initially chartered to provide such semantics in its recommendation:</p>
     <blockquote cite="http://www.w3.org/2011/01/rdf-wg-charter">
@@ -126,17 +126,17 @@
 
 	<p>However, discussions within the Working Group revealed that very different assumptions were currently existing among practitioners, who are using RDF datasets with their own intuition of the meaning of the datasets.  Defining the semantics of RDF datasets requires an understanding of the two following issues:</p>
 	<ul>
-		<li>what the named graph IRIs denote;</li>
+		<li>what the graph names (IRI or blank node) denote;</li>
 		<li>how the triples in the named graph influence the meaning of the dataset.</li>
 	</ul>
 	
-	<p>Possible choices for the denotation of graph IRIs are:</p>
+	<p>Possible choices for the denotation of graph names are:</p>
 	<ul>
 		<li>it denotes the RDF graph in the (name,graph) pair;</li>
 		<li>it denotes the pair itself;</li>
 		<li>it denotes a supergraph of the graph inside the pair;</li>
 		<li>it denotes a container for the RDF graph, that is, a mutable element;</li>
-		<li>it denotes the information resource that can be obtained by dereferencing the IRI (if such resource exists);</li>
+		<li>it denotes the information resource that can be obtained by dereferencing the graph name, when it is an IRI and if such resource exists;</li>
 		<li>it denotes an arbitrary resource that is constrained to be in a special relationship with the graph inside the pair;</li>
 		<li>it denotes an unconstrained resource.</li>
 	</ul>
@@ -159,7 +159,7 @@
 
 	<p>We first take a look at existing specifications that could shed a light on how the semantics of datasets should be defined. There are three important documents that closely relate to the issue:</p>
 	<ul>
-		<li>the RDF semantics, as standardised in 2004 [[RDF-MT]];</li>
+		<li>the RDF semantics, as standardised in 2004 [[RDF-MT]] and its revision in 2013 [[RDF11-MT]];</li>
 		<li>the article <i>Named Graphs</i> by Carrol et al., which first introduced the term "named graph" and contains a section on formal semantics;</li>
 		<li>the SPARQL specification [[RDF-SPARQL-QUERY]], which defines RDF datasets and how to query them.</li>
 	</ul>
@@ -167,23 +167,25 @@
 	<section id="rdf-semantics">
 		<h3>The RDF semantics</h3>
 		
-		<p>The RDF semantics defines the meaning of a set of RDF graphs: <q cite="http://www.w3.org/TR/rdf-mt/#entail">a set of graphs can be treated as equivalent to its merge, i.e. a single graph, as far as the model theory is concerned</q>.</p>
-		<p>So, a first intuition could be that an RDF dataset, being presented as a collection of graph, should mean exactly what the set of its named graphs and default graph means. However, this has both formal drawbacks and conceptual drawbacks.</p>
-		<p>Formally, the semantics of RDF defines a notion of interpretation for a set of triples (i.e., an RDF graph), which then extends easily to a set of RDF graphs. A dataset is neither a set of triples nor a set of RDF graphs. It is a set of <em>pairs</em> (name,graph) together with a distinguish RDF graph. Consequently, defining interpretation and entailement for RDF datasets would require at least an extension of the RDF semantics.</p>
+		<p class="issue">Part of what follows is somewhat subjective.</p>
+		
+		<p>The first version of RDF semantics defined the meaning of a set of RDF graphs: <q cite="http://www.w3.org/TR/rdf-mt/#entail">a set of graphs can be treated as equivalent to its merge, that is, a single graph, as far as the model theory is concerned</q>.</p>
+		<p>So, a first intuition could be that an RDF dataset, being presented as a collection of graph, should mean exactly what the set of its named graphs and default graph means. However, this completely leaves out the meaning of graph names, which could be valuable indicators for the truth of a dataset.</p>
+		<p>Formally, the semantics of RDF defines a notion of interpretation for a set of triples (i.e., an RDF graph), which then can extend to a set of RDF graphs. A dataset is neither a set of triples nor a set of RDF graphs. It is a set of <em>pairs</em> (name,graph) together with a distinguished RDF graph. Consequently, defining interpretation and entailement for RDF datasets would require at least an extension of the RDF semantics.</p>
 		<p>Conceptually, it is problematic since one of the reasons for separating triples into distinct (named) graphs is to avoid propagating the knowledge of one graph to the entire triple base. Sometimes, contradicting graphs need to coexist in a store. Sometimes named graphs are not endorsed by the system as a whole, they are merely quoted.</p>
 	</section>
 	
 	<section id="named-graph-paper">
 		<h3>The Named Graphs paper</h3>
 		
-		<p>In Carrol et al., a named graph is simply defined as a pair comprising an IRI and an RDF graph. The notion of RDF interpretation is extended to named graphs by saying that the graph IRI in the pair must denote the pair itself. This non-ambiguously answers the question of what the graph IRI denotes. Additionally, ...</p>
+		<p>In Carrol et al., a named graph is simply defined as a pair comprising an IRI and an RDF graph. The notion of RDF interpretation is extended to named graphs by saying that the graph IRI in the pair must denote the pair itself. This non-ambiguously answers the question of what the graph IRI denotes. This can then be used to define a proper dataset semantics, as shown in Section ?.</p>
 	</section>
 	
 	<section id="sparql">
 		<h3>The SPARQL specification</h3>
 
-		<p>RDF 1.1 defines the notion of RDF dataset identically to SPARQL, which introduced it first. So, in order to understand the semantics of dataset, it is worth looking at how SPARQL uses datasets. SPARQL defines what are answers to queries posed against a dataset, but it never defines the notions that are key to a model theoretic formal semantics: it neither presents interpretations nor entailment. Still, it is worth noticing that a ASK query that only contains a basic graph pattern without variables yields the same result as asking whether the RDF graph in the query is entailed by the default graph. Based on this observation, one may extrapolate that a ASK query containing no variables and only GRAPH graph patterns would yield the same result as dataset entailment.</p>
-		<p>This can be used to define a formal semantics for datasets, as can be seen in Section ?.</p>
+		<p>RDF 1.1 defines the notion of RDF dataset  identically to SPARQL, which introduced it first. So, in order to understand the semantics of dataset, it is worth looking at how SPARQL uses datasets. SPARQL defines what are answers to queries posed against a dataset, but it never defines the notions that are key to a model theoretic formal semantics: it neither presents interpretations nor entailment. Still, it is worth noticing that a ASK query that only contains a basic graph pattern without variables yields the same result as asking whether the RDF graph in the query is entailed by the default graph. Based on this observation, one may extrapolate that a ASK query containing no variables and only GRAPH graph patterns would yield the same result as dataset entailment.</p>
+		<p>This can be used as a guide for formalizing the semantics of datasets, as can be seen in Section ?.</p>
 	</section>
 	
 </section>
@@ -201,58 +203,170 @@
 		<li>they define notions of interpretation and entailment in function of the corresponding notions in RDF Semantics.</li>
 	</ul>
 
-	<p>In fact, the dependency on RDF semantics is such that most of the dataset semantics below reuse RDF semantics as a black box.  The purpose of a formal semantics for datasets is to determine under what circumstances a dataset can be said to be true or false.  The formalisation below indicates that the truth of an RDF dataset can be determined in function of the truth of an RDF graph, no matter how the latter is determined.  Therefore, instead of defining a precise definition of RDF graph interpretations and entailment, we use the more abstract notion of <a>entailment regime</a>.  In fact, RDF Semantics does not define a single formal semantics, but multiple ones, depending on what standard vocabularies are endorsed by an application.  Consequently, we will parameterize most of the definitions below with an unspecified entailment regime E.  RDF 1.1 defines the following entailment regimes: simple entailment, LV entailment, RDFS-entailment, D-entailment.  Additionally, OWL defines two other entailment regimes, based on the OWL 2 direct semantics and the OWL 2 RDF-based semantics.</p>
-	<p>For an entailment regime E, we will say E-interpretation, E-entailment, E-equivalence, E-consistency to describe the notions of interpretations, entailment, equivalence and consistency associated with the regime E. Similarly, we will use the terms dataset-interpretation, dataset-entailment, dataset-equivalence, dataset-consistency for the corresponding notions in dataset semantics.</p>
+	<p>In fact, the dependency on RDF semantics is such that most of the dataset semantics below reuse RDF semantics as a black box.  The purpose of a formal semantics for datasets is to determine under what circumstances a dataset can be said to be true or false.  The formalisation below indicates that the truth of an RDF dataset can be determined in function of the truth of an RDF graph, no matter how the latter is determined.  Therefore, instead of defining a precise definition of RDF graph interpretations and entailment, we use the more abstract notion of <a>entailment regime</a>.  In fact, RDF Semantics does not define a single formal semantics, but multiple ones, depending on what standard vocabularies are endorsed by an application.  Consequently, we will parameterize most of the definitions below with an unspecified entailment regime <var>E</var>.  RDF 1.1 defines the following entailment regimes: simple entailment, D-entailment, RDF-entailment, RDFS-entailment.  Additionally, OWL defines two other entailment regimes, based on the OWL 2 direct semantics [[OWL2-Direct-Semantics]] and the OWL 2 RDF-based semantics [[OWL2-RDF-based-Semantics]].</p>
+	<p>For an entailment regime <var>E</var>, we will say <var>E</var>-interpretation, <var>E</var>-entailment, <var>E</var>-equivalence, <var>E</var>-consistency to describe the notions of interpretations, entailment, equivalence and consistency associated with the regime <var>E</var>. Similarly, we will use the terms dataset-interpretation, dataset-entailment, dataset-equivalence, dataset-consistency for the corresponding notions in dataset semantics.</p>
 
 	<section>
 		<h3>Named graphs have no meaning</h3>
 		<p>The simplest semantics defines an interpretation of a dataset as an RDF interpretation of the default graph. The dataset is true, according to the interpretation, if and only if the default graph is true. In this case, any datasets that have equivalent default graphs are dataset-equivalent.</p>
 		<p>This means that the named graphs in a dataset are irrelevent to determining the truth of a dataset. Therefore, arbitrary modifications of the named graphs in a graph store always yield an equivalent dataset, according to this semantics.</p>
 		<h4 class="formal">Formalization</h4>
-		<p>Considering an entailment regime E, a dataset-interpretation of a vocabulary V with respect to E is an E-interpretation of V. Given an interpretation I of V and a dataset D = (G, NG), I(D) is true if and only if I(G).</p>
+		<p>Considering an entailment regime <var>E</var>, a dataset-interpretation with respect to <var>E</var> is an <var>E</var>-interpretation. Given an interpretation <var>I</var> and a dataset <var>D</var> having default graph <var>G</var> and named graphs <var>NG</var>, <var>I(D)</var> is true if and only if <var>I(G)</var> is true.</p>
 
 		<h4 class="ex">Examples of entailement and non-entailments</h4>
 		<p>Consider the following dataset:</p>
-		<pre class="example">{ :s :p :o . }
-:g1 { :a :b :c }</pre>
-		<p>does not entail:</p>
-		<pre class="example">{ :s :p :o .
-:a :b :c .}</pre>
-		<p>but entails:</p>
+		<pre class="example">{ :s  :p  :o . }
+:g1 { :a  :b  :c }</pre>
+		<p>does not dataset-entail:</p>
+		<pre class="example">{ :s  :p  :o .
+:a  :b  :c .}</pre>
+		<p>but dataset-entails:</p>
 		<pre class="example">{}  # empty default graph
-:g2 { :x :y :z }</pre>
+:g2 { :x  :y  :z }</pre>
+		<p>Since graph names are not particularly constrained, one can use them in triples, for instance:</p>
+		<pre class="example">{ :g1  :author  :Bob .
+ :g1  :created  "2013-09-17"^^xsd:date .}
+:g1 { :a  :b  :c }</pre>
+		<p>but it would dataset-entail:</p>
+		<pre class="example">{ :g1  :author  :Bob .
+ :g1  :created  "2013-09-17"^^xsd:date .}
+:g1 { :x  :y  :z }</pre>
+
+		<h4 class="prop">Properties of this dataset semantics</h4>
+		<p>Assuming this semantics is convenient since it merely ignores named graphs in a dataset. As a result, datasets can be simply treated as regular RDF graphs by extracting the default graph. Named graphs can still be used to preserve useful information, but it bares no more meaning than a commentary in a program source code.</p>
+		<p>The obvious disadvantage is that, since named graphs are completely disregarded, there is no added value in using RDF datasets rather than regular RDF graphs.</p>
 	</section>
 
 	<section>
-		<h3>Dafault graph as union or as merge</h3>
-		<p>It is sometimes assumed that named graphs are simply a convenient way of sorting the triples but all the triples participte in a united knowledge base that takes the place of the default graph.  More precisely, a dataset is considered to be true if all the triples in all the graphs, named or default, are true together.  This description allows two formalization of dataset semantics, depending on how blank nodes spanning several named graphs are treated.</p>
+		<h3>Default graph as union or as merge</h3>
+		<p>It is sometimes assumed that named graphs are simply a convenient way of sorting the triples but all the triples participte in a united knowledge base that takes the place of the default graph.  More precisely, a dataset is considered to be true if all the triples in all the graphs, named or default, are true together.  This description allows two formalizations of dataset semantics, depending on how blank nodes spanning several named graphs are treated.</p>
 
 		<h4 class="formal">Formalization: first version</h4>
-		<p>We define a dataset-interpretation of a vocabulary V with respect to an entailment regime E as an E-interpretation of V. Given a dataset-interpretation I and a dataset D = (G, NG), I(D) is true if and only if I(G) is true and for all ng in NG, I(ng) is true.</p>
-		<p>This is equivalent to I(D) is true if I(H) is true where H is the merge of all the RDF graphs, named or default, appearing in D.</p>
+		<p>We define a dataset-interpretation with respect to an entailment regime <var>E</var> as an <var>E</var>-interpretation. Given a dataset-interpretation <var>I</var> and a dataset <var>D</var> having default graph <var>G</var> and named fgraphs <var>NG</var>, <var>I(D)</var> is true if and only if <var>I(G)</var> is true and for all <var>ng</var> in <var>NG</var>, <var>I(ng)</var> is true.</p>
+		<p>This is equivalent to <var>I(D)</var> is true if <var>I(H)</var> is true where <var>H</var> is the <a>merge</a> of all the RDF graphs, named or default, appearing in <var>D</var>.</p>
 		
 		<h4 class="formal">Formalization: second version</h4>
-		<p>We define a dataset-interpretation of a vocabulary V with respect to an entailment regime E as an E-interpretation of V. Given a dataset-interpretation I and a dataset D = (G, NG), I(D) is true if and only if I(H) is true where H is the union of all the RDF graphs, named or default, appearing in D.</p>
-		<p>An alternative presentation of this variant is the following: define I+A to be an extended interpretation which is like I except that it uses A to give the interpretation of blank nodes; define blank(D) to be the set of blank nodes in D. Then I(D) = true if [I+A](D) = true for some mapping A from blank(D) to the set of resources in I, otherwise I(D)= false.</p>
+		<p>We define a dataset-interpretation with respect to an entailment regime <var>E</var> as an <var>E</var>-interpretation. Given a dataset-interpretation <var>I</var> and a dataset <var>D</var> having default graph <var>G</var> and named graphs <var>NG</var>, <var>I(D)</var> is true if and only if <var>I(H)</var> is true where <var>H</var> is the union of all the RDF graphs, named or default, appearing in <var>D</var>.</p>
+		<p>An alternative presentation of this variant is the following: define <var>I+A</var> to be an extended interpretation which is like <var>I</var> except that it uses <var>A</var> to give the interpretation of blank nodes; define <var>blank(D)</var> to be the set of blank nodes in <var>D</var>. Then <var>I(D)</var> is true if and only if <var>[I+A](D)</var> is true for some mapping <var>A</var> from <var>blank(D)</var> to the set of resources in <var>I</var>.</p>
 
 		<h4 class="ex">Examples</h4>
-		
+		<p>Consider the following dataset:</p>
+		<pre class="example">{ :s  :p  :o . }  # default graph
+:g1 { :a  :b  :c }</pre>
+		<p>dataset-entails:</p>
+		<pre class="example">{ :s  :p  :o .
+:a  :b  :c .}</pre>
+		<p>If the entailment regime <var>E</var> is RDFS with the recognized datatype <code>xsd:integer</code>, then the following RDF dataset is RDFS-dataset-inconsistent:</p>
+		<pre class="example">{ }  # empty default graph
+:g1 { :age  rdfs:range  xsd:integer . }
+:g2 { :bob  :age  "twenty" .}</pre>
+
+		<h4 class="prop">Properties of this dataset semantics</h4>
+		<p>This semantics allows one to partition the triples of an RDF graph into multiple named graphs for easier data management, yet retaining the meaning of the overall RDF graph. Note that this choice of semantics does not impact the way graph names are interpreted: it is possible to further constrain the graph names to denote the RDF graph associated with it, or other possible constraints. The possible interpretations of graph names, and their consequences, are presented in the next sections.</p>
+		<p>This semantics is implicitely assumed by existing graph store implementations. The OWLIM RDF database management system implements reasoning techniques over RDF datasets that materialize inferred statements into the database [[citation needed]]. This is done by taking the union of the graphs in the named graphs, applying standard entailment regimes over this RDF graph and putting the inferred triples into the default graph.</p>
+		<p>The main drawback of this dataset semantics is that all triples in the named graphs contribute to a global knowledge that must be consistent. In situations where named graphs are used to store RDF graphs obtained from various sources on the open Web, inconsistencies or contradictions can easily occur. Notably, Web crawlers of search engines harvest all RDF documents, and it is known as a fact that the Web contains documents serializing inconsistent RDF graphs as well as documents that are mutually contradicting yet consistent on their own.</p>
 	</section>
 
 	<section>
-		<h3>The graph IRI denotes the associated graph</h3>
-		<p></p>
+		<h3>The graph name denotes the named graph or the graph</h3>
+		<p>It is common to use the graph name as a way to identify the RDF graph inside the named graphs, or rather, to identify a particular occurence of the graph. This allows one to describe the graph or the graph source in triples. For instance, one may want to say who is the creator of a particular occurence of a graph. Assuming this semantics for graph names amounts to say that each named graph pair is an assertion that sets the <a>referent</a> of the graph name to be the associated graph.</p>
+		<p>Intutively, this semantics can be seen as quoting the RDF graphs inside the named graphs. In this sense, <code>:alice {:bob  :is  :smart}</code> has to be understood as <q>Alice said: "Bob is smart"</q> which does not entail <q>Alice said: "Bob is intelligent"</q> even though "smart" and "intelligent" can be understood as equivalent.</p>
+
 		<h4 class="formal">Formalization</h4>
-		<h4>Examples</h4>
+		<p>We reuse the notation presented in [[RDF11-MT]]:</p>
+		<blockquote>Suppose I is an interpretation and A is a mapping from a set of blank nodes to the universe IR of I. Define the mapping [I+A] to be I on names, and A on blank nodes on the set: [I+A](x)=I(x) when x is a name and [I+A](x)=A(x) when x is a blank node; and extend this mapping to triples and RDF graphs using the rules given above for ground graphs.</blockquote>
+		<p>A dataset-interpretation <var>I</var> with respect to an entailment regime <var>E</var> is an <var>E</var>-interpretation extended to named graphs and datasets as follows:</p>
+		<ul><li>if <var>(n,g)</var> is a named graph where the graph name is an IRI, then <var>I(n,g)</var> is true if and only if <var>I(n)</var> = <var>(n,g)</var>.
+		<li>if <var>D</var> is a dataset comprising default graph <var>DG</var> and named graphs <var>NG</var>, then <var>I(D)</var> is true if and only if there exists a mapping from bnodes to the universe <var>IR</var> of <var>I</var> such that <var>[I+A](DG)</var> is true and for all named graph <var>(n,g)</var> in <var>NG</var>, <var>[I+A](n)</var> = <var>(n,g)</var>.</li>
+		</ul>
+
+		<h4 class="ex">Examples</h4>
+		<p>Consider the following dataset:</p>
+		<pre class="example">{ }  # empty default graph
+:g1 { :a  :b  :c }
+:g2 { :x  :y  :z }</pre>
+		<p>dataset-entails:</p>
+		<pre class="example">{ }
+_:b { :a  :b  :c }
+:g2 { :x  :y  :z }</pre>
+		<p>but does not dataset-entail:</p>
+		<pre class="example">{ }
+:g1 { []  :b  :c }
+:g2 { :x  :y  :z }</pre>
+		<p>nor:</p>
+		<pre class="example">{ }
+:g1 {  }</pre>
+		<p>If the entailment regime <var>E</var> is RDFS with the recognized datatype <code>xsd:integer</code>, then the following RDF dataset is RDFS-dataset-inconsistent:</p>
+		<pre class="example">{ :age  rdfs:range  xsd:integer .
+:me  :age  :g1 . }  # default graph
+:g1 { :s  :p  :o }</pre>
+		<p>The graph name can be used in triples to attached metadata (here <code>:entains</code> is a custom term that does not enforce a formal constraint, so it is up to the implementation to decide how to treat it):</p>
+		<pre class="example">{ :g1  :published  "2023-08-26"^^xsd:date .
+ :g1  :entails  :g2 .}
+:g1 { :s1  :p1  :o1 .
+      :s2  :p2  :o2 }
+:g2 { :s1  :p1  :o1 }</pre>
+		
+		<h4 class="prop">Properties of this dataset semantics</h4>
+		<p>There are important implications with this semantics. First, the presence of blank nodes as graph names can be problematic because a named graph entails an infinity of other named graphs where only the graph name is changed to a different blank node. Second, graph names have to be handled almost like literals. Unlike other IRIs or blank nodes, their denotation is strictly fixed, like literals are. Therefore, any entailment regime that recognizes datatypes and use this semantics has to be able to distinguish graphs from, e.g., integers and strings. Combined with RDFS semantics, it can lead to inconsistencies, as in the last example above.</p>
+		<p>A variant of this dataset semantics imposes that the graph name denotes the RDF graph itself, rather than the pair. This means that two occurrences of the same graph in different named graph pairs actually identify the same thing. Thus, the graph names associated with the same RDF graphs are interchangeable in any triple in this case.</p>
 	</section>
 
 	<section>
+		<h3>Each named graph defines its own context</h3>
+		<p>Named graphs in RDF datasets are sometimes used to delimit a context in which the triples of the named graphs are true. From the truth of these triples, it is possible to infer knowledge that it is convenient to make part of the named graph. An example of such situation occurs when one wants to keep track of the evolution of the data with time. Another example is when one wants to allow different view points to be expressed and reasoned with, without creating a conflict or inconsistency. By having inferences done at the named graph level, one can prevent for instance that triples comming from untrusted parties are influenceing trusted knowledge. Yet it does not disallow reasoning with and drawing conclusions from untrusted information.</p>
+		<p>Intutively, this semantics can be seen as interpreting the RDF graphs inside the named graphs. In this sense, <code>:alice {:bob  :is  :smart}</code> has to be understood as <q>Alice said that Bob is smart</q> which entails <q>Alice said that Bob is intelligent</q> because the two sentences mean the same thing. Neither sentences mean that Alice used these actual words.</p>
+
+		<h4 class="formal">Formalization</h4>
+		<p>There are several possible formalization of this. One way is to interpret the graph name as denoting a graph that represents all that is true in the context of the named graph. In this case, a dataset-interpretation with respect to an entailment regime <var>E</var> is an <var>E</var>-interpretation such that:</p>
+		<ul>
+			<li>for each named graph pair <var>ng</var> = <var>(n,G)</var>, <var>I(ng)</var> is true if <var>I(n)</var> is an RDF graph and <var>E</var>-entails <var>G</var>;</li>
+			<li>for a dataset <var>D</var> = <var>(DG,NG)</var>, <var>I(D)</var> is true if <var>I(DG)</var> is true and for all named graph <var>ng</var> in <var>NG</var>, <var>I(ng)</var> is true;
+			<li><var>I(D)</var> is false otherwise.</li>
+		</ul>
+
+		<h4 class="ex">Examples</h4>
+		<p>Consider the following dataset:</p>
+		<pre class="example">{ }  # empty default graph
+:g1 { :YoutubeEmployee  rdfs:subClassOf  :GoogleEmployee .
+:steveChen  rdf:type  :YoutubeEmployee . }
+:g2 { :chadHurley  rdf:type  :YoutubeEmployee }</pre>
+		<p>RDFS-dataset-entails:</p>
+		<pre class="example">{ }
+:g1 { :steveChen  rdf:type  :GoogleEmployee }</pre>
+		<p>but does not RDFS-dataset-entail:</p>
+		<pre class="example">{ }
+:g2 { :chadHurley  rdf:type  :GoogleEmployee }</pre>
+		<p>With this semantics too, graph names can be used in triples:</p>
+		<pre class="example">{ :g1  :validAfter  "2006"^^xsd:gYear .
+ :g1  :published  "2013-08-26"^^xsd:date .
+ :g2  :validAt  "2005"^^:xsd:gYear .}
+:g1 { :YoutubeEmployee  rdfs:subClassOf  :GoogleEmployee .
+:steveChen  rdf:type  :YoutubeEmployee . }
+:g2 { :chadHurley  rdf:type  :YoutubeEmployee }</pre>
+		<p>(here, <code>:validAfter</code> and <code>:validAt</code> are custom terms that do not enforce a formal constraint, but may be used internally for, e.g., checking the temporal validity of triples in the named graph).</p>
+
+		<h4 class="prop">Properties of this dataset semantics</h4>
+		<p>This semantics assumes that the truth of named graphs is preserved when replacing the RDF graphs inside named graphs with equivalent graphs. This means in particular, that one can normalise literals and still preserve the truth of a named graph. This means too that standard RDF inferences that can be drawn from the RDF graphs inside named graphs can be added to the graph associated with the graph name without impacting the truth of the RDF dataset.</p>
+		<p>While this semantics does not guarantee that reasoning with RDF datasets will preserve the exact triples of an original dataset, it is semantically valid to store both the original and any entailed datasets.</p>
+		<p>An example implementation of such a context-based semantics is Sindice [[Delbru-et-al]].</p>
+		
+		<h4 class="other">Variants this dataset semantics</h4>
+		<p>There are several variants of this type of dataset-semantics</p>
+		<ul>
+			<li>The default graph is interpreted as universal truth, that is, for a named graph <var>(n,G)</var>, <var>I(n)</var> <var>E</var>-entails the default graph.</li>
+			<li>The graph name does not denote an RDF graph but a resource associated with an RDF graph. This is similar to saying that the name is interpreted as the intension of the graph, and the actual RDF graph is its extension.</li>
+			<li>Each named graph could be associated with a distinct <var>E</var>-interpretation and impose all interpretations to be true for their corresponding graph, in order for the dataset to be true.</li>
+		</ul>
+	</section>
+
+	<!--<section>
 		<h3>Each named graph defines its own "context"</h3>
 		<p>Sometimes, the separation of triples into different named graphs is used to indicate truth in different contexts. Each graph describes a "world".</p>
 		<p>In substance, the formalization says that each RDF graph in a dataset is interpreted separately.  This models the fact that different RDF graphs may hold in different contexts.  This way, graphs that have been put in different "named graph pairs" can contradict with each other without making the dataset inconsistent.</p>
 	
 		<h4 class="formal">Formalization</h4>
-		<p>Like RDF interpretations, a dataset-interpretation is relative to a vocabulary V.  Moreover, dataset interpretations are defined with respect to an entailment regime E.  Let KE be the set of all E-interpretations.  The dataset-interpretation of a vocabulary V is a pair (IG,Con) where IG is an E-interpretation of V and Con is a mapping from V to KE.</p>
+		<p>For any entailment regime <var>E</var>, let <var>K(E)</var> be the set of all <var>E</var>-interpretations. A dataset-interpretation with respect to an entailment regime <var>E</var> is a pair <var>(IG,Con)</var> where IG is an <var>E</var>-interpretation and <var>Con</var> is a partial mapping from to </var>K(E)</var>.</p>
 		<p>The truth of a dataset for a dataset-interpretation I = (IG,Con) is defined as follows:</p>
 		<ul>
 			<li>for a named graph pair ng = (n,G), I(ng) is true if Con(n) is defined Con(n)(G) is true;</li>
@@ -260,33 +374,16 @@
 			<li>I(D) is false otherwise.</li>
 		</ul>
 		<p>Following standard definitions, we say that a dataset D1 entails a dataset D2 if all dataset-interpretation I that makes D1 true also makes D2 true.</p>
-	</section>
-	
-	<section>
-		<h3>Each named graph is a hypothetical theory</h3>
-		<p></p>
-		<h4 class="formal">Formalization</h4>
-		<p>A dataset-interpretation of a vocabulary V is a pair (IG,IGEXT) where IG is an E-interpretation of V and IGEXT is a mapping from V to the set of RDF graphs.</p>
-		<p>The truth of a dataset for a dataset-interpretation I = (IG,IGEXT) is defined as follows:</p>
-		<ul>
-			<li>for a named graph pair ng = (n,G), I(ng) is true if IGEXT(n) is defined and IGEXT(n) E-entails G;</li>
-			<li>for a dataset D = (DG,G), I(D) is true if IG(G) is true and for all named graph ng in NG, I(ng) is true;
-			<li>I(D) is false otherwise.</li>
-		</ul>
-		<p></p>
+	</section>-->
 
-		<h4 class="ex">Examples</h4>
-	</section>
-
-
-	<section>
+	<!--<section>
 		<h3>Named graphs as contexts, and the default graph is universal truth</h3>
 		<p>In this case, the named graphs are used to hold statements that are only true in certain circonstances, while the default graph holds in all cases. For instance, terminological knowledge may be considered universal (such as, hierarchy of classes or properties), while assertional knowledge (facts about instances) is changing from time to time, or from sources to sources.</p>
 
 		<h4 class="formal">Formalization</h4>
 		
 		<h4 class="ex">Examples</h4>
-	</section>
+	</section>-->
 	
 	<section>
 		<h3>Quad semantics</h3>
@@ -294,37 +391,70 @@
 		<p>This semantics is extending the semantics of RDF rather than simply reusing it.</p>
 
 		<h4 class="formal">Formalization</h4>
-		<p>A quad-interpretration of a vocabulary V is a tuple (IR,IP,IEXT,IS,IL,LV) where IR, IP, IS, IL and LV are defined as in RDF and IEXT is a mapping from IP into the powerset of IR &times; IR union IR &times; IR &times; IR.</p>
+		<p>A quad-interpretration is a tuple <var>(IR,IP,IEXT,IS,IL,LV)</var> where <var>IR</var>, <var>IP</var>, <var>IS</var>, <var>IL</var> and <var>LV</var> are defined as in RDF and <var>IEXT</var> is a mapping from <var>IP</var> into the powerset of <var>IR &times; IR union IR &times; IR &times; IR</var>.</p>
 
-		<p>Since this option modifies the notion of simple-interpretation, instead of simply referring to it, which is the basis for all E-interpretations in any entailment regime E, it is not clear how it can be extended to arbitrary entailment regimes.</p>
+		<p>Since this option modifies the notion of simple-interpretation, which is the basis for all <var>E</var>-interpretations in any entailment regime E, it is not clear how it can be extended to arbitrary entailment regimes. For instance, does the following quad set:</p>
+		<pre class="example">:a  rdf:type  :c  :x .
+:c  rdfs:subClassOf  :d  :x .</pre>
+		<p>RDFS-dataset-entails:</p>
+		<pre class="example">:a  rdf:type  :d  :x .</pre>
 		
-		<h4 class="ex">Examples</h4>
+		<h4 class="prop">Properties of this dataset semantics</h4>
+		<p>With this semantics, all inferences that are valid with normal RDF triples are preserved, but it is necessary to extend RDFS in order to accomodate for ternary relations. There are several existing proposal that extends this quad semantics by dealing with a specific "dimension", such as time, uncertainty, provenance. For instance, temporal RDF [[TEMPORAL-RDF]] use the fourth element to denote a time frame, and reasoning can be performed per time frame. Special semantic rules allow one to combine triples in overlapping time frames. Fuzzy RDF extends the semantics to deal with uncertainty. stRDF extends temporal RDF to deal with spatial information. Annotated RDF generalizes the previous proposals.</p>
+	</section>
+
+	<section>
+		<h3>Quoted graphs</h3>
+		<p>Quoted graphs are a way to associate information to a specific RDF graph without constraining the relationship between a graph name and the graph associated with it in a dataset. An RDF graph is "quoted" by using a literal having a lexical form that is a syntactic expression of the graph. For instance:</p>
+		<pre class="example">{ :g  :quotes  ":a  :b  []"^^:turtle . }
+:g { :b  rdf:type  rdf:Property .
+ :a  :b  _:x . }</pre>
+		<p>This technique allows one to assume a dataset semantics of contexts (as in Section ?) and still preserve an initial version of a graph. However, quoting big graphs may be combursome and would require a custom datatype to be recognized.</p>
 
 	</section>
 
 	<section>
+		<h3>Relationship with SPARQL entailment regime</h3>
+		<p>There is a strong relationship between SPARQL ASK queries with an entailment regime [[SPARQL_ER]] and inferences in the regime. If an ASK query does not contain variables and its WHERE clause only contains a basic graph pattern, then the query can be seen as an RDF graph. If such a graph query <var>Q</var> returns <code>true</code> when issued against an RDF graph <var>G</var> with entailment regime <var>E</var>, then <var>G</var> <var>E</var>-entails <var>Q</var>. If it returns <code>false</code>, then <var>G</var> does not <var>E</var>-entail <var>Q</var>.</p>
+		<p>A dataset semantics can also be compared to what ASK queries return when they do not contain variables but may contain basic graph patterns or graph graph patterns. For instance, consider the dataset:</p>
+		<pre class="example">{ }
+:g1 { :x  rdf:type  :c .
+ :c  rdfs:subClassOf  :d . }
+:g2 { :y  rdf:type  :c . }</pre>
+		<p>Then the query:</p>
+		<pre class="example">ASK WHERE {
+    GRAPH :g1 { :x  rdf:type  :d }
+}</pre>
+		<p>with RDFS entailment regime would answer <code>true</code>, but the query:</p>
+		<pre class="example">ASK WHERE {
+    GRAPH :g1 { :x  rdf:type  :d }
+    GRAPH :g2 { :y  rdf:type  :d }
+}</pre>
+		<p>would answer <code>false</code>.</p>
+		<p>This can lead to a classification of dataset semantics in terms of whether they are compatible with SPARQL ASK queries or not. It can be noted that a semantics where each named graph defines its own context is "SPARQL-ASK-compatible", while a semantics where the graph name denotes the graph or named graph is not compatible in this sense.</p>
 	</section>
 </section>
 
 <section id="declaring">
 	<h2>Declaring the intended semantics</h2>
 	
-	<p>In spite of the RDF Working Group's mission to define a semantics for a multiple graph data model, none semantics presented before could obtained consensus. Choosing one or another of the propositions before would have gone against deployed implementations. Therefore, the Working Group discussed the possibility to define several semantics, among  which an implementation could choose, and provide the means to declare which semantics is adopted.</p>
+	<p>The RDF Working Group did not define a formal semantics for a multiple graph data model because none of the semantics presented before could obtained consensus. Choosing one or another of the propositions before would have gone against some deployed implementations. Therefore, the Working Group discussed the possibility to define several semantics, among which an implementation could choose, and provide the means to declare which semantics is adopted.</p>
 	<p>This was not retained eventually, because of the lack of experience, and potentially the lack of utility, so there is no definite option for this. Nonetheless, for completeness, we describe here possible solutions.</p>
 
 	<h3>Using vocabularies</h3>
 	<p>A dataset can be described in RDF using vocabularies like voiD [[VOID]] and the SPARQL service description vocabulary. VoiD is used to describe how a collection of RDF triples is organized in a web site or across web sites, giving information about the size of the datasets, the location of the dump files, the IRI of the query endpoints, and so on. The notion of dataset in voiD is used as a more informal and broader concept than RDF dataset. However, an RDF dataset and the graphs in it can be describe as voiD datasets and the information can be completed with SPARQL service description</p>
-	<pre class="example"><script type="text/turtle">@prefix er: <http://www.w3.org/ns/entailment> .
-@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
+	<pre class="example">@prefix er: &lt;http://www.w3.org/ns/entailment&gt; .
+@prefix sd: &lt;http://www.w3.org/ns/sparql-service-description#&gt; .
+[]  a sd:Dataset;
+    sd:defaultEntailmentRegime er:RDF;
+    sd:namedGraph [
+        sd:name "http://example.com/ng1";
+        sd:entailmentRegime er:RDFS
+    ] .</pre>
+	<p>A vocabulary specifically tailored for describing the intended dataset semantics could be defined in a future specification.</p>
 
-[]
-    a sd:Dataset ;
-    sd:defaultEntailmentRegime er:RDF;
-	sd:namedGraph [
-		sd:name "http://example.com/ng1";
-		sd:entailmentRegime er:RDFS
-	] .</pre>
-	
+	<h3>Using other mechanisms</h3>
+	<p>Communication of the intended semantics could be performed in various ways, from having the author tell the consumers directly, to inventing a protocol for this. Use of the HTTP protocol and content negotiation could be a possible way too. Special syntactic markers in the concrete serialization of datasets could convey the intended meaning. All of those are solutions that do not follow current practices.</p>
 </section>
 
 <section id="references">
@@ -334,7 +464,8 @@
 <section class="appendix informative" id="changes">
   <h2>Changes</h2>
   <ul>
-    <li>2013-01-28:  Initial editor's draft.</li>
+    <li>2013-09-17:  All sections revised. Second published editor's draft.</li>
+	<li>2013-01-28:  Initial editor's draft.</li>
   </ul>
 </section>
author	AZ
	Tue, 17 Sep 2013 09:13:18 +0200
changeset 1087	fe8fbf908d32
parent 1005	9391a2bc14c1
child 1089	afba5f8bebc5