Finished draft of well-formed checks
authorDave Reynolds <dave@epimorphics.com>
Tue, 05 Mar 2013 11:19:32 +0000
changeset 337 308652c0b060
parent 336 b4ffa5812d09
child 338 fb948cc514ed
Finished draft of well-formed checks
data-cube/index.html
--- a/data-cube/index.html	Tue Mar 05 00:21:52 2013 +0000
+++ b/data-cube/index.html	Tue Mar 05 11:19:32 2013 +0000
@@ -862,7 +862,7 @@
 
 
 <section id="dsd-mm-dim">
-<h4>Measure dimension</h4>
+<h4><dfn>Measure dimension</dfn></h4>
   
 <p>This approach restricts observations to having a single measured value but allows
   a data set to carry multiple measures by adding an extra dimension, a <em>measure dimension</em>.
@@ -1661,6 +1661,14 @@
 <section id="wf">
 <h2>Well-formed cubes</h2>
 
+<div class="note">
+This section is At Risk. The working group believes these criteria to
+be correct and compatible with earlier versions of the RDF Data Cube vocabulary.
+However as a new addition they not received as much
+scrutiny as other parts of the specification. If problems are uncovered
+during the Last Call process the working group may retract all or part of this section. 
+</div>
+
 <p>An instance of an RDF Data Cube should conform to a set of
   integrity constraints which we define in this section.</p>
 
@@ -1671,72 +1679,533 @@
 <p>A <dfn>well-formed abbreviated</dfn> RDF Data Cube is an a RDF
   graph which, when expanded using
   the <a href="#normalize-algorithm">normalization algorithm</a>
-yields a <a>well-formed RDF Data Cube</a>.</p>
+  yields a <a>well-formed RDF Data Cube</a>.</p>
 
 <section id="wf-rules">
 <h3>Integrity constraints</h3>
 
 <p>Each integrity constraint is expressed as narrative prose and, where possible, a SPARQL
-  [[!SPARQL-QUERY-11]] ASK query which will return <em>true</em>
-  if the constraint has been violated. Using SPARQL queries to
-  define the integrity constraints does not imply that integrity
+  [[!SPARQL-QUERY-11]]. If the ASK query  is applied to an RDF graph then it
+  will return <em>true</em> if that graph contains one or more RDF Data Cube instances which
+  violate the corresponding constraint.
+  Using SPARQL queries to
+  express the integrity constraints does not imply that integrity
   checking must be performed this way. Implementations are free
   to use alternative query formulations or alternative implementation
-  techniques to perform equivalent checks. For example the queries given
-  here may not be practically scalable to large cubes.</p>
+  techniques to perform equivalent checks.</p>
+
+<p>Each integrity constraint query assumes the following set of prefix bindings:</p>
+<pre>
+PREFIX rdf:     &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#>
+PREFIX rdfs:    &lt;http://www.w3.org/2000/01/rdf-schema#>
+PREFIX skos:    &lt;http://www.w3.org/2004/02/skos/core#>
+PREFIX qb:      &lt;http://purl.org/linked-data/cube#>
+PREFIX xsd:     &lt;http://www.w3.org/2001/XMLSchema#>
+PREFIX owl:     &lt;http://www.w3.org/2002/07/owl#>
+</pre>
 
 <p>The complete set of constraints is listed below.</p>
 
-<table id="ic-0" class="bordered-table">
-  <thead>
-    <tr>
-      <th>IC-0</th>
-      <th>Datatype consistency</th>
-    </tr>
-  </thead>
+jobs
+<h4 id="ic-0">IC-0. Datatype consistency</h4>
+<p>
+The RDF graph must be consistent under RDF D-entailment [[!RDF-MT]]
+using a datatype map containing all the datatypes used within the graph. 
+</p>
+
+<h3 id="ic-1">IC-1.  Unique DataSet</h3>
+<p>
+Every <code><a>qb:Observation</a></code> has exactly one associated <code><a>qb:DataSet</a></code>.
+</p> 
+<table class="bordered-table">
   <tbody>
-    <tr><td colspan="2">
-The RDF graph must be consistent under RDF D-entailment [[!RDF-MT]]
-using a datatype map containing all the datatypes used within the graph.
-    </td></tr>
+    <tr><td><pre>
+ASK {
+  {
+    # Check observation has a data set
+    ?obs a qb:Observation .
+    FILTER NOT EXISTS { ?obs qb:dataSet ?dataset1 . }
+  } UNION {
+    # Check has just one data set
+    ?obs a qb:Observation ;
+       qb:dataSet ?dataset1, ?dataset2 .
+    FILTER (?dataset1 != ?dataset2)
+  }
+}
+    </td></pre></tr>
+  </tbody>
+</table> 
+
+<h3 id="ic-2">IC-2. Unique DSD</h3>
+<p>
+Every <code><a>qb:DataSet</a></code> has exactly one associated <code><a>qb:DataStructureDefinition</a></code>.
+</p>
+<table class="bordered-table">
+  <tbody> 
+    <tr><td><pre>
+ASK {
+  {
+    # Check dataset has a dsd
+    ?dataset a iqb:DataSet .
+    FILTER NOT EXISTS { ?dataset qb:structure ?dsd . }
+  } UNION { 
+    # Check has just one dsd
+    ?dataset a qb:DataSet ;
+       qb:structure ?dsd1, ?dsd2 .
+    FILTER (?dsd1 != ?dsd2)
+  }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-3">IC-3. DSD includes measure</h3>
+<p>
+Every <code><a>qb:DataStructureDefinition</a></code> must include at least one declared measure.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  ?dsd a qb:DataStructureDefinition .
+  FILTER NOT EXISTS { ?dsd qb:component [qb:componentProperty [a qb:MeasureProperty]] }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-4">IC-4. Dimensions have range</h3>
+<p>
+Every dimension declared in a <code><a>qb:DataStructureDefinition</a></code> must have a declared <code><a>rdfs:range</a></code>.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  ?dim a qb:DimensionProperty .
+  FILTER NOT EXISTS { ?dim rdfs:range ?range }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-5">IC-5. Concept dimensions have code lists</h3>
+<p>
+Every dimension with range <code><a>skos:Concept</a></code> must have a <code><a>qb:codeList</a></code>.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  ?dim a qb:DimensionProperty ;
+       rdfs:range skos:Concept .
+  FILTER NOT EXISTS { ?dim qb:codeList [] }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-6">IC-6. Only attributes may be optional</h3>
+<p>
+The only components of
+a <code><a>qb:DataStructureDefinition</a></code> that may be marked as
+optional, using <code><a>qb:componentRequired</a> false</code> are attributes.
+</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  ?dsd qb:component ?componentSpec .
+  ?componentSpec qb:componentRequired "false"^^xsd:boolean ;
+                 qb:componentProperty ?component .
+  FILTER NOT EXISTS { ?component a qb:AttributeProperty }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-7">IC-7. Slice Keys must be declared</h3>
+<p>
+Every <code><a>qb:SliceKey</a></code> must be associated with a <code><a>qb:DataStructureDefinition</a></code>.
+</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    ?sliceKey a qb:SliceKey .
+    FILTER NOT EXISTS { [a qb:DataStructureDefinition] qb:sliceKey ?sliceKey }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-8">IC-8. Slice Keys consistent with DSD</h3>
+<p>
+Every <code><a>qb:componentProperty</a></code> on a <code><a>qb:SliceKey</a></code> must also be declared as a <code><a>qb:component</a></code> of the associated <code><a>qb:DataStructureDefinition</a></code>.
+</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  ?slicekey a qb:SliceKey;
+      qb:componentProperty ?prop .
+  ?dsd qb:sliceKey ?sliceKey .
+  FILTER NOT EXISTS { ?dsd qb:component [qb:componentProperty ?prop] }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-9">IC-9. Unique slice structure</h3>
+<p>
+Each <code><a>qb:Slice</a></code> must have exactly one associated <code><a>qb:sliceStructure</a></code>.
+</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  {
+    # Slice has a key
+    ?slice a qb:Slice .
+    FILTER NOT EXISTS { ?slice qb:sliceStructure ?key }
+  } UNION {
+    # Slice has just one key
+    ?slice a qb:Slice ;
+           qb:sliceStructure ?key1, ?key2;
+    FILTER (?key1 != ?key2)
+  }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+
+<h3 id="ic-10">IC-10. Slice dimensions complete</h3>
+<p>
+Every <code><a>qb:Slice</a></code> must have a value for every dimension declared in its <code><a>qb:sliceStructure</a></code>.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  ?slice qb:sliceStructure [qb:componentProperty ?dim] .
+  FILTER NOT EXISTS { ?slice ?dim [] }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-11">IC-11. All dimensions required</h3>
+<p>
+Every <code><a>qb:Observation</a></code> has a value for each dimension declared in its associated <code><a>qb:DataStructureDefinition</a></code>.
+</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+    ?dim a qb:DimensionProperty;
+    FILTER NOT EXISTS { ?obs ?.dim [] }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-12">IC-12. No duplicate observations</h3>
+<p>
+No two <code><a>qb:Observation</a></code>s in the same <code><a>qb:DataSet</a></code> may have the same value for all dimensions.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  FILTER( ?allEqual )
+  {
+    # For each pair of observations test if all the dimension values are the same
+    SELECT (MIN(?equal) AS ?allEqual) WHERE {
+        ?obs1 qb:dataSet ?dataset .
+        ?obs2 qb:dataSet ?dataset .
+        FILTER (?obs1 != ?obs2)
+        ?dataset qb:structure/qb:component/qb:componentProperty ?dim .
+        ?dim a qb:DimensionProperty .
+  )      ?obs1 ?dim ?value1 .
+        ?obs2 ?dim ?value2 .
+        BIND( ?value1 = ?value2 AS ?equal)
+    } GROUP BY ?obs1 ?obs2
+  }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-13">IC-13. Required attributes</h3>
+<p>
+Every <code><a>qb:Observation</a></code> has a value for each declared attribute that is not explicitly marked as optional.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    ?obs qb:dataSet/qb:structure/qb:component ?component .
+    ?component qb:componentRequired "true"^^xsd:boolean ;
+               qb:componentProperty ?attr .
+    FILTER NOT EXISTS { ?obs ?attr [] }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-14">IC-14. All measures present</h3>
+<p>
+In a <code><a>qb:DataSet</a></code> which does not use a <a>Measure dimension</a> then each individual <code><a>qb:Observation</a></code> must have a value for every declared measure.
+</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    # Observation in a non-measureType cube
+    ?obs qb:dataSet/qb:structure ?dsd .
+    FILTER NOT EXISTS { ?dsd qb:component/qb:componentProperty qb:measureType }
+
+    # verify every measure is present
+    ?dsd qb:component/qb:componentProperty ?measure .
+    ?measure a qb:MeasureProperty;
+    FILTER NOT EXISTS { ?obs ?measure [] }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-15">IC-15. Measure dimension consistent</h3>
+<p>
+In a <code><a>qb:DataSet</a></code> which uses a <a>Measure dimension</a> then each <code><a>qb:Observation</a></code> must have a value for the measure corresponding to its given <code><a>qb:measureType</a></code>.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    # Observation in a measureType-cube
+    ?obs qb:dataSet/qb:structure ?dsd ;
+         qb:measureType ?measure .
+    ?dsd qb:component/qb:componentProperty qb:measureType .
+    # Must have value for its measureType
+    FILTER NOT EXISTS { ?obs ?measure [] }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-16">IC-16. Single measure on measure dimension observation</h3>
+<p>
+In a <code><a>qb:DataSet</a></code> which uses a <a>Measure dimension</a> then each <code><a>qb:Observation</a></code> must only have a measure value one measure (by IC-15 this will be the measure corresponding to its <code><a>qb:measureType</a></code>).
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    # Observation with measureType
+    ?obs qb:dataSet/qb:structure ?dsd ;
+         qb:measureType ?measure ;
+         ?omeasure [] .
+    # Any measure on the observation
+    ?dsd qb:component/qb:componentProperty qb:measureType ;
+         qb:component/qb:componentProperty ?omeasure .
+    ?omeasure a qb:MeasureProperty .
+    # Must be the same as the measureType
+    FILTER (?omeasure != ?measure)
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-17">IC-17. All measures present in measures dimension cube </h3>
+<p>
+In a <code><a>qb:DataSet</a></code> which uses a <a>Measure dimension</a> then if there is a Observation for some combination of non-measure dimensions then there must be other Observations with the same non-measure dimension values for each of the declared measures.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+  {
+      # Count number of other measures found at each point 
+      SELECT ?numMeasures (COUNT(?obs2) AS ?count) WHERE {
+          {
+              # Find the DSDs and check how many measures they have
+              SELECT ?dsd (COUNT(?m) AS ?numMeasures) WHERE {
+                  ?dsd qb:component/qb:componentProperty ?m.
+                  ?m a qb:MeasureProperty .
+              } GROUP BY ?dsd
+          }
+        
+          # Observation in measureType cube
+          ?obs1 qb:dataSet/qb:structure ?dsd;
+                qb:dataSet ?dataset ;
+                qb:measureType ?m1 .
+    
+          # Other observation at same dimension value
+          ?obs2 qb:dataSet ?dataset ;
+                qb:measureType ?m2 .
+          FILTER NOT EXISTS { 
+              ?dsd qb:component/qb:componentProperty ?dim .
+              FILTER (?dim != qb:measureType)
+              ?dim a qb:DimensionProperty .
+              ?obs1 ?dim ?v1 . 
+              ?obs2 ?dim ?v2. 
+              FILTER (?v1 != ?v2)
+          }
+          
+      } GROUP BY ?obs1 ?numMeasures
+        HAVING (?count != ?numMeasures)
+  }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-18">IC-18. Consistent data set links</h3>
+<p>
+If a <code><a>qb:DataSet</a></code> D has a <code><a>qb:slice</a></code> S, and S has an <code><a>qb:observation</a></code> O, then the <code><a>qb:dataSet</a></code> corresponding to O must be D.
+</p>
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    ?dataset qb:slice       ?slice .
+    ?slice   qb:observation ?obs .
+    FILTER NOT EXISTS { ?obs qb:dataSet ?dataset . }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-19">IC-19. Codes from code list</h3>
+<p>
+If a dimension property has a <code><a>qb:codeList</a></code>, then the value of the dimension property on every <code><a>qb:Observation</a></code> must be in the code list.
+</p>
+<p>The following integrity check queries must be applied to an RDF graph which contains the 
+definition of the code list as well as the RDF Data Cube to be checked. In the case
+of a <code>skos:ConceptScheme</code> then each concept must be linked to the scheme using
+<code>skos:inScheme</code>. In the case of a <code>skos:Collection</code> then the
+collection must link to each concept using <code>skos:member</code> (i.e. if the
+collection uses <code>skos:memberList</code> then the entailment of <code>skos:member</code>
+values defined by <a href="http://www.w3.org/TR/2009/REC-skos-reference-20090818/#S36">S36</a>
+in [[!SKOS-REFERENCE]] must be materialized).</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+    ?dim a qb:DimensionProperty ;
+        qb:codeList ?list .
+    ?list a skos:ConceptScheme .
+    ?obs ?dim ?v .
+    FILTER NOT EXISTS { ?v skos:inScheme ?list }
+}
+
+ASK {
+    ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+    ?dim a qb:DimensionProperty ;
+        qb:codeList ?list .
+    ?list a skos:Collection .
+    ?obs ?dim ?v .
+    FILTER NOT EXISTS { ?list skos:member ?v }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-20">IC-20. Codes from hierarchy</h3>
+<p>
+If a dimension property has
+a <code><a>qb:HierarchicalCodeList</a></code> with a non-blank <code><a>qb:parentChildProperty</a></code> then the value of that dimension property on every <code><a>qb:Observation</a></code> must be reachable from a root of hierarchy using zero or more hops along the <code><a>qb:parentChildProperty</a></code> links.
+</p>
+<p>
+This check cannot be made by a simple fixed SPARQL query. Instead a
+query template is supplied. 
+An instance of the template should be generated
+for each <code><a>qb:HierarchicalCodeList</a></code> which has an IRI
+value for its  <code><a>qb:parentChildProperty</a></code>.
+That is for each binding of <code>?p</code> in the following
+instantiation query:</p>
+<pre>
+SELECT ?p WHERE {
+    ?hierarchy a qb:HierarchicalCodeList ;
+               qb:parentChildProperty ?p .
+    FILTER ( isIRI(?p) )
+}
+</pre>
+
+<p>The template is then instantiated by replacing the
+  string <code>$p</code> by the IRI found by the
+  instantiation query. The template is:</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+    ?dim a qb:DimensionProperty ;
+        qb:codeList ?list .
+    ?list a qb:HierarchicalCodeList .
+    ?obs ?dim ?v .
+    FILTER NOT EXISTS { ?list qb:hierarchyRoot/<$p>* ?v }
+}
+    </td></pre></tr>
+  </tbody>
+</table>
+
+<h3 id="ic-21">IC-21. Codes from hierarchy (inverse)</h3>
+<p>
+If a dimension property has a <code><a>qb:HierarchicalCodeList</a></code> with an inverse <code><a>qb:parentChildProperty</a></code> then the value of that dimension property on every <code><a>qb:Observation</a></code> must be reachable from a root of hierarchy using zero or more hops along the inverse  <code><a>qb:parentChildProperty</a></code> links.
+</p>
+
+<p>
+This check cannot be made by a simple fixed SPARQL query. Instead a
+query template is supplied. 
+An instance of the template should be generated
+for each <code><a>qb:HierarchicalCodeList</a></code> which has an
+blank-node
+value for its  <code><a>qb:parentChildProperty</a></code>, with an 
+associated inverse property.
+That is for each binding of <code>?p</code> in the following
+instantiation query:</p>
+<pre>
+SELECT ?p WHERE {
+    ?hierarchy a qb:HierarchicalCodeList;
+               qb:parentChildProperty ?pcp .
+    FILTER( isBlank(?pcp) )
+    ?pcp  owl:inverseOf ?p .
+    FILTER( isIRI(?p) )
+}
+</pre>
+
+<p>The template is then instantiated by replacing the
+  string <code>$p</code> by the IRI found by the
+  instantiation query. The template is:</p>
+
+<table class="bordered-table">
+  <tbody>
+    <tr><td><pre>
+ASK {
+    ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+    ?dim a qb:DimensionProperty ;
+        qb:codeList ?list .
+    ?list a qb:HierarchicalCodeList .
+    ?obs ?dim ?v .
+    FILTER NOT EXISTS { ?list qb:hierarchyRoot/(^<$p>)* ?v }
+}
+    </td></pre></tr>
   </tbody>
 </table>
 
 </section>
-<pre>
-
-1. Every Observation has a unique associated DataSet
-
-2. Every DataSet has a unique associated DataStructureDefinition
-3. Every DSD must include a measure
-4. Every Dimension must have a declared range
-5. Every Dimension with range skos:concept must have a codeList
-6. Only attributes may be marked optional
-
-7. Every SliceKey must be associated with a DataStructureDefinition
-8. SliceKey components must be subset of the DSD's component
-9. Every Slice must have exactly one sliceStructure
-
-10. Every Slice must have a value for every dimension in its sliceStructure
-
-11. Every observation has a value for each declared dimension
-12. No two observations in the same cube may have the same value for all dimensions
-13. Every observation has a value for each non-optional attribute
-14. Every observation in a non-measureType cube must have a value for every measure
-15. Every observation in a measureType cube must have a measure value corresponding to its measureType
-16. Every observation in a measureType cube must have a value for only one measure
-17. In a measureType cube if there is an observation for one measure, there must be a corresponding observation for all other measures at the same dimension values
-
-18. if A qb:slice B and B qb:observation C then C qb:dataSet A
-
-19. If a dimension property has a qb:codeList, then the value of the dimension property on every observation must be in the code list
-
-20. If a dimension property has a hierarchical code list with a parentChildProperty then the value of that dimension property on every observation must be reachable from a root of hierarchy using zero or more hops along the parentChildProperty links.
-21. If a dimension property has a hierarchical code list with an inverse parentChildProperty then the value of that dimension property on every observation must be reachable from a root of hierarchy using zero or more hops along the inverse parentChildProperty links.
-
-</pre>
-
-Note that 19-21 need access to code list, with skos:inScheme for schemes and skos:member for collections (unpack ordered members if necessary)
 
 </section>