--- a/data-cube/index.html Tue Mar 05 00:21:52 2013 +0000
+++ b/data-cube/index.html Tue Mar 05 11:19:32 2013 +0000
@@ -862,7 +862,7 @@
<section id="dsd-mm-dim">
-<h4>Measure dimension</h4>
+<h4><dfn>Measure dimension</dfn></h4>
<p>This approach restricts observations to having a single measured value but allows
a data set to carry multiple measures by adding an extra dimension, a <em>measure dimension</em>.
@@ -1661,6 +1661,14 @@
<section id="wf">
<h2>Well-formed cubes</h2>
+<div class="note">
+This section is At Risk. The working group believes these criteria to
+be correct and compatible with earlier versions of the RDF Data Cube vocabulary.
+However as a new addition they not received as much
+scrutiny as other parts of the specification. If problems are uncovered
+during the Last Call process the working group may retract all or part of this section.
+</div>
+
<p>An instance of an RDF Data Cube should conform to a set of
integrity constraints which we define in this section.</p>
@@ -1671,72 +1679,533 @@
<p>A <dfn>well-formed abbreviated</dfn> RDF Data Cube is an a RDF
graph which, when expanded using
the <a href="#normalize-algorithm">normalization algorithm</a>
-yields a <a>well-formed RDF Data Cube</a>.</p>
+ yields a <a>well-formed RDF Data Cube</a>.</p>
<section id="wf-rules">
<h3>Integrity constraints</h3>
<p>Each integrity constraint is expressed as narrative prose and, where possible, a SPARQL
- [[!SPARQL-QUERY-11]] ASK query which will return <em>true</em>
- if the constraint has been violated. Using SPARQL queries to
- define the integrity constraints does not imply that integrity
+ [[!SPARQL-QUERY-11]]. If the ASK query is applied to an RDF graph then it
+ will return <em>true</em> if that graph contains one or more RDF Data Cube instances which
+ violate the corresponding constraint.
+ Using SPARQL queries to
+ express the integrity constraints does not imply that integrity
checking must be performed this way. Implementations are free
to use alternative query formulations or alternative implementation
- techniques to perform equivalent checks. For example the queries given
- here may not be practically scalable to large cubes.</p>
+ techniques to perform equivalent checks.</p>
+
+<p>Each integrity constraint query assumes the following set of prefix bindings:</p>
+<pre>
+PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
+PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
+PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
+PREFIX qb: <http://purl.org/linked-data/cube#>
+PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
+PREFIX owl: <http://www.w3.org/2002/07/owl#>
+</pre>
<p>The complete set of constraints is listed below.</p>
-<table id="ic-0" class="bordered-table">
- <thead>
- <tr>
- <th>IC-0</th>
- <th>Datatype consistency</th>
- </tr>
- </thead>
+jobs
+<h4 id="ic-0">IC-0. Datatype consistency</h4>
+<p>
+The RDF graph must be consistent under RDF D-entailment [[!RDF-MT]]
+using a datatype map containing all the datatypes used within the graph.
+</p>
+
+<h3 id="ic-1">IC-1. Unique DataSet</h3>
+<p>
+Every <code><a>qb:Observation</a></code> has exactly one associated <code><a>qb:DataSet</a></code>.
+</p>
+<table class="bordered-table">
<tbody>
- <tr><td colspan="2">
-The RDF graph must be consistent under RDF D-entailment [[!RDF-MT]]
-using a datatype map containing all the datatypes used within the graph.
- </td></tr>
+ <tr><td><pre>
+ASK {
+ {
+ # Check observation has a data set
+ ?obs a qb:Observation .
+ FILTER NOT EXISTS { ?obs qb:dataSet ?dataset1 . }
+ } UNION {
+ # Check has just one data set
+ ?obs a qb:Observation ;
+ qb:dataSet ?dataset1, ?dataset2 .
+ FILTER (?dataset1 != ?dataset2)
+ }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-2">IC-2. Unique DSD</h3>
+<p>
+Every <code><a>qb:DataSet</a></code> has exactly one associated <code><a>qb:DataStructureDefinition</a></code>.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ {
+ # Check dataset has a dsd
+ ?dataset a iqb:DataSet .
+ FILTER NOT EXISTS { ?dataset qb:structure ?dsd . }
+ } UNION {
+ # Check has just one dsd
+ ?dataset a qb:DataSet ;
+ qb:structure ?dsd1, ?dsd2 .
+ FILTER (?dsd1 != ?dsd2)
+ }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-3">IC-3. DSD includes measure</h3>
+<p>
+Every <code><a>qb:DataStructureDefinition</a></code> must include at least one declared measure.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?dsd a qb:DataStructureDefinition .
+ FILTER NOT EXISTS { ?dsd qb:component [qb:componentProperty [a qb:MeasureProperty]] }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-4">IC-4. Dimensions have range</h3>
+<p>
+Every dimension declared in a <code><a>qb:DataStructureDefinition</a></code> must have a declared <code><a>rdfs:range</a></code>.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?dim a qb:DimensionProperty .
+ FILTER NOT EXISTS { ?dim rdfs:range ?range }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-5">IC-5. Concept dimensions have code lists</h3>
+<p>
+Every dimension with range <code><a>skos:Concept</a></code> must have a <code><a>qb:codeList</a></code>.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?dim a qb:DimensionProperty ;
+ rdfs:range skos:Concept .
+ FILTER NOT EXISTS { ?dim qb:codeList [] }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-6">IC-6. Only attributes may be optional</h3>
+<p>
+The only components of
+a <code><a>qb:DataStructureDefinition</a></code> that may be marked as
+optional, using <code><a>qb:componentRequired</a> false</code> are attributes.
+</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?dsd qb:component ?componentSpec .
+ ?componentSpec qb:componentRequired "false"^^xsd:boolean ;
+ qb:componentProperty ?component .
+ FILTER NOT EXISTS { ?component a qb:AttributeProperty }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-7">IC-7. Slice Keys must be declared</h3>
+<p>
+Every <code><a>qb:SliceKey</a></code> must be associated with a <code><a>qb:DataStructureDefinition</a></code>.
+</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?sliceKey a qb:SliceKey .
+ FILTER NOT EXISTS { [a qb:DataStructureDefinition] qb:sliceKey ?sliceKey }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-8">IC-8. Slice Keys consistent with DSD</h3>
+<p>
+Every <code><a>qb:componentProperty</a></code> on a <code><a>qb:SliceKey</a></code> must also be declared as a <code><a>qb:component</a></code> of the associated <code><a>qb:DataStructureDefinition</a></code>.
+</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?slicekey a qb:SliceKey;
+ qb:componentProperty ?prop .
+ ?dsd qb:sliceKey ?sliceKey .
+ FILTER NOT EXISTS { ?dsd qb:component [qb:componentProperty ?prop] }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-9">IC-9. Unique slice structure</h3>
+<p>
+Each <code><a>qb:Slice</a></code> must have exactly one associated <code><a>qb:sliceStructure</a></code>.
+</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ {
+ # Slice has a key
+ ?slice a qb:Slice .
+ FILTER NOT EXISTS { ?slice qb:sliceStructure ?key }
+ } UNION {
+ # Slice has just one key
+ ?slice a qb:Slice ;
+ qb:sliceStructure ?key1, ?key2;
+ FILTER (?key1 != ?key2)
+ }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+
+<h3 id="ic-10">IC-10. Slice dimensions complete</h3>
+<p>
+Every <code><a>qb:Slice</a></code> must have a value for every dimension declared in its <code><a>qb:sliceStructure</a></code>.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?slice qb:sliceStructure [qb:componentProperty ?dim] .
+ FILTER NOT EXISTS { ?slice ?dim [] }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-11">IC-11. All dimensions required</h3>
+<p>
+Every <code><a>qb:Observation</a></code> has a value for each dimension declared in its associated <code><a>qb:DataStructureDefinition</a></code>.
+</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+ ?dim a qb:DimensionProperty;
+ FILTER NOT EXISTS { ?obs ?.dim [] }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-12">IC-12. No duplicate observations</h3>
+<p>
+No two <code><a>qb:Observation</a></code>s in the same <code><a>qb:DataSet</a></code> may have the same value for all dimensions.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ FILTER( ?allEqual )
+ {
+ # For each pair of observations test if all the dimension values are the same
+ SELECT (MIN(?equal) AS ?allEqual) WHERE {
+ ?obs1 qb:dataSet ?dataset .
+ ?obs2 qb:dataSet ?dataset .
+ FILTER (?obs1 != ?obs2)
+ ?dataset qb:structure/qb:component/qb:componentProperty ?dim .
+ ?dim a qb:DimensionProperty .
+ ) ?obs1 ?dim ?value1 .
+ ?obs2 ?dim ?value2 .
+ BIND( ?value1 = ?value2 AS ?equal)
+ } GROUP BY ?obs1 ?obs2
+ }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-13">IC-13. Required attributes</h3>
+<p>
+Every <code><a>qb:Observation</a></code> has a value for each declared attribute that is not explicitly marked as optional.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?obs qb:dataSet/qb:structure/qb:component ?component .
+ ?component qb:componentRequired "true"^^xsd:boolean ;
+ qb:componentProperty ?attr .
+ FILTER NOT EXISTS { ?obs ?attr [] }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-14">IC-14. All measures present</h3>
+<p>
+In a <code><a>qb:DataSet</a></code> which does not use a <a>Measure dimension</a> then each individual <code><a>qb:Observation</a></code> must have a value for every declared measure.
+</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ # Observation in a non-measureType cube
+ ?obs qb:dataSet/qb:structure ?dsd .
+ FILTER NOT EXISTS { ?dsd qb:component/qb:componentProperty qb:measureType }
+
+ # verify every measure is present
+ ?dsd qb:component/qb:componentProperty ?measure .
+ ?measure a qb:MeasureProperty;
+ FILTER NOT EXISTS { ?obs ?measure [] }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-15">IC-15. Measure dimension consistent</h3>
+<p>
+In a <code><a>qb:DataSet</a></code> which uses a <a>Measure dimension</a> then each <code><a>qb:Observation</a></code> must have a value for the measure corresponding to its given <code><a>qb:measureType</a></code>.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ # Observation in a measureType-cube
+ ?obs qb:dataSet/qb:structure ?dsd ;
+ qb:measureType ?measure .
+ ?dsd qb:component/qb:componentProperty qb:measureType .
+ # Must have value for its measureType
+ FILTER NOT EXISTS { ?obs ?measure [] }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-16">IC-16. Single measure on measure dimension observation</h3>
+<p>
+In a <code><a>qb:DataSet</a></code> which uses a <a>Measure dimension</a> then each <code><a>qb:Observation</a></code> must only have a measure value one measure (by IC-15 this will be the measure corresponding to its <code><a>qb:measureType</a></code>).
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ # Observation with measureType
+ ?obs qb:dataSet/qb:structure ?dsd ;
+ qb:measureType ?measure ;
+ ?omeasure [] .
+ # Any measure on the observation
+ ?dsd qb:component/qb:componentProperty qb:measureType ;
+ qb:component/qb:componentProperty ?omeasure .
+ ?omeasure a qb:MeasureProperty .
+ # Must be the same as the measureType
+ FILTER (?omeasure != ?measure)
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-17">IC-17. All measures present in measures dimension cube </h3>
+<p>
+In a <code><a>qb:DataSet</a></code> which uses a <a>Measure dimension</a> then if there is a Observation for some combination of non-measure dimensions then there must be other Observations with the same non-measure dimension values for each of the declared measures.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ {
+ # Count number of other measures found at each point
+ SELECT ?numMeasures (COUNT(?obs2) AS ?count) WHERE {
+ {
+ # Find the DSDs and check how many measures they have
+ SELECT ?dsd (COUNT(?m) AS ?numMeasures) WHERE {
+ ?dsd qb:component/qb:componentProperty ?m.
+ ?m a qb:MeasureProperty .
+ } GROUP BY ?dsd
+ }
+
+ # Observation in measureType cube
+ ?obs1 qb:dataSet/qb:structure ?dsd;
+ qb:dataSet ?dataset ;
+ qb:measureType ?m1 .
+
+ # Other observation at same dimension value
+ ?obs2 qb:dataSet ?dataset ;
+ qb:measureType ?m2 .
+ FILTER NOT EXISTS {
+ ?dsd qb:component/qb:componentProperty ?dim .
+ FILTER (?dim != qb:measureType)
+ ?dim a qb:DimensionProperty .
+ ?obs1 ?dim ?v1 .
+ ?obs2 ?dim ?v2.
+ FILTER (?v1 != ?v2)
+ }
+
+ } GROUP BY ?obs1 ?numMeasures
+ HAVING (?count != ?numMeasures)
+ }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-18">IC-18. Consistent data set links</h3>
+<p>
+If a <code><a>qb:DataSet</a></code> D has a <code><a>qb:slice</a></code> S, and S has an <code><a>qb:observation</a></code> O, then the <code><a>qb:dataSet</a></code> corresponding to O must be D.
+</p>
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?dataset qb:slice ?slice .
+ ?slice qb:observation ?obs .
+ FILTER NOT EXISTS { ?obs qb:dataSet ?dataset . }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-19">IC-19. Codes from code list</h3>
+<p>
+If a dimension property has a <code><a>qb:codeList</a></code>, then the value of the dimension property on every <code><a>qb:Observation</a></code> must be in the code list.
+</p>
+<p>The following integrity check queries must be applied to an RDF graph which contains the
+definition of the code list as well as the RDF Data Cube to be checked. In the case
+of a <code>skos:ConceptScheme</code> then each concept must be linked to the scheme using
+<code>skos:inScheme</code>. In the case of a <code>skos:Collection</code> then the
+collection must link to each concept using <code>skos:member</code> (i.e. if the
+collection uses <code>skos:memberList</code> then the entailment of <code>skos:member</code>
+values defined by <a href="http://www.w3.org/TR/2009/REC-skos-reference-20090818/#S36">S36</a>
+in [[!SKOS-REFERENCE]] must be materialized).</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+ ?dim a qb:DimensionProperty ;
+ qb:codeList ?list .
+ ?list a skos:ConceptScheme .
+ ?obs ?dim ?v .
+ FILTER NOT EXISTS { ?v skos:inScheme ?list }
+}
+
+ASK {
+ ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+ ?dim a qb:DimensionProperty ;
+ qb:codeList ?list .
+ ?list a skos:Collection .
+ ?obs ?dim ?v .
+ FILTER NOT EXISTS { ?list skos:member ?v }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-20">IC-20. Codes from hierarchy</h3>
+<p>
+If a dimension property has
+a <code><a>qb:HierarchicalCodeList</a></code> with a non-blank <code><a>qb:parentChildProperty</a></code> then the value of that dimension property on every <code><a>qb:Observation</a></code> must be reachable from a root of hierarchy using zero or more hops along the <code><a>qb:parentChildProperty</a></code> links.
+</p>
+<p>
+This check cannot be made by a simple fixed SPARQL query. Instead a
+query template is supplied.
+An instance of the template should be generated
+for each <code><a>qb:HierarchicalCodeList</a></code> which has an IRI
+value for its <code><a>qb:parentChildProperty</a></code>.
+That is for each binding of <code>?p</code> in the following
+instantiation query:</p>
+<pre>
+SELECT ?p WHERE {
+ ?hierarchy a qb:HierarchicalCodeList ;
+ qb:parentChildProperty ?p .
+ FILTER ( isIRI(?p) )
+}
+</pre>
+
+<p>The template is then instantiated by replacing the
+ string <code>$p</code> by the IRI found by the
+ instantiation query. The template is:</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+ ?dim a qb:DimensionProperty ;
+ qb:codeList ?list .
+ ?list a qb:HierarchicalCodeList .
+ ?obs ?dim ?v .
+ FILTER NOT EXISTS { ?list qb:hierarchyRoot/<$p>* ?v }
+}
+ </td></pre></tr>
+ </tbody>
+</table>
+
+<h3 id="ic-21">IC-21. Codes from hierarchy (inverse)</h3>
+<p>
+If a dimension property has a <code><a>qb:HierarchicalCodeList</a></code> with an inverse <code><a>qb:parentChildProperty</a></code> then the value of that dimension property on every <code><a>qb:Observation</a></code> must be reachable from a root of hierarchy using zero or more hops along the inverse <code><a>qb:parentChildProperty</a></code> links.
+</p>
+
+<p>
+This check cannot be made by a simple fixed SPARQL query. Instead a
+query template is supplied.
+An instance of the template should be generated
+for each <code><a>qb:HierarchicalCodeList</a></code> which has an
+blank-node
+value for its <code><a>qb:parentChildProperty</a></code>, with an
+associated inverse property.
+That is for each binding of <code>?p</code> in the following
+instantiation query:</p>
+<pre>
+SELECT ?p WHERE {
+ ?hierarchy a qb:HierarchicalCodeList;
+ qb:parentChildProperty ?pcp .
+ FILTER( isBlank(?pcp) )
+ ?pcp owl:inverseOf ?p .
+ FILTER( isIRI(?p) )
+}
+</pre>
+
+<p>The template is then instantiated by replacing the
+ string <code>$p</code> by the IRI found by the
+ instantiation query. The template is:</p>
+
+<table class="bordered-table">
+ <tbody>
+ <tr><td><pre>
+ASK {
+ ?obs qb:dataSet/qb:structure/qb:component/qb:componentProperty ?dim .
+ ?dim a qb:DimensionProperty ;
+ qb:codeList ?list .
+ ?list a qb:HierarchicalCodeList .
+ ?obs ?dim ?v .
+ FILTER NOT EXISTS { ?list qb:hierarchyRoot/(^<$p>)* ?v }
+}
+ </td></pre></tr>
</tbody>
</table>
</section>
-<pre>
-
-1. Every Observation has a unique associated DataSet
-
-2. Every DataSet has a unique associated DataStructureDefinition
-3. Every DSD must include a measure
-4. Every Dimension must have a declared range
-5. Every Dimension with range skos:concept must have a codeList
-6. Only attributes may be marked optional
-
-7. Every SliceKey must be associated with a DataStructureDefinition
-8. SliceKey components must be subset of the DSD's component
-9. Every Slice must have exactly one sliceStructure
-
-10. Every Slice must have a value for every dimension in its sliceStructure
-
-11. Every observation has a value for each declared dimension
-12. No two observations in the same cube may have the same value for all dimensions
-13. Every observation has a value for each non-optional attribute
-14. Every observation in a non-measureType cube must have a value for every measure
-15. Every observation in a measureType cube must have a measure value corresponding to its measureType
-16. Every observation in a measureType cube must have a value for only one measure
-17. In a measureType cube if there is an observation for one measure, there must be a corresponding observation for all other measures at the same dimension values
-
-18. if A qb:slice B and B qb:observation C then C qb:dataSet A
-
-19. If a dimension property has a qb:codeList, then the value of the dimension property on every observation must be in the code list
-
-20. If a dimension property has a hierarchical code list with a parentChildProperty then the value of that dimension property on every observation must be reachable from a root of hierarchy using zero or more hops along the parentChildProperty links.
-21. If a dimension property has a hierarchical code list with an inverse parentChildProperty then the value of that dimension property on every observation must be reachable from a root of hierarchy using zero or more hops along the inverse parentChildProperty links.
-
-</pre>
-
-Note that 19-21 need access to code list, with skos:inScheme for schemes and skos:member for collections (unpack ordered members if necessary)
</section>