gld: changeset 451:7d57a7584cf8

Binary file data-cube/images/qb-fig1.png has changed

--- a/data-cube/index.html	Thu Apr 11 10:18:03 2013 -0400
+++ b/data-cube/index.html	Thu Apr 11 23:32:25 2013 +0100
@@ -53,9 +53,11 @@
 <section id="outline">
 <h2>Outline of the vocabulary</h2>
 
-<!-- <img src="images/qb-fig1.png" alt="UML-style block diagram of the terms in this vocabulary"/> -->
+<figure>
+  <img src="images/qb-fig1.png" alt="UML-style block diagram of the terms in this vocabulary"/>
+  <figcaption>Pictorial summary of key terms and their relationship</figcaption>
+</figure>
 
-<img src="images/qb-fig1-proposed.png" alt="UML-style block diagram of the terms in this vocabulary"/>
 
 <section id="index">
 <h3>Vocabulary index</h3>
@@ -194,7 +196,7 @@
 <h3>SDMX and related standards</h3>
 
 <p>The Statistical Data and Metadata Exchange (SDMX) Initiative
-was organised in 2001 by seven international organisations (BIS,
+was organised in 2001 by seven international organizations (BIS,
 ECB, Eurostat, IMF, OECD, World Bank and the UN) to
 realise greater efficiencies in statistical practice. These
 organisations all
@@ -230,7 +232,9 @@
 interoperability and comparability between datasets by providing a
 shared terminology between SDMX implementers [[COG]]. RDF versions of these
 terms are available separately for use along with the Data Cube
-vocabulary, see <a href="#dsd-cog">Content oriented guidelines</a>.
+vocabulary, see <a href="#dsd-cog">Content oriented guidelines</a> for
+  further details. These external resources do not form a normative part of the
+  Data Cube Vocabulary specification.
 </p>
 </section>
 
@@ -376,7 +380,7 @@
 Series</em> and to refer slices along non-time dimensions as <em>Sections</em>.
 Within the Data Cube vocabulary we allow arbitrary dimensionality
 slices and do not give different names to particular types of
-  slice. Such sub classes of slice could be added in extension vocabularies.</p>
+  slice. Such sub-classes of slice could be added in extension vocabularies.</p>
 
 </section>
 
@@ -529,8 +533,10 @@
    in particular helps with detection of incoherent sets obtained by 
    combining differently structured source data;</li>
   <li>it allows a consumer to easily determine what dimensions are available for query
-    and their presentational order, which in turn simplifies UI construction;</li>
-  <li>it supports transmission of the structure information in associated SDMX data flows.</li>
+    and their presentational order, which in turn simplifies data
+    consumption, for example for UI construction;</li>
+  <li>it supports transmission of the structure information in
+  associated SDMX data flows (see below).</li>
 </ul>
 
 <p>It is common, when publishing statistical data, to have a regular series of publications which
@@ -592,7 +598,7 @@
   about components - the role that they play within the structure definition. In particular, it is sometimes
   convenient for consumers to be able to easily identify which is the time dimension,
   which component is the primary measure and so forth. It turns out that such roles are intrinsic to
-  the concepts and so this information can encoded by providing subclasses of <code>skos:Concept</code>
+  the concepts and so this information can be encoded by providing subclasses of <code>skos:Concept</code>
   for each role. The particular choice of roles here is specific to the SDMX standard and so is not 
   included within the core Data Cube vocabulary.</p>
 
@@ -621,8 +627,8 @@
   </tbody>
 </table>
 
-<p>These resources are provided as a convenience and do not form part
-  of the Data Cube standard at this time. However, they are used
+<p>These community resources are provided as a convenience and do not form part
+  of the Data Cube specification. However, they are used
   by a number of existing Data Cube publications and so we will
   reference them within our worked examples.</p>
 
@@ -633,7 +639,7 @@
 <h3>Example</h3>
 
 <p>Turning to our example data set then we can see there are three dimensions to represent
-   - time period, region (unitary authority) and sex of the population. There is a single
+   - time period, region (unitary authority) and sex. There is a single
    (primary) measure which corresponds to the topic of the data set (life expectancy) and
   encodes a value in years. Hence, we need the following components.</p>
 
@@ -679,7 +685,7 @@
   This is defined using attributes which qualify the interpretation of the observed value.
   Specifically in this example we can use the predefined <code>sdmx-attribute:unitMeasure</code>
   which in turn corresponds to the COG concept of <code>UNIT_MEASURE</code>. To express
-  the value of this attribute we would typically us a common thesaurus of units of measure.
+  the value of this attribute we would typically use a common thesaurus of units of measure.
   For the sake of this simple example we will use the DBpedia resource <code>http://dbpedia.org/resource/Year</code>
   which corresponds to the topic of the Wikipedia page on "Years".</p>
 
@@ -704,7 +710,7 @@
 <ul>
   <li>Attributes may be declared as optional or required. If an
   attribute is required to be present for every observation then the specification should set 
-    <code><a>qb:componentRequired></a></code>. In the
+    <code><a>qb:componentRequired</a></code>. In the
     absence of such a declaration an attribute is assumed to be
     optional. The  <code><a>qb:componentRequired</a></code>
     declaration may only be applied to component specifications of
@@ -714,7 +720,7 @@
     appropriate user interfaces. It can also be useful in the publication chain to enable
     synthesis of appropriate URIs for observations.</li>
   <li>By default the values of all of the components will be attached to each individual observation,
-    well call this the <em><a>normalized</a></em> representation.
+    this is called the <em><a>normalized</a></em> representation.
     This allows such observations to stand alone, so that a SPARQL query to retrieve the observation
     can immediately locate the attributes which enable the observation to be interpreted. However,
     it is also permissible to attach attributes to the
@@ -722,7 +728,8 @@
     This reduces some of the redundancy in the encoding of the instance data. To declare such an
     abbreviated structure, the <code><a>qb:componentAttachment</a></code> property of the specification should
     reference the class corresponding to the attachment level (e.g. <code><a>qb:DataSet</a></code> for attributes
-    that will be attached to the overall data set).</li>
+    that will be attached to the overall data set). The classes 
+    which can be used as such attachment levels are all subclasses of <code><a>qb:Attachable</a></code>.</li>
 </ul>
 
 <p>In the case of our running example the dimensions can be usefully ordered. There is only one
@@ -735,15 +742,15 @@
 <pre id="attachment-example" class="example">
   eg:dsd-le a qb:DataStructureDefinition;
       # The dimensions
-      qb:component [qb:dimension eg:refArea;         qb:order 1];
-      qb:component [qb:dimension eg:refPeriod;       qb:order 2];
-      qb:component [qb:dimension sdmx-dimension:sex; qb:order 3];
+      qb:component [ qb:dimension eg:refArea;         qb:order 1 ];
+      qb:component [ qb:dimension eg:refPeriod;       qb:order 2 ];
+      qb:component [ qb:dimension sdmx-dimension:sex; qb:order 3 ];
       # The measure(s)
-      qb:component [qb:measure eg:lifeExpectancy];
+      qb:component [ qb:measure eg:lifeExpectancy];
       # The attributes
-      qb:component [qb:attribute sdmx-attribute:unitMeasure; 
-                    qb:componentRequired "true"^^xsd:boolean;
-                    qb:componentAttachment qb:DataSet;] .</pre>
+      qb:component [ qb:attribute sdmx-attribute:unitMeasure; 
+                     qb:componentRequired "true"^^xsd:boolean;
+                     qb:componentAttachment qb:DataSet; ] .</pre>
 
 <p>Note that we have given the data structure definition (DSD) a URI since it will be
  reused across different datasets with the same structure. Similarly the component properties
@@ -763,24 +770,26 @@
   multiple different performance indicators for each region) or quite different (e.g. a data set
   on trades might provide quantity, value, weight for each trade).</p>
   
-<p>There are two approaches to representing multiple measures. In the SDMX information model, each 
-  observation can record a single observed value. In a data set with multiple observations then we 
-  add an additional dimension whose value indicates the measure. This is appropriate for applications
-  where the measures are separate aggregate statistics. In other domains such as a clinical statistics
-  or sensor networks then the term <em>observation</em> usually denotes an observation event which can include multiple
-  observed values.  Similarly in Business Intelligence applications and OLAP, a single "cell" in the data cube will 
-  typically contain values for multiple measures.
-</p>
-  
-<p>The data cube vocabulary permits either representation approach to be used though they cannot be mixed
+<p>There are two approaches to representing multiple measures supported by the Data Cube vocabulary.</p>
+
+<p>In the first approach each observation records a single observed value for one measure.
+We introduce an additional dimension whose value indicates the measure being conveyed by 
+each observation. This <em>measure dimension</em> approach is the one supported by the SDMX information model. </p>
+
+<p>In the second approach a single observation can provide values for multiple different measures.
+This is particularly appropriate in cases where each of those values relates to a single
+observational event such as a multi-spectral sensor measurement. This <em>multi-measure</em>
+approach is commonly used in applications such as Business Intelligence and OLAP.</p>
+
+<p>The Data Cube vocabulary permits either representation approach to be used though they cannot be mixed
   within the same data set. </p>
 
 <p>Both representation approaches require
   that, for every point in the space of dimensions for which there is
-  an observation, then a value must be given for every measure. In the
-  case of multi-measure observations then each measure must be
+  an observation, a value must be given for every measure. In the
+  case of multi-measure observations each measure must be
   present on each observation. In cubes which use a measure dimension
-  then there are sets of observations for each populated point in the
+  there are sets of observations for each populated point in the
   cube and within each of those sets there must be an observation giving each measure.</p>
   
 
@@ -831,9 +840,9 @@
 <p>This approach restricts observations to having a single measured value but allows
   a data set to carry multiple measures by adding an extra dimension, a <em>measure dimension</em>.
   The value of the measure dimension denotes which particular measure is being conveyed by the 
-  observation. This is the representation approach used within SDMX and the SMDX-in-RDF
-  extension vocabulary introduces a subclass of <code><a>qb:DataStructureDefinition</a></code> which is restricted
-  to using the <em>measure dimension</em> representation.</p>
+  observation. This is the representation approach used within SDMX and an extension vocabulary
+  could introduce a sub-class of <code><a>qb:DataStructureDefinition</a></code> which enforces
+  such a single-measure restriction.</p>
   
 <p>To use this representation you declare an additional dimension within the data structure
   definition to play the role of the measure dimension. For use within the Data Cube vocabulary
@@ -847,7 +856,10 @@
   In the special case of using <code><a>qb:measureType</a></code> as the measure dimension, the set of allowed 
   measures is assumed to be those measures declared within the DSD. There is no need to 
   define a separate code list or enumerated class to duplicate this information. 
-  Thus, <code><a>qb:measureType</a></code> is a “magic” dimension property with an implicit code list.</p>
+  Thus, <code><a>qb:measureType</a></code> is a “magic” dimension
+  property with an implicit code list. This notion of an implicit
+  code list for <code><a>qb:measureType</a></code> is a small divergence
+  from SDMX usage.</p>
 
 <p>The data structure definition for our above example, using this representation approach, would then be:</p>
 <pre class="example">
@@ -902,8 +914,8 @@
 
 <dl>
   <dt>Observations</dt>
-  <dd>This is the actual data, the measured numbers. In a statistical table, the observations 
-       would be the numbers in the table cells.</dd>
+  <dd>This is the actual data, the measured values. In a statistical table, the observations 
+       would be the values in the table cells.</dd>
 
   <dt>Organizational structure</dt>
   <dd>To locate an observation within the hypercube, one has at least to know the value of each 
@@ -911,14 +923,14 @@
       Datasets can have additional organizational structure in the form of <em>slices</em> 
     as described earlier in <a href="#slices">section 7.2</a>.
 
-  <dt>Internal metadata</dt>
+  <dt>Structural metadata</dt>
   <dd>Having located an observation, we need certain metadata in order to be able to interpret it. 
     What is the unit of measurement? Is it a normal value or a series break? 
     Is the value measured or estimated? These metadata are provided as <em>attributes</em> and can 
     be attached to individual observations, or to higher levels as defined by the ComponentSpecification
     described earlier.</dd>
 
-  <dt>External metadata</dt>
+  <dt>Reference metadata</dt>
   <dd>This is metadata that describes the dataset as a whole, such as categorization of the 
        dataset, its publisher, and a SPARQL endpoint where it can be accessed. 
       External metadata is described in <a href="#metadata">section 9</a>.</dd>
@@ -986,7 +998,7 @@
   observations. To cater for situations like this the Data Cube vocabulary allows components
   to be attached at a high level in the nested structure. Indeed if we re-examine our
   original Data Structure Declaration we see that we declared the unit of measure to be
-  attached at the data set level. So an improved version of the example is:</p>
+  attached at the data set level. So an shortened version of the example is:</p>
 
 <pre class="example">
   eg:dataset-le1 a qb:DataSet;
@@ -1033,7 +1045,7 @@
 <section id="slices">
 <h2>Slices and groups of observations</h2>
 
-<p>Slices allow us to group subsets of observations together. This not intended
+<p>Slices allow us to group subsets of observations together. This is not intended
   to represent arbitrary selections from the observations but uniform slices
   through the cube in which one or more of the dimension values are fixed.</p>
   
@@ -1049,9 +1061,9 @@
  That will enable us to refer to e.g. "male life expectancy observations for 2004-2006" 
  and guide applications to present a comparative chart across regions. </p>
 
-<p>We first define the structure of the slices we want by associating a "slice key" which the
-   data structure definition. This is done by creating a <code><a>qb:SliceKey</a></code> which
-   lists the component properties (which must be dimensions) which will be fixed in the
+<p>We first define the structure of the slices we want by associating a "slice key" with the
+   data structure definition. This is done by creating a <code><a>qb:SliceKey</a></code> to
+   list the component properties (which must be dimensions) which will be fixed in the
    slice. The key is attached to the DSD using <code><a>qb:sliceKey</a></code>. For example: </p>
    
 <pre class="example">
@@ -1062,11 +1074,11 @@
       
   eg:dsd-le-slice1 a qb:DataStructureDefinition;
       qb:component 
-          [qb:dimension eg:refArea;         qb:order 1];
-          [qb:dimension eg:refPeriod;       qb:order 2];
-          [qb:dimension sdmx-dimension:sex; qb:order 3];
-          [qb:measure eg:lifeExpectancy];
-          [qb:attribute sdmx-attribute:unitMeasure; qb:componentAttachment qb:DataSet;] ;
+          [ qb:dimension eg:refArea;         qb:order 1 ];
+          [ qb:dimension eg:refPeriod;       qb:order 2 ];
+          [ qb:dimension sdmx-dimension:sex; qb:order 3 ];
+          [ qb:measure eg:lifeExpectancy];
+          [qb:attribute sdmx-attribute:unitMeasure; qb:componentAttachment qb:DataSet; ] ;
       qb:sliceKey eg:sliceByRegion .
 </pre>   
 
@@ -1126,11 +1138,11 @@
 <pre  class="example">
   eg:dsd-le-slice3 a qb:DataStructureDefinition;
       qb:component 
-          [qb:dimension eg:refArea;         qb:order 1];
-          [qb:dimension eg:refPeriod;       qb:order 2; qb:componentAttachment qb:Slice];
-          [qb:dimension sdmx-dimension:sex; qb:order 3; qb:componentAttachment qb:Slice];
-          [qb:measure eg:lifeExpectancy];
-          [qb:attribute sdmx-attribute:unitMeasure; qb:componentAttachment qb:DataSet;] ;
+          [ qb:dimension eg:refArea;         qb:order 1 ];
+          [ qb:dimension eg:refPeriod;       qb:order 2; qb:componentAttachment qb:Slice ];
+          [ qb:dimension sdmx-dimension:sex; qb:order 3; qb:componentAttachment qb:Slice ];
+          [ qb:measure eg:lifeExpectancy];
+          [ qb:attribute sdmx-attribute:unitMeasure; qb:componentAttachment qb:DataSet; ] ;
       qb:sliceKey eg:sliceByRegion .
 
   eg:dataset-le3 a qb:DataSet;
@@ -1273,7 +1285,8 @@
 used in SDMX when the data cube includes aggregations of data values 
 (e.g. aggregating a measure across geographic regions).
 Hierarchical code lists SHOULD be represented using the 
-<code>skos:narrower</code> relationship to link from the <code>skos:hasTopConcept</code>
+<code>skos:narrower</code> relationship, or a sub-property of it,
+to link from the <code>skos:hasTopConcept</code>
 codes down through the tree or lattice of child codes. 
 In some publishing tool chains the corresponding transitive closure 
 <code>skos:narrowerTransitive</code> will be automatically inferred. 
@@ -1287,7 +1300,7 @@
 
 <p>It is sometimes convenient to be able to specify a hierarchical arrangement of 
 concepts other than through the use of the SKOS relation <code>skos:narrower</code>. 
-There are several situations where this is useful:</p>
+There are several situations where this is useful, for example:</p>
 
 <ul>
 <li>In some cases publishers wish to be able to reuse existing reference data as their
@@ -1296,8 +1309,6 @@
 <li>Where such maintained reference data is to be reused there can be multiple hierarchies which relate
 the same codes. In particular a set of geographic entities may participate in both a geographic-containment hierarchy
 and an administrative hierarchy which do not precisely align. </li>
-<li>The SKOS relations do not define when the child concepts are disjoint (mutually exclusive) or when they form 
-a complete cover of the parent concept (exhaustive).</li>
 </ul>
 
 <p>The Data Cube vocabulary supports this situation through the <code><a>qb:HierarchicalCodeList</a></code> class.
@@ -1479,7 +1490,10 @@
 extract sets of observations, including from across multiple
 cubes. However, the verbosity of a fully normalized representation
 incurs overheads in transmission and storage of Data Cubes which may
-be problematic in some settings.
+be problematic in some settings. Note that abbreviated form is
+  provided as an option and there is requirement that it be used. In
+  many settings standard compression techniques can eliminate much of the
+  overhead of normalized form.
 </p>
 
 <p>To address this the Data Cube vocabulary supports a notion of

author	Dave Reynolds <dave@epimorphics.com>
	Thu, 11 Apr 2013 23:32:25 +0100
changeset 451	7d57a7584cf8
parent 450	5ac9cc45d40a
child 452	f5641da02987

data-cube/images/qb-fig1.png
data-cube/index.html