--- a/data-cube-ucr/index.html Fri Jul 19 18:13:44 2013 +0100
+++ b/data-cube-ucr/index.html Sat Jul 20 15:44:03 2013 +0200
@@ -22,7 +22,7 @@
organizations in- and outside of the public sector, collect numeric
data and aggregate this data into statistics. There is a need to
publish these statistics in a standardized, machine-readable way on
- the web, so that they can be freely integrated and reused in consuming
+ the Web, so that they can be freely integrated and reused in consuming
applications.</p>
<p>
In this document, the <a href="http://www.w3.org/2011/gld/">W3C
@@ -113,7 +113,7 @@
statistics. It can describe statistics as observations. Observations
exhibit values (Measures) that depend on dimensions (Members of
Dimensions). Since the SDMX standard has proven applicable in many
- contexts, the vocabulary adopts the multidimensional model that
+ contexts, the Data Cube Vocabulary adopts the multidimensional model that
underlies SDMX and will be compatible with SDMX.
</p>
</section>
@@ -360,10 +360,10 @@
government-published data, so recording provenance is a key
requirement for the COINS data. For instance, the COINS project [<cite><a
href="#ref-COINS">COINS</a></cite>] has at least four perspectives on what
- they mean by “COINS” data: the abstract notion of “all of COINS”, the
- data for a particular year, the version of the data for a particular
- year released on a given date, and the constituent graphs which hold
- both the authoritative data translated from HMT’s own sources. Also,
+ they mean by “COINS” data: the abstract notion of “all of COINS”; the
+ data for a particular year; the version of the data for a particular
+ year released on a given date; and the constituent graphs which hold
+ both the authoritative data translated from HMT’s own sources and
additional supplementary information which they derive from the data,
for example by cross-linking to other datasets. This challenge
supports lesson: <a
@@ -412,15 +412,15 @@
<p>Another concrete example is the <a
href="http://ontowiki.net/Projects/Stats2RDF?show_comments=1">Stats2RDF</a>
- project that intends to publish biomedical statistical data that is
- represented as Excel sheets. Here, Excel files are first translated
+ project that intends to publish Excel sheets with biomedical statistical data.
+ Here, Excel files are first translated
into CSV and then translated into RDF using OntoWiki, a semantic wiki.
</p>
<h4>Benefits</h4>
<ul>
- <li>The goal in this use case is to to publish spreadsheet
- information in a machine-readable format on the web, e.g., so that
+ <li>The goal in this use case is to publish spreadsheet
+ information in a machine-readable format on the Web, e.g., so that
crawlers can find spreadsheets that use a certain column value. The
published data should represent and make available for queries the
most important information in the spreadsheets, e.g., rows, columns,
@@ -433,9 +433,10 @@
different attributes across different value points. This way a
harmonization among variables is performed around the measurement
points themselves.</li>
- <li>Novel visualization of census data</li>
- <li>Possible integration with provenance vocabularies, e.g.,
- PROV-O, for tracking of harmonization steps</li>
+ <li>Integration with provenance vocabularies, e.g.,
+ PROV-O, for tracking of harmonization steps becomes possible.</li>
+ <li>Once data representation and publication is standardised, consumers can focus on novel
+ visualizations and analysis interfaces of census data.</li>
<li>In historical research, until now, harmonization across
datasets is performed by hand, and in subsequent iterations of a
database: it is very hard to trace back the provenance of decisions
@@ -450,7 +451,7 @@
lesson <a
href="#declaringRel">Publishers
may need guidance in making transparent the pre-processing of
- aggregate statistics</a>
+ aggregate statistics</a>.
</li>
<li>Combining Data Cube with SKOS [<cite><a
href="#ref-skos">SKOS</a></cite>] to allow for cross-location and
@@ -480,7 +481,7 @@
<li>There may be many spreadsheets which supports lesson <a
href="#mechRec">Publishers
and consumers may need more guidance in efficiently processing data
- using the Data Cube Vocabulary</a></li>
+ using the Data Cube Vocabulary</a>.</li>
</ul>
@@ -546,7 +547,7 @@
unit, units of 1000 households is used.</p>
<p>In this use case, one wants to publish not only a dataset on the
- bottom most level, i.e. what are the number of households at each
+ bottom most level, i.e., what are the number of households at each
Unitary Authority in each year, but also a dataset on more aggregated
levels. For instance, in order to publish a dataset with the number of
households at each Government Office Region per year, one needs to
@@ -577,7 +578,7 @@
<li>Importantly, one would like to maintain the relationship
between the resulting datasets, i.e., the levels and aggregation
functions. Again, this use case does not simply need a selection (or
- "dice" in OLAP context) where one fixes the time period dimension.
+ "dice" in OLAP context) where one fixes the time period dimension, but includes aggregation.
This supports lesson <a
href="#aggregations">Publishers
may need guidance in how to represent common analytical operations
@@ -817,7 +818,7 @@
href="http://eurostat.linked-statistics.org/">Linked Statistics
Eurostat Data</a> intend to publish <a
href="http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/">Eurostat
- SDMX data</a> as <a href="http://5stardata.info/">5-star Linked Open
+ SDMX data</a> as <a href="http://www.w3.org/TR/ld-glossary/#x5-star-linked-open-data">5 Star Linked Open
Data</a>. Eurostat data is partly published as SDMX, partly as tabular
data (TSV, similar to CSV). Eurostat provides a <a
href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">TOC
@@ -856,10 +857,6 @@
<h4>Challenges</h4>
<ul>
- <li>New Eurostat datasets are added regularly to Eurostat. The
- Linked Data representation should automatically provide access to the
- most-up-to-date data.</li>
-
<li>There is a large number of Eurostat datasets, each possibly
containing a large number of columns (dimensions) and rows
(observations). Eurostat publishes more than 5200 datasets, which,
@@ -880,18 +877,6 @@
processing data using the Data Cube Vocabulary.</a>
</li>
- <li>Provide a useful interface for browsing and visualizing the
- data. One problem is that the data sets have too high dimensionality
- to be displayed directly. Instead, one could visualize slices of time
- series data. However, for that, one would need to either fix most
- other dimensions (e.g., sex) or aggregate over them (e.g., via
- average). The selection of useful slices from the large number of
- possible slices is a challenge. This supports lesson <a
- href="#clarify">
- Publishers may need more guidance in creating and managing slices or
- arbitrary groups of observations</a>.
- </li>
-
<li>Each dimension used by a dataset has a range of permitted
values that need to be described.</li>
@@ -904,7 +889,9 @@
operations such as Slice, Dice, Rollup on data cubes</a>.
</li>
- <li>Updates to the data
+ <li>New Eurostat datasets are added regularly to Eurostat. The
+ Linked Data representation should automatically provide access to the
+ most-up-to-date data:
<ul>
<li>Eurostat Linked Data pulls in changes from the original
@@ -941,8 +928,21 @@
execution to resolve the ds.</li>
</ul>
- <li>Browsing and visualizing interface:
+ <li>Providing a useful interface for browsing and visualizing the
+ data:
<ul>
+
+ <li>One problem is that the data sets have too high dimensionality
+ to be displayed directly. Instead, one could visualize slices of time
+ series data. However, for that, one would need to either fix most
+ other dimensions (e.g., sex) or aggregate over them (e.g., via
+ average). The selection of useful slices from the large number of
+ possible slices is a challenge. This supports lesson <a
+ href="#clarify">
+ Publishers may need more guidance in creating and managing slices or
+ arbitrary groups of observations</a>.
+ </li>
+
<li>Eurostat Linked Data Wrapper provides for each dataset an
HTML page showing a JavaScript-based visualization of the data.
This also supports lesson <a
@@ -969,7 +969,7 @@
A newer version of SDMX, SDMX Standards, Version 2.1, is available which might be used by
Eurostat in the future which supports lesson <a
href="#putative">
- There is a putative requirement to update to SDMX 2.1 if there are specific use cases that demand it</a></li>
+ There is a putative requirement to update to SDMX 2.1 if there are specific use cases that demand it</a>.</li>
</ul>
</section> <section>
@@ -1206,16 +1206,16 @@
roll-up), and filter it for specific information (slice, dice).
</p>
- <p>OLAP systems that first use ETL pipelines to
- Extract-Load-Transform relevant data for efficient storage and queries
- in a data warehouse and then allows interfaces to issue OLAP queries
- on the data are commonly used in industry to analyze statistical data
- on a regular basis.</p>
+ <p>OLAP systems are commonly used in industry to analyze statistical data
+ on a regular basis. OLAP systems first use ETL pipelines to
+ extract-load-transform relevant data
+ in a data warehouse and then allow interfaces to efficiently issue OLAP queries
+ on the data.</p>
<p>
The goal in this use case is to allow analysis of published
statistical data with common OLAP systems [<cite><a
- href="#ref-OLAP4LD">OLAP4LD</a></cite>]
+ href="#ref-OLAP4LD">OLAP4LD</a></cite>].
</p>
<p>For that a multidimensional model of the data needs to be
@@ -1228,12 +1228,12 @@
An example scenario of this use case is the Financial Information
Observation System (FIOS) [<cite><a href="#ref-FIOS">FIOS</a></cite>],
where XBRL data provided by the SEC on the Web is re-published as
- Linked Data and made possible to explore and analyse by stakeholders
- in a web-based OLAP client Saiku.
+ Linked Data and made possible to explore and analyze by stakeholders
+ in a Web-based OLAP client Saiku.
</p>
<p>The following figure shows an example of using FIOS. Here, for
- three different companies, the cost of goods sold as disclosed in XBRL
+ three different companies, the Cost of Goods Sold as disclosed in XBRL
documents are analyzed. As cell values either the number of
disclosures or — if only one available — the actual number in USD is
given:</p>
@@ -1352,7 +1352,7 @@
href="#mechRec">Publishers
may need guidance in communicating the availability of published
statistical data to external parties and to allow automatic
- discovery of statistical data</a>
+ discovery of statistical data</a>.
</li>
<li>Define mapping between the Data Cube Vocabulary and data catalog
descriptions. If data catalogs contain statistics, they do not expose
@@ -1406,8 +1406,8 @@
<ul>
<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/33">http://www.w3.org/2011/gld/track/issues/33</a></li>
- <li>Since there are no use cases for qb:subslice, the vocabulary
- should clarify or drop the use of qb:subslice; issue: <a
+ <li>Since there are no known use cases for <code>qb:subslice</code>, the vocabulary
+ should clarify or drop the use of <code>qb:subslice</code>; issue: <a
href="http://www.w3.org/2011/gld/track/issues/34">http://www.w3.org/2011/gld/track/issues/34</a>
</li>
</ul>
@@ -1502,7 +1502,7 @@
id="relToSO19156">Modelers
using ISO19156 - Observations & Measurements may need clarification
regarding the relationship to the Data Cube Vocabulary</h3>
- <p>An number of organizations, particularly in the Climate and
+ <p>A number of organizations, particularly in the Climate and
Meteorological area, already have some commitment to the OGC
"Observations and Measurements" (O&M) logical data model, also
published as ISO 19156. Are there any statements about compatibility
@@ -1713,12 +1713,12 @@
<dt id="ref-cog">[COG]</dt>
<dd>
SDMX Content Oriented Guidelines, <a
- href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>
+ href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>.
</dd>
<dt id="ref-COGS">[COGS]</dt>
<dd>
- Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S.,&Curry, E.
+ Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S., & Curry, E.
(2012). Representing Interoperable Provenance Descriptions for ETL
Workflows. ESWC 2012 Workshop Highlights (pp. 1–15). Springer Verlag,
2012 (in press). (Extended Paper published in Conf. Proceedings.). <a
@@ -1729,7 +1729,7 @@
<dd>
Ian Dickinson et al., COINS as Linked Data <a
href="http://data.gov.uk/resources/coins">http://data.gov.uk/resources/coins</a>,
- last visited on Jan 9 2013
+ last visited on Jan 9 2013.
</dd>
<dt id="ref-FIOS">[FIOS]</dt>
@@ -1747,44 +1747,44 @@
<dt id="ref-linked-data">[LOD]</dt>
<dd>
- Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>
+ Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>.
</dd>
<dt id="ref-OLAP">[OLAP]</dt>
<dd>
Online Analytical Processing Data Cubes, <a
- href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>
+ href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>.
</dd>
<dt id="ref-OLAP4LD">[OLAP4LD]</dt>
<dd>
Kämpgen, B. and Harth, A. (2011). Transforming Statistical Linked
Data for Use in OLAP Systems. I-Semantics 2011. <a
- href="http://www.aifb.kit.edu/web/Inproceedings3211">http://www.aifb.kit.edu/web/Inproceedings3211</a>
+ href="http://www.aifb.kit.edu/web/Inproceedings3211">http://www.aifb.kit.edu/web/Inproceedings3211</a>.
</dd>
<dt id="ref-QB-2010">[QB-2010]</dt>
<dd>
RDF Data Cube vocabulary, <a
- href="http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html">http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</a>
+ href="http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html">http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</a>.
</dd>
<dt id="ref-QB-2013">[QB-2013]</dt>
<dd>
RDF Data Cube vocabulary, <a
- href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a>
+ href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a>.
</dd>
<dt id="ref-QB4OLAP">[QB4OLAP]</dt>
<dd>
Etcheverry, Vaismann. QB4OLAP : A New Vocabulary for OLAP Cubes on
the Semantic Web. <a
- href="http://publishing-multidimensional-data.googlecode.com/git/index.html">http://publishing-multidimensional-data.googlecode.com/git/index.html</a>
+ href="http://publishing-multidimensional-data.googlecode.com/git/index.html">http://publishing-multidimensional-data.googlecode.com/git/index.html</a>.
</dd>
<dt id="ref-rdf">[RDF]</dt>
<dd>
- Resource Description Framework, <a href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</a>
+ Resource Description Framework, <a href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</a>.
</dd>
<dt id="ref-scovo">[SCOVO]</dt>
@@ -1792,13 +1792,13 @@
The Statistical Core Vocabulary, <a
href="http://sw.joanneum.at/scovo/schema.html">http://sw.joanneum.at/scovo/schema.html</a>
<br /> SCOVO: Using Statistics on the Web of data, <a
- href="http://sw-app.org/pub/eswc09-inuse-scovo.pdf">http://sw-app.org/pub/eswc09-inuse-scovo.pdf</a>
+ href="http://sw-app.org/pub/eswc09-inuse-scovo.pdf">http://sw-app.org/pub/eswc09-inuse-scovo.pdf</a>.
</dd>
<dt id="ref-skos">[SKOS]</dt>
<dd>
Simple Knowledge Organization System, <a
- href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>
+ href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>.
</dd>
<dt id="ref-SDMX">[SMDX]</dt>
@@ -1818,7 +1818,7 @@
<dt id="ref-xkos">[XKOS]</dt>
<dd>
Extended Knowledge Organization System (XKOS), <a
- href="https://github.com/linked-statistics/xkos">https://github.com/linked-statistics/xkos</a>
+ href="https://github.com/linked-statistics/xkos">https://github.com/linked-statistics/xkos</a>.
</dd>
</dl>