* Last read-through with minor changes.
authorbkaempge
Sat, 20 Jul 2013 15:44:03 +0200
changeset 593 de46b18e2385
parent 592 4d2a833e1552
child 594 3de17a9de4c5
* Last read-through with minor changes.
data-cube-ucr/index.html
--- a/data-cube-ucr/index.html	Fri Jul 19 18:13:44 2013 +0100
+++ b/data-cube-ucr/index.html	Sat Jul 20 15:44:03 2013 +0200
@@ -22,7 +22,7 @@
 		organizations in- and outside of the public sector, collect numeric
 		data and aggregate this data into statistics. There is a need to
 		publish these statistics in a standardized, machine-readable way on
-		the web, so that they can be freely integrated and reused in consuming
+		the Web, so that they can be freely integrated and reused in consuming
 		applications.</p>
 	<p>
 		In this document, the <a href="http://www.w3.org/2011/gld/">W3C
@@ -113,7 +113,7 @@
 		statistics. It can describe statistics as observations. Observations
 		exhibit values (Measures) that depend on dimensions (Members of
 		Dimensions). Since the SDMX standard has proven applicable in many
-		contexts, the vocabulary adopts the multidimensional model that
+		contexts, the Data Cube Vocabulary adopts the multidimensional model that
 		underlies SDMX and will be compatible with SDMX.
 	</p>
 	</section>
@@ -360,10 +360,10 @@
 			government-published data, so recording provenance is a key
 			requirement for the COINS data. For instance, the COINS project [<cite><a
 				href="#ref-COINS">COINS</a></cite>] has at least four perspectives on what
-			they mean by “COINS” data: the abstract notion of “all of COINS”, the
-			data for a particular year, the version of the data for a particular
-			year released on a given date, and the constituent graphs which hold
-			both the authoritative data translated from HMT’s own sources. Also,
+			they mean by “COINS” data: the abstract notion of “all of COINS”; the
+			data for a particular year; the version of the data for a particular
+			year released on a given date; and the constituent graphs which hold
+			both the authoritative data translated from HMT’s own sources and 
 			additional supplementary information which they derive from the data,
 			for example by cross-linking to other datasets. This challenge
 			supports lesson: <a
@@ -412,15 +412,15 @@
 
 	<p>Another concrete example is the <a
 			href="http://ontowiki.net/Projects/Stats2RDF?show_comments=1">Stats2RDF</a>
-		project that intends to publish biomedical statistical data that is
-		represented as Excel sheets. Here, Excel files are first translated
+		project that intends to publish Excel sheets with biomedical statistical data. 
+		Here, Excel files are first translated 
 		into CSV and then translated into RDF using OntoWiki, a semantic wiki.
 	</p>
 
 	<h4>Benefits</h4>
 	<ul>
-		<li>The goal in this use case is to to publish spreadsheet
-			information in a machine-readable format on the web, e.g., so that
+		<li>The goal in this use case is to publish spreadsheet
+			information in a machine-readable format on the Web, e.g., so that
 			crawlers can find spreadsheets that use a certain column value. The
 			published data should represent and make available for queries the
 			most important information in the spreadsheets, e.g., rows, columns,
@@ -433,9 +433,10 @@
 			different attributes across different value points. This way a
 			harmonization among variables is performed around the measurement
 			points themselves.</li>
-		<li>Novel visualization of census data</li>
-		<li>Possible integration with provenance vocabularies, e.g.,
-			PROV-O, for tracking of harmonization steps</li>
+		<li>Integration with provenance vocabularies, e.g.,
+			PROV-O, for tracking of harmonization steps becomes possible.</li>
+		<li>Once data representation and publication is standardised, consumers can focus on novel 
+		visualizations and analysis interfaces of census data.</li>
 		<li>In historical research, until now, harmonization across
 			datasets is performed by hand, and in subsequent iterations of a
 			database: it is very hard to trace back the provenance of decisions
@@ -450,7 +451,7 @@
 			lesson <a
 			href="#declaringRel">Publishers
 				may need guidance in making transparent the pre-processing of
-				aggregate statistics</a>
+				aggregate statistics</a>.
 		</li>
 		<li>Combining Data Cube with SKOS [<cite><a
 				href="#ref-skos">SKOS</a></cite>] to allow for cross-location and
@@ -480,7 +481,7 @@
 		<li>There may be many spreadsheets which supports lesson <a
 			href="#mechRec">Publishers
 				and consumers may need more guidance in efficiently processing data
-				using the Data Cube Vocabulary</a></li>
+				using the Data Cube Vocabulary</a>.</li>
 
 	</ul>
 
@@ -546,7 +547,7 @@
 		unit, units of 1000 households is used.</p>
 
 	<p>In this use case, one wants to publish not only a dataset on the
-		bottom most level, i.e. what are the number of households at each
+		bottom most level, i.e., what are the number of households at each
 		Unitary Authority in each year, but also a dataset on more aggregated
 		levels. For instance, in order to publish a dataset with the number of
 		households at each Government Office Region per year, one needs to
@@ -577,7 +578,7 @@
 		<li>Importantly, one would like to maintain the relationship
 			between the resulting datasets, i.e., the levels and aggregation
 			functions. Again, this use case does not simply need a selection (or
-			"dice" in OLAP context) where one fixes the time period dimension.
+			"dice" in OLAP context) where one fixes the time period dimension, but includes aggregation. 
 			This supports lesson <a
 			href="#aggregations">Publishers
 				may need guidance in how to represent common analytical operations
@@ -817,7 +818,7 @@
 			href="http://eurostat.linked-statistics.org/">Linked Statistics
 			Eurostat Data</a> intend to publish <a
 			href="http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/">Eurostat
-			SDMX data</a> as <a href="http://5stardata.info/">5-star Linked Open
+			SDMX data</a> as <a href="http://www.w3.org/TR/ld-glossary/#x5-star-linked-open-data">5 Star Linked Open
 			Data</a>. Eurostat data is partly published as SDMX, partly as tabular
 		data (TSV, similar to CSV). Eurostat provides a <a
 			href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">TOC
@@ -856,10 +857,6 @@
 	<h4>Challenges</h4>
 
 	<ul>
-		<li>New Eurostat datasets are added regularly to Eurostat. The
-			Linked Data representation should automatically provide access to the
-			most-up-to-date data.</li>
-
 		<li>There is a large number of Eurostat datasets, each possibly
 			containing a large number of columns (dimensions) and rows
 			(observations). Eurostat publishes more than 5200 datasets, which,
@@ -880,18 +877,6 @@
 				processing data using the Data Cube Vocabulary.</a>
 		</li>
 
-		<li>Provide a useful interface for browsing and visualizing the
-			data. One problem is that the data sets have too high dimensionality
-			to be displayed directly. Instead, one could visualize slices of time
-			series data. However, for that, one would need to either fix most
-			other dimensions (e.g., sex) or aggregate over them (e.g., via
-			average). The selection of useful slices from the large number of
-			possible slices is a challenge. This supports lesson <a
-			href="#clarify">
-				Publishers may need more guidance in creating and managing slices or
-				arbitrary groups of observations</a>.
-		</li>
-
 		<li>Each dimension used by a dataset has a range of permitted
 			values that need to be described.</li>
 
@@ -904,7 +889,9 @@
 				operations such as Slice, Dice, Rollup on data cubes</a>.
 		</li>
 
-		<li>Updates to the data
+		<li>New Eurostat datasets are added regularly to Eurostat. The
+			Linked Data representation should automatically provide access to the
+			most-up-to-date data:
 
 			<ul>
 				<li>Eurostat Linked Data pulls in changes from the original
@@ -941,8 +928,21 @@
 				execution to resolve the ds.</li>
 		</ul>
 
-		<li>Browsing and visualizing interface:
+		<li>Providing a useful interface for browsing and visualizing the
+			data:
 			<ul>
+			
+					<li>One problem is that the data sets have too high dimensionality
+			to be displayed directly. Instead, one could visualize slices of time
+			series data. However, for that, one would need to either fix most
+			other dimensions (e.g., sex) or aggregate over them (e.g., via
+			average). The selection of useful slices from the large number of
+			possible slices is a challenge. This supports lesson <a
+			href="#clarify">
+				Publishers may need more guidance in creating and managing slices or
+				arbitrary groups of observations</a>.
+		</li>
+			
 				<li>Eurostat Linked Data Wrapper provides for each dataset an
 					HTML page showing a JavaScript-based visualization of the data.
 					This also supports lesson <a
@@ -969,7 +969,7 @@
 A newer version of SDMX, SDMX Standards, Version 2.1, is available which might be used by 
 Eurostat in the future which supports lesson <a
 					href="#putative">
-						There is a putative requirement to update to SDMX 2.1 if there are specific use cases that demand it</a></li>
+						There is a putative requirement to update to SDMX 2.1 if there are specific use cases that demand it</a>.</li>
 	</ul>
 
 	</section> <section>
@@ -1206,16 +1206,16 @@
 		roll-up), and filter it for specific information (slice, dice).
 	</p>
 
-	<p>OLAP systems that first use ETL pipelines to
-		Extract-Load-Transform relevant data for efficient storage and queries
-		in a data warehouse and then allows interfaces to issue OLAP queries
-		on the data are commonly used in industry to analyze statistical data
-		on a regular basis.</p>
+	<p>OLAP systems are commonly used in industry to analyze statistical data
+		on a regular basis. OLAP systems first use ETL pipelines to
+		extract-load-transform relevant data 
+		in a data warehouse and then allow interfaces to efficiently issue OLAP queries
+		on the data.</p>
 
 	<p>
 		The goal in this use case is to allow analysis of published
 		statistical data with common OLAP systems [<cite><a
-			href="#ref-OLAP4LD">OLAP4LD</a></cite>]
+			href="#ref-OLAP4LD">OLAP4LD</a></cite>].
 	</p>
 
 	<p>For that a multidimensional model of the data needs to be
@@ -1228,12 +1228,12 @@
 		An example scenario of this use case is the Financial Information
 		Observation System (FIOS) [<cite><a href="#ref-FIOS">FIOS</a></cite>],
 		where XBRL data provided by the SEC on the Web is re-published as
-		Linked Data and made possible to explore and analyse by stakeholders
-		in a web-based OLAP client Saiku.
+		Linked Data and made possible to explore and analyze by stakeholders
+		in a Web-based OLAP client Saiku.
 	</p>
 
 	<p>The following figure shows an example of using FIOS. Here, for
-		three different companies, the cost of goods sold as disclosed in XBRL
+		three different companies, the Cost of Goods Sold as disclosed in XBRL
 		documents are analyzed. As cell values either the number of
 		disclosures or &mdash; if only one available &mdash; the actual number in USD is
 		given:</p>
@@ -1352,7 +1352,7 @@
 			href="#mechRec">Publishers
 				may need guidance in communicating the availability of published
 				statistical data to external parties and to allow automatic
-				discovery of statistical data</a>
+				discovery of statistical data</a>.
 		</li>
 		<li>Define mapping between the Data Cube Vocabulary and data catalog
 			descriptions. If data catalogs contain statistics, they do not expose
@@ -1406,8 +1406,8 @@
 	<ul>
 		<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/33">http://www.w3.org/2011/gld/track/issues/33</a></li>
 
-		<li>Since there are no use cases for qb:subslice, the vocabulary
-			should clarify or drop the use of qb:subslice; issue: <a
+		<li>Since there are no known use cases for <code>qb:subslice</code>, the vocabulary
+			should clarify or drop the use of <code>qb:subslice</code>; issue: <a
 			href="http://www.w3.org/2011/gld/track/issues/34">http://www.w3.org/2011/gld/track/issues/34</a>
 		</li>
 	</ul>
@@ -1502,7 +1502,7 @@
 		id="relToSO19156">Modelers
 		using ISO19156 - Observations &amp; Measurements may need clarification
 		regarding the relationship to the Data Cube Vocabulary</h3>
-	<p>An number of organizations, particularly in the Climate and
+	<p>A number of organizations, particularly in the Climate and
 		Meteorological area, already have some commitment to the OGC
 		"Observations and Measurements" (O&amp;M) logical data model, also
 		published as ISO 19156. Are there any statements about compatibility
@@ -1713,12 +1713,12 @@
 		<dt id="ref-cog">[COG]</dt>
 		<dd>
 			SDMX Content Oriented Guidelines, <a
-				href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>
+				href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>.
 		</dd>
 
 		<dt id="ref-COGS">[COGS]</dt>
 		<dd>
-			Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S.,&amp;Curry, E.
+			Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S., &amp; Curry, E.
 			(2012). Representing Interoperable Provenance Descriptions for ETL
 			Workflows. ESWC 2012 Workshop Highlights (pp. 1–15). Springer Verlag,
 			2012 (in press). (Extended Paper published in Conf. Proceedings.). <a
@@ -1729,7 +1729,7 @@
 		<dd>
 			Ian Dickinson et al., COINS as Linked Data <a
 				href="http://data.gov.uk/resources/coins">http://data.gov.uk/resources/coins</a>,
-			last visited on Jan 9 2013
+			last visited on Jan 9 2013.
 		</dd>
 
 		<dt id="ref-FIOS">[FIOS]</dt>
@@ -1747,44 +1747,44 @@
 
 		<dt id="ref-linked-data">[LOD]</dt>
 		<dd>
-			Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>
+			Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>.
 		</dd>
 
 		<dt id="ref-OLAP">[OLAP]</dt>
 		<dd>
 			Online Analytical Processing Data Cubes, <a
-				href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>
+				href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>.
 		</dd>
 
 		<dt id="ref-OLAP4LD">[OLAP4LD]</dt>
 		<dd>
 			Kämpgen, B. and Harth, A. (2011). Transforming Statistical Linked
 			Data for Use in OLAP Systems. I-Semantics 2011. <a
-				href="http://www.aifb.kit.edu/web/Inproceedings3211">http://www.aifb.kit.edu/web/Inproceedings3211</a>
+				href="http://www.aifb.kit.edu/web/Inproceedings3211">http://www.aifb.kit.edu/web/Inproceedings3211</a>.
 		</dd>
 
 		<dt id="ref-QB-2010">[QB-2010]</dt>
 		<dd>
 			RDF Data Cube vocabulary, <a
-				href="http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html">http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</a>
+				href="http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html">http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</a>.
 		</dd>
 
 		<dt id="ref-QB-2013">[QB-2013]</dt>
 		<dd>
 			RDF Data Cube vocabulary, <a
-				href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a>
+				href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a>.
 		</dd>
 
 		<dt id="ref-QB4OLAP">[QB4OLAP]</dt>
 		<dd>
 			Etcheverry, Vaismann. QB4OLAP : A New Vocabulary for OLAP Cubes on
 			the Semantic Web. <a
-				href="http://publishing-multidimensional-data.googlecode.com/git/index.html">http://publishing-multidimensional-data.googlecode.com/git/index.html</a>
+				href="http://publishing-multidimensional-data.googlecode.com/git/index.html">http://publishing-multidimensional-data.googlecode.com/git/index.html</a>.
 		</dd>
 
 		<dt id="ref-rdf">[RDF]</dt>
 		<dd>
-			Resource Description Framework, <a href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</a>
+			Resource Description Framework, <a href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</a>.
 		</dd>
 
 		<dt id="ref-scovo">[SCOVO]</dt>
@@ -1792,13 +1792,13 @@
 			The Statistical Core Vocabulary, <a
 				href="http://sw.joanneum.at/scovo/schema.html">http://sw.joanneum.at/scovo/schema.html</a>
 			<br /> SCOVO: Using Statistics on the Web of data, <a
-				href="http://sw-app.org/pub/eswc09-inuse-scovo.pdf">http://sw-app.org/pub/eswc09-inuse-scovo.pdf</a>
+				href="http://sw-app.org/pub/eswc09-inuse-scovo.pdf">http://sw-app.org/pub/eswc09-inuse-scovo.pdf</a>.
 		</dd>
 
 		<dt id="ref-skos">[SKOS]</dt>
 		<dd>
 			Simple Knowledge Organization System, <a
-				href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>
+				href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>.
 		</dd>
 
 		<dt id="ref-SDMX">[SMDX]</dt>
@@ -1818,7 +1818,7 @@
 		<dt id="ref-xkos">[XKOS]</dt>
 		<dd>
 			Extended Knowledge Organization System (XKOS), <a
-				href="https://github.com/linked-statistics/xkos">https://github.com/linked-statistics/xkos</a>
+				href="https://github.com/linked-statistics/xkos">https://github.com/linked-statistics/xkos</a>.
 		</dd>
 
 	</dl>