gld: changeset 288:9901db54f738

--- a/data-cube-ucr/index.html	Mon Feb 25 16:52:05 2013 +0100
+++ b/data-cube-ucr/index.html	Mon Feb 25 16:52:18 2013 +0100
@@ -1,68 +1,148 @@
 <?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
-                      "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml">
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+
 <head>
+<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
 <title>Use Cases and Requirements for the Data Cube Vocabulary</title>
-<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
-<script type="text/javascript" src='../respec/respec3/builds/respec-w3c-common.js' class='remove'></script>
+
+<script type="text/javascript"
+	src='../respec/respec3/builds/respec-w3c-common.js' class='remove'></script>
 <script src="respec-ref.js"></script>
 <script src="respec-config.js"></script>
 <link rel="stylesheet" type="text/css" href="local-style.css" />
 </head>
+
 <body>
 
 	<section id="abstract">
 	<p>Many national, regional and local governments, as well as other
-		organizations inside and outside of the public sector, create
-		statistics. There is a need to publish those statistics in a
-		standardized, machine-readable way on the web, so that statistics can
-		be freely integrated and reused in consuming applications. This
-		document is a collection of use cases for a standard vocabulary to
-		publish statistics as Linked Data.</p>
+		organizations in- and outside of the public sector, collect numeric
+		data and aggregate this data into statistics. There is a need to
+		publish theses statistics in a standardised, machine-readable way on
+		the web, so that they can be freely integrated and reused in consuming
+		applications.</p>
+	<p>
+		This document presents the preparatory work for a W3C recommendation
+		of the RDF Data Cube Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>].
+		It lists representative use cases, which were partly obtained from
+		existing deployments of an earlier version of the vocabulary [<cite><a
+			href="#ref-QB-2010">QB-2010</a></cite>] and partly obtained from discussions
+		within the working group. This document also features a set of
+		requirements that have been derived from the use cases and are
+		considered in the specification.
+	</p>
 	</section>
 
 	<section id="sotd">
 	<p>
-		This is a working document of the <a
-			href="http://www.w3.org/2011/gld/wiki/Data_Cube_Vocabulary">Data
-			Cube Vocabulary project</a> within the <a
-			href="http://www.w3.org/2011/gld/">W3C Government Linked Data
-			Working Group</a>. Feedback is welcome and should be sent to the <a
-			href="mailto:public-gld-comments@w3.org">public-gld-comments@w3.org
-			mailing list</a>.
+		This document is an editorial update to an Editor's Draft of the "Use
+		Cases and Requirements for the Data Cube Vocabulary" developed by the
+		<a href="http://www.w3.org/2011/gld/">W3C Government Linked Data
+			Working Group</a>.
+	</p>
+	<p>
+		Comments on this document may be sent to <a
+			href="mailto:public-gld-comments@w3.org">mailto:public-gld-comments@w3.org</a>;
+		please include the text "[QB] UCR comment" in the subject line. All
+		messages received at this address are viewable in a <a
+			href="http://lists.w3.org/Archives/Public/public-gld-comments/">public
+			archive</a>.
 	</p>
 	</section>
 
 	<section>
-	<h2>Introduction</h2>
+	<h2 id="introduction">Introduction</h2>
+	The aim of this document is to present use cases (rather than general
+	scenarios) that benefit from a standard vocabulary to publish
+	statistics as Linked Data. These use cases are used to derive and
+	justify requirements to the specification. Use cases do not necessarily
+	need to be implemented, their main purpose is to document and
+	illustrate design decisions.
 
-	<p>Many national, regional and local governments, as well as other
-		organizations inside and outside of the public sector, create
-		statistics. There is a need to publish those statistics in a
-		standardized, machine-readable way on the web, so that statistics can
-		be freely linked, integrated and reused in consuming applications.
-		This document is a collection of use cases for a standard vocabulary
-		to publish statistics as Linked Data.</p>
+	<p>In the following, we describe the challenge of an RDF vocabulary
+		for publishing statistics as Linked Data.</p>
+	<p>Publishing statistics - collected and aggregated numeric data -
+		is challenging for the following reasons:</p>
+	<ul>
+		<li>Representing statistics requires more complex modeling as
+			discussed by Martin Fowler [<cite><a href="#ref-Fowler1997">Fowler1997</a></cite>]:
+			Recording a statistic simply as an attribute to an object (e.g., the
+			fact that a person weighs 185 pounds) fails with representing
+			important concepts such as quantity, measurement, and unit. Instead,
+			a statistic is modeled as a distinguishable object, an observation.
+		</li>
+		<li>The object describes an observation of a value, e.g., a
+			numeric value (e.g., 185) in case of a measurement or a categorical
+			value (e.g., "blood group A") in case of a categorical observation.</li>
+		<li>To allow correct interpretation of the value, the object can
+			be further described by "dimensions", e.g., the specific phenomenon
+			"weight" observed and the unit "pounds". Given background
+			information, e.g., arithmetical and comparative operations, humans
+			and machines can appropriately visualize such observations or have
+			conversions between different quantities.</li>
+		<li>Also, an observation separates a value from the actual event
+			at which it was collected; for instance, one can describe the person
+			that collected the observation and the time the observation was
+			collected.</li>
+	</ul>
+	The following figure illustrates this specificitiy of modelling in a
+	class diagram:
+
+	<p class="caption">Figure demonstrating specificity of modelling a
+		statistic</p>
+
+	<p align="center">
+		<img alt="specificity of modelling a
+		statistic"
+			src="./figures/modeling_quantity_measurement_observation.png"></img>
+	</p>
+
+	<p>
+		The Statistical Data and Metadata eXchange [<cite><a
+			href="#ref-SDMX">SDMX</a></cite>] - the ISO standard for exchanging and
+		sharing of statistical data and metadata among organizations - uses
+		"multidimensional model" that caters for the specificity of modelling
+		statistics. It allows to describe statistics as observations.
+		Observations exhibit values (Measures) that depend on dimensions
+		(Members of Dimensions).
+	</p>
+	<p>Since the SDMX standard has proven applicable in many contexts,
+		the vocabulary adopts the multidimensional model that underlies SDMX
+		and will be compatible to SDMX.</p>
+	<p>We use the name data cube vocabulary throughout the document
+		when referring to the vocabulary.</p>
 	</section>
 
 
 	<section>
-	<h2>Terminology</h2>
+	<h2 id="terminology">Terminology</h2>
 	<p>
 		<dfn>Statistics</dfn>
 		is the <a href="http://en.wikipedia.org/wiki/Statistics">study</a> of
-		the collection, organization, analysis, and interpretation of data. A
-		statistic is a statistical dataset.
+		the collection, organization, analysis, and interpretation of data.
+		Statistics comprise statistical data.
 	</p>
 
 	<p>
-		A
-		<dfn>statistical dataset</dfn>
-		comprises multidimensional data - a set of observed values organized
-		along a group of dimensions, together with associated metadata. Basic
-		structure of (aggregated) statistical data is a multidimensional table
-		(also called a cube) <a href="#ref-SDMX">[SDMX]</a>.
+
+		The basic structure of
+		<dfn>statistical data</dfn>
+		is a multidimensional table (also called a data cube) [<cite><a
+			href="#ref-SDMX">SDMX</a></cite>], i.e., a set of observed values organized
+		along a group of dimensions, together with associated metadata. If
+		aggregated we refer to statistical data as "macro-data" whereas if
+		not, we refer to "micro-data".
+	</p>
+	<p>
+		Statistical data can be collected in a
+		<dfn>dataset</dfn>
+		, typically published and maintained by an organisation [<cite><a
+			href="#ref-SDMX">SDMX</a></cite>]. The dataset contains metadata, e.g.,
+		about the time of collection and publication or about the maintaining
+		and publishing organisation.
 	</p>
 
 	<p>
@@ -96,506 +176,88 @@
 		<dfn>consumer</dfn>
 		is a person or agent that uses Linked Data from the Web.
 	</p>
-
+	<p>
+		A
+		<dfn>registry</dfn>
+		collects metadata about statistical data in a registration fashion.
+	</p>
 	</section>
 
 
 	<section>
-	<h2>Use cases</h2>
-	<p>
-		This section presents scenarios that would be enabled by the existence
-		of a standard vocabulary for the representation of statistics as
-		Linked Data. Since a draft of the specification of the cube vocabulary
-		has been published, and the vocabulary already is in use, we will call
-		this standard vocabulary after its current name RDF Data Cube
-		vocabulary (short <a href="#ref-QB">[QB]</a>) throughout the document.
-	</p>
-	<p>We distinguish between use cases of publishing statistical data,
-		and use cases of consuming statistical data since requirements for
-		publishers and consumers of statistical data differ.</p>
-	<section>
-	<h3>Publishing statistical data</h3>
+	<h2 id="usecases">Use cases</h2>
+	<p>This section presents scenarios that are enabled by the
+		existence of an vocabulary for the representation of statistics as
+		Linked Data.</p>
 
 	<section>
-	<h4>Publishing general statistics in a machine-readable and
-		application-independent way (UC 1)</h4>
-	<p>More and more organizations want to publish statistics on the
-		web, for reasons such as increasing transparency and trust. Although
-		in the ideal case, published data can be understood by both humans and
-		machines, data often is simply published as CSV, PDF, XSL etc.,
-		lacking elaborate metadata, which makes free usage and analysis
-		difficult.</p>
-
-	<p>The goal in this use case is to use a machine-readable and
-		application-independent description of common statistics with use of
-		open standards. The use case is fulfilled if QB will be a Linked Data
-		vocabulary for encoding statistical data that has a hypercube
-		structure and as such can describe common statistics in a
-		machine-readable and application-independent way.</p>
-
+	<h3 id="SDMXWebDisseminationUseCase">SDMX Web Dissemination Use
+		Case</h3>
 	<p>
-		An example scenario of this use case has been to publish the Combined
-		Online Information System (<a
-			href="http://data.gov.uk/resources/coins">COINS</a>). There, HM
-		Treasury, the principal custodian of financial data for the UK
-		government, released previously restricted information from its
-		Combined Online Information System (COINS). Five data files were
-		released containing between 3.3 and 4.9 million rows of data. The
-		COINS dataset was translated into RDF for two reasons:
+		<span style="font-size: 10pt">(Use case taken from SDMX Web
+			Dissemination Use Case [<cite><a href="#ref-SDMX-21">SDMX
+					2.1</a></cite>]
+		</span>
 	</p>
+	<p>Since we have adopted the multidimensional model that underlies
+		SDMX, we also adopt the "Web Dissemination Use Case" which is the
+		prime use case for SDMX since it is an increasing popular use of SDMX
+		and enables organisations to build a self-updating dissemination
+		system.</p>
+	<p>The Web Dissemination Use Case contains three actors, a
+		structural metadata web service (registry) that collects metadata
+		about statistical data in a registration fashion, a data web service
+		(publisher) that publishes statistical data and its metadata as
+		registered in the structural metadata web service, and a data
+		consumption application (consumer) that first discovers data from the
+		registry, then queries data from the corresponding publisher of
+		selected data, and then visualises the data.</p>
+	<p>Abstracted from the SDMX specificities, this use case contains
+		the following processes, also illustrated in a process flow diagram by
+		SDMX and in more detail described as follows:</p>
 
-	<ol>
-		<li>To publish statistics (e.g., as data files) are too large to
-			load into widely available analysis tools such as Microsoft Excel, a
-			common tool-of-choice for many data investigators.</li>
-		<li>COINS is a highly technical information source, requiring
-			both domain and technical skills to make useful applications around
-			the data.</li>
-	</ol>
-	<p>Publishing statistics is challenging for the several reasons:</p>
-	<p>
-		Representing observations and measurements requires more complex
-		modeling as discussed by Martin Fowler <a href="#Fowler1997">[Fowler,
-			1997]</a>: Recording a statistic simply as an attribute to an object
-		(e.g., a the fact that a person weighs 185 pounds) fails with
-		representing important concepts such as quantity, measurement, and
-		observation.
+	<p class="caption">
+		Process flow diagram by SDMX [<cite><a href="#ref-SDMX-21">SDMX
+				2.1</a></cite>]
 	</p>
 
-	<p>Quantity comprises necessary information to interpret the value,
-		e.g., the unit and arithmetical and comparative operations; humans and
-		machines can appropriately visualize such quantities or have
-		conversions between different quantities.</p>
-
-	<p>A Measurement separates a quantity from the actual event at
-		which it was collected; a measurement assigns a quantity to a specific
-		phenomenon type (e.g., strength). Also, a measurement can record
-		metadata such as who did the measurement (person), and when was it
-		done (time).</p>
-
-	<p>Observations, eventually, abstract from measurements only
-		recording numeric quantities. An Observation can also assign a
-		category observation (e.g., blood group A) to an observation. Figure
-		demonstrates this relationship.</p>
-	<p>
-	<div class="fig">
-		<a href="figures/modeling_quantity_measurement_observation.png"><img
-			src="figures/modeling_quantity_measurement_observation.png"
-			alt="Modeling quantity, measurement, observation" /> </a>
-		<div>Modeling quantity, measurement, observation</div>
-	</div>
-	</div>
+	<p align="center">
+		<img alt="SDMX Web Dissemination Use Case"
+			src="./figures/SDMX_Web_Dissemination_Use_Case.png"></img>
 	</p>
+	<p>Benefits:</p>
+	<p>A structural metadata source (registry) collects metadata about
+		statistical data.</p>
+	<p>A data web service (publisher) registers statistical data in a
+		registry, and provides statistical data from a database and metadata
+		from a metadata repository for consumers. For that, the publisher
+		creates database tables (see 1 in figure), and loads statistical data
+		in a database and metadata in a metadata repository.</p>
+	<p>A consumer discovers data from a registry (3) and creates a
+		query to the publisher for selected statistical data (4).</p>
+	<p>The publisher translates the query to a query to its database
+		(5) as well as metadata repository (6) and returns the statistical
+		data and metadata.</p>
+	<p>The consumer visualises the returned statistical data and
+		metadata.</p>
 
-	<p>QB deploys the multidimensional model (made of observations with
-		Measures depending on Dimensions and Dimension Members, and further
-		contextualized by Attributes) and should cater for these complexity in
-		modelling.</p>
-	<p>Another challenge is that for brevity reasons and to avoid
-		repetition, it is useful to have abbreviation mechanisms such as
-		assigning overall valid properties of observations at the dataset or
-		slice level, and become implicitly part of each observation. For
-		instance, in the case of COINS, all of the values are in thousands of
-		pounds sterling. However, one of the use cases for the linked data
-		version of COINS is to allow others to link to individual
-		observations, which suggests that these observations should be
-		standalone and self-contained – and should therefore have explicit
-		multipliers and units on each observation. One suggestion is to author
-		data without the duplication, but have the data publication tools
-		"flatten" the compact representation into standalone observations
-		during the publication process.</p>
-	<p>A further challenge is related to slices of data. Slices of data
-		group observations that are of special interest, e.g., slices
-		unemployment rates per year of a specific gender are suitable for
-		direct visualization in a line diagram. However, depending on the
-		number of Dimensions, the number of possible slices can become large
-		which makes it difficult to select all interesting slices. Therefore,
-		and because of their additional complexity, not many publishers create
-		slices. In fact, it is somewhat unclear at this point which slices
-		through the data will be useful to (COINS-RDF) users.</p>
-	<p>Unanticipated Uses (optional): -</p>
-	<p>Existing Work (optional): -</p>
+	<p>Requirements:</p>
 
-	</section> <section>
-	<h4>Publishing one or many MS excel spreadsheet files with
-		statistical data on the web (UC 2)</h4>
-	<p>Not only in government, there is a need to publish considerable
-		amounts of statistical data to be consumed in various (also
-		unexpected) application scenarios. Typically, Microsoft Excel sheets
-		are made available for download. Those excel sheets contain single
-		spreadsheets with several multidimensional data tables, having a name
-		and notes, as well as column values, row values, and cell values.</p>
-	<p>The goal in this use case is to to publish spreadsheet
-		information in a machine-readable format on the web, e.g., so that
-		crawlers can find spreadsheets that use a certain column value. The
-		published data should represent and make available for queries the
-		most important information in the spreadsheets, e.g., rows, columns,
-		and cell values. QB should provide the level of detail that is needed
-		for such a transformation in order to fulfil this use case.</p>
-	<p>In a possible use case scenario an institution wants to develop
-		or use a software that transforms their excel sheets into the
-		appropriate format.</p>
-
-	<p class="editorsnote">@@TODO: Concrete example needed.</p>
-	<p>Challenges of this use case are:</p>
-	<ul>
-		<li>Excel sheets provide much flexibility in arranging
-			information. It may be necessary to limit this flexibility to allow
-			automatic transformation.</li>
-		<li>There may be many spreadsheets.</li>
-		<li>Semi-structured information, e.g., notes about lineage of
-			data cells, may not be possible to be formalized.</li>
-	</ul>
-	<p>Unanticipated Uses (optional): -</p>
-	<p>
-		Existing Work (optional): Stats2RDF uses OntoWiki to translate CSV
-		into QB <a href="http://aksw.org/Projects/Stats2RDF">[Stats2RDF]</a>.
-	</p>
+	<p>The SDMX Web Dissemination Use Case can be concretised by
+		several sub-use cases, detailed in the following sections.</p>
 
 	</section> <section>
-	<h4>Publishing SDMX as Linked Data (UC 3)</h4>
-	<p>The ISO standard for exchanging and sharing statistical data and
-		metadata among organizations is Statistical Data and Metadata eXchange
-		(SDMX). Since this standard has proven applicable in many contexts, QB
-		is designed to be compatible with the multidimensional model that
-		underlies SDMX.</p>
-	<p class="editorsnote">@@TODO: The QB spec should maybe also use
-		the term "multidimensional model" instead of the less clear "cube
-		model" term.</p>
-	<p>Therefore, it should be possible to re-publish SDMX data using
-		QB.</p>
-	<p>
-		The scenario for this use case is Eurostat <a
-			href="http://epp.eurostat.ec.europa.eu/">[EUROSTAT]</a>, which
-		publishes large amounts of European statistics coming from a data
-		warehouse as SDMX and other formats on the web. Eurostat also provides
-		an interface to browse and explore the datasets. However, linking such
-		multidimensional data to related data sets and concepts would require
-		download of interesting datasets and manual integration.
-	</p>
-	<p>The goal of this use case is to improve integration with other
-		datasets; Eurostat data should be published on the web in a
-		machine-readable format, possible to be linked with other datasets,
-		and possible to be freeley consumed by applications. This use case is
-		fulfilled if QB can be used for publishing the data from Eurostat as
-		Linked Data for integration.</p>
-	<p>A publisher wants to make available Eurostat data as Linked
-		Data. The statistical data shall be published as is. It is not
-		necessary to represent information for validation. Data is read from
-		tsv only. There are two concrete examples of this use case: Eurostat
-		Linked Data Wrapper (http://estatwrap.ontologycentral.com/), and
-		Linked Statistics Eurostat Data
-		(http://eurostat.linked-statistics.org/). They have slightly different
-		focus (e.g., with respect to completeness, performance, and agility).
-	</p>
-	<p>Challenges of this use case are:</p>
-	<ul>
-		<li>There are large amounts of SDMX data; the Eurostat dataset
-			comprises 350 GB of data. This may influence decisions about toolsets
-			and architectures to use. One important task is to decide whether to
-			structure the data in separate datasets.</li>
-		<li>Again, the question comes up whether slices are useful.</li>
-	</ul>
-	<p>Unanticipated Uses (optional): -</p>
-	<p>Existing Work (optional): -</p>
-	</section> <section>
-	<h4>Publishing sensor data as statistics (UC 4)</h4>
-	<p>Typically, multidimensional data is aggregated. However, there
-		are cases where non-aggregated data needs to be published, e.g.,
-		observational, sensor network and forecast data sets. Such raw data
-		may be available in RDF, already, but using a different vocabulary.</p>
-	<p>The goal of this use case is to demonstrate that publishing of
-		aggregate values or of raw data should not make much of a difference
-		in QB.</p>
-	<p>
-		For example the Environment Agency uses it to publish (at least
-		weekly) information on the quality of bathing waters around England
-		and Wales <A
-			href="http://www.epimorphics.com/web/wiki/bathing-water-quality-structure-published-linked-data">[EnvAge]</A>.
-		In another scenario DERI tracks from measurements about printing for a
-		sustainability report. In the DERI scenario, raw data (number of
-		printouts per person) is collected, then aggregated on a unit level,
-		and then modelled using QB.
-	</p>
-	<p>Problems and Limitations:</p>
-	<ul>
-		<li>This use case also shall demonstrate how to link statistics
-			with other statistics or non-statistical data (metadata).</li>
-	</ul>
-	<p>Unanticipated Uses (optional): -</p>
+	<h3 id="COINS">Publisher Use Case: UK government financial data
+		from Combined Online Information System (COINS)</h3>
 	<p>
-		Existing Work (optional): Semantic Sensor Network ontology <A
-			href="http://purl.oclc.org/NET/ssnx/ssn">[SSN]</A> already provides a
-		way to publish sensor information. SSN data provides statistical
-		Linked Data and grounds its data to the domain, e.g., sensors that
-		collect observations (e.g., sensors measuring average of temperature
-		over location and time). A number of organizations, particularly in
-		the Climate and Meteorological area already have some commitment to
-		the OGC "Observations and Measurements" (O&M) logical data model, also
-		published as ISO 19156. The QB spec should maybe also prefer the term
-		"multidimensional model" instead of the less clear "cube model" term.
-
-
-	
-	<p class="editorsnote">@@TODO: Are there any statements about
-		compatibility and interoperability between O&M and Data Cube that can
-		be made to give guidance to such organizations?</p>
-	</p>
-	</section> <section>
-	<h4>Registering statistical data in dataset catalogs (UC 5)</h4>
-	<p>
-		After statistics have been published as Linked Data, the question
-		remains how to communicate the publication and let users find the
-		statistics. There are catalogs to register datasets, e.g., CKAN, <a
-			href="http://www.datacite.org/datacite.org">datacite.org</a>, <a
-			href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a
-			href="http://pangaea.de/">Pangea</a>. Those catalogs require specific
-		configurations to register statistical data.
+		<span style="font-size: 10pt">(This use case has been
+			summarised from Ian Dickinson et al. (COINS as Linked Data.
+			http://data.gov.uk/resources/coins. Last visited on Jan 9 2013). </span>
 	</p>
-	<p>The goal of this use case is to demonstrate how to expose and
-		distribute statistics after modeling using QB. For instance, to allow
-		automatic registration of statistical data in such catalogs, for
-		finding and evaluating datasets. To solve this issue, it should be
-		possible to transform QB data into formats that can be used by data
-		catalogs.</p>
-
-	<p class="editorsnote">@@TODO: Find specific use case scenario or
-		ask how other publishers of QB data have dealt with this issue Maybe
-		relation to DCAT?</p>
-	<p>Problems and Limitations: -</p>
-	<p>Unanticipated Uses (optional): If data catalogs contain
-		statistics, they do not expose those using Linked Data but for
-		instance using CSV or HTML (Pangea [11]). It could also be a use case
-		to publish such data using QB.</p>
-	<p>Existing Work (optional): -</p>
-	</section> <section>
-	<h4>Making transparent transformations on or different versions of
-		statistical data (UC 6)</h4>
-	<p>Statistical data often is used and further transformed for
-		analysis and reporting. There is the risk that data has been
-		incorrectly transformed so that the result is not interpretable any
-		more. Therefore, if statistical data has been derived from other
-		statistical data, this should be made transparent.</p>
-	<p>The goal of this use case is to describe provenance and
-		versioning around statistical data, so that the history of statistics
-		published on the web becomes clear. This may also relate to the issue
-		of having relationships between datasets published using QB. To fulfil
-		this use case QB should recommend specific approaches to transforming
-		and deriving of datasets which can be tracked and stored with the
-		statistical data.</p>
-
-	<p>A simple specific use case is that the Welsh Assembly government
-		publishes a variety of population datasets broken down in different
-		ways. For many uses then population broken down by some category (e.g.
-		ethnicity) is expressed as a percentage. Separate datasets give the
-		actual counts per category and aggregate counts. In such cases it is
-		common to talk about the denominator (often DENOM) which is the
-		aggregate count against which the percentages can be interpreted.</p>
-	<p>Challenges of this use case are:</p>
-	<ul>
-		<li>Operations on statistical data result in new statistical
-			data, depending on the operation. For intance, in terms of Data Cube,
-			operations such as slice, dice, roll-up, drill-down will result in
-			new Data Cubes. This may require representing general relationships
-			between cubes (as discussed here: [12]).</li>
-		<li>Should Data Cube support explicit declaration of such
-			relationships either between separated qb:DataSets or between
-			measures with a single qb:DataSet (e.g. ex:populationCount and
-			ex:populationPercent)?</li>
-		<li>If so should that be scoped to simple, common relationships
-			like DENOM or allow expression of arbitrary mathematical relations?</li>
-	</ul>
-	<p>Unanticipated Uses (optional): -</p>
-	<p>Existing Work (optional): Possible relation to Best Practices
-		part on Versioning [13], where it is specified how to publish data
-		which has multiple versions.</p>
-
-
-	</section></section> <section>
-	<h3>Consuming published statistical data</h3>
+	</section> </section>
 
 	<section>
-	<h4>Simple chart visualizations of (integrated) published
-		statistical datasets (UC 7)</h4>
-	<p>Data that is published on the Web is typically visualized by
-		transforming it manually into CSV or Excel and then creating a
-		visualization on top of these formats using Excel, Tableau,
-		RapidMiner, Rattle, Weka etc.</p>
-	<p>This use case shall demonstrate how statistical data published
-		on the web can be directly visualized, without using commercial or
-		highly-complex tools. This use case is fulfilled if data that is
-		published in QB can be directly visualized inside a webpage.</p>
-	<p>An example scenario is environmental research done within the
-		SMART research project (http://www.iwrm-smart.org/). Here, statistics
-		about environmental aspects (e.g., measurements about the climate in
-		the Lower Jordan Valley) shall be visualized for scientists and
-		decision makers. Statistics should also be possible to be integrated
-		and displayed together. The data is available as XML files on the web.
-		On a separate website, specific parts of the data shall be queried and
-		visualized in simple charts, e.g., line diagrams. The following figure
-		shows the wanted display of an environmental measure over time for
-		three regions in the lower Jordan valley; displayed inside a web page:</p>
-
-	<p>
-	<div class="fig">
-		<a href="figures/Level_above_msl_3_locations.png"><img
-			width="800px" src="figures/Level_above_msl_3_locations.png"
-			alt="Line chart visualization of QB data" /> </a>
-		<div>Line chart visualization of QB data</div>
-	</div>
-	</div>
-	</p>
-
-	<p>The following figure shows the same measures in a pivot table.
-		Here, the aggregate COUNT of measures per cell is given.</p>
-
-	<p>
-	<div class="fig">
-		<a href="figures/pivot_analysis_measurements.PNG"><img
-			src="figures/pivot_analysis_measurements.PNG"
-			alt="Pivot analysis measurements" /> </a>
-		<div>Pivot analysis measurements</div>
-	</div>
-	</div>
-	</p>
-
-	<p>The use case uses Google App Engine, Qcrumb.com, and Spark. An
-		example of a line diagram is given at [14] (some loading time needed).
-		Current work tries to integrate current datasets with additional data
-		sources, and then having queries that take data from both datasets and
-		display them together.</p>
-	<p>Challenges of this use case are:</p>
-	<ul>
-		<li>The difficulties lay in structuring the data appropriately so
-			that the specific information can be queried.</li>
-		<li>Also, data shall be published with having potential
-			integration in mind. Therefore, e.g., units of measurements need to
-			be represented.</li>
-		<li>Integration becomes much more difficult if publishers use
-			different measures, dimensions.</li>
-
-	</ul>
-	<p>Unanticipated Uses (optional): -</p>
-	<p>Existing Work (optional): -</p>
-	</section> <section>
-	<h4>Uploading published statistical data in Google Public Data
-		Explorer (UC 8)</h4>
-	<p>Google Public Data Explorer (GPDE -
-		http://code.google.com/apis/publicdata/) provides an easy possibility
-		to visualize and explore statistical data. Data needs to be in the
-		Dataset Publishing Language (DSPL -
-		https://developers.google.com/public-data/overview) to be uploaded to
-		the data explorer. A DSPL dataset is a bundle that contains an XML
-		file, the schema, and a set of CSV files, the actual data. Google
-		provides a tutorial to create a DSPL dataset from your data, e.g., in
-		CSV. This requires a good understanding of XML, as well as a good
-		understanding of the data that shall be visualized and explored.</p>
-	<p>In this use case, it shall be demonstrate how to take any
-		published QB dataset and to transform it automatically into DSPL for
-		visualization and exploration. A dataset that is published conforming
-		to QB will provide the level of detail that is needed for such a
-		transformation.</p>
-	<p>In an example scenario, a publisher P has published data using
-		QB. There are two different ways to fulfil this use case: 1) A
-		customer C is downloading this data into a triple store; SPARQL
-		queries on this data can be used to transform the data into DSPL and
-		uploaded and visualized using GPDE. 2) or, one or more XLST
-		transformation on the RDF/XML transforms the data into DSPL.</p>
-	<p>Challenges of this use case are:</p>
-	<ul>
-		<li>The technical challenges for the consumer here lay in knowing
-			where to download what data and how to get it transformed into DSPL
-			without knowing the data.</li>
-		<p>Unanticipated Uses (optional): DSPL is representative for using
-			statistical data published on the web in available tools for
-			analysis. Similar tools that may be automatically covered are: Weka
-			(arff data format), Tableau, etc.</p>
-		<p>Existing Work (optional): -</p>
-	</ul>
-	<p>Unanticipated Uses (optional): -</p>
-	<p>Existing Work (optional): -</p>
-	</section> <section>
-	<h4>Allow Online Analytical Processing on published datasets of
-		statistical data (UC 9)</h4>
-	<p>Online Analytical Processing [15] is an analysis method on
-		multidimensional data. It is an explorative analysis methode that
-		allows users to interactively view the data on different angles
-		(rotate, select) or granularities (drill-down, roll-up), and filter it
-		for specific information (slice, dice).</p>
-	<p>The multidimensional model used in QB to model statistics should
-		be usable by OLAP systems. More specifically, data that conforms to QB
-		can be used to define a Data Cube within an OLAP engine and can then
-		be queries by OLAP clients.</p>
-	<p>An example scenario of this use case is the Financial
-		Information Observation System (FIOS) [16], where XBRL data has been
-		re-published using QB and made analysable for stakeholders in a
-		web-based OLAP client. The following figure shows an example of using
-		FIOS. Here, for three different companies, cost of goods sold as
-		disclosed in XBRL documents are analysed. As cell values either the
-		number of disclosures or - if only one available - the actual number
-		in USD is given:</p>
-
-	<p>
-	<div class="fig">
-		<a href="figures/FIOS_example.PNG"><img
-			src="figures/FIOS_example.PNG" alt="OLAP of QB data" /> </a>
-		<div>OLAP of QB data</div>
-	</div>
-	</div>
-	</p>
-	<p>Challenges of this use case are:</p>
-	<ul>
-		<li>A problem lies in the strict separation between queries for
-			the structure of data, and queries for actual aggregated values.</li>
-		<li>Another problem lies in defining Data Cubes without greater
-			insight in the data beforehand.</li>
-		<li>Depending on the expressivity of the OLAP queries (e.g.,
-			aggregation functions, hierarchies, ordering), performance plays an
-			important role.</li>
-		<li>QB allows flexibility in describing statistics, e.g., in
-			order to reduce redundancy of information in single observations.
-			These alternatives make general consumption of QB data more complex.
-			Also, it is not clear, what "conforms" to QB means, e.g., is a
-			qb:DataStructureDefinition required?</li>
-		<p>Unanticipated Uses (optional): -</p>
-		<p>Existing Work (optional): -</p>
-	</ul>
-	<p>Unanticipated Uses (optional): -</p>
-	<p>Existing Work (optional): -</p>
-	</section> <section>
-	<h4>Transforming published statistics into XBRL (UC 10)</h4>
-	<p>XBRL is a standard data format for disclosing financial
-		information. Typically, financial data is not managed within the
-		organization using XBRL but instead, internal formats such as excel or
-		relational databases are used. If different data sources are to be
-		summarized in XBRL data formats to be published, an internally-used
-		standard format such as QB could help integrate and transform the data
-		into the appropriate format.</p>
-	<p>In this use case data that is available as data conforming to QB
-		should also be possible to be automatically transformed into such XBRL
-		data format. This use case is fulfilled if QB contains necessary
-		information to derive XBRL data.</p>
-	<p>In an example scenario, DERI has had a use case to publish
-		sustainable IT information as XBRL to the Global Reporting Initiative
-		(GRI - https://www.globalreporting.org/). Here, raw data (number of
-		printouts per person) is collected, then aggregated on a unit level
-		and modelled using QB. QB data shall then be used directly to fill-in
-		XBRL documents that can be published to the GRI.</p>
-	<p>Challenges of this use case are:</p>
-	<ul>
-		<li>So far, QB data has been transformed into semantic XBRL, a
-			vocabulary closer to XBRL. There is the chance that certain
-			information required in a GRI XBRL document cannot be encoded using a
-			vocabulary as general as QB. In this case, QB could be used in
-			concordance with semantic XBRL.</li>
-	</ul>
-	<p class="editorsnote">@@TODO: Add link to semantic XBRL.</p>
-	<p>Unanticipated Uses (optional): -</p>
-	<p>Existing Work (optional): -</p>
-
-	</section> </section></section>
-	<section>
-	<h2>Requirements</h2>
+	<h2 id="requirements">Requirements</h2>
 
 	<p>The use cases presented in the previous section give rise to the
 		following requirements for a standard representation of statistics.
@@ -684,6 +346,41 @@
 	
 	
 	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
+	
 	</pre>
 	<p>What is the best way (in the context of the RDF/Data Cube/SDMX
 		approach) to express that the values for the England/Scotland/Wales/
@@ -793,7 +490,7 @@
 	<p>Required by: UC7, UC8, UC9, UC10</p>
 	</section> </section> </section>
 	<section class="appendix">
-	<h2>Acknowledgments</h2>
+	<h2 id="acknowledgements">Acknowledgements</h2>
 	<p>The editors are very thankful for comments and suggestions ...</p>
 	</section>
 
@@ -802,18 +499,32 @@
 	<dl>
 		<dt id="ref-SDMX">[SMDX]</dt>
 		<dd>
-			SMDX - User Guide 2009, <a
-				href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>
+			SMDX - SDMX User Guide Version 2009.1, <a
+				href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>,
+			last visited Jan 8 2013.
 		</dd>
 
-		<dt id="ref-SDMX">[Fowler1997]</dt>
+		<dt id="ref-SDMX-21">[SMDX 2.1]</dt>
+		<dd>
+			SDMX 2.1 User Guide Version. Version 0.1 - 19/09/2012. <a
+				href="http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf">http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf</a>.
+			Last visited on 8 Jan 2013.
+		</dd>
+
+		<dt id="ref-Fowler1997">[Fowler1997]</dt>
 		<dd>Fowler, Martin (1997). Analysis Patterns: Reusable Object
 			Models. Addison-Wesley. ISBN 0201895420.</dd>
 
-		<dt id="ref-QB">[QB]</dt>
+		<dt id="ref-QB">[QB-2010]</dt>
 		<dd>
 			RDF Data Cube vocabulary, <a
-				href="http://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html">http://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html</a>
+				href="http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html">http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</a>
+		</dd>
+
+		<dt id="ref-QB">[QB-2013]</dt>
+		<dd>
+			RDF Data Cube vocabulary, <a
+				href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a>
 		</dd>
 
 		<dt id="ref-OLAP">[OLAP]</dt>

--- a/data-cube-ucr/respec-config.js	Mon Feb 25 16:52:05 2013 +0100
+++ b/data-cube-ucr/respec-config.js	Mon Feb 25 16:52:18 2013 +0100
@@ -1,6 +1,6 @@
 var respecConfig = {
     // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
-    specStatus:           "ED",
+    specStatus:           "NOTE",
     //copyrightStart:       "2010",
 
     // the specification's short name, as in http://www.w3.org/TR/short-name/
@@ -11,13 +11,13 @@
 
     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
     // and its maturity status
-    //previousPublishDate:  "2011-06-26",
-    //previousMaturity:     "ED",
+    previousPublishDate:  "2012-02-22",
+    previousMaturity:     "ED",
     //previousDiffURI:      "http://dvcs.w3.org/hg/gld/bp/",
     //diffTool:             "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
 
     // if there a publicly available Editor's Draft, this is the link
-    edDraftURI:           "http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/index.html",
+    edDraftURI:           "http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-2012-02-22/index.html",
 
     // if this is a LCWD, uncomment and set the end of its review period
     // lcEnd: "2009-08-05",

author	bkaempge
	Mon, 25 Feb 2013 16:52:18 +0100
changeset 288	9901db54f738
parent 287	61fcdd2dc4c0
child 289	34d03b6b4249

data-cube-ucr/index.html
data-cube-ucr/respec-config.js