--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/data-cube-ucr/vocab-data-cube-use-cases-20130720/index.html Sat Jul 20 15:56:01 2013 +0200
@@ -0,0 +1,1952 @@
+<!--?xml version="1.0" encoding="UTF-8"?-->
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"><head>
+<meta http-equiv="content-type" content="text/html; charset=UTF-8">
+<title>Use Cases and Lessons for the Data Cube Vocabulary</title>
+
+
+<script src="index_files/respec-ref.js"></script>
+<script src="index_files/respec-config.js"></script>
+<link rel="stylesheet" type="text/css" href="index_files/local-style.css">
+<style>/*****************************************************************
+ * ReSpec 3 CSS
+ * Robin Berjon - http://berjon.com/
+ *****************************************************************/
+
+/* --- INLINES --- */
+em.rfc2119 {
+ text-transform: lowercase;
+ font-variant: small-caps;
+ font-style: normal;
+ color: #900;
+}
+
+h1 acronym, h2 acronym, h3 acronym, h4 acronym, h5 acronym, h6 acronym, a acronym,
+h1 abbr, h2 abbr, h3 abbr, h4 abbr, h5 abbr, h6 abbr, a abbr {
+ border: none;
+}
+
+dfn {
+ font-weight: bold;
+}
+
+a.internalDFN {
+ color: inherit;
+ border-bottom: 1px solid #99c;
+ text-decoration: none;
+}
+
+a.externalDFN {
+ color: inherit;
+ border-bottom: 1px dotted #ccc;
+ text-decoration: none;
+}
+
+a.bibref {
+ text-decoration: none;
+}
+
+cite .bibref {
+ font-style: normal;
+}
+
+code {
+ color: #ff4500;
+}
+
+
+/* --- --- */
+ol.algorithm { counter-reset:numsection; list-style-type: none; }
+ol.algorithm li { margin: 0.5em 0; }
+ol.algorithm li:before { font-weight: bold; counter-increment: numsection; content: counters(numsection, ".") ") "; }
+
+/* --- TOC --- */
+.toc a, .tof a {
+ text-decoration: none;
+}
+
+a .secno, a .figno {
+ color: #000;
+}
+
+ul.tof, ol.tof {
+ list-style: none outside none;
+}
+
+.caption {
+ margin-top: 0.5em;
+ font-style: italic;
+}
+
+/* --- TABLE --- */
+table.simple {
+ border-spacing: 0;
+ border-collapse: collapse;
+ border-bottom: 3px solid #005a9c;
+}
+
+.simple th {
+ background: #005a9c;
+ color: #fff;
+ padding: 3px 5px;
+ text-align: left;
+}
+
+.simple th[scope="row"] {
+ background: inherit;
+ color: inherit;
+ border-top: 1px solid #ddd;
+}
+
+.simple td {
+ padding: 3px 10px;
+ border-top: 1px solid #ddd;
+}
+
+.simple tr:nth-child(even) {
+ background: #f0f6ff;
+}
+
+/* --- DL --- */
+.section dd > p:first-child {
+ margin-top: 0;
+}
+
+.section dd > p:last-child {
+ margin-bottom: 0;
+}
+
+.section dd {
+ margin-bottom: 1em;
+}
+
+.section dl.attrs dd, .section dl.eldef dd {
+ margin-bottom: 0;
+}
+</style><link href="index_files/W3C-WG-NOTE.css" rel="stylesheet"></head>
+
+<body><div class="head">
+ <p>
+
+ <a href="http://www.w3.org/"><img src="index_files/w3c_home.png" alt="W3C" height="48" width="72"></a>
+
+ </p>
+ <h1 class="title" id="title">Use Cases and Lessons for the Data Cube Vocabulary</h1>
+
+ <h2 id="w3c-working-group-note-20-july-2013"><abbr title="World Wide Web Consortium">W3C</abbr> Working Group Note 20 July 2013</h2>
+ <dl>
+
+ <dt>This version:</dt>
+ <dd><a href="http://www.w3.org/TR/2013/NOTE-data-cube-ucr-20130720/">http://www.w3.org/TR/2013/NOTE-data-cube-ucr-20130720/</a></dd>
+ <dt>Latest published version:</dt>
+ <dd><a href="http://www.w3.org/TR/data-cube-ucr/">http://www.w3.org/TR/data-cube-ucr/</a></dd>
+
+
+ <dt>Latest editor's draft:</dt>
+ <dd><a href="http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-20120222/index.html">http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-20120222/index.html</a></dd>
+
+
+
+
+
+ <dt>Previous version:</dt>
+ <dd><a href=""></a></dd>
+
+
+ <dt>Editors:</dt>
+ <dd><a href="http://www.aifb.kit.edu/web/Benedikt_K%C3%A4mpgen/en">Benedikt Kämpgen</a>, <a href="http://www.fzi.de/index.php/en">FZI Karlsruhe</a></dd>
+<dd><a href="http://richard.cyganiak.de/">Richard Cyganiak</a>, <a href="http://www.deri.ie/">DERI, NUI Galway</a></dd>
+
+
+ </dl>
+
+
+
+
+
+ <p class="copyright">
+ <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> ©
+ 2013
+
+ <a href="http://www.w3.org/"><abbr title="World Wide Web Consortium">W3C</abbr></a><sup>®</sup>
+ (<a href="http://www.csail.mit.edu/"><abbr title="Massachusetts Institute of Technology">MIT</abbr></a>,
+ <a href="http://www.ercim.eu/"><abbr title="European Research Consortium for Informatics and Mathematics">ERCIM</abbr></a>,
+ <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved.
+ <abbr title="World Wide Web Consortium">W3C</abbr> <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
+ <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and
+ <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.
+ </p>
+
+
+ <hr>
+</div>
+
+ <section class="introductory" id="abstract"><h2>Abstract</h2>
+ <p>Many national, regional and local governments, as well as other
+ organizations in- and outside of the public sector, collect numeric
+ data and aggregate this data into statistics. There is a need to
+ publish these statistics in a standardized, machine-readable way on
+ the Web, so that they can be freely integrated and reused in consuming
+ applications.</p>
+ <p>
+ In this document, the <a href="http://www.w3.org/2011/gld/"><abbr title="World Wide Web Consortium">W3C</abbr>
+ Government Linked Data Working Group</a> presents use cases and lessons
+ supporting a recommendation of the RDF Data Cube Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>]. We describe case studies of
+ existing deployments of an earlier version of the Data Cube Vocabulary
+ [<cite><a href="#ref-QB-2010">QB-2010</a></cite>] as well as other
+ possible use cases that would benefit from using the vocabulary. In
+ particular, we identify benefits and challenges in using a vocabulary
+ for representing statistics. Also, we derive lessons that can be used
+ for future work on the vocabulary as well as for useful tools
+ complementing the vocabulary.
+ </p>
+ </section><section id="sotd" class="introductory"><h2>Status of This Document</h2>
+
+
+
+ <p>
+ <em>This section describes the status of this document at the time of its publication. Other
+ documents may supersede this document. A list of current <abbr title="World Wide Web Consortium">W3C</abbr> publications and the latest revision
+ of this technical report can be found in the <a href="http://www.w3.org/TR/"><abbr title="World Wide Web Consortium">W3C</abbr> technical reports
+ index</a> at http://www.w3.org/TR/.</em>
+ </p>
+
+ <p>
+ This document is an editorial update to an Editor's Draft of the "Use
+ Cases and Requirements for the Data Cube Vocabulary" developed by the
+ <a href="http://www.w3.org/2011/gld/"><abbr title="World Wide Web Consortium">W3C</abbr> Government Linked Data
+ Working Group</a>.
+ </p>
+
+ <p>
+ This document was published by the <a href="http://www.w3.org/2011/gld/">Government Linked Data Working Group</a> as a Working Group Note.
+
+
+ If you wish to make comments regarding this document, please send them to
+ <a href="mailto:public-gld-comments@w3.org">public-gld-comments@w3.org</a>
+ (<a href="mailto:public-gld-comments-request@w3.org?subject=subscribe">subscribe</a>,
+ <a href="http://lists.w3.org/Archives/Public/public-gld-comments/">archives</a>).
+
+
+
+
+ All comments are welcome.
+
+
+ </p><p>
+ Publication as a Working Group Note does not imply endorsement by the <abbr title="World Wide Web Consortium">W3C</abbr> Membership.
+ This is a draft document and may be updated, replaced or obsoleted by other documents at
+ any time. It is inappropriate to cite this document as other than work in progress.
+ </p>
+
+
+ <p>
+
+ This document was produced by a group operating under the
+ <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 <abbr title="World Wide Web Consortium">W3C</abbr> Patent Policy</a>.
+
+
+
+
+ <abbr title="World Wide Web Consortium">W3C</abbr> maintains a <a href="" rel="disclosure">public list of any patent disclosures</a>
+
+ made in connection with the deliverables of the group; that page also includes instructions for
+ disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains
+ <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the
+ information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section
+ 6 of the <abbr title="World Wide Web Consortium">W3C</abbr> Patent Policy</a>.
+
+
+ </p>
+
+
+
+
+</section><section id="toc"><h2 class="introductory">Table of Contents</h2><ul class="toc"><li class="tocline"><a class="tocxref" href="#introduction-1"><span class="secno">1. </span>Introduction</a></li><li class="tocline"><a class="tocxref" href="#terminology-1"><span class="secno">2. </span>Terminology</a></li><li class="tocline"><a class="tocxref" href="#use-cases"><span class="secno">3. </span>Use cases</a><ul class="toc"><li class="tocline"><a class="tocxref" href="#sdmx-web-dissemination-use-case"><span class="secno">3.1 </span>SDMX Web Dissemination Use
+ Case</a></li><li class="tocline"><a class="tocxref" href="#publisher-case-study-uk-government-financial-data-from-combined-online-information-system-coins"><span class="secno">3.2 </span>Publisher
+ Case Study: UK government financial data from Combined Online
+ Information System (COINS)</a></li><li class="tocline"><a class="tocxref" href="#publisher-use-case-publishing-excel-spreadsheets-about-dutch-historical-census-data-as-linked-data"><span class="secno">3.3 </span>Publisher Use
+ Case: Publishing Excel Spreadsheets about Dutch historical census data
+ as Linked Data</a></li><li class="tocline"><a class="tocxref" href="#publisher-use-case-publishing-hierarchically-structured-data-from-statswales-and-open-data-communities"><span class="secno">3.4 </span>Publisher
+ Use Case: Publishing hierarchically structured data from StatsWales
+ and Open Data Communities</a></li><li class="tocline"><a class="tocxref" href="#publisher-case-study-publishing-observational-data-sets-about-uk-bathing-water-quality"><span class="secno">3.5 </span>Publisher
+ Case Study: Publishing Observational Data Sets about UK Bathing Water
+ Quality</a></li><li class="tocline"><a class="tocxref" href="#publisher-case-study-site-specific-weather-forecasts-from-met-office-the-uk-s-national-weather-service"><span class="secno">3.6 </span>Publisher Case Study: Site specific
+ weather forecasts from Met Office, the UK's National Weather Service</a></li><li class="tocline"><a class="tocxref" href="#publisher-case-study-eurostat-sdmx-as-linked-data"><span class="secno">3.7 </span>Publisher Case Study: Eurostat
+ SDMX as Linked Data</a></li><li class="tocline"><a class="tocxref" href="#publisher-case-study-improving-trust-in-published-sustainability-information-at-the-digital-enterprise-research-institute-deri"><span class="secno">3.8 </span>Publisher
+ Case Study: Improving trust in published sustainability information at
+ the Digital Enterprise Research Institute (DERI)</a></li><li class="tocline"><a class="tocxref" href="#consumer-case-study-simple-chart-visualizations-of-integrated-published-climate-sensor-data"><span class="secno">3.9 </span>Consumer
+ Case Study: Simple chart visualizations of (integrated) published
+ climate sensor data</a></li><li class="tocline"><a class="tocxref" href="#consumer-use-case-visualizing-published-statistical-data-in-google-public-data-explorer"><span class="secno">3.10 </span>Consumer
+ Use Case: Visualizing published statistical data in Google Public Data
+ Explorer</a></li><li class="tocline"><a class="tocxref" href="#consumer-case-study-analyzing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems"><span class="secno">3.11 </span>Consumer
+ Case Study: Analyzing published financial (XBRL) data from the SEC
+ with common OLAP systems</a></li><li class="tocline"><a class="tocxref" href="#registry-use-case-registering-published-statistical-data-in-data-catalogs"><span class="secno">3.12 </span>Registry
+ Use Case: Registering published statistical data in data catalogs</a></li></ul></li><li class="tocline"><a class="tocxref" href="#lessons"><span class="secno">4. </span>Lessons</a><ul class="toc"><li class="tocline"><a class="tocxref" href="#there-is-a-putative-requirement-to-update-to-sdmx-2.1-if-there-are-specific-use-cases-that-demand-it"><span class="secno">4.1 </span>There is
+ a putative requirement to update to SDMX 2.1 if there are specific use
+ cases that demand it</a></li><li class="tocline"><a class="tocxref" href="#publishers-may-need-more-guidance-in-creating-and-managing-slices-or-arbitrary-groups-of-observations"><span class="secno">4.2 </span>Publishers
+ may need more guidance in creating and managing slices or arbitrary
+ groups of observations</a></li><li class="tocline"><a class="tocxref" href="#publishers-may-need-more-guidance-to-decide-which-representation-of-hierarchies-is-most-suitable-for-their-use-case"><span class="secno">4.3 </span>Publishers
+ may need more guidance to decide which representation of hierarchies
+ is most suitable for their use case</a></li><li class="tocline"><a class="tocxref" href="#modelers-using-iso19156---observations-measurements-may-need-clarification-regarding-the-relationship-to-the-data-cube-vocabulary"><span class="secno">4.4 </span>Modelers
+ using ISO19156 - Observations & Measurements may need clarification
+ regarding the relationship to the Data Cube Vocabulary</a></li><li class="tocline"><a class="tocxref" href="#publishers-may-need-guidance-in-how-to-represent-common-analytical-operations-such-as-slice-dice-rollup-on-data-cubes"><span class="secno">4.5 </span>Publishers
+ may need guidance in how to represent common analytical operations
+ such as Slice, Dice, Rollup on data cubes</a></li><li class="tocline"><a class="tocxref" href="#publishers-may-need-guidance-in-making-transparent-the-pre-processing-of-aggregate-statistics"><span class="secno">4.6 </span>Publishers
+ may need guidance in making transparent the pre-processing of
+ aggregate statistics</a></li><li class="tocline"><a class="tocxref" href="#publishers-and-consumers-may-need-guidance-in-checking-and-making-use-of-well-formedness-of-published-data-using-data-cube"><span class="secno">4.7 </span>Publishers
+ and consumers may need guidance in checking and making use of
+ well-formedness of published data using data cube</a></li><li class="tocline"><a class="tocxref" href="#publishers-may-need-guidance-in-conversions-from-common-statistical-representations-such-as-csv-excel-arff-etc."><span class="secno">4.8 </span>Publishers
+ may need guidance in conversions from common statistical
+ representations such as CSV, Excel, ARFF etc.</a></li><li class="tocline"><a class="tocxref" href="#consumers-may-need-guidance-in-conversions-into-formats-that-can-easily-be-displayed-and-further-investigated-in-tools-such-as-google-data-explorer-r-weka-etc."><span class="secno">4.9 </span>Consumers
+ may need guidance in conversions into formats that can easily be
+ displayed and further investigated in tools such as Google Data
+ Explorer, R, Weka etc.</a></li><li class="tocline"><a class="tocxref" href="#publishers-and-consumers-may-need-more-guidance-in-efficiently-processing-data-using-the-data-cube-vocabulary"><span class="secno">4.10 </span>Publishers
+ and consumers may need more guidance in efficiently processing data
+ using the Data Cube Vocabulary</a></li><li class="tocline"><a class="tocxref" href="#publishers-may-need-guidance-in-communicating-the-availability-of-published-statistical-data-to-external-parties-and-to-allow-automatic-discovery-of-statistical-data"><span class="secno">4.11 </span>Publishers
+ may need guidance in communicating the availability of published
+ statistical data to external parties and to allow automatic discovery
+ of statistical data</a></li></ul></li><li class="tocline"><a class="tocxref" href="#acknowledgements-1"><span class="secno">A. </span>Acknowledgements</a></li></ul></section>
+
+
+
+ <section id="introduction-1">
+ <!--OddPage--><h2 id="introduction"><span class="secno">1. </span>Introduction</h2>
+ The aim of this document is to present concrete use cases and lessons
+ for a vocabulary to publish statistics as Linked Data. An earlier
+ version of the Data Cube Vocabulary [<cite><a href="#ref-QB-2010">QB-2010</a></cite>] has existed for some time and has
+ proven applicable in <a href="http://wiki.planet-data.eu/web/Datasets">several
+ deployments</a>. The <a href="http://www.w3.org/2011/gld/"><abbr title="World Wide Web Consortium">W3C</abbr>
+ Government Linked Data Working Group</a> intends to transform the data
+ cube vocabulary into a <abbr title="World Wide Web Consortium">W3C</abbr> Recommendation of the RDF Data Cube
+ Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>]. In this
+ document, we describe use cases that would benefit from using the
+ vocabulary. In particular, we identify possible benefits and challenges
+ in using such a vocabulary for representing statistics. Also, we derive
+ lessons that can motivate future work on the vocabulary as well as
+ associated tools or services complementing the vocabulary.
+
+ <p>The rest of this document is structured as follows. We will
+ first give a short introduction to modeling statistics. Then, we will
+ describe use cases that have been derived from existing deployments or
+ from feedback to the earlier version of the Data Cube Vocabulary. In
+ particular, we describe possible benefits and challenges of use cases.
+ Afterwards, we will describe lessons derived from the use cases.</p>
+
+ <p>We use the term "Data Cube Vocabulary" throughout the document
+ when referring to the vocabulary.</p>
+
+ <p>In the following, we describe the challenge of authoring an RDF
+ vocabulary for publishing statistics as Linked Data. Describing
+ statistics — collected and aggregated numeric data — is challenging
+ for the following reasons:</p>
+ <ul>
+ <li>Representing statistics requires more complex modeling as
+ discussed by Martin Fowler [<cite><a href="#ref-FOWLER97">FOWLER97</a></cite>]:
+ Recording a statistic simply as an attribute to an object (e.g., the
+ fact that a person weighs 185 pounds) fails to represent important
+ concepts such as quantity, measurement, and unit. Instead, a
+ statistic is modeled as a distinguishable object, an observation.
+ </li>
+ <li>The object describes an observation of a value, e.g., a
+ numeric value (e.g., 185) in the case of a measurement and a categorical
+ value (e.g., "blood group A") in the case of a categorical observation.</li>
+ <li>To allow correct interpretation of the value, the observation
+ needs to be further described by "dimensions" such as the specific
+ phenomenon, e.g., "weight", the time the observation is valid, e.g.,
+ "January 2013" or a location where the observation was made, e.g., "New
+ York".</li>
+ <li>To further improve interpretation of the value, attributes
+ such as presentational information, e.g., a series title "COINS 2010
+ to 2013" or critical information to understanding the data, e.g., the
+ unit of measure "miles" can be given to observations.</li>
+ <li>Given background information, e.g., arithmetical and
+ comparative operations, humans and machines can appropriately
+ visualize such observations or perform conversions between different
+ quantities.</li>
+ </ul>
+
+ <p>
+ The Statistical Data and Metadata eXchange [<cite><a href="#ref-SDMX">SDMX</a></cite>] — the ISO standard for exchanging and
+ sharing statistical data and metadata among organizations — uses a
+ "multidimensional model" to meet the above challenges in modeling
+ statistics. It can describe statistics as observations. Observations
+ exhibit values (Measures) that depend on dimensions (Members of
+ Dimensions). Since the SDMX standard has proven applicable in many
+ contexts, the Data Cube Vocabulary adopts the multidimensional model that
+ underlies SDMX and will be compatible with SDMX.
+ </p>
+ </section>
+
+ <section id="terminology-1">
+ <!--OddPage--><h2 id="terminology"><span class="secno">2. </span>Terminology</h2>
+ <p>
+ <dfn id="dfn-statistics">Statistics</dfn>
+ is the <a href="http://en.wikipedia.org/wiki/Statistics">study</a> of
+ the collection, organization, analysis, and interpretation of data.
+ Statistics comprise statistical data.
+ </p>
+
+ <p>The basic structure of
+ <dfn id="dfn-statistical-data">statistical data</dfn>
+ is a multidimensional table (also called a data cube) [<cite><a href="#ref-SDMX">SDMX</a></cite>], i.e., a set of observed values organized
+ along a group of dimensions, together with associated metadata. We
+ refer to aggregated statistical data as "macro-data" and unaggregated
+ statistical data as "micro-data".
+ </p>
+ <p>
+ Statistical data can be collected in a
+ <dfn id="dfn-dataset">dataset,</dfn>
+ typically published and maintained by an organization [<cite><a href="#ref-SDMX">SDMX</a></cite>]. The dataset contains metadata, e.g.,
+ about the time of collection and publication or about the maintaining
+ and publishing organization.
+ </p>
+
+ <p>
+ <dfn id="dfn-source-data">Source data</dfn>
+ is data from data stores such as relational databases or spreadsheets
+ that acts as a source for the Linked Data publishing process.
+ </p>
+
+ <p>
+ <dfn id="dfn-metadata">Metadata</dfn>
+ about statistics defines the data structure and gives contextual
+ information about the statistics.
+ </p>
+
+ <p>
+ A format is
+ <dfn id="dfn-machine-readable">machine-readable</dfn>
+ if it is amenable to automated processing by a machine, as opposed to
+ presentation to a human user.
+ </p>
+
+ <p>
+ A
+ <dfn id="dfn-publisher">publisher</dfn>
+ is a person or organization that exposes source data as Linked Data on
+ the Web.
+ </p>
+
+ <p>
+ A
+ <dfn id="dfn-consumer">consumer</dfn>
+ is a person or agent that uses Linked Data from the Web.
+ </p>
+ <p>
+ A
+ <dfn id="dfn-registry">registry</dfn>
+ allows a publisher to announce that data or metadata exists and to add
+ information about how to obtain that data [<cite><a href="#ref-SDMX-21">SDMX 2.1</a></cite>].
+ </p>
+ </section>
+
+
+ <section id="use-cases">
+ <!--OddPage--><h2 id="usecases"><span class="secno">3. </span>Use cases</h2>
+ <p>This section presents scenarios that are enabled by the
+ existence of a standard vocabulary for the representation of
+ statistics as Linked Data.</p>
+
+ <section id="sdmx-web-dissemination-use-case">
+ <h3 id="SDMXWebDisseminationUseCase"><span class="secno">3.1 </span>SDMX Web Dissemination Use
+ Case</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from SDMX Web
+ Dissemination Use Case [<cite><a href="#ref-SDMX-21">SDMX
+ 2.1</a></cite>])
+ </span>
+ </p>
+ <p>Since we have adopted the multidimensional model that underlies
+ SDMX, we also adopt the "Web Dissemination Use Case" which is the
+ prime use case for SDMX since it is an increasingly popular use of SDMX
+ and enables organizations to build a self-updating dissemination
+ system.</p>
+ <p>The Web Dissemination Use Case contains three actors, a
+ structural metadata Web service (registry) that collects metadata
+ about statistical data in a registration fashion, a data Web service
+ (publisher) that publishes statistical data and its metadata as
+ registered in the structural metadata Web service, and a data
+ consumption application (consumer) that first discovers data from the
+ registry, then queries data from the corresponding publisher of
+ selected data, and then visualizes the data.</p>
+
+ <h3 id="benefits">Benefits</h3>
+ <ul>
+ <li>A structural metadata source (registry) can collect metadata
+ about statistical data.</li>
+
+ <li>A data Web service (publisher) can register statistical data
+ in a registry, and can provide statistical data from a database and
+ metadata from a metadata repository for consumers. For that, the
+ publisher creates database tables, and loads statistical data in a
+ database and metadata in a metadata repository.</li>
+
+ <li>A consumer can discover data from a registry and
+ automatically can create a query to the publisher for selected
+ statistical data.</li>
+
+ <li>The publisher can translate the query to a query to its
+ database as well as metadata repository and return the statistical
+ data and metadata.</li>
+
+ <li>The consumer can visualize the returned statistical data and
+ metadata.</li>
+ </ul>
+
+ <h3 id="challenges">Challenges</h3>
+ <ul>
+ <li>This use case is too abstract. The SDMX Web Dissemination Use
+ Case can be concretized by several sub-use cases, detailed in the
+ following sections.</li>
+ <li>In particular, this use case requires a recommended way to
+ advertise published statistical datasets, which supports the
+ following lesson: <a href="#pubGuidance">Publishers
+ may need guidance in communicating the availability of published
+ statistical data to external parties and to allow automatic
+ discovery of statistical data</a>.
+ </li>
+
+ </ul>
+
+ </section> <section id="publisher-case-study-uk-government-financial-data-from-combined-online-information-system-coins">
+ <h3 id="UKgovernmentfinancialdatafromCombinedOnlineInformationSystem"><span class="secno">3.2 </span>Publisher
+ Case Study: UK government financial data from Combined Online
+ Information System (COINS)</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has been
+ summarized from Ian Dickinson et al. [<cite><a href="#ref-COINS">COINS</a></cite>])
+ </span>
+ </p>
+ <p>More and more organizations want to publish statistics on the
+ Web, for reasons such as increasing transparency and trust. Although,
+ in the ideal case, published data can be understood by both humans and
+ machines, data often is simply published as CSV, PDF, XSL etc.,
+ lacking elaborate metadata, which makes free usage and analysis
+ difficult.</p>
+ <p>
+ Therefore, the goal in this scenario is to use a machine-readable and
+ application-independent description of common statistics, expressed using
+ open standards, to foster usage and innovation on the published data.
+ In the "COINS as Linked Data" project [<cite><a href="#ref-COINS">COINS</a></cite>], the Combined Online Information System
+ (COINS) shall be published using a standard Linked Data vocabulary.
+ Via the Combined Online Information System (COINS), <a href="http://www.hm-treasury.gov.uk/psr_coins_data.htm">HM
+ Treasury</a>, the principal custodian of financial data for the UK
+ government, releases previously restricted financial information about
+ government spending.
+ </p>
+
+ <p>The COINS data has a hypercube structure. It describes financial
+ transactions using seven independent dimensions (time, data-type,
+ department etc.) and one dependent measure (value). Also, it allows
+ thirty-three attributes that may further describe each transaction.
+ COINS is an example of one of the more complex statistical datasets
+ being publishing via data.gov.uk.</p>
+ <p>Part of the complexity of COINS arises from the nature of the
+ data being released:</p>
+ <p>The published COINS datasets cover expenditure related to five
+ different years (2005–06 to 2009–10). The actual COINS database at HM
+ Treasury is updated daily. In principle at least, multiple snapshots
+ of the COINS data could be released throughout the year.</p>
+ <p>The actual data and its hypercube structure are to be
+ represented separately so that an application first can examine the
+ structure before deciding to download the actual data, i.e., the
+ transactions. The hypercube structure also defines, for each dimension
+ and attribute, a range of permitted values that are to be represented.</p>
+ <p>An access or query interface to the COINS data, e.g., via a
+ SPARQL endpoint or the linked data API, is planned. Queries that are
+ expected to be interesting are: "spending for one department", "total
+ spending by department", "retrieving all data for a given observation"
+ etc.</p>
+
+ <h3 id="benefits-1">Benefits</h3>
+ <p>According to the COINS as Linked Data project, the reason for
+ publishing COINS as Linked Data are threefold:</p>
+
+ <ul>
+ <li>using an open standard representation makes it easier to work
+ with the data using available technologies and promises innovative
+ third-party tools and usages;</li>
+ <li>individual transactions and groups of transactions are given
+ an identity, and so can be referenced by Web address (URL), to allow
+ them to be discussed, annotated, or listed as source data for
+ articles or visualizations;</li>
+ <li>cross-links between linked-data datasets allow for much
+ richer exploration of related datasets.</li>
+ </ul>
+
+ <h3 id="challenges-1">Challenges</h3>
+
+ <p>The COINS use case leads to the following challenges:</p>
+ <ul>
+ <li>Although not originally intended, the Data Cube Vocabulary
+ could be successfully used for publishing financial data, not just
+ statistics. This has also been shown by the <a href="http://data.gov.uk/resources/payments">Payments Ontology</a>.
+ </li>
+ <li>Also, the publisher favors a representation that is both as
+ self-descriptive as possible, i.e., others can link to and download
+ fully-described individual transactions, and as compact as possible,
+ i.e., information is not unnecessarily repeated. This challenge
+ supports lesson: <a href="#criteriaForWell">Publishers
+ and consumers may need guidance in checking and making use of
+ well-formedness of published data using data cube</a>.
+ </li>
+ <li>Moreover, the publisher is thinking about the possible
+ benefit of publishing slices of the data, e.g., datasets that fix all
+ dimensions but the time dimension. For instance, such slices could be
+ particularly interesting for visualizations or comments. However,
+ depending on the number of dimensions, the number of possible slices
+ can become large which makes it difficult to semi-automatically
+ select all interesting slices. This challenge supports lesson: <a href="#clarify">Publishers
+ may need more guidance in creating and managing slices or arbitrary
+ groups of observations</a>.
+ </li>
+ <li>An important benefit of linked data is that we are able to
+ annotate data, at a fine-grained level of detail, to record
+ information about the data itself. This includes where it came from —
+ the provenance of the data — but could include annotations from
+ reviewers, links to other useful resources, etc. Being able to trust
+ that data to be correct and reliable is a central value for
+ government-published data, so recording provenance is a key
+ requirement for the COINS data. For instance, the COINS project [<cite><a href="#ref-COINS">COINS</a></cite>] has at least four perspectives on what
+ they mean by “COINS” data: the abstract notion of “all of COINS”; the
+ data for a particular year; the version of the data for a particular
+ year released on a given date; and the constituent graphs which hold
+ both the authoritative data translated from HMT’s own sources and
+ additional supplementary information which they derive from the data,
+ for example by cross-linking to other datasets. This challenge
+ supports lesson: <a href="#declaringRel">Publishers
+ may need guidance in making transparent the pre-processing of
+ aggregate statistics</a>.
+ </li>
+ <li>A challenge also is the size of the data, especially since it
+ is updated regularly. Five data files already contain between 3.3 and
+ 4.9 million rows of data. This challenge supports lesson: <a href="#mechRec">Publishers
+ and consumers may need more guidance in efficiently processing data
+ using the Data Cube Vocabulary</a>.
+ </li>
+ </ul>
+
+ </section> <section id="publisher-use-case-publishing-excel-spreadsheets-about-dutch-historical-census-data-as-linked-data">
+ <h3 id="PublishingExcelSpreadsheetsasLinkedData"><span class="secno">3.3 </span>Publisher Use
+ Case: Publishing Excel Spreadsheets about Dutch historical census data
+ as Linked Data</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has been
+ contributed by Rinke Hoekstra. See <a href="http://ehumanities.nl/ceda_r/">CEDA_R</a> and <a href="http://www.data2semantics.org/">Data2Semantics</a> for more
+ information.)
+ </span>
+ </p>
+ <p>Not only in government, there is a need to publish considerable
+ amounts of statistical data to be consumed in various (also
+ unexpected) application scenarios. Typically, Microsoft Excel sheets
+ are made available for download.</p>
+ <p>
+ For instance, in the <a href="http://ehumanities.nl/ceda_r/">CEDA_R</a>
+ and <a href="http://www.data2semantics.org/">Data2Semantics</a>
+ projects publishing and harmonizing Dutch historical census data (from
+ 1795 onwards) is a goal. These censuses are now only available as
+ Excel spreadsheets (obtained by data entry) that closely mimic the way
+ in which the data was originally published and shall be published as
+ Linked Data.
+ </p>
+
+ <p>Those Excel sheets contain single spreadsheets with several
+ multidimensional data tables, having a name and notes, as well as
+ column values, row values, and cell values.</p>
+
+ <p>Another concrete example is the <a href="http://ontowiki.net/Projects/Stats2RDF?show_comments=1">Stats2RDF</a>
+ project that intends to publish Excel sheets with biomedical statistical data.
+ Here, Excel files are first translated
+ into CSV and then translated into RDF using OntoWiki, a semantic wiki.
+ </p>
+
+ <h3 id="benefits-2">Benefits</h3>
+ <ul>
+ <li>The goal in this use case is to publish spreadsheet
+ information in a machine-readable format on the Web, e.g., so that
+ crawlers can find spreadsheets that use a certain column value. The
+ published data should represent and make available for queries the
+ most important information in the spreadsheets, e.g., rows, columns,
+ and cell values.</li>
+ <li>All context and so all meaning of the measurement point is
+ expressed by means of dimensions. The pure number is the star of an
+ ego-network of attributes or dimensions. In an RDF representation it
+ is then easily possible to define hierarchical relationships between
+ the dimensions (that can be exemplified further) as well as mapping
+ different attributes across different value points. This way a
+ harmonization among variables is performed around the measurement
+ points themselves.</li>
+ <li>Integration with provenance vocabularies, e.g.,
+ PROV-O, for tracking of harmonization steps becomes possible.</li>
+ <li>Once data representation and publication is standardised, consumers can focus on novel
+ visualizations and analysis interfaces of census data.</li>
+ <li>In historical research, until now, harmonization across
+ datasets is performed by hand, and in subsequent iterations of a
+ database: it is very hard to trace back the provenance of decisions
+ made during the harmonization procedure. Publishing the census data
+ as Linked Data may allow (semi-)automatic harmonization.</li>
+ </ul>
+ <h3 id="challenges-2">Challenges</h3>
+
+ <ul>
+ <li>Semi-structured information, e.g., notes about lineage of
+ data cells, may not be possible to be formalized. This supports
+ lesson <a href="#declaringRel">Publishers
+ may need guidance in making transparent the pre-processing of
+ aggregate statistics</a>.
+ </li>
+ <li>Combining Data Cube with SKOS [<cite><a href="#ref-skos">SKOS</a></cite>] to allow for cross-location and
+ cross-time historical analysis, supporting lesson <a href="#heirarchic">Publishers
+ may need more guidance to decide which representation of hierarchies
+ is most suitable for their use case</a>.
+ </li>
+ <li>These challenges may seem to be particular to the field of
+ historical research, but in fact apply to government information at
+ large. Government is not a single body that publishes information at
+ a single point in time. Government consists of multiple (altering)
+ bodies, scattered across multiple levels, jurisdictions and areas.
+ Publishing government information in a consistent, integrated manner
+ requires exactly the type of harmonization required in this use case.</li>
+ <li>Define a mapping between Excel and the Data Cube Vocabulary.
+ Excel spreadsheets are representative for other common representation
+ formats for statistics such as CSV, XBRL, ARFF, which supports lesson
+ <a href="#excelCSV">Publishers
+ may need guidance in conversions from common statistical
+ representations such as CSV, Excel, ARFF etc.</a>
+ </li>
+ <li>Excel sheets provide a great deal of flexibility in arranging
+ information. It may be necessary to limit this flexibility to allow
+ automatic transformation.</li>
+ <li>There may be many spreadsheets which supports lesson <a href="#mechRec">Publishers
+ and consumers may need more guidance in efficiently processing data
+ using the Data Cube Vocabulary</a>.</li>
+
+ </ul>
+
+ </section> <section id="publisher-use-case-publishing-hierarchically-structured-data-from-statswales-and-open-data-communities">
+ <h3 id="PublishinghierarchicallystructureddatafromStatsWalesandOpenDataCommunities"><span class="secno">3.4 </span>Publisher
+ Use Case: Publishing hierarchically structured data from StatsWales
+ and Open Data Communities</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case has been taken from [<cite><a href="#ref-QB4OLAP">QB4OLAP</a></cite>] and from discussions at <a href="http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f">publishing-statistical-data
+ mailing list</a>)
+ </span>
+ </p>
+
+ <p>It often comes up in statistical data that you have some kind of
+ 'overall' figure, which is then broken down into parts.</p>
+
+ <p>Example (in pseudo-turtle RDF):</p>
+ <pre>ex:obs1
+ sdmx:refArea <uk>;
+ sdmx:refPeriod "2011";
+ ex:population "60" .
+ex:obs2
+ sdmx:refArea <england>;
+ sdmx:refPeriod "2011";
+ ex:population "50" .
+ex:obs3
+ sdmx:refArea <scotland>;
+ sdmx:refPeriod "2011";
+ ex:population "5" .
+ex:obs4
+ sdmx:refArea <wales>;
+ sdmx:refPeriod "2011";
+ ex:population "3" .
+ex:obs5
+ sdmx:refArea <northernireland>;
+ sdmx:refPeriod "2011";
+ ex:population "2" . </pre>
+
+ <p>
+ We are looking for the best way (in the context of the RDF/Data
+ Cube/SDMX approach) to express that the values for
+ England, Scotland, Wales & Northern Ireland ought to add up to the value
+ for the UK and constitute a more detailed breakdown of the overall UK
+ figure. Since we might also have population figures for France,
+ Germany, EU28 etc., it is not as simple as just taking a
+ <code>qb:Slice</code> where you fix the time period and the measure.
+ </p>
+
+ <p>Similarly, Etcheverry and Vaisman [<cite><a href="#ref-QB4OLAP">QB4OLAP</a></cite>]
+ present the use case to publish household data from <a href="http://statswales.wales.gov.uk/index.htm">StatsWales</a> and <a href="http://opendatacommunities.org/doc/dataset/housing/household-projections">Open
+ Data Communities</a>.
+ </p>
+
+ <p>This multidimensional data contains for each fact a time
+ dimension with one level Year and a location dimension with levels
+ Unitary Authority, Government Office Region, Country, and ALL. As
+ unit, units of 1000 households is used.</p>
+
+ <p>In this use case, one wants to publish not only a dataset on the
+ bottom most level, i.e., what are the number of households at each
+ Unitary Authority in each year, but also a dataset on more aggregated
+ levels. For instance, in order to publish a dataset with the number of
+ households at each Government Office Region per year, one needs to
+ aggregate the measure of each fact having the same Government Office
+ Region using the SUM function.</p>
+
+ <p>Similarly, for many uses then population broken down by some
+ category (e.g., ethnicity) is expressed as a percentage. Separate
+ datasets give the actual counts per category and aggregate counts. In
+ such cases it is common to talk about the denominator (often DENOM)
+ which is the aggregate count against which the percentages can be
+ interpreted.</p>
+
+ <h3 id="benefits-3">Benefits</h3>
+ <ul>
+ <li>Expressing aggregation relationships would allow query
+ engines to automatically derive statistics on higher aggregation
+ levels.</li>
+ <li>Vice versa, representing further aggregated datasets would
+ allow the answering of queries with a simple lookup instead of computations
+ which may be more time consuming or require specific features of the
+ query engine (e.g., SPARQL 1.1).</li>
+ </ul>
+
+
+ <h3 id="challenges-3">Challenges</h3>
+ <ul>
+ <li>Importantly, one would like to maintain the relationship
+ between the resulting datasets, i.e., the levels and aggregation
+ functions. Again, this use case does not simply need a selection (or
+ "dice" in OLAP context) where one fixes the time period dimension, but includes aggregation.
+ This supports lesson <a href="#aggregations">Publishers
+ may need guidance in how to represent common analytical operations
+ such as Slice, Dice, Rollup on data cubes</a>
+ </li>
+ <li>Literals that are used in observations cannot be used as
+ subjects in triples. So no hierarchies can be defined that would, for
+ example, link integer years via skos:narrower to months. This supports
+ lesson <a href="#heirarchic">Publishers
+ may need more guidance to decide which representation of hierarchies
+ is most suitable for their use case</a>.
+ </li>
+ </ul>
+
+ </section> <section id="publisher-case-study-publishing-observational-data-sets-about-uk-bathing-water-quality">
+ <h3 id="PublishingslicesofdataaboutUKBathingWaterQuality"><span class="secno">3.5 </span>Publisher
+ Case Study: Publishing Observational Data Sets about UK Bathing Water
+ Quality</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case has been provided by
+ Epimorphics Ltd, in their <a href="http://www.epimorphics.com/web/projects/bathing-water-quality">UK
+ Bathing Water Quality</a> deployment)
+ </span>
+ </p>
+ <p>
+ As part of their work with data.gov.uk and the UK Location Programme,
+ Epimorphics Ltd have been working to pilot the publication of both
+ current and historic bathing water quality information from the <a href="http://www.environment-agency.gov.uk/">UK Environment
+ Agency</a> as Linked Data.
+ </p>
+ <p>The UK has a number of areas, typically beaches, that are
+ designated as bathing waters where people routinely enter the water.
+ The Environment Agency monitors and reports on the quality of the
+ water at these bathing waters.</p>
+ <p>The Environment Agency's data can be thought of as structured
+ in 3 groups:</p>
+ <ul>
+ <li>basic reference data describing the bathing waters
+ and sampling points;</li>
+ <li>"Annual Compliance Assessment Dataset"
+ giving the rating for each bathing water for each year it has been
+ monitored;</li>
+ <li>"In-Season Sample Assessment Dataset"
+ giving the detailed weekly sampling results for each bathing water.</li>
+ </ul>
+ <p>The most important dimensions of the data are bathing water,
+ sampling point, and compliance classification.</p>
+
+ <h3 id="benefits-4">Benefits</h3>
+ <ul>
+ <li>The bathing-water dataset (documentation) is structured
+ around the use of the Data Cube Vocabulary and fronted by a linked
+ data API configuration which makes the data available for re-use in
+ additional formats such as JSON and CSV.</li>
+ <li>Publishing bathing-water quality information in this way will
+ 1) enable the Environment Agency to meet the needs of its many data
+ consumers in a uniform way rather than through diverse pair-wise
+ arrangements 2) preempt requests for specific data and 3) enable a
+ larger community of Web and mobile application developers and
+ value-added information aggregators to use and re-use bathing-water
+ quality information sourced by the environment agency.</li>
+ </ul>
+
+ <h3 id="challenges-4">Challenges</h3>
+ <ul>
+ <li>Observations may exhibit a number of attributes, e.g.,
+ whether there was an abnormal weather exception.</li>
+ <li>Relevant slices of both datasets are to be created, which
+ supports lesson <a href="#clarify">Publishers
+ may need more guidance in creating and managing slices or arbitrary
+ groups of observations</a>:
+ <ul>
+ <li>Annual Compliance Assessment Dataset: all the observations
+ for a specific sampling point, all the observations for a specific
+ year.</li>
+ <li>In-Season Sample Assessment Dataset: samples for a given
+ sampling point, samples for a given week, samples for a given year,
+ samples for a given year and sampling point, latest samples for
+ each sampling point.</li>
+ <li>The use case suggests more arbitrary subsets of the
+ observations, e.g., collecting all the "latest" observations in a
+ continuously updated data set.</li>
+ </ul>
+ </li>
+ <li>In this use case, observation and measurement data is to be
+ published which <i>per se</i> is not aggregated statistics. The <a href="http://purl.oclc.org/NET/ssnx/ssn">Semantic Sensor Network
+ ontology</a> (SSN) already provides a way to publish sensor information.
+ SSN data provides statistical Linked Data and grounds its data to the
+ domain, e.g., sensors that collect observations (e.g., sensors
+ measuring average of temperature over location and time). Still, this
+ case study has shown that the Data Cube Vocabulary may be a useful
+ alternative and can be successfully used for observation and
+ measurement data, as well as statistical data.
+ </li>
+ </ul>
+ </section> <section id="publisher-case-study-site-specific-weather-forecasts-from-met-office-the-uk-s-national-weather-service">
+ <h3 id="MetOfficeCaseStudy"><span class="secno">3.6 </span>Publisher Case Study: Site specific
+ weather forecasts from Met Office, the UK's National Weather Service</h3>
+ <span style="font-size: 10pt">(This section contributed by Dave
+ Reynolds)</span>
+
+ <p>The Met Office, the UK's National Weather Service, provides a
+ range of weather forecast products including openly available
+ site-specific forecasts for the UK. The site specific forecasts cover
+ over 5000 forecast points, each forecast predicts 10 parameters and
+ spans a 5 day window at 3 hourly intervals, the whole forecast is
+ updated each hour. A proof of concept project investigated the
+ challenge of publishing this information as linked data using the Data
+ Cube vocabulary.</p>
+
+ <h3 id="benefits-5">Benefits</h3>
+ <ul>
+ <li>Explicit metadata describing the forecast process, coverage
+ and phenomena being forecast; making the data self-describing.</li>
+
+ <li>Linking to other linked data resources (particularly
+ geographic regions and named places associated with the forecast
+ locations) enabling discovery of related data.</li>
+
+ <li>Ability to define slices through the data for convenient
+ consumption by applications.</li>
+ </ul>
+
+ <h3 id="challenges-5">Challenges</h3>
+
+ <p>This weather forecasts case study leads to the following
+ challenges:</p>
+
+ <h3 id="iso19156-compatibility">ISO19156 compatibility</h3>
+
+ <p>
+ The World Meteorological Organization (WMO) develops and recommends
+ data interchange standard and within that community compatibility with
+ ISO19156 <em>"Geographic information — Observations and
+ measurements"</em> (O&M) is regarded as important. Thus, this supports
+ lesson <a href="#relToSO19156">Modelers
+ using ISO19156 - Observations & Measurements may need
+ clarification regarding the relationship to the Data Cube Vocabulary</a>.
+ </p>
+ <b>Solution in this case study:</b>
+ <p>O&M provides a data model for an Observation with associated
+ Phenomenon, measurement ProcessUsed, Domain (feature of interest) and
+ Result. Prototype vocabularies developed at CSIRO and extended within
+ this project allow this data model to be represented in RDF. For the
+ site specific forecasts then a 5-day forecast for all 5000+ sites is
+ regarded as a single O&M Observation.</p>
+ <p>
+ To represent the forecast data itself, the Result in the O&M
+ model, then the relevant standard is ISO19123 <em>"Geographic
+ information — Schema for coverage geometry and functions"</em>. This
+ provides a data model for a Coverage which can represent a set of
+ values across some space. It defines different types of Coverage
+ including a DiscretePointCoverage suited to representing site-specific
+ forecast results.
+ </p>
+ <p>It turns out that it is straightforward to treat an RDF Data
+ Cube as a particular concrete representation of the
+ DiscretePointCoverage logical model. The cube has dimensions
+ corresponding to the forecast time and location and the measure is a
+ record representing the forecast values of the 10 phenomena. Slices by
+ time and location provide subsets of the data that directly match the
+ data packages supported by an existing on-line service.</p>
+ <p>
+ Note that in this situation an <em>observation</em> in the sense of
+ <code>qb:Observation</code>
+ and an <em>observation</em> in the sense of ISO19156 Observations and
+ Measurements are different things. The O&M Observation is the
+ whole forecast whereas each
+ <code>qb:Observation</code>
+ corresponds to a single GeometryValuePair within the forecast results
+ Coverage.
+ </p>
+
+ <p></p>
+ <h3 id="data-volume">Data volume</h3>
+ <p>
+ Each hourly update comprises over 2 million data points and forecast
+ data is requested by a large number of data consumers. Bandwidth costs
+ are thus a key consideration and the apparent verbosity of RDF in
+ general, and Data Cube specifically, was a concern. This supports
+ lesson <a href="#mechRec">
+ Publishers and consumers may need more guidance in efficiently
+ processing data using the Data Cube Vocabulary</a>.
+ </p>
+ <b>Solution in this case study:</b>
+ <p>Regarding bandwidth costs then the key is not raw data volume
+ but compressibility, since such data is transmitted in compressed
+ form. A Turtle representation of a non-abbreviated data cube
+ compressed to within 15-20% of the size of compressed, handcrafted XML
+ and JSON representations. Thus obviating the need for abbreviations or
+ custom serialization.</p>
+
+ </section> <section id="publisher-case-study-eurostat-sdmx-as-linked-data">
+ <h3 id="EurostatSDMXasLinkedData"><span class="secno">3.7 </span>Publisher Case Study: Eurostat
+ SDMX as Linked Data</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has been taken
+ from <a href="http://estatwrap.ontologycentral.com/">Eurostat
+ Linked Data Wrapper</a> and <a href="http://eurostat.linked-statistics.org/">Linked Statistics
+ Eurostat Data</a>, both deployments for publishing Eurostat SDMX as
+ Linked Data using the draft version of the Data Cube Vocabulary)
+ </span>
+ </p>
+
+ <p>
+ As mentioned already, the ISO standard for exchanging and sharing
+ statistical data and metadata among organizations is Statistical Data
+ and Metadata eXchange [<cite><a href="#ref-SDMX">SDMX</a></cite>].
+ Since this standard has proven applicable in many contexts, we adopt
+ the multidimensional model that underlies SDMX and intend the standard
+ vocabulary to be compatible to SDMX. Therefore, in this use case we
+ explain the benefit and challenges of publishing SDMX data as Linked
+ Data.
+ </p>
+
+ <p>
+ As one of the main adopters of SDMX, <a href="http://epp.eurostat.ec.europa.eu/">Eurostat</a> publishes large
+ amounts of European statistics coming from a data warehouse as SDMX
+ and other formats on the Web. Eurostat also provides an interface to
+ browse and explore the datasets. However, linking such
+ multidimensional data to related data sets and concepts would require
+ downloading of interesting datasets and manual integration. The goal
+ here is to improve integration with other datasets; Eurostat data
+ should be published on the Web in a machine-readable format, possibly
+ to be linked with other datasets, and possibly to be freely consumed
+ by applications. Both <a href="http://estatwrap.ontologycentral.com/">Eurostat
+ Linked Data Wrapper</a> and <a href="http://eurostat.linked-statistics.org/">Linked Statistics
+ Eurostat Data</a> intend to publish <a href="http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/">Eurostat
+ SDMX data</a> as <a href="http://www.w3.org/TR/ld-glossary/#x5-star-linked-open-data">5 Star Linked Open
+ Data</a>. Eurostat data is partly published as SDMX, partly as tabular
+ data (TSV, similar to CSV). Eurostat provides a <a href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">TOC
+ of published datasets</a> as well as a feed of modified and new datasets.
+
+ Eurostat provides a list of used code lists, i.e., <a href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&dir=dic">range
+ of permitted dimension values</a>. Any Eurostat dataset contains a
+ varying set of dimensions (e.g., date, geo, obs_status, sex, unit) as
+ well as measures (generic value, content is specified by dataset,
+ e.g., GDP per capita in PPS, Total population, Employment rate by
+ sex).
+ </p>
+
+
+ <h3 id="benefits-6">Benefits</h3>
+
+ <ul>
+ <li>Possible implementation of ETL pipelines based on Linked Data
+ technologies (e.g., <a href="http://code.google.com/p/ldspider/">LDSpider</a>)
+ to effectively load the data into a data warehouse for analysis.
+ </li>
+
+ <li>Allows useful queries to the data, e.g., comparison of
+ statistical indicators across EU countries.</li>
+
+ <li>Allows one to attach contextual information to statistics during
+ the interpretation process.</li>
+
+ <li>Allows one to reuse single observations from the data.</li>
+
+ <li>Linking to information from other data sources, e.g., for
+ geo-spatial dimension.</li>
+ </ul>
+
+ <h3 id="challenges-6">Challenges</h3>
+
+ <ul>
+ <li>There is a large number of Eurostat datasets, each possibly
+ containing a large number of columns (dimensions) and rows
+ (observations). Eurostat publishes more than 5200 datasets, which,
+ when converted into RDF require more than 350GB of disk space
+ yielding a dataspace with some 8 billion triples. This supports
+ lesson <a href="#mechRec">
+ Publishers and consumers may need more guidance in efficiently
+ processing data using the Data Cube Vocabulary.</a>
+ </li>
+
+ <li>In the Eurostat Linked Data Wrapper, there is a timeout for
+ transforming SDMX to Linked Data, since Google App Engine is used.
+ Mechanisms to reduce the amount of data that needs to be translated
+ would be needed, again supporting lesson <a href="#mechRec">
+ Publishers and consumers may need more guidance in efficiently
+ processing data using the Data Cube Vocabulary.</a>
+ </li>
+
+ <li>Each dimension used by a dataset has a range of permitted
+ values that need to be described.</li>
+
+ <li>The Eurostat SDMX as Linked Data use case provides data on a
+ gender level and on a level aggregating over the gender level. This
+ suggests a need to have time lines on data aggregating over the gender
+ dimension, supporting the lesson: <a href="#aggregations">
+ Publishers may need guidance in how to represent common analytical
+ operations such as Slice, Dice, Rollup on data cubes</a>.
+ </li>
+
+ <li>New Eurostat datasets are added regularly to Eurostat. The
+ Linked Data representation should automatically provide access to the
+ most-up-to-date data:
+
+ <ul>
+ <li>Eurostat Linked Data pulls in changes from the original
+ Eurostat dataset on a weekly basis and the conversion process runs
+ every Saturday at noon taking into account new datasets along with
+ updates to existing datasets.</li>
+ <li>Eurostat Linked Data Wrapper translates Eurostat
+ datasets into RDF on the fly so that the most current data is always used. The
+ problem is only to point users towards the URIs of Eurostat
+ datasets: Estatwrap provides a feed of modified and new <a href="http://estatwrap.ontologycentral.com/feed.rdf">datasets</a>.
+ Also, it provides a <a href="http://estatwrap.ontologycentral.com/table_of_contents.html">TOC</a>
+ that could be automatically updated from the <a href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">Eurostat
+ TOC</a>.
+ </li>
+ </ul>
+
+
+ </li>
+
+ <li>Query interface</li>
+
+ <ul>
+ <li>Eurostat Linked Data provides a SPARQL endpoint for the
+ metadata (not the observations).</li>
+ <li>Eurostat Linked Data Wrapper provides resolvable URIs to
+ datasets (ds) that return all observations of the dataset. Also,
+ every dataset serves the URI of its data structure definition (dsd).
+ The dsd URI returns all RDF describing the dataset. Separating
+ information resources for dataset and data structure definition
+ allows one, for example, to first gather the dsd and only for actual query
+ execution to resolve the ds.</li>
+ </ul>
+
+ <li>Providing a useful interface for browsing and visualizing the
+ data:
+ <ul>
+
+ <li>One problem is that the data sets have too high dimensionality
+ to be displayed directly. Instead, one could visualize slices of time
+ series data. However, for that, one would need to either fix most
+ other dimensions (e.g., sex) or aggregate over them (e.g., via
+ average). The selection of useful slices from the large number of
+ possible slices is a challenge. This supports lesson <a href="#clarify">
+ Publishers may need more guidance in creating and managing slices or
+ arbitrary groups of observations</a>.
+ </li>
+
+ <li>Eurostat Linked Data Wrapper provides for each dataset an
+ HTML page showing a JavaScript-based visualization of the data.
+ This also supports lesson <a href="#consumers">
+ Consumers may need guidance in conversions into formats that can
+ easily be displayed and further investigated in tools such as
+ Google Data Explorer, R, Weka etc.</a>
+ </li>
+ </ul>
+
+
+ </li>
+ <li>One possible application would run validation checks over
+ Eurostat data. However, the Data Cube Vocabulary is designed to publish
+ statistical data as-is and is not intended to represent information
+ for validation (similar to business rules).</li>
+ <li>An application could try to automatically match elements of
+ the geo-spatial dimension to elements of other data sources, e.g.,
+ NUTS, GADM. In Eurostat Linked Data wrapper this is done by simple
+ URI guessing from external data sources. Automatic linking datasets
+ or linking datasets with metadata is not part of Data Cube
+ Vocabulary.</li>
+ <li>The draft version of the Data Cube Vocabulary builds upon SDMX Standards Version 2.0.
+A newer version of SDMX, SDMX Standards, Version 2.1, is available which might be used by
+Eurostat in the future which supports lesson <a href="#putative">
+ There is a putative requirement to update to SDMX 2.1 if there are specific use cases that demand it</a>.</li>
+ </ul>
+
+ </section> <section id="publisher-case-study-improving-trust-in-published-sustainability-information-at-the-digital-enterprise-research-institute-deri">
+ <h3 id="Representingrelationshipsbetweenstatisticaldata"><span class="secno">3.8 </span>Publisher
+ Case Study: Improving trust in published sustainability information at
+ the Digital Enterprise Research Institute (DERI)</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has mainly been
+ taken from [<cite><a href="#ref-COGS">COGS</a></cite>])
+ </span>
+ </p>
+
+ <p>In several applications, relationships between statistical data
+ need to be represented.</p>
+
+ <p>The goal of this use case is to describe provenance,
+ transformations, and versioning around statistical data, so that the
+ history of statistics published on the Web becomes clear. This may
+ also relate to the issue of having relationships between datasets
+ published.</p>
+
+ <p>
+ A concrete example is given by Freitas et al. [<cite><a href="#ref-COGS">COGS</a></cite>], where transformations on financial
+ datasets, e.g., the addition of derived measures, conversion of units,
+ aggregations, OLAP operations, and enrichment of statistical data are
+ executed on statistical data before showing them in a Web-based
+ report.
+ </p>
+
+ <p>
+ See <a href="http://treo.deri.ie/cogs/example/swpm2012.htm">SWPM
+ 2012 Provenance Example</a> for screenshots about this use case.
+ </p>
+
+ <h3 id="benefits-7">Benefits</h3>
+
+ <p>Making transparent the transformation a dataset has been exposed
+ to increases trust in the data.</p>
+
+ <h3 id="challenges-7">Challenges</h3>
+
+ <ul>
+ <li>Operations on statistical data result in new statistical
+ data, depending on the operation. For instance, in terms of the Data
+ Cube Vocabulary, operations such as slice, dice, roll-up, drill-down will result
+ in new data cubes. This may require representing general
+ relationships between cubes (as discussed in the <a href="http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/75762788de10de95">publishing-statistical-data
+ mailing list</a>).
+ </li>
+ <li>Should the Data Cube Vocabulary support explicit declaration of such
+ relationships either between separated qb:DataSets or between
+ measures with a single <code>qb:DataSet</code> (e.g., <code>ex:populationCount</code>
+ and <code>ex:populationPercent</code>)?
+ </li>
+ <li>If so should that be scoped to simple, common relationships
+ like DENOM or allow expression of arbitrary mathematical relations?</li>
+
+ <li>This use case opens up questions regarding versioning of
+ statistical Linked Data. Thus, there is a possible relation to the <a href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary#Versioning">Versioning</a>
+ part of GLD Best Practices Document, where it is specified how to
+ publish data which has multiple versions.
+ </li>
+ <li>In this use case, the <a href="http://sites.google.com/site/cogsvocab/">COGS</a> vocabulary [<cite><a href="#ref-COGS">COGS</a></cite>] has shown to complement the Data Cube
+ Vocabulary with respect to representing ETL pipelines processing statistics.
+ This supports lesson <a href="#declaringRel">
+ Publishers may need guidance in making transparent the
+ pre-processing of aggregate statistics</a>.
+ </li>
+ </ul>
+
+ </section> <section id="consumer-case-study-simple-chart-visualizations-of-integrated-published-climate-sensor-data">
+ <h3 id="Simplechartvisualisationsofpublishedstatisticaldata"><span class="secno">3.9 </span>Consumer
+ Case Study: Simple chart visualizations of (integrated) published
+ climate sensor data</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a href="http://www.iwrm-smart.org/">SMART natural sciences research
+ project</a>)
+ </span>
+ </p>
+
+ <p>Data that is published on the Web is typically visualized by
+ transforming it manually into CSV or Excel and then creating a
+ visualization on top of these formats using Excel, Tableau,
+ RapidMiner, Rattle, Weka etc.</p>
+ <p>This use case shall demonstrate how statistical data published
+ on the Web can be visualized inside a webpage with little effort and without
+ using commercial or highly-complex tools.</p>
+ <p>
+ An example scenario is environmental research done within the <a href="http://www.iwrm-smart.org/">SMART research project</a>. Here,
+ statistics about environmental aspects (e.g., measurements about the
+ climate in the Lower Jordan Valley) shall be visualized for scientists
+ and decision makers. Statistics should also be possible to be
+ integrated and displayed together. The data is available as XML files
+ on the Web which are re-published as Linked Data using the Data Cube
+ Vocabulary. On a separate website, specific parts of the data shall be
+ queried and visualized in simple charts, e.g., line diagrams.
+ </p>
+
+ <p class="caption">Figure 1: HTML embedded line chart of an
+ environmental measure over time for three regions in the lower Jordan
+ valley</p>
+
+ <p align="center">
+ <img alt="display of an environmental measure over time for three regions in the lower Jordan valley" src="index_files/Level_above_msl_3_locations.png" width="1000px">
+ </p>
+
+ <p class="caption">Figure 2: Showing the same data in a pivot table
+ aggregating to single months. Here, the aggregate COUNT of measures
+ per cell is given.</p>
+ <p align="center">
+ <img alt="Figure: Showing the same data in a pivot
+ table aggregating to single months. Here, the aggregate COUNT of measures per cell is given." src="index_files/pivot_analysis_measurements.PNG">
+ </p>
+ <h3 id="benefits-8">Benefits</h3>
+ <p>Easy, flexible and powerful visualizations of published
+ statistical data.</p>
+
+ <h3 id="challenges-8">Challenges</h3>
+ <ul>
+ <li>The difficulties lay in structuring the data appropriately so
+ that the specific information can be queried. This supports lesson: <a href="#criteriaForWell">
+ Publishers and consumers may need guidance in checking and making
+ use of well-formedness of published data using data cube</a>.
+ </li>
+ <li>Also, data shall be published with potential
+ integration in mind. Therefore, e.g., units of measurements need to
+ be represented.</li>
+ <li>Integration becomes much more difficult if publishers use
+ different measures/dimensions.</li>
+ </ul>
+
+ </section> <section id="consumer-use-case-visualizing-published-statistical-data-in-google-public-data-explorer">
+ <h3 id="consumer-use-case-visualising-published-statistical-data-in-google-public-data-explorer"><span class="secno">3.10 </span>Consumer
+ Use Case: Visualizing published statistical data in Google Public Data
+ Explorer</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a href="http://code.google.com/apis/publicdata/">Google Public Data
+ Explorer (GPDE)</a>)
+ </span>
+ </p>
+ <p>
+ <a href="http://code.google.com/apis/publicdata/">Google Public
+ Data Explorer</a> (GPDE) provides an easy possibility to visualize and
+ explore statistical data. Data needs to be in the <a href="https://developers.google.com/public-data/overview">Dataset
+ Publishing Language</a> (DSPL) to be uploaded to the data explorer. A
+ DSPL dataset is a bundle that contains an XML file, the schema, and a
+ set of CSV files, the actual data. Google provides a tutorial to
+ create a DSPL dataset from your data, e.g., in CSV. This requires a
+ good understanding of XML, as well as a good understanding of the data
+ that shall be visualized and explored.
+ </p>
+ <p>In this use case, the goal is to take statistical data published
+ as Linked Data re-using the Data Cube Vocabulary and to transform it
+ into DSPL for visualization and exploration using GPDE with as few
+ effort as possible.</p>
+ <p>For instance, Eurostat data about Unemployment rate downloaded
+ from the Web as shown in the following figure:</p>
+
+ <p class="caption">Figure 3: An interactive chart in GPDE for
+ visualizing Eurostat data described with DSPL</p>
+ <p align="center">
+ <img alt="An interactive chart in GPDE for visualising Eurostat data in the DSPL" src="index_files/Eurostat_GPDE_Example.png" width="1000px">
+ </p>
+
+ <p>There are different possible approaches each having advantages
+ and disadvantages: 1) A customer C is downloading this data into a
+ triple store; SPARQL queries on this data can be used to transform the
+ data into DSPL and uploaded and visualized using GPDE. 2) or, one or
+ more XLST transformation on the RDF/XML transforms the data into DSPL.</p>
+
+ <h3 id="benefits-9">Benefits</h3>
+ <ul>
+ <li>Easy to visualize statistics published using the Data Cube Vocabulary.</li>
+ <li>There could be a process of first transforming data into RDF
+ for further preprocessing and integration and then of loading it into
+ GPDE for visualization.</li>
+ <li>Linked Data could provide the way to automatically load data
+ from a data source whereas GPDE is only for visualization.</li>
+ </ul>
+ <h3 id="challenges-9">Challenges</h3>
+ <ul>
+ <li>The technical challenges for the consumer here lay in knowing
+ where to download what data and how to get it transformed into DSPL
+ without knowing the data. This supports lesson <a href="#criteriaForWell">
+ Publishers and consumers may need guidance in checking and making
+ use of well-formedness of published data using data cube</a>.
+ </li>
+ <li>Define a mapping between Data Cube and DSPL. DSPL is
+ representative for using statistical data published on the Web in
+ available tools for analysis. Similar tools that may additionally be
+ covered are: Weka (arff data format), Tableau, SPSS, STATA, PC-Axis
+ etc. This supports lesson <a href="#consumers">
+ Consumers may need guidance in conversions into formats that can
+ easily be displayed and further investigated in tools such as Google
+ Data Explorer, R, Weka etc.</a>.
+ </li>
+ </ul>
+
+ </section> <section id="consumer-case-study-analyzing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">
+ <h3 id="AnalysingpublishedstatisticaldatawithcommonOLAPsystems"><span class="secno">3.11 </span>Consumer
+ Case Study: Analyzing published financial (XBRL) data from the SEC
+ with common OLAP systems</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a href="http://xbrl.us/research/appdev/Pages/275.aspx">Financial
+ Information Observation System (FIOS)</a>)
+ </span>
+ </p>
+
+ <p>
+ Online Analytical Processing (OLAP) [<cite><a href="#ref-OLAP">OLAP</a></cite>]
+ is an analysis method on multidimensional data. It is an explorative
+ analysis method that allows users to interactively view the data on
+ different angles (rotate, select) or granularities (drill-down,
+ roll-up), and filter it for specific information (slice, dice).
+ </p>
+
+ <p>OLAP systems are commonly used in industry to analyze statistical data
+ on a regular basis. OLAP systems first use ETL pipelines to
+ extract-load-transform relevant data
+ in a data warehouse and then allow interfaces to efficiently issue OLAP queries
+ on the data.</p>
+
+ <p>
+ The goal in this use case is to allow analysis of published
+ statistical data with common OLAP systems [<cite><a href="#ref-OLAP4LD">OLAP4LD</a></cite>].
+ </p>
+
+ <p>For that a multidimensional model of the data needs to be
+ generated. A multidimensional model consists of facts summarized in
+ data cubes. Facts exhibit measures depending on members of dimensions.
+ Members of dimensions can be further structured along hierarchies of
+ levels.</p>
+
+ <p>
+ An example scenario of this use case is the Financial Information
+ Observation System (FIOS) [<cite><a href="#ref-FIOS">FIOS</a></cite>],
+ where XBRL data provided by the SEC on the Web is re-published as
+ Linked Data and made possible to explore and analyze by stakeholders
+ in a Web-based OLAP client Saiku.
+ </p>
+
+ <p>The following figure shows an example of using FIOS. Here, for
+ three different companies, the Cost of Goods Sold as disclosed in XBRL
+ documents are analyzed. As cell values either the number of
+ disclosures or — if only one available — the actual number in USD is
+ given:</p>
+
+
+ <p class="caption">Figure 4: Example of using FIOS for OLAP
+ operations on financial data</p>
+ <p align="center">
+ <img alt="Example of using FIOS for OLAP operations on financial data" src="index_files/FIOS_example.PNG">
+ </p>
+
+ <h3 id="benefits-10">Benefits</h3>
+
+ <ul>
+ <li>Data cube model well-known to many people in industry.</li>
+ <li>OLAP operations cover typical business requirements, e.g.,
+ slice, dice, drill-down and can be issued via intuitive, interactive,
+ explorative, fast OLAP frontends.</li>
+ <li>OLAP functionality provided by many tools that may be reused</li>
+ </ul>
+
+ <h3 id="challenges-10">Challenges</h3>
+ <ul>
+ <li>Define a mapping between XBRL and the Data Cube Vocabulary.
+ XBRL is representative for other common representation formats for
+ statistics such as CSV, Excel, ARFF, which supports lesson <a href="#excelCSV">Publishers
+ may need guidance in conversions from common statistical
+ representations such as CSV, Excel, ARFF etc.</a>
+ </li>
+ <li>ETL pipeline needs to automatically populate a data
+ warehouse. Common OLAP systems use relational databases with a star
+ schema. This supports lesson <a href="#criteriaForWell">
+ Publishers and consumers may need guidance in checking and making
+ use of well-formedness of published data using data cube</a>.
+ </li>
+ <li>A problem lies in the strict separation between queries for
+ the structure of data (metadata queries), and queries for actual
+ aggregated values (OLAP operations).</li>
+ <li>Define a mapping between OLAP operations and operations on
+ data using the Data Cube Vocabulary. This supports lesson <a href="#aggregations">
+ Publishers may need guidance in how to represent common analytical
+ operations such as Slice, Dice, Rollup on data cubes</a>.
+ </li>
+ <li>Another problem lies in defining data cubes without greater
+ insight in the data beforehand. Thus, OLAP systems have to cater for
+ possibly missing information (e.g., the aggregation function or a
+ human readable label).</li>
+ <li>Depending on the expressivity of the OLAP queries (e.g.,
+ aggregation functions, hierarchies, ordering), performance plays an
+ important role. This supports lesson <a href="#mechRec">
+ Publishers and consumers may need more guidance in efficiently
+ processing data using the Data Cube Vocabulary</a>.
+ </li>
+ </ul>
+
+ </section> <section id="registry-use-case-registering-published-statistical-data-in-data-catalogs">
+ <h3 id="Registeringpublishedstatisticaldataindatacatalogs"><span class="secno">3.12 </span>Registry
+ Use Case: Registering published statistical data in data catalogs</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case motivated by <a href="http://www.w3.org/TR/vocab-dcat/">Data Catalog vocabulary</a>
+ and <a href="http://wiki.planet-data.eu/web/Datasets">RDF Data
+ Cube Vocabulary datasets</a> in the PlanetData Wiki)
+ </span>
+ </p>
+
+ <p>
+ After statistics have been published as Linked Data, the question
+ remains how to communicate the publication and to let users discover
+ the statistics. There are catalogs to register datasets, e.g., CKAN, <a href="http://www.datacite.org/">datacite.org</a>, <a href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a href="http://pangaea.de/">Pangea</a>. Those catalogs require specific
+ configurations to register statistical data.
+ </p>
+
+ <p>The goal of this use case is to demonstrate how to expose and
+ distribute statistics after publication. For instance, to allow
+ automatic registration of statistical data in such catalogs, for
+ finding and evaluating datasets. To solve this issue, it should be
+ possible to transform the published statistical data into formats that
+ can be used by data catalogs.</p>
+
+ <p>
+ A concrete use case is the structured collection of <a href="http://wiki.planet-data.eu/web/Datasets">RDF Data Cube
+ Vocabulary datasets</a> in the PlanetData Wiki. This list is supposed to
+ describe statistical datasets on a higher level — for easy discovery
+ and selection — and to provide a useful overview of RDF Data Cube
+ deployments in the Linked Data cloud.
+ </p>
+ <h3 id="benefits-11">Benefits</h3>
+ <ul>
+ <li>Datasets may automatically be discovered by Web or data
+ crawlers.</li>
+ <li>Potential consumers will be pointed to published statistics
+ in search engines if searching for related information.</li>
+ <li>Users can use keyword search or structured queries for
+ specific datasets they may be interested in.</li>
+ <li>Applications and users are told about licenses, download
+ capabilities etc. of datasets.</li>
+ </ul>
+
+ <h3 id="challenges-11">Challenges</h3>
+ <ul>
+ <li>Define mapping between DCAT and Data Cube Vocabulary. The <a href="http://www.w3.org/TR/vocab-dcat/">Data Catalog vocabulary</a>
+ (DCAT) is strongly related to this use case since it may complement
+ the standard vocabulary for representing statistics in the case of
+ registering data in a data catalog. This supports lesson <a href="#mechRec">Publishers
+ may need guidance in communicating the availability of published
+ statistical data to external parties and to allow automatic
+ discovery of statistical data</a>.
+ </li>
+ <li>Define mapping between the Data Cube Vocabulary and data catalog
+ descriptions. If data catalogs contain statistics, they do not expose
+ those using Linked Data but for instance using CSV, HTML (e.g.,
+ Pangea) or XML (e.g., DDI - Data Documentation Initiative).
+ Therefore, it could also be a use case to publish such data using the
+ Data Cube Vocabulary.</li>
+ </ul>
+
+ </section> </section>
+
+ <section id="lessons">
+ <!--OddPage--><h2 id="requirements"><span class="secno">4. </span>Lessons</h2>
+
+ <p>The use cases presented in the previous section give rise to the
+ following lessons that can motivate future work on the vocabulary as
+ well as associated tools or services complementing the vocabulary.</p>
+
+ <section id="there-is-a-putative-requirement-to-update-to-sdmx-2.1-if-there-are-specific-use-cases-that-demand-it">
+ <h3 id="putative"><span class="secno">4.1 </span>There is
+ a putative requirement to update to SDMX 2.1 if there are specific use
+ cases that demand it</h3>
+ <p>
+ The draft version of the vocabulary builds upon <a href="http://sdmx.org/?page_id=16">SDMX Standards Version 2.0</a>. A
+ newer version of SDMX, <a href="http://sdmx.org/?p=899">SDMX
+ Standards, Version 2.1</a>, is available.
+ </p>
+ <p>The requirement is to at least build upon Version 2.0, if
+ specific use cases derived from Version 2.1 become available, the
+ working group may consider building upon Version 2.1.</p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/37">http://www.w3.org/2011/gld/track/issues/37</a></li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-case-study-eurostat-sdmx-as-linked-data">Publisher
+ Case Study: Eurostat SDMX as Linked Data</a></li>
+ </ul>
+ </section> <section id="publishers-may-need-more-guidance-in-creating-and-managing-slices-or-arbitrary-groups-of-observations">
+ <h3 id="clarify"><span class="secno">4.2 </span>Publishers
+ may need more guidance in creating and managing slices or arbitrary
+ groups of observations</h3>
+ <p>There should be a consensus on the issue of flattening or
+ abbreviating data; one suggestion is to author data without the
+ duplication, but have the data publication tools "flatten" the compact
+ representation into standalone observations during the publication
+ process.</p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/33">http://www.w3.org/2011/gld/track/issues/33</a></li>
+
+ <li>Since there are no known use cases for <code>qb:subslice</code>, the vocabulary
+ should clarify or drop the use of <code>qb:subslice</code>; issue: <a href="http://www.w3.org/2011/gld/track/issues/34">http://www.w3.org/2011/gld/track/issues/34</a>
+ </li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+
+ <li><a href="#publisher-case-study-uk-government-financial-data-from-combined-online-information-system-coins">Publisher
+ Case Study: UK government financial data from Combined Online
+ Information System (COINS)</a></li>
+
+ <li><a href="#publisher-case-study-publishing-observational-data-sets-about-uk-bathing-water-quality">Publisher
+ Case Study: Publishing Observational Data Sets about UK Bathing
+ Water Quality</a></li>
+
+ <li><a href="#publisher-case-study-eurostat-sdmx-as-linked-data">Publisher
+ Case Study: Eurostat SDMX as Linked Data</a></li>
+
+ </ul>
+ </section> <section id="publishers-may-need-more-guidance-to-decide-which-representation-of-hierarchies-is-most-suitable-for-their-use-case">
+ <h3 id="heirarchic"><span class="secno">4.3 </span>Publishers
+ may need more guidance to decide which representation of hierarchies
+ is most suitable for their use case</h3>
+ <p>
+ First, hierarchical code lists may be supported via SKOS [<cite><a href="#ref-skos">SKOS</a></cite>]. Allow for cross-location and cross-time
+ analysis of statistical datasets.
+ </p>
+ <p>
+ Second, one can think of non-SKOS hierarchical code lists. E.g., if
+ simple
+ <code> skos:narrower</code>
+ /
+ <code>skos:broader</code>
+ relationships are not sufficient or if a vocabulary uses specific
+ hierarchical properties, e.g.,
+ <code>geo:containedIn</code>
+ .
+ </p>
+ <p>
+ Also, the use of hierarchy levels needs to be clarified. It has been
+ suggested, to allow
+ <code>skos:Collections</code>
+ as value of
+ <code>qb:codeList</code>
+ .
+ </p>
+ <p>
+ Richard Cyganiak gave a summary of different options for specifying
+ the allowed dimension values of a coded property, possibly including
+ hierarchies (see <a href="http://lists.w3.org/Archives/Public/public-gld-wg/2013Mar/0108.html">mail</a>):
+ </p>
+
+ <ol>
+ <li>All instances of a given rdfs:Class (via rdf:type).</li>
+ <li>All skos:Concepts in a given skos:ConceptScheme (via
+ skos:inScheme).</li>
+ <li>All skos:Concepts in a given skos:Collection or its
+ subcollections (via skos:member).</li>
+ <li>All resources that are roots, or children of a root, of a
+ qb:HierarchicalCodeList.</li>
+ </ol>
+
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/39">http://www.w3.org/2011/gld/track/issues/39</a>
+ </li>
+ <li>Discussion at publishing-statistical-data mailing list: <a href="http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f">http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f</a></li>
+ <li>Part of the requirement is met by the work on an ISO
+ Extension to SKOS [<cite><a href="#ref-xkos">XKOS</a></cite>]
+ </li>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/59">http://www.w3.org/2011/gld/track/issues/59</a></li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-use-case-publishing-excel-spreadsheets-about-dutch-historical-census-data-as-linked-data">Publisher
+ Use Case: Publishing Excel Spreadsheets about Dutch historical
+ census data as Linked Data</a></li>
+ <li><a href="#publisher-use-case-publishing-hierarchically-structured-data-from-statswales-and-open-data-communities">Publisher
+ Use Case: Publishing hierarchically structured data from StatsWales
+ and Open Data Communities</a></li>
+ </ul>
+ </section> <section id="modelers-using-iso19156---observations-measurements-may-need-clarification-regarding-the-relationship-to-the-data-cube-vocabulary">
+ <h3 id="relToSO19156"><span class="secno">4.4 </span>Modelers
+ using ISO19156 - Observations & Measurements may need clarification
+ regarding the relationship to the Data Cube Vocabulary</h3>
+ <p>A number of organizations, particularly in the Climate and
+ Meteorological area, already have some commitment to the OGC
+ "Observations and Measurements" (O&M) logical data model, also
+ published as ISO 19156. Are there any statements about compatibility
+ and interoperability between O&M and Data Cube that can be made to
+ give guidance to such organizations?</p>
+
+ <p>
+ Partly solved by description for <a href="#publisher-case-study-site-specific-weather-forecasts-from-met-office-the-uk-s-national-weather-service">Publisher
+ Case study: Site specific weather forecasts from Met Office, the UK's
+ National Weather Service</a>.
+ </p>
+
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/32">http://www.w3.org/2011/gld/track/issues/32</a></li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-case-study-site-specific-weather-forecasts-from-met-office-the-uk-s-national-weather-service">Publisher
+ Case Study: Site specific weather forecasts from Met Office, the
+ UK's National Weather Service</a></li>
+ </ul>
+ </section> <section id="publishers-may-need-guidance-in-how-to-represent-common-analytical-operations-such-as-slice-dice-rollup-on-data-cubes">
+ <h3 id="aggregations"><span class="secno">4.5 </span>Publishers
+ may need guidance in how to represent common analytical operations
+ such as Slice, Dice, Rollup on data cubes</h3>
+
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-use-case-publishing-hierarchically-structured-data-from-statswales-and-open-data-communities">Publisher
+ Use Case: Publishing hierarchically structured data from StatsWales
+ and Open Data Communities</a></li>
+ <li><a href="#publisher-case-study-eurostat-sdmx-as-linked-data">Publisher
+ Case Study: Eurostat SDMX as Linked Data</a></li>
+ <li><a href="#consumer-case-study-analysing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">Consumer
+ Case Study: Analyzing published financial (XBRL) data from the SEC
+ with common OLAP systems</a></li>
+ </ul>
+ </section> <section id="publishers-may-need-guidance-in-making-transparent-the-pre-processing-of-aggregate-statistics">
+ <h3 id="declaringRel"><span class="secno">4.6 </span>Publishers
+ may need guidance in making transparent the pre-processing of
+ aggregate statistics</h3>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/30">http://www.w3.org/2011/gld/track/issues/30</a></li>
+ <li>Discussion in <a href="http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/75762788de10de95">publishing-statistical-data
+ mailing list</a>
+ </li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-case-study-uk-government-financial-data-from-combined-online-information-system-coins">Publisher
+ Case Study: UK government financial data from Combined Online
+ Information System (COINS)</a></li>
+ <li><a href="#publisher-use-case-publishing-excel-spreadsheets-about-dutch-historical-census-data-as-linked-data">Publisher
+ Use Case: Publishing Excel Spreadsheets about Dutch historical
+ census data as Linked Data</a></li>
+ <li><a href="#publisher-case-study-improving-trust-in-published-sustainability-information-at-the-digital-enterprise-research-institute-deri">Publisher
+ Case Study: Improving trust in published sustainability information
+ at the Digital Enterprise Research Institute (DERI)</a></li>
+ </ul>
+ </section> <section id="publishers-and-consumers-may-need-guidance-in-checking-and-making-use-of-well-formedness-of-published-data-using-data-cube">
+ <h3 id="criteriaForWell"><span class="secno">4.7 </span>Publishers
+ and consumers may need guidance in checking and making use of
+ well-formedness of published data using data cube</h3>
+
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/29">http://www.w3.org/2011/gld/track/issues/29</a></li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-case-study-uk-government-financial-data-from-combined-online-information-system-coins">Publisher
+ Case Study: UK government financial data from Combined Online
+ Information System (COINS)</a></li>
+ <li><a href="#consumer-case-study-simple-chart-visualisations-of-integrated-published-climate-sensor-data">Consumer
+ Case Study: Simple chart visualizations of (integrated) published
+ climate sensor data</a></li>
+ <li><a href="#consumer-use-case-visualising-published-statistical-data-in-google-public-data-explorer">Consumer
+ Use Case: Visualising published statistical data in Google Public
+ Data Explorer</a></li>
+ <li><a href="#consumer-case-study-analysing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">Consumer
+ Case Study: Analyzing published financial (XBRL) data from the SEC
+ with common OLAP systems</a></li>
+ </ul>
+ </section> <section id="publishers-may-need-guidance-in-conversions-from-common-statistical-representations-such-as-csv-excel-arff-etc.">
+ <h3 id="excelCSV"><span class="secno">4.8 </span>Publishers
+ may need guidance in conversions from common statistical
+ representations such as CSV, Excel, ARFF etc.</h3>
+
+ <p>Background information:</p>
+ <ul>
+ <li>None.</li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-use-case-publishing-excel-spreadsheets-about-dutch-historical-census-data-as-linked-data">Publisher
+ Use Case: Publishing Excel Spreadsheets about Dutch historical
+ census data as Linked Data</a></li>
+ <li><a href="#consumer-case-study-analysing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">Consumer
+ Case Study: Analyzing published financial (XBRL) data from the SEC
+ with common OLAP systems</a></li>
+ </ul>
+ </section> <section id="consumers-may-need-guidance-in-conversions-into-formats-that-can-easily-be-displayed-and-further-investigated-in-tools-such-as-google-data-explorer-r-weka-etc.">
+ <h3 id="consumers"><span class="secno">4.9 </span>Consumers
+ may need guidance in conversions into formats that can easily be
+ displayed and further investigated in tools such as Google Data
+ Explorer, R, Weka etc.</h3>
+
+ <p>Background information:</p>
+ <ul>
+ <li>None.</li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-case-study-eurostat-sdmx-as-linked-data">Publisher
+ Case Study: Eurostat SDMX as Linked Data</a></li>
+ <li><a href="#consumer-use-case-visualising-published-statistical-data-in-google-public-data-explorer">Consumer
+ Use Case: Visualising published statistical data in Google Public
+ Data Explorer</a></li>
+ </ul>
+ </section> <section id="publishers-and-consumers-may-need-more-guidance-in-efficiently-processing-data-using-the-data-cube-vocabulary">
+ <h3 id="mechRec"><span class="secno">4.10 </span>Publishers
+ and consumers may need more guidance in efficiently processing data
+ using the Data Cube Vocabulary</h3>
+ <p>Background information:</p>
+ <ul>
+ <li>Related issue regarding abbreviations <a href="http://www.w3.org/2011/gld/track/issues/29">http://www.w3.org/2011/gld/track/issues/29</a>
+ </li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#publisher-case-study-uk-government-financial-data-from-combined-online-information-system-coins">Publisher
+ Case Study: UK government financial data from Combined Online
+ Information System (COINS)</a></li>
+ <li><a href="#publisher-use-case-publishing-excel-spreadsheets-about-dutch-historical-census-data-as-linked-data">Publisher
+ Use Case: Publishing Excel Spreadsheets about Dutch historical
+ census data as Linked Data</a></li>
+ <li><a href="#publisher-case-study-site-specific-weather-forecasts-from-met-office-the-uk-s-national-weather-service">Publisher
+ Case Study: Site specific weather forecasts from Met Office, the
+ UK's National Weather Service</a></li>
+ <li><a href="#publisher-case-study-eurostat-sdmx-as-linked-data">Publisher
+ Case Study: Eurostat SDMX as Linked Data</a></li>
+ <li><a href="#consumer-case-study-analysing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">Consumer
+ Case Study: Analyzing published financial (XBRL) data from the SEC
+ with common OLAP systems</a></li>
+ </ul>
+ </section> <section id="publishers-may-need-guidance-in-communicating-the-availability-of-published-statistical-data-to-external-parties-and-to-allow-automatic-discovery-of-statistical-data">
+ <h3 id="pubGuidance"><span class="secno">4.11 </span>Publishers
+ may need guidance in communicating the availability of published
+ statistical data to external parties and to allow automatic discovery
+ of statistical data</h3>
+ <p>Clarify the relationship between DCAT and QB.</p>
+ <p>Background information:</p>
+ <ul>
+ <li>None.</li>
+ </ul>
+ <p>Supporting use cases:</p>
+ <ul>
+ <li><a href="#sdmx-web-dissemination-use-case">SDMX Web
+ Dissemination Use Case</a></li>
+ <li><a href="#registry-use-case-registering-published-statistical-data-in-data-catalogs">Registry
+ Use Case: Registering published statistical data in data catalogs</a></li>
+ </ul>
+
+ </section> </section>
+ <section id="acknowledgements-1" class="appendix">
+ <!--OddPage--><h2 id="acknowledgements"><span class="secno">A. </span>Acknowledgements</h2>
+ <p>We thank Phil Archer, John Erickson, Rinke Hoekstra, Bernadette
+ Hyland, Aftab Iqbal, James McKinney, Dave Reynolds, Biplav Srivastava,
+ Boris Villazón-Terrazas for feedback and input.</p>
+ </section>
+
+ <h2 id="references">References</h2>
+
+ <dl>
+
+ <dt id="ref-cog">[COG]</dt>
+ <dd>
+ SDMX Content Oriented Guidelines, <a href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>.
+ </dd>
+
+ <dt id="ref-COGS">[COGS]</dt>
+ <dd>
+ Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S., & Curry, E.
+ (2012). Representing Interoperable Provenance Descriptions for ETL
+ Workflows. ESWC 2012 Workshop Highlights (pp. 1–15). Springer Verlag,
+ 2012 (in press). (Extended Paper published in Conf. Proceedings.). <a href="http://andrefreitas.org/papers/preprint_provenance_ETL_workflow_eswc_highlights.pdf">http://andrefreitas.org/papers/preprint_provenance_ETL_workflow_eswc_highlights.pdf</a>.
+ </dd>
+
+ <dt id="ref-COINS">[COINS]</dt>
+ <dd>
+ Ian Dickinson et al., COINS as Linked Data <a href="http://data.gov.uk/resources/coins">http://data.gov.uk/resources/coins</a>,
+ last visited on Jan 9 2013.
+ </dd>
+
+ <dt id="ref-FIOS">[FIOS]</dt>
+ <dd>
+ Andreas Harth, Sean O'Riain, Benedikt Kämpgen. Submission XBRL
+ Challenge 2011. <a href="http://xbrl.us/research/appdev/Pages/275.aspx">http://xbrl.us/research/appdev/Pages/275.aspx</a>.
+ </dd>
+
+
+ <dt id="ref-FOWLER97">[FOWLER97]</dt>
+ <dd>Fowler, Martin (1997). Analysis Patterns: Reusable Object
+ Models. Addison-Wesley. ISBN 0201895420.</dd>
+
+
+ <dt id="ref-linked-data">[LOD]</dt>
+ <dd>
+ Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>.
+ </dd>
+
+ <dt id="ref-OLAP">[OLAP]</dt>
+ <dd>
+ Online Analytical Processing Data Cubes, <a href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>.
+ </dd>
+
+ <dt id="ref-OLAP4LD">[OLAP4LD]</dt>
+ <dd>
+ Kämpgen, B. and Harth, A. (2011). Transforming Statistical Linked
+ Data for Use in OLAP Systems. I-Semantics 2011. <a href="http://www.aifb.kit.edu/web/Inproceedings3211">http://www.aifb.kit.edu/web/Inproceedings3211</a>.
+ </dd>
+
+ <dt id="ref-QB-2010">[QB-2010]</dt>
+ <dd>
+ RDF Data Cube vocabulary, <a href="http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html">http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</a>.
+ </dd>
+
+ <dt id="ref-QB-2013">[QB-2013]</dt>
+ <dd>
+ RDF Data Cube vocabulary, <a href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a>.
+ </dd>
+
+ <dt id="ref-QB4OLAP">[QB4OLAP]</dt>
+ <dd>
+ Etcheverry, Vaismann. QB4OLAP : A New Vocabulary for OLAP Cubes on
+ the Semantic Web. <a href="http://publishing-multidimensional-data.googlecode.com/git/index.html">http://publishing-multidimensional-data.googlecode.com/git/index.html</a>.
+ </dd>
+
+ <dt id="ref-rdf">[RDF]</dt>
+ <dd>
+ Resource Description Framework, <a href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</a>.
+ </dd>
+
+ <dt id="ref-scovo">[SCOVO]</dt>
+ <dd>
+ The Statistical Core Vocabulary, <a href="http://sw.joanneum.at/scovo/schema.html">http://sw.joanneum.at/scovo/schema.html</a>
+ <br> SCOVO: Using Statistics on the Web of data, <a href="http://sw-app.org/pub/eswc09-inuse-scovo.pdf">http://sw-app.org/pub/eswc09-inuse-scovo.pdf</a>.
+ </dd>
+
+ <dt id="ref-skos">[SKOS]</dt>
+ <dd>
+ Simple Knowledge Organization System, <a href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>.
+ </dd>
+
+ <dt id="ref-SDMX">[SMDX]</dt>
+ <dd>
+ SMDX - SDMX User Guide Version 2009.1, <a href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>,
+ last visited Jan 8 2013.
+ </dd>
+
+ <dt id="ref-SDMX-21">[SMDX 2.1]</dt>
+ <dd>
+ SDMX 2.1 User Guide Version. Version 0.1 - 19/09/2012. <a href="http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf">http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf</a>.
+ last visited on 8 Jan 2013.
+ </dd>
+
+ <dt id="ref-xkos">[XKOS]</dt>
+ <dd>
+ Extended Knowledge Organization System (XKOS), <a href="https://github.com/linked-statistics/xkos">https://github.com/linked-statistics/xkos</a>.
+ </dd>
+
+ </dl>
+
+
+</body></html>
\ No newline at end of file