Binary file data-cube-ucr/data-cube-ucr-20120222/figures/Eurostat_GPDE_Example.png has changed
Binary file data-cube-ucr/data-cube-ucr-20120222/figures/FIOS_example.PNG has changed
Binary file data-cube-ucr/data-cube-ucr-20120222/figures/Level_above_msl_3_locations.png has changed
Binary file data-cube-ucr/data-cube-ucr-20120222/figures/Relationships_Statistical_Data_Cogs_Example.png has changed
Binary file data-cube-ucr/data-cube-ucr-20120222/figures/SDMX_Web_Dissemination_Use_Case.png has changed
Binary file data-cube-ucr/data-cube-ucr-20120222/figures/modeling_quantity_measurement_observation.png has changed
Binary file data-cube-ucr/data-cube-ucr-20120222/figures/pivot_analysis_measurements.PNG has changed
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/data-cube-ucr/data-cube-ucr-20120222/index.html Thu Feb 28 10:06:00 2013 -0500
@@ -0,0 +1,860 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
+ "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<title>Use Cases and Requirements for the Data Cube Vocabulary</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<script type="text/javascript"
+ src="http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js" class="remove"></script>
+<script src="respec-ref.js"></script>
+<script src="respec-config.js"></script>
+<link rel="stylesheet" type="text/css" href="local-style.css" />
+</head>
+<body>
+
+ <section id="abstract">
+ <p>Many national, regional and local governments, as well as other
+ organizations inside and outside of the public sector, create
+ statistics. There is a need to publish those statistics in a
+ standardized, machine-readable way on the web, so that statistics can
+ be freely integrated and reused in consuming applications. This
+ document is a collection of use cases for a standard vocabulary to
+ publish statistics as Linked Data.</p>
+ </section>
+
+ <section id="sotd">
+ <p>
+ This is a working document of the <a
+ href="http://www.w3.org/2011/gld/wiki/Data_Cube_Vocabulary">Data
+ Cube Vocabulary project</a> within the <a
+ href="http://www.w3.org/2011/gld/">W3C Government Linked Data
+ Working Group</a>. Feedback is welcome and should be sent to the <a
+ href="mailto:public-gld-comments@w3.org">public-gld-comments@w3.org
+ mailing list</a>.
+ </p>
+ </section>
+
+ <section>
+ <h2>Introduction</h2>
+
+ <p>Many national, regional and local governments, as well as other
+ organizations inside and outside of the public sector, create
+ statistics. There is a need to publish those statistics in a
+ standardized, machine-readable way on the web, so that statistics can
+ be freely linked, integrated and reused in consuming applications.
+ This document is a collection of use cases for a standard vocabulary
+ to publish statistics as Linked Data.</p>
+ </section>
+
+
+ <section>
+ <h2>Terminology</h2>
+ <p>
+ <dfn>Statistics</dfn>
+ is the <a href="http://en.wikipedia.org/wiki/Statistics">study</a> of
+ the collection, organization, analysis, and interpretation of data. A
+ statistic is a statistical dataset.
+ </p>
+
+ <p>
+ A
+ <dfn>statistical dataset</dfn>
+ comprises multidimensional data - a set of observed values organized
+ along a group of dimensions, together with associated metadata. Basic
+ structure of (aggregated) statistical data is a multidimensional table
+ (also called a cube) <a href="#ref-SDMX">[SDMX]</a>.
+ </p>
+
+ <p>
+ <dfn>Source data</dfn>
+ is data from datastores such as RDBs or spreadsheets that acts as a
+ source for the Linked Data publishing process.
+ </p>
+
+ <p>
+ <dfn>Metadata</dfn>
+ about statistics defines the data structure and give contextual
+ information about the statistics.
+ </p>
+
+ <p>
+ A format is
+ <dfn>machine-readable</dfn>
+ if it is amenable to automated processing by a machine, as opposed to
+ presentation to a human user.
+ </p>
+
+ <p>
+ A
+ <dfn>publisher</dfn>
+ is a person or organization that exposes source data as Linked Data on
+ the Web.
+ </p>
+
+ <p>
+ A
+ <dfn>consumer</dfn>
+ is a person or agent that uses Linked Data from the Web.
+ </p>
+
+ </section>
+
+
+ <section>
+ <h2>Use cases</h2>
+ <p>
+ This section presents scenarios that would be enabled by the existence
+ of a standard vocabulary for the representation of statistics as
+ Linked Data. Since a draft of the specification of the cube vocabulary
+ has been published, and the vocabulary already is in use, we will call
+ this standard vocabulary after its current name RDF Data Cube
+ vocabulary (short <a href="#ref-QB">[QB]</a>) throughout the document.
+ </p>
+ <p>We distinguish between use cases of publishing statistical data,
+ and use cases of consuming statistical data since requirements for
+ publishers and consumers of statistical data differ.</p>
+ <section>
+ <h3>Publishing statistical data</h3>
+
+ <section>
+ <h4>Publishing general statistics in a machine-readable and
+ application-independent way (UC 1)</h4>
+ <p>More and more organizations want to publish statistics on the
+ web, for reasons such as increasing transparency and trust. Although
+ in the ideal case, published data can be understood by both humans and
+ machines, data often is simply published as CSV, PDF, XSL etc.,
+ lacking elaborate metadata, which makes free usage and analysis
+ difficult.</p>
+
+ <p>The goal in this use case is to use a machine-readable and
+ application-independent description of common statistics with use of
+ open standards. The use case is fulfilled if QB will be a Linked Data
+ vocabulary for encoding statistical data that has a hypercube
+ structure and as such can describe common statistics in a
+ machine-readable and application-independent way.</p>
+
+ <p>
+ An example scenario of this use case has been to publish the Combined
+ Online Information System (<a
+ href="http://data.gov.uk/resources/coins">COINS</a>). There, HM
+ Treasury, the principal custodian of financial data for the UK
+ government, released previously restricted information from its
+ Combined Online Information System (COINS). Five data files were
+ released containing between 3.3 and 4.9 million rows of data. The
+ COINS dataset was translated into RDF for two reasons:
+ </p>
+
+ <ol>
+ <li>To publish statistics (e.g., as data files) are too large to
+ load into widely available analysis tools such as Microsoft Excel, a
+ common tool-of-choice for many data investigators.</li>
+ <li>COINS is a highly technical information source, requiring
+ both domain and technical skills to make useful applications around
+ the data.</li>
+ </ol>
+ <p>Publishing statistics is challenging for the several reasons:</p>
+ <p>
+ Representing observations and measurements requires more complex
+ modeling as discussed by Martin Fowler <a href="#Fowler1997">[Fowler,
+ 1997]</a>: Recording a statistic simply as an attribute to an object
+ (e.g., a the fact that a person weighs 185 pounds) fails with
+ representing important concepts such as quantity, measurement, and
+ observation.
+ </p>
+
+ <p>Quantity comprises necessary information to interpret the value,
+ e.g., the unit and arithmetical and comparative operations; humans and
+ machines can appropriately visualize such quantities or have
+ conversions between different quantities.</p>
+
+ <p>Quantity comprises necessary information to interpret the value,
+ e.g., the unit and arithmetical and comparative operations; humans and
+ machines can appropriately visualize such quantities or have
+ conversions between different quantities.</p>
+
+ <p>A Measurement separates a quantity from the actual event at
+ which it was collected; a measurement assigns a quantity to a specific
+ phenomenon type (e.g., strength). Also, a measurement can record
+ metadata such as who did the measurement (person), and when was it
+ done (time).</p>
+
+ <p>Observations, eventually, abstract from measurements only
+ recording numeric quantities. An Observation can also assign a
+ category observation (e.g., blood group A) to an observation. Figure
+ demonstrates this relationship.</p>
+ <p>
+ <div class="fig">
+ <a href="figures/modeling_quantity_measurement_observation.png"><img
+ src="figures/modeling_quantity_measurement_observation.png"
+ alt="Modeling quantity, measurement, observation" /> </a>
+ <div>Modeling quantity, measurement, observation</div>
+ </div>
+ </div>
+ </p>
+
+ <p>QB deploys the multidimensional model (made of observations with
+ Measures depending on Dimensions and Dimension Members, and further
+ contextualized by Attributes) and should cater for these complexity in
+ modelling.</p>
+ <p>Another challenge is that for brevity reasons and to avoid
+ repetition, it is useful to have abbreviation mechanisms such as
+ assigning overall valid properties of observations at the dataset or
+ slice level, and become implicitly part of each observation. For
+ instance, in the case of COINS, all of the values are in thousands of
+ pounds sterling. However, one of the use cases for the linked data
+ version of COINS is to allow others to link to individual
+ observations, which suggests that these observations should be
+ standalone and self-contained – and should therefore have explicit
+ multipliers and units on each observation. One suggestion is to author
+ data without the duplication, but have the data publication tools
+ "flatten" the compact representation into standalone observations
+ during the publication process.</p>
+ <p>A further challenge is related to slices of data. Slices of data
+ group observations that are of special interest, e.g., slices
+ unemployment rates per year of a specific gender are suitable for
+ direct visualization in a line diagram. However, depending on the
+ number of Dimensions, the number of possible slices can become large
+ which makes it difficult to select all interesting slices. Therefore,
+ and because of their additional complexity, not many publishers create
+ slices. In fact, it is somewhat unclear at this point which slices
+ through the data will be useful to (COINS-RDF) users.</p>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>Existing Work (optional): -</p>
+
+ </section> <section>
+ <h4>Publishing one or many MS excel spreadsheet files with
+ statistical data on the web (UC 2)</h4>
+ <p>Not only in government, there is a need to publish considerable
+ amounts of statistical data to be consumed in various (also
+ unexpected) application scenarios. Typically, Microsoft Excel sheets
+ are made available for download. Those excel sheets contain single
+ spreadsheets with several multidimensional data tables, having a name
+ and notes, as well as column values, row values, and cell values.</p>
+ <p>The goal in this use case is to to publish spreadsheet
+ information in a machine-readable format on the web, e.g., so that
+ crawlers can find spreadsheets that use a certain column value. The
+ published data should represent and make available for queries the
+ most important information in the spreadsheets, e.g., rows, columns,
+ and cell values. QB should provide the level of detail that is needed
+ for such a transformation in order to fulfil this use case.</p>
+ <p>In a possible use case scenario an institution wants to develop
+ or use a software that transforms their excel sheets into the
+ appropriate format.</p>
+
+ <p class="editorsnote">@@TODO: Concrete example needed.</p>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>Excel sheets provide much flexibility in arranging
+ information. It may be necessary to limit this flexibility to allow
+ automatic transformation.</li>
+ <li>There may be many spreadsheets.</li>
+ <li>Semi-structured information, e.g., notes about lineage of
+ data cells, may not be possible to be formalized.</li>
+ </ul>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>
+ Existing Work (optional): Stats2RDF uses OntoWiki to translate CSV
+ into QB <a href="http://aksw.org/Projects/Stats2RDF">[Stats2RDF]</a>.
+ </p>
+
+ </section> <section>
+ <h4>Publishing SDMX as Linked Data (UC 3)</h4>
+ <p>The ISO standard for exchanging and sharing statistical data and
+ metadata among organizations is Statistical Data and Metadata eXchange
+ (SDMX). Since this standard has proven applicable in many contexts, QB
+ is designed to be compatible with the multidimensional model that
+ underlies SDMX.</p>
+ <p class="editorsnote">@@TODO: The QB spec should maybe also use
+ the term "multidimensional model" instead of the less clear "cube
+ model" term.</p>
+ <p>Therefore, it should be possible to re-publish SDMX data using
+ QB.</p>
+ <p>
+ The scenario for this use case is Eurostat <a
+ href="http://epp.eurostat.ec.europa.eu/">[EUROSTAT]</a>, which
+ publishes large amounts of European statistics coming from a data
+ warehouse as SDMX and other formats on the web. Eurostat also provides
+ an interface to browse and explore the datasets. However, linking such
+ multidimensional data to related data sets and concepts would require
+ download of interesting datasets and manual integration.
+ </p>
+ <p>The goal of this use case is to improve integration with other
+ datasets; Eurostat data should be published on the web in a
+ machine-readable format, possible to be linked with other datasets,
+ and possible to be freeley consumed by applications. This use case is
+ fulfilled if QB can be used for publishing the data from Eurostat as
+ Linked Data for integration.</p>
+ <p>A publisher wants to make available Eurostat data as Linked
+ Data. The statistical data shall be published as is. It is not
+ necessary to represent information for validation. Data is read from
+ tsv only. There are two concrete examples of this use case: Eurostat
+ Linked Data Wrapper (http://estatwrap.ontologycentral.com/), and
+ Linked Statistics Eurostat Data
+ (http://eurostat.linked-statistics.org/). They have slightly different
+ focus (e.g., with respect to completeness, performance, and agility).
+ </p>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>There are large amounts of SDMX data; the Eurostat dataset
+ comprises 350 GB of data. This may influence decisions about toolsets
+ and architectures to use. One important task is to decide whether to
+ structure the data in separate datasets.</li>
+ <li>Again, the question comes up whether slices are useful.</li>
+ </ul>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>Existing Work (optional): -</p>
+ </section> <section>
+ <h4>Publishing sensor data as statistics (UC 4)</h4>
+ <p>Typically, multidimensional data is aggregated. However, there
+ are cases where non-aggregated data needs to be published, e.g.,
+ observational, sensor network and forecast data sets. Such raw data
+ may be available in RDF, already, but using a different vocabulary.</p>
+ <p>The goal of this use case is to demonstrate that publishing of
+ aggregate values or of raw data should not make much of a difference
+ in QB.</p>
+ <p>
+ For example the Environment Agency uses it to publish (at least
+ weekly) information on the quality of bathing waters around England
+ and Wales <A
+ href="http://www.epimorphics.com/web/wiki/bathing-water-quality-structure-published-linked-data">[EnvAge]</A>.
+ In another scenario DERI tracks from measurements about printing for a
+ sustainability report. In the DERI scenario, raw data (number of
+ printouts per person) is collected, then aggregated on a unit level,
+ and then modelled using QB.
+ </p>
+ <p>Problems and Limitations:</p>
+ <ul>
+ <li>This use case also shall demonstrate how to link statistics
+ with other statistics or non-statistical data (metadata).</li>
+ </ul>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>
+ Existing Work (optional): Semantic Sensor Network ontology <A
+ href="http://purl.oclc.org/NET/ssnx/ssn">[SSN]</A> already provides a
+ way to publish sensor information. SSN data provides statistical
+ Linked Data and grounds its data to the domain, e.g., sensors that
+ collect observations (e.g., sensors measuring average of temperature
+ over location and time). A number of organizations, particularly in
+ the Climate and Meteorological area already have some commitment to
+ the OGC "Observations and Measurements" (O&M) logical data model, also
+ published as ISO 19156. The QB spec should maybe also prefer the term
+ "multidimensional model" instead of the less clear "cube model" term.
+
+ <p class="editorsnote">@@TODO: Are there any statements about
+ compatibility and interoperability between O&M and Data Cube that can
+ be made to give guidance to such organizations?</p>
+ </p>
+ </section> <section>
+ <h4>Registering statistical data in dataset catalogs (UC 5)</h4>
+ <p>
+ After statistics have been published as Linked Data, the question
+ remains how to communicate the publication and let users find the
+ statistics. There are catalogs to register datasets, e.g., CKAN, <a
+ href="http://www.datacite.org/datacite.org">datacite.org</a>, <a
+ href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a
+ href="http://pangaea.de/">Pangea</a>. Those catalogs require specific
+ configurations to register statistical data.
+ </p>
+ <p>The goal of this use case is to demonstrate how to expose and
+ distribute statistics after modeling using QB. For instance, to allow
+ automatic registration of statistical data in such catalogs, for
+ finding and evaluating datasets. To solve this issue, it should be
+ possible to transform QB data into formats that can be used by data
+ catalogs.</p>
+
+ <p class="editorsnote">@@TODO: Find specific use case scenario or
+ ask how other publishers of QB data have dealt with this issue Maybe
+ relation to DCAT?</p>
+ <p>Problems and Limitations: -</p>
+ <p>Unanticipated Uses (optional): If data catalogs contain
+ statistics, they do not expose those using Linked Data but for
+ instance using CSV or HTML (Pangea [11]). It could also be a use case
+ to publish such data using QB.</p>
+ <p>Existing Work (optional): -</p>
+ </section> <section>
+ <h4>Making transparent transformations on or different versions of
+ statistical data (UC 6)</h4>
+ <p>Statistical data often is used and further transformed for
+ analysis and reporting. There is the risk that data has been
+ incorrectly transformed so that the result is not interpretable any
+ more. Therefore, if statistical data has been derived from other
+ statistical data, this should be made transparent.</p>
+ <p>The goal of this use case is to describe provenance and
+ versioning around statistical data, so that the history of statistics
+ published on the web becomes clear. This may also relate to the issue
+ of having relationships between datasets published using QB. To fulfil
+ this use case QB should recommend specific approaches to transforming
+ and deriving of datasets which can be tracked and stored with the
+ statistical data.</p>
+ <p class="editorsnote">@@TODO: Add concrete example use case
+ scenario.</p>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>Operations on statistical data result in new statistical
+ data, depending on the operation. For intance, in terms of Data Cube,
+ operations such as slice, dice, roll-up, drill-down will result in
+ new Data Cubes. This may require representing general relationships
+ between cubes (as discussed here: [12]).</li>
+ </ul>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>Existing Work (optional): Possible relation to Best Practices
+ part on Versioning [13], where it is specified how to publish data
+ which has multiple versions.</p>
+
+
+ </section></section> <section>
+ <h3>Consuming published statistical data</h3>
+
+ <section>
+ <h4>Simple chart visualizations of (integrated) published
+ statistical datasets (UC 7)</h4>
+ <p>Data that is published on the Web is typically visualized by
+ transforming it manually into CSV or Excel and then creating a
+ visualization on top of these formats using Excel, Tableau,
+ RapidMiner, Rattle, Weka etc.</p>
+ <p>This use case shall demonstrate how statistical data published
+ on the web can be directly visualized, without using commercial or
+ highly-complex tools. This use case is fulfilled if data that is
+ published in QB can be directly visualized inside a webpage.</p>
+ <p>An example scenario is environmental research done within the
+ SMART research project (http://www.iwrm-smart.org/). Here, statistics
+ about environmental aspects (e.g., measurements about the climate in
+ the Lower Jordan Valley) shall be visualized for scientists and
+ decision makers. Statistics should also be possible to be integrated
+ and displayed together. The data is available as XML files on the web.
+ On a separate website, specific parts of the data shall be queried and
+ visualized in simple charts, e.g., line diagrams. The following figure
+ shows the wanted display of an environmental measure over time for
+ three regions in the lower Jordan valley; displayed inside a web page:</p>
+
+ <p>
+ <div class="fig">
+ <a href="figures/Level_above_msl_3_locations.png"><img
+ width="800px" src="figures/Level_above_msl_3_locations.png"
+ alt="Line chart visualization of QB data" /> </a>
+ <div>Line chart visualization of QB data</div>
+ </div>
+ </div>
+ </p>
+
+ <p>The following figure shows the same measures in a pivot table.
+ Here, the aggregate COUNT of measures per cell is given.</p>
+
+ <p>
+ <div class="fig">
+ <a href="figures/pivot_analysis_measurements.PNG"><img
+ src="figures/pivot_analysis_measurements.PNG"
+ alt="Pivot analysis measurements" /> </a>
+ <div>Pivot analysis measurements</div>
+ </div>
+ </div>
+ </p>
+
+ <p>The use case uses Google App Engine, Qcrumb.com, and Spark. An
+ example of a line diagram is given at [14] (some loading time needed).
+ Current work tries to integrate current datasets with additional data
+ sources, and then having queries that take data from both datasets and
+ display them together.</p>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>The difficulties lay in structuring the data appropriately so
+ that the specific information can be queried.</li>
+ <li>Also, data shall be published with having potential
+ integration in mind. Therefore, e.g., units of measurements need to
+ be represented.</li>
+ <li>Integration becomes much more difficult if publishers use
+ different measures, dimensions.</li>
+
+ </ul>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>Existing Work (optional): -</p>
+ </section> <section>
+ <h4>Uploading published statistical data in Google Public Data
+ Explorer (UC 8)</h4>
+ <p>Google Public Data Explorer (GPDE -
+ http://code.google.com/apis/publicdata/) provides an easy possibility
+ to visualize and explore statistical data. Data needs to be in the
+ Dataset Publishing Language (DSPL -
+ https://developers.google.com/public-data/overview) to be uploaded to
+ the data explorer. A DSPL dataset is a bundle that contains an XML
+ file, the schema, and a set of CSV files, the actual data. Google
+ provides a tutorial to create a DSPL dataset from your data, e.g., in
+ CSV. This requires a good understanding of XML, as well as a good
+ understanding of the data that shall be visualized and explored.</p>
+ <p>In this use case, it shall be demonstrate how to take any
+ published QB dataset and to transform it automatically into DSPL for
+ visualization and exploration. A dataset that is published conforming
+ to QB will provide the level of detail that is needed for such a
+ transformation.</p>
+ <p>In an example scenario, a publisher P has published data using
+ QB. There are two different ways to fulfil this use case: 1) A
+ customer C is downloading this data into a triple store; SPARQL
+ queries on this data can be used to transform the data into DSPL and
+ uploaded and visualized using GPDE. 2) or, one or more XLST
+ transformation on the RDF/XML transforms the data into DSPL.</p>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>The technical challenges for the consumer here lay in knowing
+ where to download what data and how to get it transformed into DSPL
+ without knowing the data.</li>
+ <p>Unanticipated Uses (optional): DSPL is representative for using
+ statistical data published on the web in available tools for
+ analysis. Similar tools that may be automatically covered are: Weka
+ (arff data format), Tableau, etc.</p>
+ <p>Existing Work (optional): -</p>
+ </ul>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>Existing Work (optional): -</p>
+ </section> <section>
+ <h4>Allow Online Analytical Processing on published datasets of
+ statistical data (UC 9)</h4>
+ <p>Online Analytical Processing [15] is an analysis method on
+ multidimensional data. It is an explorative analysis methode that
+ allows users to interactively view the data on different angles
+ (rotate, select) or granularities (drill-down, roll-up), and filter it
+ for specific information (slice, dice).</p>
+ <p>The multidimensional model used in QB to model statistics should
+ be usable by OLAP systems. More specifically, data that conforms to QB
+ can be used to define a Data Cube within an OLAP engine and can then
+ be queries by OLAP clients.</p>
+ <p>An example scenario of this use case is the Financial
+ Information Observation System (FIOS) [16], where XBRL data has been
+ re-published using QB and made analysable for stakeholders in a
+ web-based OLAP client. The following figure shows an example of using
+ FIOS. Here, for three different companies, cost of goods sold as
+ disclosed in XBRL documents are analysed. As cell values either the
+ number of disclosures or - if only one available - the actual number
+ in USD is given:</p>
+
+ <p>
+ <div class="fig">
+ <a href="figures/FIOS_example.PNG"><img
+ src="figures/FIOS_example.PNG" alt="OLAP of QB data" /> </a>
+ <div>OLAP of QB data</div>
+ </div>
+ </div>
+ </p>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>A problem lies in the strict separation between queries for
+ the structure of data, and queries for actual aggregated values.</li>
+ <li>Another problem lies in defining Data Cubes without greater
+ insight in the data beforehand.</li>
+ <li>Depending on the expressivity of the OLAP queries (e.g.,
+ aggregation functions, hierarchies, ordering), performance plays an
+ important role.</li>
+ <li>QB allows flexibility in describing statistics, e.g., in
+ order to reduce redundancy of information in single observations.
+ These alternatives make general consumption of QB data more complex.
+ Also, it is not clear, what "conforms" to QB means, e.g., is a
+ qb:DataStructureDefinition required?</li>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>Existing Work (optional): -</p>
+ </ul>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>Existing Work (optional): -</p>
+ </section> <section>
+ <h4>Transforming published statistics into XBRL (UC 10)</h4>
+ <p>XBRL is a standard data format for disclosing financial
+ information. Typically, financial data is not managed within the
+ organization using XBRL but instead, internal formats such as excel or
+ relational databases are used. If different data sources are to be
+ summarized in XBRL data formats to be published, an internally-used
+ standard format such as QB could help integrate and transform the data
+ into the appropriate format.</p>
+ <p>In this use case data that is available as data conforming to QB
+ should also be possible to be automatically transformed into such XBRL
+ data format. This use case is fulfilled if QB contains necessary
+ information to derive XBRL data.</p>
+ <p>In an example scenario, DERI has had a use case to publish
+ sustainable IT information as XBRL to the Global Reporting Initiative
+ (GRI - https://www.globalreporting.org/). Here, raw data (number of
+ printouts per person) is collected, then aggregated on a unit level
+ and modelled using QB. QB data shall then be used directly to fill-in
+ XBRL documents that can be published to the GRI.</p>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>So far, QB data has been transformed into semantic XBRL, a
+ vocabulary closer to XBRL. There is the chance that certain
+ information required in a GRI XBRL document cannot be encoded using a
+ vocabulary as general as QB. In this case, QB could be used in
+ concordance with semantic XBRL.</li>
+ </ul>
+ <p class="editorsnote">@@TODO: Add link to semantic XBRL.</p>
+ <p>Unanticipated Uses (optional): -</p>
+ <p>Existing Work (optional): -</p>
+
+ </section> </section></section>
+ <section>
+ <h2>Requirements</h2>
+
+ <p>The use cases presented in the previous section give rise to the
+ following requirements for a standard representation of statistics.
+ Requirements are cross-linked with the use cases that motivate them.
+ Requirements are similarly categorized as deriving from publishing or
+ consuming use cases.</p>
+
+ <section>
+ <h3>Publishing requirements</h3>
+
+ <section>
+ <h4>Machine-readable and application-independent representation of
+ statistics</h4>
+ <p>It should be possible to add abstraction, multiple levels of
+ description, summaries of statistics.</p>
+
+ <p>Required by: UC1, UC2, UC3, UC4</p>
+ </section> <section>
+ <h4>Representing statistics from various resource</h4>
+ <p>Statistics from various resource data should be possible to be
+ translated into QB. QB should be very general and should be usable for
+ other data sets such as survey data, spreadsheets and OLAP data cubes.
+ What kind of statistics are described: simple CSV tables (UC 1), excel
+ (UC 2) and more complex SDMX (UC 3) data about government statistics
+ or other public-domain relevant data.</p>
+
+ <p>Required by: UC1, UC2, UC3</p>
+ </section> <section>
+ <h4>Communicating, exposing statistics on the web</h4>
+ <p>It should become clear how to make statistical data available on
+ the web, including how to expose it, and how to distribute it.</p>
+
+ <p>Required by: UC5</p>
+ </section> <section>
+ <h4>Coverage of typical statistics metadata</h4>
+ <p>It should be possible to add metainformation to statistics as
+ found in typical statistics or statistics catalogs.</p>
+
+ <p>Required by: UC1, UC2, UC3, UC4, UC5</p>
+ </section> <section>
+ <h4>Expressing hierarchies</h4>
+ <p>It should be possible to express hierarchies on Dimensions of
+ statistics. Some of this requirement is met by the work on ISO
+ Extension to SKOS [17].</p>
+
+ <p>Required by: UC3, UC9</p>
+ </section> <section>
+ <h4>Machine-readable and application-independent representation of
+ statistics</h4>
+ <p>It should be possible to add abstraction, multiple levels of
+ description, summaries of statistics.</p>
+
+ <p>Required by: UC1, UC2, UC3, UC4</p>
+ </section> <section>
+ <h4>Expressing aggregation relationships in Data Cube</h4>
+ <p>Based on [18]: It often comes up in statistical data that you
+ have some kind of 'overall' figure, which is then broken down into
+ parts. To Supposing I have a set of population observations, expressed
+ with the Data Cube vocabulary - something like (in pseudo-turtle):</p>
+ <pre>
+ex:obs1
+ sdmx:refArea <UK>;
+ sdmx:refPeriod "2011";
+ ex:population "60" .
+
+ex:obs2
+ sdmx:refArea <England>;
+ sdmx:refPeriod "2011";
+ ex:population "50" .
+
+ex:obs3
+ sdmx:refArea <Scotland>;
+ sdmx:refPeriod "2011";
+ ex:population "5" .
+
+ex:obs4
+ sdmx:refArea <Wales>;
+ sdmx:refPeriod "2011";
+ ex:population "3" .
+
+ex:obs5
+ sdmx:refArea <NorthernIreland>;
+ sdmx:refPeriod "2011";
+ ex:population "2" .
+
+
+ </pre>
+ <p>What is the best way (in the context of the RDF/Data Cube/SDMX
+ approach) to express that the values for the England/Scotland/Wales/
+ Northern Ireland ought to add up to the value for the UK and
+ constitute a more detailed breakdown of the overall UK figure? I might
+ also have population figures for France, Germany, EU27, etc...so it's
+ not as simple as just taking a qb:Slice where you fix the time period
+ and the measure.</p>
+ <p>Some of this requirement is met by the work on ISO Extension to
+ SKOS [19].</p>
+
+
+ <p>Required by: UC1, UC2, UC3, UC9</p>
+ </section> <section>
+ <h4>Scale - how to publish large amounts of statistical data</h4>
+ <p>Publishers that are restricted by the size of the statistics
+ they publish, shall have possibilities to reduce the size or remove
+ redundant information. Scalability issues can both arise with
+ peoples's effort and performance of applications.</p>
+
+ <p>Required by: UC1, UC2, UC3, UC4</p>
+ </section> <section>
+ <h4>Compliance-levels or criteria for well-formedness</h4>
+ <p>The formal RDF Data Cube vocabulary expresses few formal
+ semantic constraints. Furthermore, in RDF then omission of
+ otherwise-expected properties on resources does not lead to any formal
+ inconsistencies. However, to build reliable software to process Data
+ Cubes then data consumers need to know what assumptions they can make
+ about a dataset purporting to be a Data Cube.</p>
+ <p>What *well-formedness* criteria should Data Cube publishers
+ conform to? Specific areas which may need explicit clarification in
+ the well-formedness criteria include (but may not be limited to):</p>
+ <ul>
+ <li>use of abbreviated data layout based on attachment levels</li>
+ <li>use of qb:Slice when (completeness, requirements for an
+ explicit qb:SliceKey?)</li>
+ <li>avoiding mixing two approaches to handling multiple-measures
+ </li>
+ <li>optional triples (e.g. type triples)</li>
+ </ul>
+
+ <p>Required by all use cases.</p>
+ </section> <section>
+ <h4>Declaring relations between Cubes</h4>
+ <p>In some situations statistical data sets are used to derive
+ further datasets. Should Data Cube be able to explicitly convey these
+ relationships?</p>
+ <p>A simple specific use case is that the Welsh Assembly government
+ publishes a variety of population datasets broken down in different
+ ways. For many uses then population broken down by some category (e.g.
+ ethnicity) is expressed as a percentage. Separate datasets give the
+ actual counts per category and aggregate counts. In such cases it is
+ common to talk about the denominator (often DENOM) which is the
+ aggregate count against which the percentages can be interpreted.</p>
+ <p>Should Data Cube support explicit declaration of such
+ relationships either between separated qb:DataSets or between measures
+ with a single qb:DataSet (e.g. ex:populationCount and
+ ex:populationPercent)?</p>
+ <p>If so should that be scoped to simple, common relationships like
+ DENOM or allow expression of arbitrary mathematical relations?</p>
+ <p>Note that there has been some work towards this within the SDMX
+ community as indicated here:
+ http://groups.google.com/group/publishing-statistical-data/msg/b3fd023d8c33561d</p>
+
+ <p>Required by: UC6</p>
+ </section> </section> <section>
+ <h3>Consumption requirements</h3>
+
+ <section>
+ <h4>Finding statistical data</h4>
+ <p>Finding statistical data should be possible, perhaps through an
+ authoritative service</p>
+
+ <p>Required by: UC5</p>
+ </section> <section>
+ <h4>Retrival of fine grained statistics</h4>
+ <p>Query formulation and execution mechanisms. It should be
+ possible to use SPARQL to query for fine grained statistics.</p>
+
+ <p>Required by: UC1, UC2, UC3, UC4, UC5, UC6, UC7</p>
+ </section> <section>
+ <h4>Understanding - End user consumption of statistical data</h4>
+ <p>Must allow presentation, visualization .</p>
+
+ <p>Required by: UC7, UC8, UC9, UC10</p>
+ </section> <section>
+ <h4>Comparing and trusting statistics</h4>
+ <p>Must allow finding what's in common in the statistics of two or
+ more datasets. This requirement also deals with information quality -
+ assessing statistical datasets - and trust - making trust judgements
+ on statistical data.</p>
+
+ <p>Required by: UC5, UC6, UC9</p>
+ </section> <section>
+ <h4>Integration of statistics</h4>
+ <p>Interoperability - combining statistics produced by multiple
+ different systems. It should be possible to combine two statistics
+ that contain related data, and possibly were published independently.
+ It should be possible to implement value conversions.</p>
+
+ <p>Required by: UC1, UC3, UC4, UC7, UC9, UC10</p>
+ </section> <section>
+ <h4>Scale - how to consume large amounts of statistical data</h4>
+ <p>Consumers that want to access large amounts of statistical data
+ need guidance.</p>
+
+ <p>Required by: UC7, UC9</p>
+ </section> <section>
+ <h4>Common internal representation of statistics, to be exported
+ in other formats</h4>
+ <p>QB data should be possible to be transformed into data formats
+ such as XBRL which are required by certain institutions.</p>
+
+ <p>Required by: UC10</p>
+ </section> <section>
+ <h4>Dealing with imperfect statistics</h4>
+ <p>Imperfections - reasoning about statistical data that is not
+ complete or correct.</p>
+
+ <p>Required by: UC7, UC8, UC9, UC10</p>
+ </section> </section> </section>
+ <section class="appendix">
+ <h2>Acknowledgments</h2>
+ <p>The editors are very thankful for comments and suggestions ...</p>
+ </section>
+
+ <h2 id="references">References</h2>
+
+ <dl>
+ <dt id="ref-SDMX">[SMDX]</dt>
+ <dd>
+ SMDX - User Guide 2009, <a
+ href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>
+ </dd>
+
+ <dt id="ref-SDMX">[Fowler1997]</dt>
+ <dd>Fowler, Martin (1997). Analysis Patterns: Reusable Object
+ Models. Addison-Wesley. ISBN 0201895420.</dd>
+
+ <dt id="ref-QB">[QB]</dt>
+ <dd>
+ RDF Data Cube vocabulary, <a
+ href="http://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html">http://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html</a>
+ </dd>
+
+ <dt id="ref-OLAP">[OLAP]</dt>
+ <dd>
+ Online Analytical Processing Data Cubes, <a
+ href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>
+ </dd>
+
+ <dt id="ref-linked-data">[LOD]</dt>
+ <dd>
+ Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>
+ </dd>
+
+ <dt id="ref-rdf">[RDF]</dt>
+ <dd>
+ Resource Description Framework, <a href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</a>
+ </dd>
+
+ <dt id="ref-scovo">[SCOVO]</dt>
+ <dd>
+ The Statistical Core Vocabulary, <a
+ href="http://sw.joanneum.at/scovo/schema.html">http://sw.joanneum.at/scovo/schema.html</a>
+ <br /> SCOVO: Using Statistics on the Web of data, <a
+ href="http://sw-app.org/pub/eswc09-inuse-scovo.pdf">http://sw-app.org/pub/eswc09-inuse-scovo.pdf</a>
+ </dd>
+
+ <dt id="ref-skos">[SKOS]</dt>
+ <dd>
+ Simple Knowledge Organization System, <a
+ href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>
+ </dd>
+
+ <dt id="ref-cog">[COG]</dt>
+ <dd>
+ SDMX Content Oriented Guidelines, <a
+ href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>
+ </dd>
+
+ </dl>
+</body>
+</html>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/data-cube-ucr/data-cube-ucr-20120222/local-style.css Thu Feb 28 10:06:00 2013 -0500
@@ -0,0 +1,167 @@
+
+.ldhcode {
+margin: 0px;
+padding: 10px;
+background: #ffffee;
+border: 1px solid #ffff88;
+}
+.turtlecode {
+margin: 0px;
+padding: 10px;
+background: #eeffee;
+border: 1px solid #88dd88;
+}
+.fig {
+text-align: center;
+}
+.fig img {
+border-bottom: 1px solid #bebebe;
+padding: 20px;
+margin-top: 20px;
+}
+.fig div {
+padding: 5px;
+}
+.fig div span {
+font-weight: bold;
+}
+.xsec h3 {
+font-size: 16px;
+text-align: left;
+margin-bottom: 5px;
+font-weight: bold;
+color: black;
+}
+.bc {
+text-align: left;
+border: 1px solid #e0e0e0;
+background: #ffffff url("http://upload.wikimedia.org/wikipedia/commons/d/db/Crystal_Clear_mimetype_vcard.png") no-repeat right -16px;
+padding: 20px 50px 20px 10px;
+margin: 0px;
+margin-top: 0px;
+}
+
+.todo {
+border: 3px solid #ff0;
+margin: 0 0 0 20px;
+padding: 10px;
+}
+
+.issue {
+border: 3px solid #f30;
+margin: 0 0 0 20px;
+padding: 10px;
+}
+
+.responsible {
+border: 3px solid #6a6;
+margin: 0 0 0 20px;
+padding: 10px;
+}
+
+
+ol.prereq li {
+padding-bottom: 10px;
+}
+ul.checklist-toc {
+margin-left: 20px;
+width: 650px;
+}
+ul.checklist-toc li {
+margin: 5px;
+padding: 10px;
+border: 1px solid #8f8f8f;
+list-style: none;
+}
+ul.inline-opt {
+margin-left: 20px;
+}
+ul.inline-opt li {
+margin: 5px;
+padding: 10px;
+}
+dl.decl dd {
+padding-bottom: 1em;
+}
+dl.refs {
+margin: 10px;
+padding: 10px;
+}
+dl.refs dt {
+padding-bottom: 5px;
+}
+dl.refs dd {
+padding-bottom: 10px;
+margin-left: 15px;
+}
+dl.decl {
+border: 1px dashed black;
+padding: 10px;
+margin-left: 100px;
+margin-right: 100px;
+}
+dl.decl dt {
+padding-bottom: 5px;
+}
+dl.decl dd {
+padding-bottom: 10px;
+}
+dl tt {
+font-size: 110%;
+}
+table.example {
+border: 0px solid #9e9e9e;
+border-bottom: 0px;
+width: 100%;
+padding: 0px;
+margin-top: 20px;
+}
+table.example th {
+border-bottom: 1px solid #bebebe;
+border-top: 0px solid #bebebe;
+}
+table.example td {
+vertical-align: top;
+padding: 10px;
+padding-top: 10px;
+}
+table.example caption {
+border-top: 1px solid #bebebe;
+padding: 5px;
+caption-side: bottom;
+margin-bottom: 30px;
+}
+table.example caption span {
+font-weight: bold;
+}
+table.xtab {
+width: 100%;
+padding: 2px;
+background: #d0d0d0;
+}
+table.xtab th {
+border: 0px;
+border-bottom: 1px solid #fefefe;
+text-align: left;
+padding: 2px;
+padding-bottom: 1px;
+}
+
+.diff { font-weight:bold; color:#0a3; }
+
+.editorsnote::before {
+ content: "Editor's Note";
+ display: block;
+ width: 150px;
+ background: #ff0;
+ color: #fff;
+ margin: -1.5em 0 0.5em 0;
+ font-weight: bold;
+ border: 1px solid #ff0;
+ padding: 3px 1em;
+}
+.editorsnote {
+ margin: 1em 0em 1em 1em;
+ padding: 1em;
+ border: 2px solid #ff0;
+}
\ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/data-cube-ucr/data-cube-ucr-20120222/respec-config.js Thu Feb 28 10:06:00 2013 -0500
@@ -0,0 +1,96 @@
+var respecConfig = {
+ // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
+ specStatus: "ED",
+ //copyrightStart: "2010",
+
+ // the specification's short name, as in http://www.w3.org/TR/short-name/
+ shortName: "data-cube-ucr",
+ //subtitle: "",
+ // if you wish the publication date to be other than today, set this
+ publishDate: "2012-02-22",
+
+ // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
+ // and its maturity status
+ //previousPublishDate: "2012-02-22",
+ //previousMaturity: "ED",
+ //previousDiffURI: "http://dvcs.w3.org/hg/gld/bp/",
+ //diffTool: "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
+
+ // if there a publicly available Editor's Draft, this is the link
+ //edDraftURI: "http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-20120222/index.html",
+
+ // if this is a LCWD, uncomment and set the end of its review period
+ // lcEnd: "2009-08-05",
+
+ // if you want to have extra CSS, append them to this list
+ // it is recommended that the respec.css stylesheet be kept
+ extraCSS: [
+ "http://dev.w3.org/2009/dap/ReSpec.js/css/respec.css"
+ ],
+
+ // editors, add as many as you like
+ // only "name" is required
+ editors: [
+ { name: "Benedikt Kämpgen", url: "http://www.aifb.kit.edu/web/Benedikt_K%C3%A4mpgen/en", company: "FZI Karlsruhe", companyURL: "http://www.fzi.de/index.php/en" },
+ { name: "Richard Cyganiak", url: "http://richard.cyganiak.de/", company: "DERI, NUI Galway", companyURL: "http://www.deri.ie/" },
+ ],
+
+ // authors, add as many as you like.
+ // This is optional, uncomment if you have authors as well as editors.
+ // only "name" is required. Same format as editors.
+
+ //authors: [],
+
+ // name of the WG
+ wg: "Government Linked Data Working Group",
+
+ // URI of the public WG page
+ wgURI: "http://www.w3.org/2011/gld/",
+
+ // name of the public mailing to which comments are due
+ wgPublicList: "public-gld-comments",
+
+ // URI of the patent status for this WG, for Rec-track documents
+ // !!!! IMPORTANT !!!!
+ // This is important for Rec-track documents, do not copy a patent URI from a random
+ // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
+ // Team Contact.
+ wgPatentURI: "",
+ maxTocLevel: 3,
+ preProcess: [ preProc ]
+ //alternateFormats: [ {uri: "diff-20110507.html", label: "diff to previous version"} ],
+};
+
+function updateExample(doc, content) {
+ // perform transformations to make it render and prettier
+ content = content.replace(/<!--/, '');
+ content = content.replace(/-->/, '');
+ content = doc._esc(content);
+ content = content.replace(/\*\*\*\*([^*]*)\*\*\*\*/g, '<span class="diff">$1</span>') ;
+ return content ;
+}
+
+function updateDTD(doc, content) {
+ // perform transformations to
+ // make it render and prettier
+ content = '<pre class="dtd">' + doc._esc(content) + '</pre>';
+ content = content.replace(/!ENTITY % ([^ \t\r\n]*)/g, '!ENTITY <span class="entity">% $1</span>');
+ content = content.replace(/!ELEMENT ([^ \t$]*)/mg, '!ELEMENT <span class="element">$1</span>');
+ return content;
+}
+
+function updateSchema(doc, content) {
+ // perform transformations to
+ // make it render and prettier
+ content = '<pre class="dtd">' + doc._esc(content) + '</pre>';
+ content = content.replace(/<xs:element\s+name="([^&]*)"/g, '<xs:element name="<span class="element" id="schema_element_$1">$1</span>"') ;
+ return content;
+}
+
+function updateTTL(doc, content) {
+ // perform transformations to
+ // make it render and prettier
+ content = '<pre class="sh_sourceCode">' + doc._esc(content) + '</pre>';
+ content = content.replace(/@prefix/g, '<span class="sh_keyword">@prefix</span>');
+ return content;
+}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/data-cube-ucr/data-cube-ucr-20120222/respec-ref.js Thu Feb 28 10:06:00 2013 -0500
@@ -0,0 +1,127 @@
+var preProc = {
+ apply: function(c) {
+ // extend the bibliography entries
+ berjon.biblio["MICRODATA"] = "<cite><a href=\"http://www.w3.org/TR/microdata/\">Microdata</a></cite> Ian Hickson; et al. 04 March 2010. W3C Working Draft. URL: http://www.w3.org/TR/microdata/ ";
+ berjon.biblio["XHTML-RDFA"] = "<cite><a href=\"http://www.w3.org/TR/xhtml-rdfa/\">XHTML+RDFa</a></cite> Manu Sporny; et al. 31 March 2011. W3C Working Draft. URL: http://www.w3.org/TR/xhtml-rdfa/ ";
+ berjon.biblio["HTML-RDFA"] = "<cite><a href=\"http://dev.w3.org/html5/rdfa/\">HTML+RDFa</a></cite> Manu Sporny; et al. 24 May 2011. W3C Working Draft. URL: http://dev.w3.org/html5/rdfa/ ";
+ berjon.biblio["HOWTO-LODP"] = "<cite><a href=\"http://linkeddata.org/docs/how-to-publish\">How to Publish Linked Data on the Web</a></cite>, C. Bizer, R. Cyganiak, and Tom Heath, Community Tutorial 17 July 2008. URL: http://linkeddata.org/docs/how-to-publish";
+ berjon.biblio["COOL-SWURIS"] = "<cite><a href=\"http://www.w3.org/TR/cooluris/\">Cool URIs for the Semantic Web</a></cite>, L. Sauermann and R. Cyganiak, W3C Interest Group Note 03 December 2008. URL: http://www.w3.org/TR/cooluris/";
+ berjon.biblio["VOID-GUIDE"] = "<cite><a href=\"http://www.w3.org/TR/void/\">Describing Linked Datasets with the VoID Vocabulary</a></cite>, K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao, W3C Interest Group Note 03 March 2011. URL: http://www.w3.org/TR/void/";
+ berjon.biblio["RDFA-CORE-PROFILE"] = "<cite><a href=\"http://www.w3.org/profile/rdfa-1.1\">RDFa Core Default Profile</a></cite>, I. Herman, W3C RDF Web Applications Working Group 02 June 2011. URL: http://www.w3.org/profile/rdfa-1.1";
+ berjon.biblio["XHTML-RDFA-PROFILE"] = "<cite><a href=\"http://www.w3.org/profile/html-rdfa-1.1\">HTML+RDFa Core Default Profile</a></cite>, I. Herman, W3C RDF Web Applications Working Group 24 May 2011. URL: http://www.w3.org/profile/html-rdfa-1.1";
+ berjon.biblio["RFC2616"] = "<cite><a href=\"http://www.w3.org/Protocols/rfc2616/rfc2616.html\">Hypertext Transfer Protocol -- HTTP/1.1</a></cite>, R. Fielding; et al. June 1999. Internet RFC 2616. URL: http://www.w3.org/Protocols/rfc2616/rfc2616.html."
+
+ // process the document before anything else is done
+ var refs = document.querySelectorAll('adef') ;
+ for (var i = 0; i < refs.length; i++) {
+ var item = refs[i];
+ var p = item.parentNode ;
+ var con = item.innerHTML ;
+ var sp = document.createElement( 'dfn' ) ;
+ var tit = item.getAttribute('title') ;
+ if (!tit) {
+ tit = con;
+ }
+ sp.className = 'adef' ;
+ sp.title=tit ;
+ sp.innerHTML = con ;
+ p.replaceChild(sp, item) ;
+ }
+ refs = document.querySelectorAll('aref') ;
+ for (var i = 0; i < refs.length; i++) {
+ var item = refs[i];
+ var p = item.parentNode ;
+ var con = item.innerHTML ;
+ var sp = document.createElement( 'a' ) ;
+ sp.className = 'aref' ;
+ sp.setAttribute('title', con);
+ sp.innerHTML = '@'+con ;
+ p.replaceChild(sp, item) ;
+ }
+ // local datatype references
+ refs = document.querySelectorAll('ldtref') ;
+ for (var i = 0; i < refs.length; i++) {
+ var item = refs[i];
+ if (!item) continue ;
+ var p = item.parentNode ;
+ var con = item.innerHTML ;
+ var ref = item.getAttribute('title') ;
+ if (!ref) {
+ ref = item.textContent ;
+ }
+ if (ref) {
+ ref = ref.replace(/\n/g, '_') ;
+ ref = ref.replace(/\s+/g, '_') ;
+ }
+ var sp = document.createElement( 'a' ) ;
+ sp.className = 'datatype';
+ sp.title = ref ;
+ sp.innerHTML = con ;
+ p.replaceChild(sp, item) ;
+ }
+ // external datatype references
+ refs = document.querySelectorAll('dtref') ;
+ for (var i = 0; i < refs.length; i++) {
+ var item = refs[i];
+ if (!item) continue ;
+ var p = item.parentNode ;
+ var con = item.innerHTML ;
+ var ref = item.getAttribute('title') ;
+ if (!ref) {
+ ref = item.textContent ;
+ }
+ if (ref) {
+ ref = ref.replace(/\n/g, '_') ;
+ ref = ref.replace(/\s+/g, '_') ;
+ }
+ var sp = document.createElement( 'a' ) ;
+ sp.className = 'externalDFN';
+ sp.title = ref ;
+ sp.innerHTML = con ;
+ p.replaceChild(sp, item) ;
+ }
+ // now do terms
+ refs = document.querySelectorAll('tdef') ;
+ for (var i = 0; i < refs.length; i++) {
+ var item = refs[i];
+ if (!item) continue ;
+ var p = item.parentNode ;
+ var con = item.innerHTML ;
+ var ref = item.getAttribute('title') ;
+ if (!ref) {
+ ref = item.textContent ;
+ }
+ if (ref) {
+ ref = ref.replace(/\n/g, '_') ;
+ ref = ref.replace(/\s+/g, '_') ;
+ }
+ var sp = document.createElement( 'dfn' ) ;
+ sp.title = ref ;
+ sp.innerHTML = con ;
+ p.replaceChild(sp, item) ;
+ }
+ // now term references
+ refs = document.querySelectorAll('tref') ;
+ for (var i = 0; i < refs.length; i++) {
+ var item = refs[i];
+ if (!item) continue ;
+ var p = item.parentNode ;
+ var con = item.innerHTML ;
+ var ref = item.getAttribute('title') ;
+ if (!ref) {
+ ref = item.textContent ;
+ }
+ if (ref) {
+ ref = ref.replace(/\n/g, '_') ;
+ ref = ref.replace(/\s+/g, '_') ;
+ }
+
+ var sp = document.createElement( 'a' ) ;
+ var id = item.textContent ;
+ sp.className = 'tref' ;
+ sp.title = ref ;
+ sp.innerHTML = con ;
+ p.replaceChild(sp, item) ;
+ }
+ }
+ } ;
\ No newline at end of file
Binary file data-cube-ucr/data-cube-ucr-20130227/figures/Eurostat_GPDE_Example.png has changed
Binary file data-cube-ucr/data-cube-ucr-20130227/figures/FIOS_example.PNG has changed
Binary file data-cube-ucr/data-cube-ucr-20130227/figures/Level_above_msl_3_locations.png has changed
Binary file data-cube-ucr/data-cube-ucr-20130227/figures/Relationships_Statistical_Data_Cogs_Example.png has changed
Binary file data-cube-ucr/data-cube-ucr-20130227/figures/SDMX_Web_Dissemination_Use_Case.png has changed
Binary file data-cube-ucr/data-cube-ucr-20130227/figures/modeling_quantity_measurement_observation.png has changed
Binary file data-cube-ucr/data-cube-ucr-20130227/figures/pivot_analysis_measurements.PNG has changed
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/data-cube-ucr/data-cube-ucr-20130227/index.html Thu Feb 28 10:06:00 2013 -0500
@@ -0,0 +1,709 @@
+<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN'
+ 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
+<html lang="en-US" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
+<head>
+<meta name="generator" content="HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+<title>Use Cases and Requirements for the Data Cube Vocabulary</title>
+
+<script src="respec-ref.js" type="text/javascript">
+</script>
+<script src="respec-config.js" type="text/javascript">
+</script>
+<link rel="stylesheet" type="text/css" href="local-style.css" />
+<style type="text/css">
+/*<![CDATA[*/
+/*****************************************************************
+ * ReSpec 3 CSS
+ * Robin Berjon - http://berjon.com/
+ *****************************************************************/
+
+/* --- INLINES --- */
+em.rfc2119 {
+ text-transform: lowercase;
+ font-variant: small-caps;
+ font-style: normal;
+ color: #900;
+}
+
+h1 acronym, h2 acronym, h3 acronym, h4 acronym, h5 acronym, h6 acronym, a acronym,
+h1 abbr, h2 abbr, h3 abbr, h4 abbr, h5 abbr, h6 abbr, a abbr {
+ border: none;
+}
+
+dfn {
+ font-weight: bold;
+}
+
+a.internalDFN {
+ color: inherit;
+ border-bottom: 1px solid #99c;
+ text-decoration: none;
+}
+
+a.externalDFN {
+ color: inherit;
+ border-bottom: 1px dotted #ccc;
+ text-decoration: none;
+}
+
+a.bibref {
+ text-decoration: none;
+}
+
+cite .bibref {
+ font-style: normal;
+}
+
+code {
+ color: #ff4500;
+}
+
+
+/* --- --- */
+ol.algorithm { counter-reset:numsection; list-style-type: none; }
+ol.algorithm li { margin: 0.5em 0; }
+ol.algorithm li:before { font-weight: bold; counter-increment: numsection; content: counters(numsection, ".") ") "; }
+
+/* --- TOC --- */
+.toc a, .tof a {
+ text-decoration: none;
+}
+
+a .secno, a .figno {
+ color: #000;
+}
+
+ul.tof, ol.tof {
+ list-style: none outside none;
+}
+
+.caption {
+ margin-top: 0.5em;
+ font-style: italic;
+}
+
+/* --- TABLE --- */
+table.simple {
+ border-spacing: 0;
+ border-collapse: collapse;
+ border-bottom: 3px solid #005a9c;
+}
+
+.simple th {
+ background: #005a9c;
+ color: #fff;
+ padding: 3px 5px;
+ text-align: left;
+}
+
+.simple th[scope="row"] {
+ background: inherit;
+ color: inherit;
+ border-top: 1px solid #ddd;
+}
+
+.simple td {
+ padding: 3px 10px;
+ border-top: 1px solid #ddd;
+}
+
+.simple tr:nth-child(even) {
+ background: #f0f6ff;
+}
+
+/* --- DL --- */
+.section dd > p:first-child {
+ margin-top: 0;
+}
+
+.section dd > p:last-child {
+ margin-bottom: 0;
+}
+
+.section dd {
+ margin-bottom: 1em;
+}
+
+.section dl.attrs dd, .section dl.eldef dd {
+ margin-bottom: 0;
+}
+/*]]>*/
+</style>
+<link rel="stylesheet" href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE" type="text/css" /><!--[if lt IE 9]><script src='http://www.w3.org/2008/site/js/html5shiv.js'></script><![endif]-->
+<style type="text/css">
+/*<![CDATA[*/
+ li.c3 {list-style: none; display: inline}
+ span.c2 {font-size: 10pt}
+ p.c1 {text-align: center}
+/*]]>*/
+</style>
+</head>
+<body>
+<div class="head">
+<p><a href="http://www.w3.org/"><img width="72" height="48" src="http://www.w3.org/Icons/w3c_home" alt="W3C" /></a></p>
+<h1 class="title" id="title">Use Cases and Requirements for the Data Cube Vocabulary</h1>
+<h2 id="w3c-working-group-note-27-february-2013"><abbr title="World Wide Web Consortium">W3C</abbr> Working Group Note 27 February 2013</h2>
+<dl>
+<dt>This version:</dt>
+<dd><a href="http://www.w3.org/TR/2013/NOTE-data-cube-ucr-20130227/">http://www.w3.org/TR/2013/NOTE-data-cube-ucr-20130227/</a></dd>
+<dt>Latest published version:</dt>
+<dd><a href="http://www.w3.org/TR/data-cube-ucr/">http://www.w3.org/TR/data-cube-ucr/</a></dd>
+<dt>Latest editor's draft:</dt>
+<dd><a href="http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-20120222/index.html">http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-20120222/index.html</a></dd>
+<dt>Previous version:</dt>
+<dd><a href="http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-20120222/index.html">http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-20120222/index.html</a></dd>
+<dt>Editors:</dt>
+<dd><a href="http://www.aifb.kit.edu/web/Benedikt_K%C3%A4mpgen/en">Benedikt Kämpgen</a>, <a href="http://www.fzi.de/index.php/en">FZI Karlsruhe</a></dd>
+<dd><a href="http://richard.cyganiak.de/">Richard Cyganiak</a>, <a href="http://www.deri.ie/">DERI, NUI Galway</a></dd>
+</dl>
+<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2013 <a href="http://www.w3.org/"><abbr title="World Wide Web Consortium">W3C</abbr></a><sup>®</sup> (<a href="http://www.csail.mit.edu/"><abbr title="Massachusetts Institute of Technology">MIT</abbr></a>, <a href="http://www.ercim.eu/"><abbr title="European Research Consortium for Informatics and Mathematics">ERCIM</abbr></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. <abbr title="World Wide Web Consortium">W3C</abbr> <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>
+<hr /></div>
+<h2>Abstract</h2>
+<p>Many national, regional and local governments, as well as other organisations in- and outside of the public sector, collect numeric data and aggregate this data into statistics. There is a need to publish these statistics in a standardised, machine-readable way on the web, so that they can be freely integrated and reused in consuming applications.</p>
+<p>In this document, the <a href="http://www.w3.org/2011/gld/"><abbr title="World Wide Web Consortium">W3C</abbr> Government Linked Data Working Group</a> presents use cases and requirements supporting a recommendation of the RDF Data Cube Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>]. The group obtained use cases from existing deployments of and experiences with an earlier version of the data cube vocabulary [<cite><a href="#ref-QB-2010">QB-2010</a></cite>]. The group also describes a set of requirements derived from the use cases and to be considered in the recommendation.</p>
+<h2>Status of This Document</h2>
+<p><em>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current <abbr title="World Wide Web Consortium">W3C</abbr> publications and the latest revision of this technical report can be found in the <a href="http://www.w3.org/TR/"><abbr title="World Wide Web Consortium">W3C</abbr> technical reports index</a> at http://www.w3.org/TR/.</em></p>
+<p>This document is an editorial update to an Editor's Draft of the "Use Cases and Requirements for the Data Cube Vocabulary" developed by the <a href="http://www.w3.org/2011/gld/"><abbr title="World Wide Web Consortium">W3C</abbr> Government Linked Data Working Group</a>.</p>
+<p>This document was published by the <a href="http://www.w3.org/2011/gld/">Government Linked Data Working Group</a> as a Working Group Note. If you wish to make comments regarding this document, please send them to <a href="mailto:public-gld-comments@w3.org">public-gld-comments@w3.org</a> (<a href="mailto:public-gld-comments-request@w3.org?subject=subscribe">subscribe</a>, <a href="http://lists.w3.org/Archives/Public/public-gld-comments/">archives</a>). All comments are welcome.</p>
+<p>Publication as a Working Group Note does not imply endorsement by the <abbr title="World Wide Web Consortium">W3C</abbr> Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>
+<p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 <abbr title="World Wide Web Consortium">W3C</abbr> Patent Policy</a>. <abbr title="World Wide Web Consortium">W3C</abbr> maintains a <a href="" rel="disclosure">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the <abbr title="World Wide Web Consortium">W3C</abbr> Patent Policy</a>.</p>
+<h2 id="introductory" class="introductory">Table of Contents</h2>
+<ul class="toc">
+<li class="tocline"><a href="#introduction" class="tocxref"><span class="secno">1.</span> Introduction</a>
+<ul class="toc">
+<li class="tocline"><a href="#describingstatistics" class="tocxref"><span class="secno">1.1</span> Describing statistics</a></li>
+</ul>
+</li>
+<li class="tocline"><a href="#terminology" class="tocxref"><span class="secno">2.</span> Terminology</a></li>
+<li class="tocline"><a href="#usecases" class="tocxref"><span class="secno">3.</span> Use cases</a>
+<ul class="toc">
+<li class="tocline"><a href="#SDMXWebDisseminationUseCase" class="tocxref"><span class="secno">3.1</span> SDMX Web Dissemination Use Case</a></li>
+<li class="tocline"><a href="#UKgovernmentfinancialdatafromCombinedOnlineInformationSystem" class="tocxref"><span class="secno">3.2</span> Publisher Use Case: UK government financial data from Combined Online Information System (COINS)</a></li>
+<li class="tocline"><a href="#PublishingExcelSpreadsheetsasLinkedData" class="tocxref"><span class="secno">3.3</span> Publisher Use Case: Publishing Excel Spreadsheets as Linked Data</a></li>
+<li class="tocline"><a href="#PublishinghierarchicallystructureddatafromStatsWalesandOpenDataCommunities" class="tocxref"><span class="secno">3.4</span> Publisher Use Case: Publishing hierarchically structured data from StatsWales and Open Data Communities</a></li>
+<li class="tocline"><a href="#PublishingslicesofdataaboutUKBathingWaterQuality" class="tocxref"><span class="secno">3.5</span> Publisher Use Case: Publishing slices of data about UK Bathing Water Quality</a></li>
+<li class="tocline"><a href="#EurostatSDMXasLinkedData" class="tocxref"><span class="secno">3.6</span> Publisher Use Case: Eurostat SDMX as Linked Data</a></li>
+<li class="tocline"><a href="#Representingrelationshipsbetweenstatisticaldata" class="tocxref"><span class="secno">3.7</span> Publisher Use Case: Representing relationships between statistical data</a></li>
+<li class="tocline"><a href="#Simplechartvisualisationsofpublishedstatisticaldata" class="tocxref"><span class="secno">3.8</span> Consumer Use Case: Simple chart visualisations of (integrated) published statistical data</a></li>
+<li class="tocline"><a href="#VisualisingpublishedstatisticaldatainGooglePublicDataExplorer" class="tocxref"><span class="secno">3.9</span> Consumer Use Case: Visualising published statistical data in Google Public Data Explorer</a></li>
+<li class="tocline"><a href="#AnalysingpublishedstatisticaldatawithcommonOLAPsystems" class="tocxref"><span class="secno">3.10</span> Consumer Use Case: Analysing published statistical data with common OLAP systems</a></li>
+<li class="tocline"><a href="#Registeringpublishedstatisticaldataindatacatalogs" class="tocxref"><span class="secno">3.11</span> Registry Use Case: Registering published statistical data in data catalogs</a></li>
+</ul>
+</li>
+<li class="tocline"><a href="#requirements" class="tocxref"><span class="secno">4.</span> Requirements</a>
+<ul class="toc">
+<li class="tocline"><a href="#VocabularyshouldbuildupontheSDMXinformationmodel" class="tocxref"><span class="secno">4.1</span> Vocabulary should build upon the SDMX information model</a></li>
+<li class="tocline"><a href="#Vocabularyshouldclarifytheuseofsubsetsofobservations" class="tocxref"><span class="secno">4.2</span> Vocabulary should clarify the use of subsets of observations</a></li>
+<li class="tocline"><a href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists" class="tocxref"><span class="secno">4.3</span> Vocabulary should recommend a mechanism to support hierarchical code lists</a></li>
+<li class="tocline"><a href="#VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements" class="tocxref"><span class="secno">4.4</span> Vocabulary should define relationship to ISO19156 - Observations & Measurements</a></li>
+<li class="tocline"><a href="#Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions" class="tocxref"><span class="secno">4.5</span> There should be a recommended mechanism to allow for publication of aggregates which cross multiple dimensions</a></li>
+<li class="tocline"><a href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes" class="tocxref"><span class="secno">4.6</span> There should be a recommended way of declaring relations between cubes</a></li>
+<li class="tocline"><a href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata" class="tocxref"><span class="secno">4.7</span> There should be criteria for well-formedness and assumptions consumers can make about published data</a></li>
+<li class="tocline"><a href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata" class="tocxref"><span class="secno">4.8</span> There should be mechanisms and recommendations regarding publication and consumption of large amounts of statistical data</a></li>
+<li class="tocline"><a href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata" class="tocxref"><span class="secno">4.9</span> There should be a recommended way to communicate the availability of published statistical data to external parties and to allow automatic discovery of statistical data</a></li>
+</ul>
+</li>
+<li class="tocline"><a href="#acknowledgements" class="tocxref"><span class="secno">A.</span> Acknowledgements</a></li>
+</ul>
+<!--OddPage-->
+<h2 id="introduction"><span class="secno">1.</span> Introduction</h2>
+The aim of this document is to present concrete use cases and requirements for a vocabulary to publish statistics as Linked Data. An earlier version of the data cube vocabulary [<cite><a href="#ref-QB-2010">QB-2010</a></cite>] has been existing for some time and has proven applicable in <a href="http://wiki.planet-data.eu/web/Datasets">several deployments</a>. The <a href="http://www.w3.org/2011/gld/"><abbr title="World Wide Web Consortium">W3C</abbr> Government Linked Data Working Group</a> intends to transform the data cube vocabulary into a <abbr title="World Wide Web Consortium">W3C</abbr> recommendation of the RDF Data Cube Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>]. This document describes use cases and requirements derived from existing data cube deployments in order to document and illustrate design decisions that have driven the work.
+<p>The rest of this document is structured as follows. We will first give a short introduction of the specificities of modelling statistics. Then, we will describe use cases that have been derived from existing deployments or feedback to the earlier data cube vocabulary version. In particular, we describe possible benefits and challenges of use cases. Afterwards, we will describe concrete requirements that were derived from those use cases and that have been taken into account for the specification.</p>
+<p>We use the name data cube vocabulary throughout the document when referring to the vocabulary.</p>
+<h3 id="describingstatistics"><span class="secno">1.1</span> Describing statistics</h3>
+<p>In the following, we describe the challenge of an RDF vocabulary for publishing statistics as Linked Data.</p>
+<p>Describing statistics - collected and aggregated numeric data - is challenging for the following reasons:</p>
+<ul>
+<li>Representing statistics requires more complex modeling as discussed by Martin Fowler [<cite><a href="#ref-FOWLER97">FOWLER97</a></cite>]: Recording a statistic simply as an attribute to an object (e.g., the fact that a person weighs 185 pounds) fails with representing important concepts such as quantity, measurement, and unit. Instead, a statistic is modeled as a distinguishable object, an observation.</li>
+<li>The object describes an observation of a value, e.g., a numeric value (e.g., 185) in case of a measurement or a categorical value (e.g., "blood group A") in case of a categorical observation.</li>
+<li>To allow correct interpretation of the value, the object can be further described by "dimensions", e.g., the specific phenomenon "weight" observed and the unit "pounds". Given background information, e.g., arithmetical and comparative operations, humans and machines can appropriately visualize such observations or have conversions between different quantities.</li>
+<li>Also, an observation separates a value from the actual event at which it was collected; for instance, one can describe the "Person" that collected the observation and the "Time" the observation was collected.</li>
+</ul>
+The following figure illustrates this specificitiy of modelling in a class diagram:
+<p class="caption">Figure: Illustration of specificities in modelling of a statistic</p>
+<p class="c1"><img alt="specificity of modelling a statistic" src="./figures/modeling_quantity_measurement_observation.png" /></p>
+<p>The Statistical Data and Metadata eXchange [<cite><a href="#ref-SDMX">SDMX</a></cite>] - the ISO standard for exchanging and sharing of statistical data and metadata among organisations - uses "multidimensional model" that caters for the specificity of modelling statistics. It allows to describe statistics as observations. Observations exhibit values (Measures) that depend on dimensions (Members of Dimensions).</p>
+<p>Since the SDMX standard has proven applicable in many contexts, the vocabulary adopts the multidimensional model that underlies SDMX and will be compatible to SDMX.</p>
+<!--OddPage-->
+<h2 id="terminology"><span class="secno">2.</span> Terminology</h2>
+<p><dfn id="dfn-statistics">Statistics</dfn> is the <a href="http://en.wikipedia.org/wiki/Statistics">study</a> of the collection, organisation, analysis, and interpretation of data. Statistics comprise statistical data.</p>
+<p>The basic structure of <dfn id="dfn-statistical-data">statistical data</dfn> is a multidimensional table (also called a data cube) [<cite><a href="#ref-SDMX">SDMX</a></cite>], i.e., a set of observed values organized along a group of dimensions, together with associated metadata. If aggregated we refer to statistical data as "macro-data" whereas if not, we refer to "micro-data".</p>
+<p>Statistical data can be collected in a <dfn id="dfn-dataset">dataset</dfn> , typically published and maintained by an organisation [<cite><a href="#ref-SDMX">SDMX</a></cite>]. The dataset contains metadata, e.g., about the time of collection and publication or about the maintaining and publishing organisation.</p>
+<p><dfn id="dfn-source-data">Source data</dfn> is data from datastores such as RDBs or spreadsheets that acts as a source for the Linked Data publishing process.</p>
+<p><dfn id="dfn-metadata">Metadata</dfn> about statistics defines the data structure and give contextual information about the statistics.</p>
+<p>A format is <dfn id="dfn-machine-readable">machine-readable</dfn> if it is amenable to automated processing by a machine, as opposed to presentation to a human user.</p>
+<p>A <dfn id="dfn-publisher">publisher</dfn> is a person or organisation that exposes source data as Linked Data on the Web.</p>
+<p>A <dfn id="dfn-consumer">consumer</dfn> is a person or agent that uses Linked Data from the Web.</p>
+<p>A <dfn id="dfn-registry">registry</dfn> collects metadata about statistical data in a registration fashion.</p>
+<!--OddPage-->
+<h2 id="usecases"><span class="secno">3.</span> Use cases</h2>
+<p>This section presents scenarios that are enabled by the existence of a standard vocabulary for the representation of statistics as Linked Data.</p>
+<h3 id="SDMXWebDisseminationUseCase"><span class="secno">3.1</span> SDMX Web Dissemination Use Case</h3>
+<p><span class="c2">(Use case taken from SDMX Web Dissemination Use Case [<cite><a href="#ref-SDMX-21">SDMX 2.1</a></cite>])</span></p>
+<p>Since we have adopted the multidimensional model that underlies SDMX, we also adopt the "Web Dissemination Use Case" which is the prime use case for SDMX since it is an increasing popular use of SDMX and enables organisations to build a self-updating dissemination system.</p>
+<p>The Web Dissemination Use Case contains three actors, a structural metadata web service (registry) that collects metadata about statistical data in a registration fashion, a data web service (publisher) that publishes statistical data and its metadata as registered in the structural metadata web service, and a data consumption application (consumer) that first discovers data from the registry, then queries data from the corresponding publisher of selected data, and then visualises the data.</p>
+<p>In the following, we illustrate the processes from this use case in a flow diagram by SDMX and describe what activities are enabled in this use case by having statistics described in a machine-readable format.</p>
+<p class="caption">Figure: Process flow diagram by SDMX [<cite><a href="#ref-SDMX-21">SDMX 2.1</a></cite>]</p>
+<p class="c1"><img alt="SDMX Web Dissemination Use Case" src="./figures/SDMX_Web_Dissemination_Use_Case.png" width="1000px" /></p>
+<p>Benefits:</p>
+<ul>
+<li>A structural metadata source (registry) can collect metadata about statistical data.</li>
+<li>A data web service (publisher) can register statistical data in a registry, and can provide statistical data from a database and metadata from a metadata repository for consumers. For that, the publisher creates database tables (see 1 in figure), and loads statistical data in a database and metadata in a metadata repository.</li>
+<li>A consumer can discover data from a registry (3) and automatically can create a query to the publisher for selected statistical data (4).</li>
+<li>The publisher can translate the query to a query to its database (5) as well as metadata repository (6) and return the statistical data and metadata.</li>
+<li>The consumer can visualise the returned statistical data and metadata.</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">There should be a recommended way to communicate the availability of published statistical data to external parties and to allow automatic discovery of statistical data</a></li>
+</ul>
+<p>The SDMX Web Dissemination Use Case can be concretised by several sub-use cases, detailed in the following sections.</p>
+<h3 id="UKgovernmentfinancialdatafromCombinedOnlineInformationSystem"><span class="secno">3.2</span> Publisher Use Case: UK government financial data from Combined Online Information System (COINS)</h3>
+<p><span class="c2">(This use case has been summarised from Ian Dickinson et al. [<cite><a href="#ref-COINS">COINS</a></cite>])</span></p>
+<p>More and more organisations want to publish statistics on the web, for reasons such as increasing transparency and trust. Although in the ideal case, published data can be understood by both humans and machines, data often is simply published as CSV, PDF, XSL etc., lacking elaborate metadata, which makes free usage and analysis difficult.</p>
+<p>Therefore, the goal in this use case is to use a machine-readable and application-independent description of common statistics with use of open standards, to foster usage and innovation on the published data.</p>
+<p>In the "COINS as Linked Data" project [<cite><a href="#ref-COINS">COINS</a></cite>], the Combined Online Information System (COINS) shall be published using a standard Linked Data vocabulary.</p>
+<p>Via the Combined Online Information System (COINS), <a href="http://www.hm-treasury.gov.uk/psr_coins_data.htm">HM Treasury</a>, the principal custodian of financial data for the UK government, releases previously restricted financial information about government spendings.</p>
+<p>According to the COINS as Linked Data project, the reason for publishing COINS as Linked Data are threefold:</p>
+<ul>
+<li>
+<ul>
+<li>using open standard representation makes it easier to work with the data with available technologies and promises innovative third-party tools and usages</li>
+<li>individual transactions and groups of transactions are given an identity, and so can be referenced by web address (URL), to allow them to be discussed, annotated, or listed as source data for articles or visualizations</li>
+<li>cross-links between linked-data datasets allow for much richer exploration of related datasets</li>
+</ul>
+</li>
+<li>The COINS data has a hypercube structure. It describes financial transactions using seven independent dimensions (time, data-type, department etc.) and one dependent measure (value). Also, it allows thirty-three attributes that may further describe each transaction. For further information, see the "COINS as Linked Data" project website.</li>
+<li>COINS is an example of one of the more complex statistical datasets being publishing via data.gov.uk.</li>
+<li>Part of the complexity of COINS arises from the nature of the data being released.</li>
+<li>The published COINS datasets cover expenditure related to five different years (2005-06 to 2009-10). The actual COINS database at HM Treasury is updated daily. In principle at least, multiple snapshots of the COINS data could be released through the year.</li>
+</ul>
+<p>The COINS use case leads to the following challenges:</p>
+<ul>
+<li>The actual data and its hypercube structure are to be represented separately so that an application first can examine the structure before deciding to download the actual data, i.e., the transactions. The hypercube structure also defines for each dimension and attribute a range of permitted values that are to be represented.</li>
+<li>An access or query interface to the COINS data, e.g., via a SPARQL endpoint or the linked data API, is planned. Queries that are expected to be interesting are: "spending for one department", "total spending by department", "retrieving all data for a given observation",</li>
+<li>Also, the publisher favours a representation that is both as self-descriptive as possible, i.e., others can link to and download fully-described individual transactions and as compact as possible, i.e., information is not unnecessarily repeated.</li>
+<li>Moreover, the publisher is thinking about the possible benefit of publishing slices of the data, e.g., datasets that fix all dimensions but the time dimension. For instance, such slices could be particularly interesting for visualisations or comments. However, depending on the number of Dimensions, the number of possible slices can become large which makes it difficult to select all interesting slices.</li>
+<li>An important benefit of linked data is that we are able to annotate data, at a fine-grained level of detail, to record information about the data itself. This includes where it came from - the provenance of the data - but could include annotations from reviewers, links to other useful resources, etc. Being able to trust that data to be correct and reliable is a central value for government-published data, so recording provenance is a key requirement for the COINS data.</li>
+<li>A challenge also is the size of the data, especially since it is updated regularly. Five data files already contain between 3.3 and 4.9 million rows of data.</li>
+</ul>
+<p>Requirements::</p>
+<ul>
+<li><a href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Vocabulary should clarify the use of subsets of observations</a></li>
+</ul>
+<h3 id="PublishingExcelSpreadsheetsasLinkedData"><span class="secno">3.3</span> Publisher Use Case: Publishing Excel Spreadsheets as Linked Data</h3>
+<p><span class="c2">(Part of this use case has been contributed by Rinke Hoekstra. See <a href="http://ehumanities.nl/ceda_r/">CEDA_R</a> and <a href="http://www.data2semantics.org/">Data2Semantics</a> for more information.)</span></p>
+<p>Not only in government, there is a need to publish considerable amounts of statistical data to be consumed in various (also unexpected) application scenarios. Typically, Microsoft Excel sheets are made available for download. Those excel sheets contain single spreadsheets with several multidimensional data tables, having a name and notes, as well as column values, row values, and cell values.</p>
+<p>Benefits:</p>
+<ul>
+<li>The goal in this use case is to to publish spreadsheet information in a machine-readable format on the web, e.g., so that crawlers can find spreadsheets that use a certain column value. The published data should represent and make available for queries the most important information in the spreadsheets, e.g., rows, columns, and cell values.</li>
+<li>For instance, in the <a href="http://ehumanities.nl/ceda_r/">CEDA_R</a> and <a href="http://www.data2semantics.org/">Data2Semantics</a> projects publishing and harmonizing Dutch historical census data (from 1795 onwards) is a goal. These censuses are now only available as Excel spreadsheets (obtained by data entry) that closely mimic the way in which the data was originally published and shall be published as Linked Data.</li>
+</ul>
+<p>Challenges in this use case:</p>
+<ul>
+<li>All context and so all meaning of the measurement point is expressed by means of dimensions. The pure number is the star of an ego-network of attributes or dimensions. In a RDF representation it is then easily possible to define hierarchical relationships between the dimensions (that can be exemplified further) as well as mapping different attributes across different value points. This way a harmonization among variables is performed around the measurement points themselves.</li>
+<li>In historical research, until now, harmonization across datasets is performed by hand, and in subsequent iterations of a database: it is very hard to trace back the provenance of decisions made during the harmonization procedure.</li>
+<li>Combining Data Cube with SKOS [<cite><a href="#ref-skos">SKOS</a></cite>] to allow for cross-location and cross-time historical analysis</li>
+<li>Novel visualisation of census data</li>
+<li>Integration with provenance vocabularies, e.g., PROV-O, for tracking of harmonization steps</li>
+<li>These challenges may seem to be particular to the field of historical research, but in fact apply to government information at large. Government is not a single body that publishes information at a single point in time. Government consists of multiple (altering) bodies, scattered across multiple levels, jurisdictions and areas. Publishing government information in a consistent, integrated manner requires exactly the type of harmonization required in this use case.</li>
+<li>Excel sheets provide much flexibility in arranging information. It may be necessary to limit this flexibility to allow automatic transformation.</li>
+<li>There are many spreadsheets.</li>
+<li>Semi-structured information, e.g., notes about lineage of data cells, may not be possible to be formalized.</li>
+</ul>
+<p>Existing work:</p>
+<ul>
+<li>Another concrete example is the <a href="http://ontowiki.net/Projects/Stats2RDF?show_comments=1">Stats2RDF</a> project that intends to publish biomedical statistical data that is represented as Excel sheets. Here, Excel files are first translated into CSV and then translated into RDF.</li>
+<li>Some of the challenges are met by the work on an ISO Extension to SKOS [<cite><a href="#ref-xkos">XKOS</a></cite>].</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Vocabulary should recommend a mechanism to support hierarchical code lists</a></li>
+<li><a href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">There should be a recommended way of declaring relations between cubes</a></li>
+</ul>
+<h3 id="PublishinghierarchicallystructureddatafromStatsWalesandOpenDataCommunities"><span class="secno">3.4</span> Publisher Use Case: Publishing hierarchically structured data from StatsWales and Open Data Communities</h3>
+<p><span class="c2">(Use case has been taken from [<cite><a href="#ref-QB4OLAP">QB4OLAP</a></cite>] and from discussions at <a href="http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f">publishing-statistical-data mailing list</a>)</span></p>
+<p>It often comes up in statistical data that you have some kind of 'overall' figure, which is then broken down into parts.</p>
+<p>Example (in pseudo-turtle RDF):</p>
+<pre>
+ex:obs1
+ sdmx:refArea ;
+ sdmx:refPeriod "2011";
+ ex:population "60" .
+ex:obs2
+ sdmx:refArea ;
+ sdmx:refPeriod "2011";
+ ex:population "50" .
+ex:obs3
+ sdmx:refArea ;
+ sdmx:refPeriod "2011";
+ ex:population "5" .
+ex:obs4
+ sdmx:refArea ;
+ sdmx:refPeriod "2011";
+ ex:population "3" .
+ex:obs5
+ sdmx:refArea ;
+ sdmx:refPeriod "2011";
+ ex:population "2" .
+
+</pre>
+<p>We are looking for the best way (in the context of the RDF/Data Cube/SDMX approach) to express that the values for the England/Scotland/Wales/ Northern Ireland ought to add up to the value for the UK and constitute a more detailed breakdown of the overall UK figure? Since we might also have population figures for France, Germany, EU27, it is not as simple as just taking a <code>qb:Slice</code> where you fix the time period and the measure.</p>
+<p>Similarly, Etcheverry and Vaisman [<cite><a href="#ref-QB4OLAP">QB4OLAP</a></cite>] present the use case to publish household data from <a href="http://statswales.wales.gov.uk/index.htm">StatsWales</a> and <a href="http://opendatacommunities.org/doc/dataset/housing/household-projections">Open Data Communities</a>.</p>
+<p>This multidimensional data contains for each fact a time dimension with one level Year and a location dimension with levels Unitary Authority, Government Office Region, Country, and ALL.</p>
+<p>As unit, units of 1000 households is used.</p>
+<p>In this use case, one wants to publish not only a dataset on the bottom most level, i.e. what are the number of households at each Unitary Authority in each year, but also a dataset on more aggregated levels.</p>
+<p>For instance, in order to publish a dataset with the number of households at each Government Office Region per year, one needs to aggregate the measure of each fact having the same Government Office Region using the SUM function.</p>
+<p>Importantly, one would like to maintain the relationship between the resulting datasets, i.e., the levels and aggregation functions.</p>
+<p>Again, this use case does not simply need a selection (or "dice" in OLAP context) where one fixes the time period dimension.</p>
+<p>Requirements:</p>
+<ul>
+<li><a href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Vocabulary should recommend a mechanism to support hierarchical code lists</a></li>
+</ul>
+<h3 id="PublishingslicesofdataaboutUKBathingWaterQuality"><span class="secno">3.5</span> Publisher Use Case: Publishing slices of data about UK Bathing Water Quality</h3>
+<p><span class="c2">(Use case has been provided by Epimorphics Ltd, in their <a href="http://www.epimorphics.com/web/projects/bathing-water-quality">UK Bathing Water Quality</a> deployment)</span></p>
+<p>As part of their work with data.gov.uk and the UK Location Programme Epimorphics Ltd have been working to pilot the publication of both current and historic bathing water quality information from the <a href="http://www.environment-agency.gov.uk/">UK Environment Agency</a> as Linked Data.</p>
+<p>The UK has a number of areas, typically beaches, that are designated as bathing waters where people routinely enter the water. The Environment Agency monitors and reports on the quality of the water at these bathing waters.</p>
+<p>The Environement Agency's data can be thought of as structured in 3 groups:</p>
+<ul>
+<li>There is basic reference data describing the bathing waters and sampling points</li>
+<li>There is a data set "Annual Compliance Assessment Dataset" giving the rating for each bathing water for each year it has been monitored</li>
+<li>There is a data set "In-Season Sample Assessment Dataset" giving the detailed weekly sampling results for each bathing water</li>
+</ul>
+<p>The most important dimensions of the data are bathing water, sampling point, and compliance classification.</p>
+<p>Challenges:</p>
+<ul>
+<li>Observations may exhibit a number of attributes, e.g., whether ther was an abnormal weather exception.</li>
+<li>Relevant slices of both datasets are to be created:
+<ul>
+<li>Annual Compliance Assessment Dataset: all the observations for a specific sampling point, all the observations for a specific year.</li>
+<li>In-Season Sample Assessment Dataset: samples for a given sampling point, samples for a given week, samples for a given year, samples for a given year and sampling point, latest samples for each sampling point.</li>
+<li>The use case suggests more arbitrary subsets of the observations, e.g., collecting all the "latest" observations in a continuously updated data set.</li>
+</ul>
+</li>
+</ul>
+<p>Existing Work:</p>
+<ul>
+<li>The <a href="http://purl.oclc.org/NET/ssnx/ssn">Semantic Sensor Network ontology</a> (SSN) already provides a way to publish sensor information. SSN data provides statistical Linked Data and grounds its data to the domain, e.g., sensors that collect observations (e.g., sensors measuring average of temperature over location and time).</li>
+<li>A number of organisations, particularly in the Climate and Meteorological area already have some commitment to the OGC "Observations and Measurements" (O&M) logical data model, also published as ISO 19156.</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Vocabulary should define relationship to ISO19156 - Observations & Measurements</a></li>
+<li><a href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Vocabulary should clarify the use of subsets of observations</a></li>
+</ul>
+<h3 id="EurostatSDMXasLinkedData"><span class="secno">3.6</span> Publisher Use Case: Eurostat SDMX as Linked Data</h3>
+<p><span class="c2">(This use case has been taken from <a href="http://estatwrap.ontologycentral.com/">Eurostat Linked Data Wrapper</a> and <a href="http://eurostat.linked-statistics.org/">Linked Statistics Eurostat Data</a>, both deployments for publishing Eurostat SDMX as Linked Data using the draft version of the data cube vocabulary)</span></p>
+<p>As mentioned already, the ISO standard for exchanging and sharing statistical data and metadata among organisations is Statistical Data and Metadata eXchange [<cite><a href="#ref-SDMX">SDMX</a></cite>]. Since this standard has proven applicable in many contexts, we adopt the multidimensional model that underlies SDMX and intend the standard vocabulary to be compatible to SDMX.</p>
+<p>Therefore, in this use case we intend to explain the benefit and challenges of publishing SDMX data as Linked Data. As one of the main adopters of SDMX, <a href="http://epp.eurostat.ec.europa.eu/">Eurostat</a> publishes large amounts of European statistics coming from a data warehouse as SDMX and other formats on the web. Eurostat also provides an interface to browse and explore the datasets. However, linking such multidimensional data to related data sets and concepts would require downloading of interesting datasets and manual integration.The goal here is to improve integration with other datasets; Eurostat data should be published on the web in a machine-readable format, possible to be linked with other datasets, and possible to be freeley consumed by applications. Both <a href="http://estatwrap.ontologycentral.com/">Eurostat Linked Data Wrapper</a> and <a href="http://eurostat.linked-statistics.org/">Linked Statistics Eurostat Data</a> intend to publish <a href="http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/">Eurostat SDMX data</a> as <a href="http://5stardata.info/">5-star Linked Open Data</a>. Eurostat data is partly published as SDMX, partly as tabular data (TSV, similar to CSV). Eurostat provides a <a href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">TOC of published datasets</a> as well as a feed of modified and new datasets. Eurostat provides a list of used codelists, i.e., <a href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&dir=dic">range of permitted dimension values</a>. Any Eurostat dataset contains a varying set of dimensions (e.g., date, geo, obs_status, sex, unit) as well as measures (generic value, content is specified by dataset, e.g., GDP per capita in PPS, Total population, Employment rate by sex).</p>
+<p>Benefits:</p>
+<ul>
+<li>Possible implementation of ETL pipelines based on Linked Data technologies (e.g., <a href="http://code.google.com/p/ldspider/">LDSpider</a>) to effectively load the data into a data warehouse for analysis</li>
+<li>Allows useful queries to the data, e.g., comparison of statistical indicators across EU countries.</li>
+<li>Allows to attach contextual information to statistics during the interpretation process.</li>
+<li>Allows to reuse single observations from the data.</li>
+<li>Linking to information from other data sources, e.g., for geo-spatial dimension.</li>
+</ul>
+<p>Challenges:</p>
+<ul>
+<li>New Eurostat datasets are added regularly to Eurostat. The Linked Data representation should automatically provide access to the most-up-to-date data.</li>
+<li>How to match elements of the geo-spatial dimension to elements of other data sources, e.g., NUTS, GADM.</li>
+<li>There is a large number of Eurostat datasets, each possibly containing a large number of columns (dimensions) and rows (observations). Eurostat publishes more than 5200 datasets, which, when converted into RDF require more than 350GB of disk space yielding a dataspace with some 8 billion triples.</li>
+<li>In the Eurostat Linked Data Wrapper, there is a timeout for transforming SDMX to Linked Data, since Google App Engine is used. Mechanisms to reduce the amount of data that needs to be translated would be needed.</li>
+<li>Provide a useful interface for browsing and visualising the data. One problem is that the data sets have to high dimensionality to be displayed directly. Instead, one could visualise slices of time series data. However, for that, one would need to either fix most other dimensions (e.g., sex) or aggregate over them (e.g., via average). The selection of useful slices from the large number of possible slices is a challenge.</li>
+<li>Each dimension used by a dataset has a range of permitted values that need to be described.</li>
+<li>The Eurostat SDMX as Linked Data use case suggests to have time lines on data aggregating over the gender dimension.</li>
+<li>The Eurostat SDMX as Linked Data use case suggests to provide data on a gender level and on a level aggregating over the gender dimension.</li>
+<li>Updates to the data
+<ul>
+<li>Eurostat - Linked Data pulls in changes from the original Eurostat dataset on weekly basis and conversion process runs every Saturday at noon taking into account new datasets along with updates to existing datasets.</li>
+<li>Eurostat Linked Data Wrapper on-the-fly translates Eurostat datasets into RDF so that always the most current data is used. The problem is only to point users towards the URIs of Eurostat datasets: Estatwrap provides a feed of modified and new <a href="http://estatwrap.ontologycentral.com/feed.rdf">datasets</a>. Also, it provides a <a href="http://estatwrap.ontologycentral.com/table_of_contents.html">TOC</a> that could be automatically updated from the <a href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">Eurostat TOC</a>.</li>
+</ul>
+</li>
+<li>Query interface</li>
+<li class="c3">
+<ul>
+<li>Eurostat - Linked Data provides SPARQL endpoint for the metadata (not the observations).</li>
+<li>Eurostat Linked Data Wrapper allows and demonstrates how to use Qcrumb.com to query the data.</li>
+</ul>
+</li>
+<li>Browsing and visualising interface:
+<ul>
+<li>Eurostat Linked Data Wrapper provides for each dataset an HTML page showing a visualisation of the data.</li>
+</ul>
+</li>
+</ul>
+<p>Non-requirements:</p>
+<ul>
+<li>One possible application would run validation checks over Eurostat data. The intended standard vocabulary is to publish the Eurostat data as-is and is not intended to represent information for validation (similar to business rules).</li>
+<li>Information of how to match elements of the geo-spatial dimension to elements of other data sources, e.g., NUTS, GADM, is not part of a vocabulary recommendation.</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#VocabularyshouldbuildupontheSDMXinformationmodel">There should be mechanisms and recommendations regarding publication and consumption of large amounts of statistical data</a></li>
+<li><a href="#Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">There should be a recommended mechanism to allow for publication of aggregates which cross multiple dimensions</a></li>
+</ul>
+<h3 id="Representingrelationshipsbetweenstatisticaldata"><span class="secno">3.7</span> Publisher Use Case: Representing relationships between statistical data</h3>
+<p><span class="c2">(This use case has mainly been taken from the COINS project [<cite><a href="#ref-COINS">COINS</a></cite>])</span></p>
+<p>In several applications, relationships between statistical data need to be represented.</p>
+<p>The goal of this use case is to describe provenance, transformations, and versioning around statistical data, so that the history of statistics published on the web becomes clear. This may also relate to the issue of having relationships between datasets published.</p>
+<p>For instance, the COINS project [<cite><a href="#ref-COINS">COINS</a></cite>] has at least four perspectives on what they mean by "COINS" data: the abstract notion of "all of COINS", the data for a particular year, the version of the data for a particular year released on a given date, and the constituent graphs which hold both the authoritative data translated from HMT's own sources. Also, additional supplementary information which they derive from the data, for example by cross-linking to other datasets.</p>
+<p>Another specific use case is that the Welsh Assembly government publishes a variety of population datasets broken down in different ways. For many uses then population broken down by some category (e.g. ethnicity) is expressed as a percentage. Separate datasets give the actual counts per category and aggregate counts. In such cases it is common to talk about the denominator (often DENOM) which is the aggregate count against which the percentages can be interpreted.</p>
+<p>Another example for representing relationships between statistical data are transformations on datasets, e.g., addition of derived measures, conversion of units, aggregations, OLAP operations, and enrichment of statistical data. A concrete example is given by Freitas et al. [<cite><a href="#ref-COGS">COGS</a></cite>] and illustrated in the following figure.</p>
+<p class="caption">Figure: Illustration of ETL workflows to process statistics</p>
+<p class="c1"><img alt="COGS relationships between statistics example" src="./figures/Relationships_Statistical_Data_Cogs_Example.png" /></p>
+<p>Here, numbers from a sustainability report have been created by a number of transformations to statistical data. Different numbers (e.g., 600 for year 2009 and 503 for year 2010) might have been created differently, leading to different reliabilities to compare both numbers.</p>
+<p>Benefits:</p>
+<p>Making transparent the transformation a dataset has been exposed to. Increases trust in the data.</p>
+<p>Challenges:</p>
+<ul>
+<li>Operations on statistical data result in new statistical data, depending on the operation. For instance, in terms of Data Cube, operations such as slice, dice, roll-up, drill-down will result in new Data Cubes. This may require representing general relationships between cubes (as discussed in the <a href="http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/75762788de10de95">publishing-statistical-data mailing list</a>).</li>
+<li>Should Data Cube support explicit declaration of such relationships either between separated qb:DataSets or between measures with a single <code>qb:DataSet</code> (e.g. <code>ex:populationCount</code> and <code>ex:populationPercent</code>)?</li>
+<li>If so should that be scoped to simple, common relationships like DENOM or allow expression of arbitrary mathematical relations?</li>
+</ul>
+<p>Existing Work:</p>
+<ul>
+<li>Possible relation to <a href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary#Versioning">Versioning</a> part of GLD Best Practices Document, where it is specified how to publish data which has multiple versions.</li>
+<li>The <a href="http://sites.google.com/site/cogsvocab/">COGS</a> vocabulary [<cite><a href="#ref-COGS">COGS</a></cite>] is related to this use case since it may complement the standard vocabulary for representing ETL pipelines processing statistics.</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">There should be a recommended way of declaring relations between cubes</a></li>
+</ul>
+<h3 id="Simplechartvisualisationsofpublishedstatisticaldata"><span class="secno">3.8</span> Consumer Use Case: Simple chart visualisations of (integrated) published statistical data</h3>
+<p><span class="c2">(Use case taken from <a href="http://www.iwrm-smart.org/">SMART research project</a>)</span></p>
+<p>Data that is published on the Web is typically visualized by transforming it manually into CSV or Excel and then creating a visualization on top of these formats using Excel, Tableau, RapidMiner, Rattle, Weka etc.</p>
+<p>This use case shall demonstrate how statistical data published on the web can be with few effort visualized inside a webpage, without using commercial or highly-complex tools.</p>
+<p>An example scenario is environmental research done within the <a href="http://www.iwrm-smart.org/">SMART research project</a>. Here, statistics about environmental aspects (e.g., measurements about the climate in the Lower Jordan Valley) shall be visualized for scientists and decision makers. Statistics should also be possible to be integrated and displayed together. The data is available as XML files on the web. On a separate website, specific parts of the data shall be queried and visualized in simple charts, e.g., line diagrams.</p>
+<p class="caption">Figure: HTML embedded line chart of an environmental measure over time for three regions in the lower Jordan valley</p>
+<p class="c1"><img alt="display of an environmental measure over time for three regions in the lower Jordan valley" src="./figures/Level_above_msl_3_locations.png" width="1000px" /></p>
+<p class="caption">Figure: Showing the same data in a pivot table. Here, the aggregate COUNT of measures per cell is given.</p>
+<p class="c1"><img alt="Figure: Showing the same data in a pivot table. Here, the aggregate COUNT of measures per cell is given." src="./figures/pivot_analysis_measurements.PNG" /></p>
+<p>Challenges of this use case are:</p>
+<ul>
+<li>The difficulties lay in structuring the data appropriately so that the specific information can be queried.</li>
+<li>Also, data shall be published with having potential integration in mind. Therefore, e.g., units of measurements need to be represented.</li>
+<li>Integration becomes much more difficult if publishers use different measures, dimensions.</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There should be criteria for well-formedness and assumptions consumers can make about published data</a></li>
+</ul>
+<h3 id="VisualisingpublishedstatisticaldatainGooglePublicDataExplorer"><span class="secno">3.9</span> Consumer Use Case: Visualising published statistical data in Google Public Data Explorer</h3>
+<p><span class="c2">(Use case taken from <a href="http://code.google.com/apis/publicdata/">Google Public Data Explorer (GPDE)</a>)</span></p>
+<p><a href="http://code.google.com/apis/publicdata/">Google Public Data Explorer</a> (GPDE) provides an easy possibility to visualize and explore statistical data. Data needs to be in the <a href="https://developers.google.com/public-data/overview">Dataset Publishing Language</a> (DSPL) to be uploaded to the data explorer. A DSPL dataset is a bundle that contains an XML file, the schema, and a set of CSV files, the actual data. Google provides a tutorial to create a DSPL dataset from your data, e.g., in CSV. This requires a good understanding of XML, as well as a good understanding of the data that shall be visualized and explored.</p>
+<p>In this use case, the goal is to take statistical data published on the web and to transform it into DSPL for visualization and exploration with as few effort as possible.</p>
+<p>For instance, Eurostat data about Unemployment rate downloaded from the web as shown in the following figure:</p>
+<p class="caption">Figure: An interactive chart in GPDE for visualising Eurostat data described with DSPL</p>
+<p class="c1"><img alt="An interactive chart in GPDE for visualising Eurostat data in the DSPL" src="./figures/Eurostat_GPDE_Example.png" width="1000px" /></p>
+<p>Benefits:</p>
+<ul>
+<li>If a standard Linked Data vocabulary is used, visualising and exploring new data that already is represented using this vocabulary can easily be done using GPDE.</li>
+<li>Datasets can be first integrated using Linked Data technology and then analysed using GDPE.</li>
+</ul>
+<p>Challenges of this use case are:</p>
+<ul>
+<li>There are different possible approaches each having advantages and disadvantages: 1) A customer C is downloading this data into a triple store; SPARQL queries on this data can be used to transform the data into DSPL and uploaded and visualized using GPDE. 2) or, one or more XLST transformation on the RDF/XML transforms the data into DSPL.</li>
+<li>The technical challenges for the consumer here lay in knowing where to download what data and how to get it transformed into DSPL without knowing the data.</li>
+</ul>
+<p>Non-requirements:</p>
+<ul>
+<li>DSPL is representative for using statistical data published on the web in available tools for analysis. Similar tools that may be automatically covered are: Weka (arff data format), Tableau, SPSS, STATA, PC-Axis etc.</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There should be criteria for well-formedness and assumptions consumers can make about published data</a></li>
+</ul>
+<h3 id="AnalysingpublishedstatisticaldatawithcommonOLAPsystems"><span class="secno">3.10</span> Consumer Use Case: Analysing published statistical data with common OLAP systems</h3>
+<p><span class="c2">(Use case taken from <a href="http://xbrl.us/research/appdev/Pages/275.aspx">Financial Information Observation System (FIOS)</a>)</span></p>
+<p>Online Analytical Processing (OLAP) [<cite><a href="#ref-OLAP">OLAP</a></cite>] is an analysis method on multidimensional data. It is an explorative analysis methode that allows users to interactively view the data on different angles (rotate, select) or granularities (drill-down, roll-up), and filter it for specific information (slice, dice).</p>
+<p>OLAP systems that first use ETL pipelines to Extract-Load-Transform relevant data for efficient storage and queries in a data warehouse and then allows interfaces to issue OLAP queries on the data are commonly used in industry to analyse statistical data on a regular basis.</p>
+<p>The goal in this use case is to allow analysis of published statistical data with common OLAP systems [<cite><a href="#ref-OLAP4LD">OLAP4LD</a></cite>]</p>
+<p>For that a multidimensional model of the data needs to be generated. A multidimensional model consists of facts summarised in data cubes. Facts exhibit measures depending on members of dimensions. Members of dimensions can be further structured along hierarchies of levels.</p>
+<p>An example scenario of this use case is the Financial Information Observation System (FIOS) [<cite><a href="#ref-FIOS">FIOS</a></cite>], where XBRL data provided by the SEC on the web is to be re-published as Linked Data and made possible to explore and analyse by stakeholders in a web-based OLAP client Saiku.</p>
+<p>The following figure shows an example of using FIOS. Here, for three different companies, cost of goods sold as disclosed in XBRL documents are analysed. As cell values either the number of disclosures or - if only one available - the actual number in USD is given:</p>
+<p class="caption">Figure: Example of using FIOS for OLAP operations on financial data</p>
+<p class="c1"><img alt="Example of using FIOS for OLAP operations on financial data" src="./figures/FIOS_example.PNG" /></p>
+<p>Benefits:</p>
+<ul>
+<li>OLAP operations cover typical business requirements, e.g., slice, dice, drill-down.</li>
+<li>OLAP frontends intuitive interactive, explorative, fast. Interfaces well-known to many people in industry.</li>
+<li>OLAP functionality provided by many tools that may be reused</li>
+</ul>
+<p>Challenges:</p>
+<ul>
+<li>ETL pipeline needs to automatically populate a data warehouse. Common OLAP systems use relational databases with a star schema.</li>
+<li>A problem lies in the strict separation between queries for the structure of data (metadata queries), and queries for actual aggregated values (OLAP operations).</li>
+<li>Another problem lies in defining Data Cubes without greater insight in the data beforehand.</li>
+<li>Depending on the expressivity of the OLAP queries (e.g., aggregation functions, hierarchies, ordering), performance plays an important role.</li>
+<li>Olap systems have to cater for possibly missing information (e.g., the aggregation function or a human readable label).</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There should be criteria for well-formedness and assumptions consumers can make about published data</a></li>
+</ul>
+<h3 id="Registeringpublishedstatisticaldataindatacatalogs"><span class="secno">3.11</span> Registry Use Case: Registering published statistical data in data catalogs</h3>
+<p><span class="c2">(Use case motivated by <a href="http://www.w3.org/TR/vocab-dcat/">Data Catalog vocabulary</a>)</span></p>
+<p>After statistics have been published as Linked Data, the question remains how to communicate the publication and let users discover the statistics. There are catalogs to register datasets, e.g., CKAN, <a href="http://www.datacite.org/">datacite.org</a>, <a href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a href="http://pangaea.de/">Pangea</a>. Those catalogs require specific configurations to register statistical data.</p>
+<p>The goal of this use case is to demonstrate how to expose and distribute statistics after publication. For instance, to allow automatic registration of statistical data in such catalogs, for finding and evaluating datasets. To solve this issue, it should be possible to transform the published statistical data into formats that can be used by data catalogs.</p>
+<p>A concrete use case is the structured collection of <a href="http://wiki.planet-data.eu/web/Datasets">RDF Data Cube Vocabulary datasets</a> in the PlanetData Wiki. This list is supposed to describe statistical datasets on a higher level - for easy discovery and selection - and to provide a useful overview of RDF Data Cube deployments in the Linked Data cloud.</p>
+<p>Unanticipated Uses:</p>
+<ul>
+<li>If data catalogs contain statistics, they do not expose those using Linked Data but for instance using CSV or HTML (e.g., Pangea). It could also be a use case to publish such data using the data cube vocabulary.</li>
+</ul>
+<p>Existing Work:</p>
+<ul>
+<li>The <a href="http://www.w3.org/TR/vocab-dcat/">Data Catalog vocabulary</a> (DCAT) is strongly related to this use case since it may complement the standard vocabulary for representing statistics in the case of registering data in a data catalog.</li>
+</ul>
+<p>Requirements:</p>
+<ul>
+<li><a href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">There should be a recommended way to communicate the availability of published statistical data to external parties and to allow automatic discovery of statistical data</a></li>
+</ul>
+<!--OddPage-->
+<h2 id="requirements"><span class="secno">4.</span> Requirements</h2>
+<p>The use cases presented in the previous section give rise to the following requirements for a standard representation of statistics. Requirements are cross-linked with the use cases that motivate them.</p>
+<h3 id="VocabularyshouldbuildupontheSDMXinformationmodel"><span class="secno">4.1</span> Vocabulary should build upon the SDMX information model</h3>
+<p>The draft version of the vocabulary builds upon <a href="http://sdmx.org/?page_id=16">SDMX Standards Version 2.0</a>. A newer version of SDMX, <a href="http://sdmx.org/?p=899">SDMX Standards, Version 2.1</a>, is available.</p>
+<p>The requirement is to at least build upon Version 2.0, if specific use cases derived from Version 2.1 become available, the working group may consider building upon Version 2.1.</p>
+<p>Background information:</p>
+<ul>
+<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/37">http://www.w3.org/2011/gld/track/issues/37</a></li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li><a href="#SDMXWebDisseminationUseCase">SDMX Web Dissemination Use Case</a></li>
+<li><a href="#UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher Use Case: UK government financial data from Combined Online Information System (COINS)</a></li>
+<li><a href="#EurostatSDMXasLinkedData">Publisher Use Case: Eurostat SDMX as Linked Data</a></li>
+</ul>
+<h3 id="Vocabularyshouldclarifytheuseofsubsetsofobservations"><span class="secno">4.2</span> Vocabulary should clarify the use of subsets of observations</h3>
+<p>There should be a consensus on the issue of flattening or abbreviating data; one suggestion is to author data without the duplication, but have the data publication tools "flatten" the compact representation into standalone observations during the publication process.</p>
+<p>Background information:</p>
+<ul>
+<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/33">http://www.w3.org/2011/gld/track/issues/33</a></li>
+<li>Since there are no use cases for qb:subslice, the vocabulary should clarify or drop the use of qb:subslice; issue: <a href="http://www.w3.org/2011/gld/track/issues/34">http://www.w3.org/2011/gld/track/issues/34</a></li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li><a href="#UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher Use Case: UK government financial data from Combined Online Information System (COINS)</a></li>
+<li><a href="#PublishingslicesofdataaboutUKBathingWaterQuality">Publisher Use Case: Publishing slices of data about UK Bathing Water Quality</a></li>
+</ul>
+<h3 id="Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists"><span class="secno">4.3</span> Vocabulary should recommend a mechanism to support hierarchical code lists</h3>
+<p>First, hierarchical code lists may be supported via SKOS [<cite><a href="#ref-skos">SKOS</a></cite>]. Allow for cross-location and cross-time analysis of statistical datasets.</p>
+<p>Second, one can think of non-SKOS hierarchical code lists. E.g., if simple <code>skos:narrower</code> / <code>skos:broader</code> relationships are not sufficient or if a vocabulary uses specific hierarchical properties, e.g., <code>geo:containedIn</code> .</p>
+<p>Also, the use of hierarchy levels needs to be clarified. It has been suggested, to allow <code>skos:Collections</code> as value of <code>qb:codeList</code> .</p>
+<p>Background information:</p>
+<ul>
+<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
+<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/39">http://www.w3.org/2011/gld/track/issues/39</a></li>
+<li>Discussion at publishing-statistical-data mailing list: <a href="http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f">http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f</a></li>
+<li>Part of the requirement is met by the work on an ISO Extension to SKOS [<cite><a href="#ref-xkos">XKOS</a></cite>]</li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li><a href="#PublishingExcelSpreadsheetsasLinkedData">Publisher Use Case: Publishing Excel Spreadsheets as Linked Data</a></li>
+</ul>
+<h3 id="VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements"><span class="secno">4.4</span> Vocabulary should define relationship to ISO19156 - Observations & Measurements</h3>
+<p>An number of organisations, particularly in the Climate and Meteorological area already have some commitment to the OGC "Observations and Measurements" (O&M) logical data model, also published as ISO 19156. Are there any statements about compatibility and interoperability between O&M and Data Cube that can be made to give guidance to such organisations?</p>
+<p>Background information:</p>
+<ul>
+<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/32">http://www.w3.org/2011/gld/track/issues/32</a></li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li><a href="#PublishingslicesofdataaboutUKBathingWaterQuality">Publisher Use Case: Publishing slices of data about UK Bathing Water Quality</a></li>
+</ul>
+<h3 id="Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions"><span class="secno">4.5</span> There should be a recommended mechanism to allow for publication of aggregates which cross multiple dimensions</h3>
+<p>Background information:</p>
+<ul>
+<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li>E.g., the Eurostat SDMX as Linked Data use case suggests to have time lines on data aggregating over the gender dimension: <a href="#EurostatSDMXasLinkedData">Publisher Use Case: Eurostat SDMX as Linked Data</a></li>
+<li>Another possible use case could be provided by the <a href="http://data.gov.uk/resources/payments">Payment Ontology</a>.</li>
+</ul>
+<h3 id="Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes"><span class="secno">4.6</span> There should be a recommended way of declaring relations between cubes</h3>
+<p>Background information:</p>
+<ul>
+<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/30">http://www.w3.org/2011/gld/track/issues/30</a></li>
+<li>Discussion in <a href="http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/75762788de10de95">publishing-statistical-data mailing list</a></li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li><a href="#Representingrelationshipsbetweenstatisticaldata">Publisher Use Case: Representing relationships between statistical data</a></li>
+</ul>
+<h3 id="Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata"><span class="secno">4.7</span> There should be criteria for well-formedness and assumptions consumers can make about published data</h3>
+<p>Background information:</p>
+<ul>
+<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/29">http://www.w3.org/2011/gld/track/issues/29</a></li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li><a href="#Simplechartvisualisationsofpublishedstatisticaldata">Consumer Use Case: Simple chart visualisations of (integrated) published statistical data</a></li>
+<li><a href="#VisualisingpublishedstatisticaldatainGooglePublicDataExplorer">Consumer Use Case: Visualising published statistical data in Google Public Data Explorer</a></li>
+<li><a href="#AnalysingpublishedstatisticaldatawithcommonOLAPsystems">Consumer Use Case: Analysing published statistical data with common OLAP systems</a></li>
+</ul>
+<h3 id="Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata"><span class="secno">4.8</span> There should be mechanisms and recommendations regarding publication and consumption of large amounts of statistical data</h3>
+<p>Background information:</p>
+<ul>
+<li>Related issue regarding abbreviations <a href="http://www.w3.org/2011/gld/track/issues/29">http://www.w3.org/2011/gld/track/issues/29</a></li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li><a href="#EurostatSDMXasLinkedData">Publisher Use Case: Eurostat SDMX as Linked Data</a></li>
+</ul>
+<h3 id="Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata"><span class="secno">4.9</span> There should be a recommended way to communicate the availability of published statistical data to external parties and to allow automatic discovery of statistical data</h3>
+<p>Clarify the relationship between DCAT and QB.</p>
+<p>Background information:</p>
+<ul>
+<li>None.</li>
+</ul>
+<p>Required by:</p>
+<ul>
+<li><a href="#SDMXWebDisseminationUseCase">SDMX Web Dissemination Use Case</a></li>
+<li><a href="#Registeringpublishedstatisticaldataindatacatalogs">Registry Use Case: Registering published statistical data in data catalogs</a></li>
+</ul>
+<!--OddPage-->
+<h2 id="acknowledgements"><span class="secno">A.</span> Acknowledgements</h2>
+<p>We thank Rinke Hoekstra, Dave Reynolds, Bernadette Hyland, Biplav Srivastava, John Erickson, Villazón-Terrazas for feedback and input.</p>
+<h2 id="references">References</h2>
+<dl>
+<dt id="ref-cog">[COG]</dt>
+<dd>SDMX Content Oriented Guidelines, <a href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a></dd>
+<dt id="ref-COGS">[COGS]</dt>
+<dd>Freitas, A., Kämpgen, B., Oliveira, J. G., O'Riain, S., & Curry, E. (2012). Representing Interoperable Provenance Descriptions for ETL Workflows. ESWC 2012 Workshop Highlights (pp. 1-15). Springer Verlag, 2012 (in press). (Extended Paper published in Conf. Proceedings.). <a href="http://andrefreitas.org/papers/preprint_provenance_ETL_workflow_eswc_highlights.pdf">http://andrefreitas.org/papers/preprint_provenance_ETL_workflow_eswc_highlights.pdf</a>.</dd>
+<dt id="ref-COINS">[COINS]</dt>
+<dd>Ian Dickinson et al., COINS as Linked Data <a href="http://data.gov.uk/resources/coins">http://data.gov.uk/resources/coins</a>, last visited on Jan 9 2013</dd>
+<dt id="ref-FIOS">[FIOS]</dt>
+<dd>Andreas Harth, Sean O'Riain, Benedikt Kämpgen. Submission XBRL Challenge 2011. <a href="http://xbrl.us/research/appdev/Pages/275.aspx">http://xbrl.us/research/appdev/Pages/275.aspx</a>.</dd>
+<dt id="ref-FOWLER97">[FOWLER97]</dt>
+<dd>Fowler, Martin (1997). Analysis Patterns: Reusable Object Models. Addison-Wesley. ISBN 0201895420.</dd>
+<dt id="ref-linked-data">[LOD]</dt>
+<dd>Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a></dd>
+<dt id="ref-OLAP">[OLAP]</dt>
+<dd>Online Analytical Processing Data Cubes, <a href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a></dd>
+<dt id="ref-OLAP4LD">[OLAP4LD]</dt>
+<dd>Kämpgen, B. and Harth, A. (2011). Transforming Statistical Linked Data for Use in OLAP Systems. I-Semantics 2011. <a href="http://www.aifb.kit.edu/web/Inproceedings3211">http://www.aifb.kit.edu/web/Inproceedings3211</a></dd>
+<dt id="ref-QB-2010">[QB-2010]</dt>
+<dd>RDF Data Cube vocabulary, <a href="http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html">http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</a></dd>
+<dt id="ref-QB-2013">[QB-2013]</dt>
+<dd>RDF Data Cube vocabulary, <a href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a></dd>
+<dt id="ref-QB4OLAP">[QB4OLAP]</dt>
+<dd>Etcheverry, Vaismann. QB4OLAP : A New Vocabulary for OLAP Cubes on the Semantic Web. <a href="http://publishing-multidimensional-data.googlecode.com/git/index.html">http://publishing-multidimensional-data.googlecode.com/git/index.html</a></dd>
+<dt id="ref-rdf">[RDF]</dt>
+<dd>Resource Description Framework, <a href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</a></dd>
+<dt id="ref-scovo">[SCOVO]</dt>
+<dd>The Statistical Core Vocabulary, <a href="http://sw.joanneum.at/scovo/schema.html">http://sw.joanneum.at/scovo/schema.html</a><br />
+SCOVO: Using Statistics on the Web of data, <a href="http://sw-app.org/pub/eswc09-inuse-scovo.pdf">http://sw-app.org/pub/eswc09-inuse-scovo.pdf</a></dd>
+<dt id="ref-skos">[SKOS]</dt>
+<dd>Simple Knowledge Organization System, <a href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a></dd>
+<dt id="ref-SDMX">[SMDX]</dt>
+<dd>SMDX - SDMX User Guide Version 2009.1, <a href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>, last visited Jan 8 2013.</dd>
+<dt id="ref-SDMX-21">[SMDX 2.1]</dt>
+<dd>SDMX 2.1 User Guide Version. Version 0.1 - 19/09/2012. <a href="http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf">http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf</a>. last visited on 8 Jan 2013.</dd>
+<dt id="ref-xkos">[XKOS]</dt>
+<dd>Extended Knowledge Organization System (XKOS), <a href="https://github.com/linked-statistics/xkos">https://github.com/linked-statistics/xkos</a></dd>
+</dl>
+</body>
+</html>
\ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/data-cube-ucr/data-cube-ucr-20130227/local-style.css Thu Feb 28 10:06:00 2013 -0500
@@ -0,0 +1,167 @@
+
+.ldhcode {
+margin: 0px;
+padding: 10px;
+background: #ffffee;
+border: 1px solid #ffff88;
+}
+.turtlecode {
+margin: 0px;
+padding: 10px;
+background: #eeffee;
+border: 1px solid #88dd88;
+}
+.fig {
+text-align: center;
+}
+.fig img {
+border-bottom: 1px solid #bebebe;
+padding: 20px;
+margin-top: 20px;
+}
+.fig div {
+padding: 5px;
+}
+.fig div span {
+font-weight: bold;
+}
+.xsec h3 {
+font-size: 16px;
+text-align: left;
+margin-bottom: 5px;
+font-weight: bold;
+color: black;
+}
+.bc {
+text-align: left;
+border: 1px solid #e0e0e0;
+background: #ffffff url("http://upload.wikimedia.org/wikipedia/commons/d/db/Crystal_Clear_mimetype_vcard.png") no-repeat right -16px;
+padding: 20px 50px 20px 10px;
+margin: 0px;
+margin-top: 0px;
+}
+
+.todo {
+border: 3px solid #ff0;
+margin: 0 0 0 20px;
+padding: 10px;
+}
+
+.issue {
+border: 3px solid #f30;
+margin: 0 0 0 20px;
+padding: 10px;
+}
+
+.responsible {
+border: 3px solid #6a6;
+margin: 0 0 0 20px;
+padding: 10px;
+}
+
+
+ol.prereq li {
+padding-bottom: 10px;
+}
+ul.checklist-toc {
+margin-left: 20px;
+width: 650px;
+}
+ul.checklist-toc li {
+margin: 5px;
+padding: 10px;
+border: 1px solid #8f8f8f;
+list-style: none;
+}
+ul.inline-opt {
+margin-left: 20px;
+}
+ul.inline-opt li {
+margin: 5px;
+padding: 10px;
+}
+dl.decl dd {
+padding-bottom: 1em;
+}
+dl.refs {
+margin: 10px;
+padding: 10px;
+}
+dl.refs dt {
+padding-bottom: 5px;
+}
+dl.refs dd {
+padding-bottom: 10px;
+margin-left: 15px;
+}
+dl.decl {
+border: 1px dashed black;
+padding: 10px;
+margin-left: 100px;
+margin-right: 100px;
+}
+dl.decl dt {
+padding-bottom: 5px;
+}
+dl.decl dd {
+padding-bottom: 10px;
+}
+dl tt {
+font-size: 110%;
+}
+table.example {
+border: 0px solid #9e9e9e;
+border-bottom: 0px;
+width: 100%;
+padding: 0px;
+margin-top: 20px;
+}
+table.example th {
+border-bottom: 1px solid #bebebe;
+border-top: 0px solid #bebebe;
+}
+table.example td {
+vertical-align: top;
+padding: 10px;
+padding-top: 10px;
+}
+table.example caption {
+border-top: 1px solid #bebebe;
+padding: 5px;
+caption-side: bottom;
+margin-bottom: 30px;
+}
+table.example caption span {
+font-weight: bold;
+}
+table.xtab {
+width: 100%;
+padding: 2px;
+background: #d0d0d0;
+}
+table.xtab th {
+border: 0px;
+border-bottom: 1px solid #fefefe;
+text-align: left;
+padding: 2px;
+padding-bottom: 1px;
+}
+
+.diff { font-weight:bold; color:#0a3; }
+
+.editorsnote::before {
+ content: "Editor's Note";
+ display: block;
+ width: 150px;
+ background: #ff0;
+ color: #fff;
+ margin: -1.5em 0 0.5em 0;
+ font-weight: bold;
+ border: 1px solid #ff0;
+ padding: 3px 1em;
+}
+.editorsnote {
+ margin: 1em 0em 1em 1em;
+ padding: 1em;
+ border: 2px solid #ff0;
+}
\ No newline at end of file
Binary file data-cube-ucr/figures/Eurostat_GPDE_Example.png has changed
Binary file data-cube-ucr/figures/Relationships_Statistical_Data_Cogs_Example.png has changed
Binary file data-cube-ucr/figures/SDMX_Web_Dissemination_Use_Case.png has changed
--- a/data-cube-ucr/index.html Thu Feb 28 09:25:17 2013 -0500
+++ b/data-cube-ucr/index.html Thu Feb 28 10:06:00 2013 -0500
@@ -1,68 +1,159 @@
<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
- "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml">
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+
<head>
+<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<title>Use Cases and Requirements for the Data Cube Vocabulary</title>
-<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
-<script type="text/javascript" src='../respec/respec3/builds/respec-w3c-common.js' class='remove'></script>
+
+<script type="text/javascript"
+ src='../respec/respec3/builds/respec-w3c-common.js' class='remove'></script>
<script src="respec-ref.js"></script>
<script src="respec-config.js"></script>
<link rel="stylesheet" type="text/css" href="local-style.css" />
</head>
+
<body>
<section id="abstract">
<p>Many national, regional and local governments, as well as other
- organizations inside and outside of the public sector, create
- statistics. There is a need to publish those statistics in a
- standardized, machine-readable way on the web, so that statistics can
- be freely integrated and reused in consuming applications. This
- document is a collection of use cases for a standard vocabulary to
- publish statistics as Linked Data.</p>
+ organisations in- and outside of the public sector, collect numeric
+ data and aggregate this data into statistics. There is a need to
+ publish these statistics in a standardised, machine-readable way on
+ the web, so that they can be freely integrated and reused in consuming
+ applications.</p>
+ <p>
+ In this document, the <a href="http://www.w3.org/2011/gld/">W3C
+ Government Linked Data Working Group</a> presents use cases and
+ requirements supporting a recommendation of the RDF Data Cube
+ Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>]. The
+ group obtained use cases from existing deployments of and experiences
+ with an earlier version of the data cube vocabulary [<cite><a
+ href="#ref-QB-2010">QB-2010</a></cite>]. The group also describes a set of
+ requirements derived from the use cases and to be considered in the
+ recommendation.
+ </p>
</section>
<section id="sotd">
<p>
- This is a working document of the <a
- href="http://www.w3.org/2011/gld/wiki/Data_Cube_Vocabulary">Data
- Cube Vocabulary project</a> within the <a
- href="http://www.w3.org/2011/gld/">W3C Government Linked Data
- Working Group</a>. Feedback is welcome and should be sent to the <a
- href="mailto:public-gld-comments@w3.org">public-gld-comments@w3.org
- mailing list</a>.
+ This document is an editorial update to an Editor's Draft of the "Use
+ Cases and Requirements for the Data Cube Vocabulary" developed by the
+ <a href="http://www.w3.org/2011/gld/">W3C Government Linked Data
+ Working Group</a>.
</p>
</section>
<section>
- <h2>Introduction</h2>
+ <h2 id="introduction">Introduction</h2>
+ The aim of this document is to present concrete use cases and
+ requirements for a vocabulary to publish statistics as Linked Data. An
+ earlier version of the data cube vocabulary [<cite><a
+ href="#ref-QB-2010">QB-2010</a></cite>] has been existing for some time and
+ has proven applicable in <a
+ href="http://wiki.planet-data.eu/web/Datasets">several deployments</a>.
+ The <a href="http://www.w3.org/2011/gld/">W3C Government Linked
+ Data Working Group</a> intends to transform the data cube vocabulary into
+ a W3C recommendation of the RDF Data Cube Vocabulary [<cite><a
+ href="#ref-QB-2013">QB-2013</a></cite>]. This document describes use cases
+ and requirements derived from existing data cube deployments in order
+ to document and illustrate design decisions that have driven the work.
- <p>Many national, regional and local governments, as well as other
- organizations inside and outside of the public sector, create
- statistics. There is a need to publish those statistics in a
- standardized, machine-readable way on the web, so that statistics can
- be freely linked, integrated and reused in consuming applications.
- This document is a collection of use cases for a standard vocabulary
- to publish statistics as Linked Data.</p>
- </section>
+ <p>The rest of this document is structured as follows. We will
+ first give a short introduction of the specificities of modelling
+ statistics. Then, we will describe use cases that have been derived
+ from existing deployments or feedback to the earlier data cube
+ vocabulary version. In particular, we describe possible benefits and
+ challenges of use cases. Afterwards, we will describe concrete
+ requirements that were derived from those use cases and that have been
+ taken into account for the specification.</p>
+ <p>We use the name data cube vocabulary throughout the document
+ when referring to the vocabulary.</p>
<section>
- <h2>Terminology</h2>
+ <h3 id="describingstatistics">Describing statistics</h3>
+ <p>In the following, we describe the challenge of an RDF vocabulary
+ for publishing statistics as Linked Data.</p>
+ <p>Describing statistics - collected and aggregated numeric data -
+ is challenging for the following reasons:</p>
+ <ul>
+ <li>Representing statistics requires more complex modeling as
+ discussed by Martin Fowler [<cite><a href="#ref-FOWLER97">FOWLER97</a></cite>]:
+ Recording a statistic simply as an attribute to an object (e.g., the
+ fact that a person weighs 185 pounds) fails with representing
+ important concepts such as quantity, measurement, and unit. Instead,
+ a statistic is modeled as a distinguishable object, an observation.
+ </li>
+ <li>The object describes an observation of a value, e.g., a
+ numeric value (e.g., 185) in case of a measurement or a categorical
+ value (e.g., "blood group A") in case of a categorical observation.</li>
+ <li>To allow correct interpretation of the value, the object can
+ be further described by "dimensions", e.g., the specific phenomenon
+ "weight" observed and the unit "pounds". Given background
+ information, e.g., arithmetical and comparative operations, humans
+ and machines can appropriately visualize such observations or have
+ conversions between different quantities.</li>
+ <li>Also, an observation separates a value from the actual event
+ at which it was collected; for instance, one can describe the
+ "Person" that collected the observation and the "Time" the
+ observation was collected.</li>
+ </ul>
+ The following figure illustrates this specificitiy of modelling in a
+ class diagram:
+
+ <p class="caption">Figure: Illustration of specificities in
+ modelling of a statistic</p>
+
+ <p align="center">
+ <img alt="specificity of modelling a
+ statistic"
+ src="./figures/modeling_quantity_measurement_observation.png"></img>
+ </p>
+
+ <p>
+ The Statistical Data and Metadata eXchange [<cite><a
+ href="#ref-SDMX">SDMX</a></cite>] - the ISO standard for exchanging and
+ sharing of statistical data and metadata among organisations - uses
+ "multidimensional model" that caters for the specificity of modelling
+ statistics. It allows to describe statistics as observations.
+ Observations exhibit values (Measures) that depend on dimensions
+ (Members of Dimensions).
+ </p>
+ <p>Since the SDMX standard has proven applicable in many contexts,
+ the vocabulary adopts the multidimensional model that underlies SDMX
+ and will be compatible to SDMX.</p>
+
+ </section> </section>
+
+ <section>
+ <h2 id="terminology">Terminology</h2>
<p>
<dfn>Statistics</dfn>
is the <a href="http://en.wikipedia.org/wiki/Statistics">study</a> of
- the collection, organization, analysis, and interpretation of data. A
- statistic is a statistical dataset.
+ the collection, organisation, analysis, and interpretation of data.
+ Statistics comprise statistical data.
</p>
<p>
- A
- <dfn>statistical dataset</dfn>
- comprises multidimensional data - a set of observed values organized
- along a group of dimensions, together with associated metadata. Basic
- structure of (aggregated) statistical data is a multidimensional table
- (also called a cube) <a href="#ref-SDMX">[SDMX]</a>.
+
+ The basic structure of
+ <dfn>statistical data</dfn>
+ is a multidimensional table (also called a data cube) [<cite><a
+ href="#ref-SDMX">SDMX</a></cite>], i.e., a set of observed values organized
+ along a group of dimensions, together with associated metadata. If
+ aggregated we refer to statistical data as "macro-data" whereas if
+ not, we refer to "micro-data".
+ </p>
+ <p>
+ Statistical data can be collected in a
+ <dfn>dataset</dfn>
+ , typically published and maintained by an organisation [<cite><a
+ href="#ref-SDMX">SDMX</a></cite>]. The dataset contains metadata, e.g.,
+ about the time of collection and publication or about the maintaining
+ and publishing organisation.
</p>
<p>
@@ -87,7 +178,7 @@
<p>
A
<dfn>publisher</dfn>
- is a person or organization that exposes source data as Linked Data on
+ is a person or organisation that exposes source data as Linked Data on
the Web.
</p>
@@ -96,374 +187,795 @@
<dfn>consumer</dfn>
is a person or agent that uses Linked Data from the Web.
</p>
-
+ <p>
+ A
+ <dfn>registry</dfn>
+ collects metadata about statistical data in a registration fashion.
+ </p>
</section>
<section>
- <h2>Use cases</h2>
- <p>
- This section presents scenarios that would be enabled by the existence
- of a standard vocabulary for the representation of statistics as
- Linked Data. Since a draft of the specification of the cube vocabulary
- has been published, and the vocabulary already is in use, we will call
- this standard vocabulary after its current name RDF Data Cube
- vocabulary (short <a href="#ref-QB">[QB]</a>) throughout the document.
- </p>
- <p>We distinguish between use cases of publishing statistical data,
- and use cases of consuming statistical data since requirements for
- publishers and consumers of statistical data differ.</p>
- <section>
- <h3>Publishing statistical data</h3>
+ <h2 id="usecases">Use cases</h2>
+ <p>This section presents scenarios that are enabled by the
+ existence of a standard vocabulary for the representation of
+ statistics as Linked Data.</p>
<section>
- <h4>Publishing general statistics in a machine-readable and
- application-independent way (UC 1)</h4>
- <p>More and more organizations want to publish statistics on the
+ <h3 id="SDMXWebDisseminationUseCase">SDMX Web Dissemination Use
+ Case</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from SDMX Web
+ Dissemination Use Case [<cite><a href="#ref-SDMX-21">SDMX
+ 2.1</a></cite>])
+ </span>
+ </p>
+ <p>Since we have adopted the multidimensional model that underlies
+ SDMX, we also adopt the "Web Dissemination Use Case" which is the
+ prime use case for SDMX since it is an increasing popular use of SDMX
+ and enables organisations to build a self-updating dissemination
+ system.</p>
+ <p>The Web Dissemination Use Case contains three actors, a
+ structural metadata web service (registry) that collects metadata
+ about statistical data in a registration fashion, a data web service
+ (publisher) that publishes statistical data and its metadata as
+ registered in the structural metadata web service, and a data
+ consumption application (consumer) that first discovers data from the
+ registry, then queries data from the corresponding publisher of
+ selected data, and then visualises the data.</p>
+ <p>In the following, we illustrate the processes from this use case
+ in a flow diagram by SDMX and describe what activities are enabled in
+ this use case by having statistics described in a machine-readable
+ format.</p>
+
+ <p class="caption">
+ Figure: Process flow diagram by SDMX [<cite><a
+ href="#ref-SDMX-21">SDMX 2.1</a></cite>]
+ </p>
+
+ <p align="center">
+ <img alt="SDMX Web Dissemination Use Case"
+ src="./figures/SDMX_Web_Dissemination_Use_Case.png" width="1000px"></img>
+ </p>
+ <p>Benefits:</p>
+ <ul>
+ <li>A structural metadata source (registry) can collect metadata
+ about statistical data.</li>
+
+ <li>A data web service (publisher) can register statistical data
+ in a registry, and can provide statistical data from a database and
+ metadata from a metadata repository for consumers. For that, the
+ publisher creates database tables (see 1 in figure), and loads
+ statistical data in a database and metadata in a metadata repository.</li>
+
+ <li>A consumer can discover data from a registry (3) and
+ automatically can create a query to the publisher for selected
+ statistical data (4).</li>
+
+ <li>The publisher can translate the query to a query to its
+ database (5) as well as metadata repository (6) and return the
+ statistical data and metadata.</li>
+
+ <li>The consumer can visualise the returned statistical data and
+ metadata.</li>
+ </ul>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">There
+ should be a recommended way to communicate the availability of
+ published statistical data to external parties and to allow
+ automatic discovery of statistical data</a></li>
+ </ul>
+
+
+ <p>The SDMX Web Dissemination Use Case can be concretised by
+ several sub-use cases, detailed in the following sections.</p>
+
+ </section> <section>
+ <h3 id="UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher
+ Use Case: UK government financial data from Combined Online
+ Information System (COINS)</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has been
+ summarised from Ian Dickinson et al. [<cite><a
+ href="#ref-COINS">COINS</a></cite>])
+ </span>
+ </p>
+ <p>More and more organisations want to publish statistics on the
web, for reasons such as increasing transparency and trust. Although
in the ideal case, published data can be understood by both humans and
machines, data often is simply published as CSV, PDF, XSL etc.,
lacking elaborate metadata, which makes free usage and analysis
difficult.</p>
-
- <p>The goal in this use case is to use a machine-readable and
- application-independent description of common statistics with use of
- open standards. The use case is fulfilled if QB will be a Linked Data
- vocabulary for encoding statistical data that has a hypercube
- structure and as such can describe common statistics in a
- machine-readable and application-independent way.</p>
-
+ <p>Therefore, the goal in this use case is to use a
+ machine-readable and application-independent description of common
+ statistics with use of open standards, to foster usage and innovation
+ on the published data.</p>
<p>
- An example scenario of this use case has been to publish the Combined
- Online Information System (<a
- href="http://data.gov.uk/resources/coins">COINS</a>). There, HM
- Treasury, the principal custodian of financial data for the UK
- government, released previously restricted information from its
- Combined Online Information System (COINS). Five data files were
- released containing between 3.3 and 4.9 million rows of data. The
- COINS dataset was translated into RDF for two reasons:
+ In the "COINS as Linked Data" project [<cite><a
+ href="#ref-COINS">COINS</a></cite>], the Combined Online Information System
+ (COINS) shall be published using a standard Linked Data vocabulary.
</p>
-
- <ol>
- <li>To publish statistics (e.g., as data files) are too large to
- load into widely available analysis tools such as Microsoft Excel, a
- common tool-of-choice for many data investigators.</li>
- <li>COINS is a highly technical information source, requiring
- both domain and technical skills to make useful applications around
- the data.</li>
- </ol>
- <p>Publishing statistics is challenging for the several reasons:</p>
<p>
- Representing observations and measurements requires more complex
- modeling as discussed by Martin Fowler <a href="#Fowler1997">[Fowler,
- 1997]</a>: Recording a statistic simply as an attribute to an object
- (e.g., a the fact that a person weighs 185 pounds) fails with
- representing important concepts such as quantity, measurement, and
- observation.
+ Via the Combined Online Information System (COINS), <a
+ href="http://www.hm-treasury.gov.uk/psr_coins_data.htm">HM
+ Treasury</a>, the principal custodian of financial data for the UK
+ government, releases previously restricted financial information about
+ government spendings.
</p>
- <p>Quantity comprises necessary information to interpret the value,
- e.g., the unit and arithmetical and comparative operations; humans and
- machines can appropriately visualize such quantities or have
- conversions between different quantities.</p>
-
- <p>A Measurement separates a quantity from the actual event at
- which it was collected; a measurement assigns a quantity to a specific
- phenomenon type (e.g., strength). Also, a measurement can record
- metadata such as who did the measurement (person), and when was it
- done (time).</p>
- <p>Observations, eventually, abstract from measurements only
- recording numeric quantities. An Observation can also assign a
- category observation (e.g., blood group A) to an observation. Figure
- demonstrates this relationship.</p>
+ <p>According to the COINS as Linked Data project, the reason for
+ publishing COINS as Linked Data are threefold:</p>
+ <ul>
+ <li>
+ <ul>
+ <li>using open standard representation makes it easier to work
+ with the data with available technologies and promises innovative
+ third-party tools and usages</li>
+ <li>individual transactions and groups of transactions are
+ given an identity, and so can be referenced by web address (URL),
+ to allow them to be discussed, annotated, or listed as source data
+ for articles or visualizations</li>
+ <li>cross-links between linked-data datasets allow for much
+ richer exploration of related datasets</li>
+ </ul>
+ </li>
+ <li>The COINS data has a hypercube structure. It describes
+ financial transactions using seven independent dimensions (time,
+ data-type, department etc.) and one dependent measure (value). Also,
+ it allows thirty-three attributes that may further describe each
+ transaction. For further information, see the "COINS as Linked Data"
+ project website.</li>
+ <li>COINS is an example of one of the more complex statistical
+ datasets being publishing via data.gov.uk.</li>
+ <li>Part of the complexity of COINS arises from the nature of the
+ data being released.</li>
+ <li>The published COINS datasets cover expenditure related to
+ five different years (2005–06 to 2009–10). The actual COINS database
+ at HM Treasury is updated daily. In principle at least, multiple
+ snapshots of the COINS data could be released through the year.</li>
+ </ul>
+
+ <p>The COINS use case leads to the following challenges:</p>
+ <ul>
+ <li>The actual data and its hypercube structure are to be
+ represented separately so that an application first can examine the
+ structure before deciding to download the actual data, i.e., the
+ transactions. The hypercube structure also defines for each dimension
+ and attribute a range of permitted values that are to be represented.</li>
+ <li>An access or query interface to the COINS data, e.g., via a
+ SPARQL endpoint or the linked data API, is planned. Queries that are
+ expected to be interesting are: "spending for one department", "total
+ spending by department", "retrieving all data for a given
+ observation",</li>
+ <li>Also, the publisher favours a representation that is both as
+ self-descriptive as possible, i.e., others can link to and download
+ fully-described individual transactions and as compact as possible,
+ i.e., information is not unnecessarily repeated.</li>
+ <li>Moreover, the publisher is thinking about the possible
+ benefit of publishing slices of the data, e.g., datasets that fix all
+ dimensions but the time dimension. For instance, such slices could be
+ particularly interesting for visualisations or comments. However,
+ depending on the number of Dimensions, the number of possible slices
+ can become large which makes it difficult to select all interesting
+ slices.</li>
+ <li>An important benefit of linked data is that we are able to
+ annotate data, at a fine-grained level of detail, to record
+ information about the data itself. This includes where it came from –
+ the provenance of the data – but could include annotations from
+ reviewers, links to other useful resources, etc. Being able to trust
+ that data to be correct and reliable is a central value for
+ government-published data, so recording provenance is a key
+ requirement for the COINS data.</li>
+ <li>A challenge also is the size of the data, especially since it
+ is updated regularly. Five data files already contain between 3.3 and
+ 4.9 million rows of data.</li>
+ </ul>
+ <p>Requirements::</p>
+ <ul>
+ <li><a
+ href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Vocabulary
+ should clarify the use of subsets of observations</a></li>
+ </ul>
+
+ </section> <section>
+ <h3 id="PublishingExcelSpreadsheetsasLinkedData">Publisher Use
+ Case: Publishing Excel Spreadsheets as Linked Data</h3>
<p>
- <div class="fig">
- <a href="figures/modeling_quantity_measurement_observation.png"><img
- src="figures/modeling_quantity_measurement_observation.png"
- alt="Modeling quantity, measurement, observation" /> </a>
- <div>Modeling quantity, measurement, observation</div>
- </div>
- </div>
+ <span style="font-size: 10pt">(Part of this use case has been
+ contributed by Rinke Hoekstra. See <a
+ href="http://ehumanities.nl/ceda_r/">CEDA_R</a> and <a
+ href="http://www.data2semantics.org/">Data2Semantics</a> for more
+ information.)
+ </span>
</p>
- <p>QB deploys the multidimensional model (made of observations with
- Measures depending on Dimensions and Dimension Members, and further
- contextualized by Attributes) and should cater for these complexity in
- modelling.</p>
- <p>Another challenge is that for brevity reasons and to avoid
- repetition, it is useful to have abbreviation mechanisms such as
- assigning overall valid properties of observations at the dataset or
- slice level, and become implicitly part of each observation. For
- instance, in the case of COINS, all of the values are in thousands of
- pounds sterling. However, one of the use cases for the linked data
- version of COINS is to allow others to link to individual
- observations, which suggests that these observations should be
- standalone and self-contained – and should therefore have explicit
- multipliers and units on each observation. One suggestion is to author
- data without the duplication, but have the data publication tools
- "flatten" the compact representation into standalone observations
- during the publication process.</p>
- <p>A further challenge is related to slices of data. Slices of data
- group observations that are of special interest, e.g., slices
- unemployment rates per year of a specific gender are suitable for
- direct visualization in a line diagram. However, depending on the
- number of Dimensions, the number of possible slices can become large
- which makes it difficult to select all interesting slices. Therefore,
- and because of their additional complexity, not many publishers create
- slices. In fact, it is somewhat unclear at this point which slices
- through the data will be useful to (COINS-RDF) users.</p>
- <p>Unanticipated Uses (optional): -</p>
- <p>Existing Work (optional): -</p>
-
- </section> <section>
- <h4>Publishing one or many MS excel spreadsheet files with
- statistical data on the web (UC 2)</h4>
<p>Not only in government, there is a need to publish considerable
amounts of statistical data to be consumed in various (also
unexpected) application scenarios. Typically, Microsoft Excel sheets
are made available for download. Those excel sheets contain single
spreadsheets with several multidimensional data tables, having a name
and notes, as well as column values, row values, and cell values.</p>
- <p>The goal in this use case is to to publish spreadsheet
- information in a machine-readable format on the web, e.g., so that
- crawlers can find spreadsheets that use a certain column value. The
- published data should represent and make available for queries the
- most important information in the spreadsheets, e.g., rows, columns,
- and cell values. QB should provide the level of detail that is needed
- for such a transformation in order to fulfil this use case.</p>
- <p>In a possible use case scenario an institution wants to develop
- or use a software that transforms their excel sheets into the
- appropriate format.</p>
+ <p>Benefits:</p>
+ <ul>
+ <li>The goal in this use case is to to publish spreadsheet
+ information in a machine-readable format on the web, e.g., so that
+ crawlers can find spreadsheets that use a certain column value. The
+ published data should represent and make available for queries the
+ most important information in the spreadsheets, e.g., rows, columns,
+ and cell values.</li>
+ <li>For instance, in the <a href="http://ehumanities.nl/ceda_r/">CEDA_R</a>
+ and <a href="http://www.data2semantics.org/">Data2Semantics</a>
+ projects publishing and harmonizing Dutch historical census data
+ (from 1795 onwards) is a goal. These censuses are now only available
+ as Excel spreadsheets (obtained by data entry) that closely mimic the
+ way in which the data was originally published and shall be published
+ as Linked Data.
+ </li>
+ </ul>
+ <p>Challenges in this use case:</p>
- <p class="editorsnote">@@TODO: Concrete example needed.</p>
- <p>Challenges of this use case are:</p>
<ul>
+ <li>All context and so all meaning of the measurement point is
+ expressed by means of dimensions. The pure number is the star of an
+ ego-network of attributes or dimensions. In a RDF representation it
+ is then easily possible to define hierarchical relationships between
+ the dimensions (that can be exemplified further) as well as mapping
+ different attributes across different value points. This way a
+ harmonization among variables is performed around the measurement
+ points themselves.</li>
+ <li>In historical research, until now, harmonization across
+ datasets is performed by hand, and in subsequent iterations of a
+ database: it is very hard to trace back the provenance of decisions
+ made during the harmonization procedure.</li>
+ <li>Combining Data Cube with SKOS [<cite><a
+ href="#ref-skos">SKOS</a></cite>] to allow for cross-location and
+ cross-time historical analysis
+ </li>
+ <li>Novel visualisation of census data</li>
+ <li>Integration with provenance vocabularies, e.g., PROV-O, for
+ tracking of harmonization steps</li>
+ <li>These challenges may seem to be particular to the field of
+ historical research, but in fact apply to government information at
+ large. Government is not a single body that publishes information at
+ a single point in time. Government consists of multiple (altering)
+ bodies, scattered across multiple levels, jurisdictions and areas.
+ Publishing government information in a consistent, integrated manner
+ requires exactly the type of harmonization required in this use case.</li>
<li>Excel sheets provide much flexibility in arranging
information. It may be necessary to limit this flexibility to allow
automatic transformation.</li>
- <li>There may be many spreadsheets.</li>
+ <li>There are many spreadsheets.</li>
<li>Semi-structured information, e.g., notes about lineage of
data cells, may not be possible to be formalized.</li>
</ul>
- <p>Unanticipated Uses (optional): -</p>
- <p>
- Existing Work (optional): Stats2RDF uses OntoWiki to translate CSV
- into QB <a href="http://aksw.org/Projects/Stats2RDF">[Stats2RDF]</a>.
- </p>
+ <p>Existing work:</p>
+ <ul>
+ <li>Another concrete example is the <a
+ href="http://ontowiki.net/Projects/Stats2RDF?show_comments=1">Stats2RDF</a>
+ project that intends to publish biomedical statistical data that is
+ represented as Excel sheets. Here, Excel files are first translated
+ into CSV and then translated into RDF.
+ </li>
+ <li>Some of the challenges are met by the work on an ISO
+ Extension to SKOS [<cite><a href="#ref-xkos">XKOS</a></cite>].
+ </li>
+ </ul>
+
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Vocabulary
+ should recommend a mechanism to support hierarchical code lists</a></li>
+ <li><a
+ href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">There
+ should be a recommended way of declaring relations between cubes</a></li>
+ </ul>
+
</section> <section>
- <h4>Publishing SDMX as Linked Data (UC 3)</h4>
- <p>The ISO standard for exchanging and sharing statistical data and
- metadata among organizations is Statistical Data and Metadata eXchange
- (SDMX). Since this standard has proven applicable in many contexts, QB
- is designed to be compatible with the multidimensional model that
- underlies SDMX.</p>
- <p class="editorsnote">@@TODO: The QB spec should maybe also use
- the term "multidimensional model" instead of the less clear "cube
- model" term.</p>
- <p>Therefore, it should be possible to re-publish SDMX data using
- QB.</p>
+ <h3
+ id="PublishinghierarchicallystructureddatafromStatsWalesandOpenDataCommunities">Publisher
+ Use Case: Publishing hierarchically structured data from StatsWales
+ and Open Data Communities</h3>
<p>
- The scenario for this use case is Eurostat <a
- href="http://epp.eurostat.ec.europa.eu/">[EUROSTAT]</a>, which
+ <span style="font-size: 10pt">(Use case has been taken from [<cite><a
+ href="#ref-QB4OLAP">QB4OLAP</a></cite>] and from discussions at <a
+ href="http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f">publishing-statistical-data
+ mailing list</a>)
+ </span>
+ </p>
+
+ <p>It often comes up in statistical data that you have some kind of
+ 'overall' figure, which is then broken down into parts.</p>
+
+ <p>Example (in pseudo-turtle RDF):</p>
+ <pre>
+ex:obs1
+ sdmx:refArea <uk>;
+ sdmx:refPeriod "2011";
+ ex:population "60" .
+ex:obs2
+ sdmx:refArea <england>;
+ sdmx:refPeriod "2011";
+ ex:population "50" .
+ex:obs3
+ sdmx:refArea <scotland>;
+ sdmx:refPeriod "2011";
+ ex:population "5" .
+ex:obs4
+ sdmx:refArea <wales>;
+ sdmx:refPeriod "2011";
+ ex:population "3" .
+ex:obs5
+ sdmx:refArea <northernireland>;
+ sdmx:refPeriod "2011";
+ ex:population "2" .
+ </pre>
+
+ <p>
+ We are looking for the best way (in the context of the RDF/Data
+ Cube/SDMX approach) to express that the values for the
+ England/Scotland/Wales/ Northern Ireland ought to add up to the value
+ for the UK and constitute a more detailed breakdown of the overall UK
+ figure? Since we might also have population figures for France,
+ Germany, EU27, it is not as simple as just taking a
+ <code>qb:Slice</code>
+ where you fix the time period and the measure.
+ </p>
+
+ <p>
+ Similarly, Etcheverry and Vaisman [<cite><a href="#ref-QB4OLAP">QB4OLAP</a></cite>]
+ present the use case to publish household data from <a
+ href="http://statswales.wales.gov.uk/index.htm">StatsWales</a> and <a
+ href="http://opendatacommunities.org/doc/dataset/housing/household-projections">Open
+ Data Communities</a>.
+ </p>
+
+ <p>This multidimensional data contains for each fact a time
+ dimension with one level Year and a location dimension with levels
+ Unitary Authority, Government Office Region, Country, and ALL.</p>
+
+ <p>As unit, units of 1000 households is used.</p>
+
+ <p>In this use case, one wants to publish not only a dataset on the
+ bottom most level, i.e. what are the number of households at each
+ Unitary Authority in each year, but also a dataset on more aggregated
+ levels.</p>
+
+ <p>For instance, in order to publish a dataset with the number of
+ households at each Government Office Region per year, one needs to
+ aggregate the measure of each fact having the same Government Office
+ Region using the SUM function.</p>
+
+ <p>Importantly, one would like to maintain the relationship between
+ the resulting datasets, i.e., the levels and aggregation functions.</p>
+
+ <p>Again, this use case does not simply need a selection (or "dice"
+ in OLAP context) where one fixes the time period dimension.</p>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Vocabulary
+ should recommend a mechanism to support hierarchical code lists</a></li>
+ </ul>
+
+
+ </section> <section>
+ <h3 id="PublishingslicesofdataaboutUKBathingWaterQuality">Publisher
+ Use Case: Publishing slices of data about UK Bathing Water Quality</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case has been provided by
+ Epimorphics Ltd, in their <a
+ href="http://www.epimorphics.com/web/projects/bathing-water-quality">UK
+ Bathing Water Quality</a> deployment)
+ </span>
+ </p>
+ <p>
+ As part of their work with data.gov.uk and the UK Location Programme
+ Epimorphics Ltd have been working to pilot the publication of both
+ current and historic bathing water quality information from the <a
+ href="http://www.environment-agency.gov.uk/">UK Environment
+ Agency</a> as Linked Data.
+ </p>
+ <p>The UK has a number of areas, typically beaches, that are
+ designated as bathing waters where people routinely enter the water.
+ The Environment Agency monitors and reports on the quality of the
+ water at these bathing waters.</p>
+ <p>The Environement Agency's data can be thought of as structured
+ in 3 groups:</p>
+ <ul>
+ <li>There is basic reference data describing the bathing waters
+ and sampling points</li>
+ <li>There is a data set "Annual Compliance Assessment Dataset"
+ giving the rating for each bathing water for each year it has been
+ monitored</li>
+ <li>There is a data set "In-Season Sample Assessment Dataset"
+ giving the detailed weekly sampling results for each bathing water</li>
+ </ul>
+ <p>The most important dimensions of the data are bathing water,
+ sampling point, and compliance classification.</p>
+ <p>Challenges:</p>
+ <ul>
+ <li>Observations may exhibit a number of attributes, e.g.,
+ whether ther was an abnormal weather exception.</li>
+ <li>Relevant slices of both datasets are to be created:
+ <ul>
+ <li>Annual Compliance Assessment Dataset: all the observations
+ for a specific sampling point, all the observations for a specific
+ year.</li>
+ <li>In-Season Sample Assessment Dataset: samples for a given
+ sampling point, samples for a given week, samples for a given year,
+ samples for a given year and sampling point, latest samples for
+ each sampling point.</li>
+ <li>The use case suggests more arbitrary subsets of the
+ observations, e.g., collecting all the "latest" observations in a
+ continuously updated data set.</li>
+ </ul>
+
+
+ </li>
+ </ul>
+ <p>Existing Work:</p>
+ <ul>
+ <li>The <a href="http://purl.oclc.org/NET/ssnx/ssn">Semantic
+ Sensor Network ontology</a> (SSN) already provides a way to publish
+ sensor information. SSN data provides statistical Linked Data and
+ grounds its data to the domain, e.g., sensors that collect
+ observations (e.g., sensors measuring average of temperature over
+ location and time).
+ </li>
+ <li>A number of organisations, particularly in the Climate and
+ Meteorological area already have some commitment to the OGC
+ "Observations and Measurements" (O&M) logical data model, also
+ published as ISO 19156.</li>
+ </ul>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Vocabulary
+ should define relationship to ISO19156 - Observations & Measurements</a></li>
+ <li><a
+ href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Vocabulary
+ should clarify the use of subsets of observations</a></li>
+ </ul>
+
+
+ </section> <section>
+ <h3 id="EurostatSDMXasLinkedData">Publisher Use Case: Eurostat
+ SDMX as Linked Data</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has been taken
+ from <a href="http://estatwrap.ontologycentral.com/">Eurostat
+ Linked Data Wrapper</a> and <a
+ href="http://eurostat.linked-statistics.org/">Linked Statistics
+ Eurostat Data</a>, both deployments for publishing Eurostat SDMX as
+ Linked Data using the draft version of the data cube vocabulary)
+ </span>
+ </p>
+
+ <p>
+ As mentioned already, the ISO standard for exchanging and sharing
+ statistical data and metadata among organisations is Statistical Data
+ and Metadata eXchange [<cite><a href="#ref-SDMX">SDMX</a></cite>].
+ Since this standard has proven applicable in many contexts, we adopt
+ the multidimensional model that underlies SDMX and intend the standard
+ vocabulary to be compatible to SDMX.
+ </p>
+
+ <p>
+ Therefore, in this use case we intend to explain the benefit and
+ challenges of publishing SDMX data as Linked Data. As one of the main
+ adopters of SDMX, <a href="http://epp.eurostat.ec.europa.eu/">Eurostat</a>
publishes large amounts of European statistics coming from a data
warehouse as SDMX and other formats on the web. Eurostat also provides
an interface to browse and explore the datasets. However, linking such
multidimensional data to related data sets and concepts would require
- download of interesting datasets and manual integration.
- </p>
- <p>The goal of this use case is to improve integration with other
- datasets; Eurostat data should be published on the web in a
- machine-readable format, possible to be linked with other datasets,
- and possible to be freeley consumed by applications. This use case is
- fulfilled if QB can be used for publishing the data from Eurostat as
- Linked Data for integration.</p>
- <p>A publisher wants to make available Eurostat data as Linked
- Data. The statistical data shall be published as is. It is not
- necessary to represent information for validation. Data is read from
- tsv only. There are two concrete examples of this use case: Eurostat
- Linked Data Wrapper (http://estatwrap.ontologycentral.com/), and
- Linked Statistics Eurostat Data
- (http://eurostat.linked-statistics.org/). They have slightly different
- focus (e.g., with respect to completeness, performance, and agility).
+ downloading of interesting datasets and manual integration.The goal
+ here is to improve integration with other datasets; Eurostat data
+ should be published on the web in a machine-readable format, possible
+ to be linked with other datasets, and possible to be freeley consumed
+ by applications. Both <a href="http://estatwrap.ontologycentral.com/">Eurostat
+ Linked Data Wrapper</a> and <a
+ href="http://eurostat.linked-statistics.org/">Linked Statistics
+ Eurostat Data</a> intend to publish <a
+ href="http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/">Eurostat
+ SDMX data</a> as <a href="http://5stardata.info/">5-star Linked Open
+ Data</a>. Eurostat data is partly published as SDMX, partly as tabular
+ data (TSV, similar to CSV). Eurostat provides a <a
+ href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">TOC
+ of published datasets</a> as well as a feed of modified and new datasets.
+
+ Eurostat provides a list of used codelists, i.e., <a
+ href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&dir=dic">range
+ of permitted dimension values</a>. Any Eurostat dataset contains a
+ varying set of dimensions (e.g., date, geo, obs_status, sex, unit) as
+ well as measures (generic value, content is specified by dataset,
+ e.g., GDP per capita in PPS, Total population, Employment rate by
+ sex).
</p>
- <p>Challenges of this use case are:</p>
- <ul>
- <li>There are large amounts of SDMX data; the Eurostat dataset
- comprises 350 GB of data. This may influence decisions about toolsets
- and architectures to use. One important task is to decide whether to
- structure the data in separate datasets.</li>
- <li>Again, the question comes up whether slices are useful.</li>
- </ul>
- <p>Unanticipated Uses (optional): -</p>
- <p>Existing Work (optional): -</p>
- </section> <section>
- <h4>Publishing sensor data as statistics (UC 4)</h4>
- <p>Typically, multidimensional data is aggregated. However, there
- are cases where non-aggregated data needs to be published, e.g.,
- observational, sensor network and forecast data sets. Such raw data
- may be available in RDF, already, but using a different vocabulary.</p>
- <p>The goal of this use case is to demonstrate that publishing of
- aggregate values or of raw data should not make much of a difference
- in QB.</p>
- <p>
- For example the Environment Agency uses it to publish (at least
- weekly) information on the quality of bathing waters around England
- and Wales <A
- href="http://www.epimorphics.com/web/wiki/bathing-water-quality-structure-published-linked-data">[EnvAge]</A>.
- In another scenario DERI tracks from measurements about printing for a
- sustainability report. In the DERI scenario, raw data (number of
- printouts per person) is collected, then aggregated on a unit level,
- and then modelled using QB.
- </p>
- <p>Problems and Limitations:</p>
- <ul>
- <li>This use case also shall demonstrate how to link statistics
- with other statistics or non-statistical data (metadata).</li>
- </ul>
- <p>Unanticipated Uses (optional): -</p>
- <p>
- Existing Work (optional): Semantic Sensor Network ontology <A
- href="http://purl.oclc.org/NET/ssnx/ssn">[SSN]</A> already provides a
- way to publish sensor information. SSN data provides statistical
- Linked Data and grounds its data to the domain, e.g., sensors that
- collect observations (e.g., sensors measuring average of temperature
- over location and time). A number of organizations, particularly in
- the Climate and Meteorological area already have some commitment to
- the OGC "Observations and Measurements" (O&M) logical data model, also
- published as ISO 19156. The QB spec should maybe also prefer the term
- "multidimensional model" instead of the less clear "cube model" term.
-
- <p class="editorsnote">@@TODO: Are there any statements about
- compatibility and interoperability between O&M and Data Cube that can
- be made to give guidance to such organizations?</p>
- </p>
- </section> <section>
- <h4>Registering statistical data in dataset catalogs (UC 5)</h4>
- <p>
- After statistics have been published as Linked Data, the question
- remains how to communicate the publication and let users find the
- statistics. There are catalogs to register datasets, e.g., CKAN, <a
- href="http://www.datacite.org/datacite.org">datacite.org</a>, <a
- href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a
- href="http://pangaea.de/">Pangea</a>. Those catalogs require specific
- configurations to register statistical data.
- </p>
- <p>The goal of this use case is to demonstrate how to expose and
- distribute statistics after modeling using QB. For instance, to allow
- automatic registration of statistical data in such catalogs, for
- finding and evaluating datasets. To solve this issue, it should be
- possible to transform QB data into formats that can be used by data
- catalogs.</p>
+ <p>Benefits:</p>
- <p class="editorsnote">@@TODO: Find specific use case scenario or
- ask how other publishers of QB data have dealt with this issue Maybe
- relation to DCAT?</p>
- <p>Problems and Limitations: -</p>
- <p>Unanticipated Uses (optional): If data catalogs contain
- statistics, they do not expose those using Linked Data but for
- instance using CSV or HTML (Pangea [11]). It could also be a use case
- to publish such data using QB.</p>
- <p>Existing Work (optional): -</p>
+ <ul>
+ <li>Possible implementation of ETL pipelines based on Linked Data
+ technologies (e.g., <a href="http://code.google.com/p/ldspider/">LDSpider</a>)
+ to effectively load the data into a data warehouse for analysis
+ </li>
+
+ <li>Allows useful queries to the data, e.g., comparison of
+ statistical indicators across EU countries.</li>
+
+ <li>Allows to attach contextual information to statistics during
+ the interpretation process.</li>
+
+ <li>Allows to reuse single observations from the data.</li>
+
+ <li>Linking to information from other data sources, e.g., for
+ geo-spatial dimension.
+ </ul>
+
+ <p>Challenges:</p>
+
+ <ul>
+ <li>New Eurostat datasets are added regularly to Eurostat. The
+ Linked Data representation should automatically provide access to the
+ most-up-to-date data.</li>
+
+ <li>How to match elements of the geo-spatial dimension to
+ elements of other data sources, e.g., NUTS, GADM.</li>
+
+ <li>There is a large number of Eurostat datasets, each possibly
+ containing a large number of columns (dimensions) and rows
+ (observations). Eurostat publishes more than 5200 datasets, which,
+ when converted into RDF require more than 350GB of disk space
+ yielding a dataspace with some 8 billion triples.</li>
+
+ <li>In the Eurostat Linked Data Wrapper, there is a timeout for
+ transforming SDMX to Linked Data, since Google App Engine is used.
+ Mechanisms to reduce the amount of data that needs to be translated
+ would be needed.</li>
+
+ <li>Provide a useful interface for browsing and visualising the
+ data. One problem is that the data sets have to high dimensionality
+ to be displayed directly. Instead, one could visualise slices of time
+ series data. However, for that, one would need to either fix most
+ other dimensions (e.g., sex) or aggregate over them (e.g., via
+ average). The selection of useful slices from the large number of
+ possible slices is a challenge.</li>
+
+ <li>Each dimension used by a dataset has a range of permitted
+ values that need to be described.</li>
+
+ <li>The Eurostat SDMX as Linked Data use case suggests to have
+ time lines on data aggregating over the gender dimension.</li>
+
+ <li>The Eurostat SDMX as Linked Data use case suggests to provide
+ data on a gender level and on a level aggregating over the gender
+ dimension.</li>
+
+ <li>Updates to the data
+
+ <ul>
+ <li>Eurostat - Linked Data pulls in changes from the original
+ Eurostat dataset on weekly basis and conversion process runs every
+ Saturday at noon taking into account new datasets along with
+ updates to existing datasets.</li>
+ <li>Eurostat Linked Data Wrapper on-the-fly translates Eurostat
+ datasets into RDF so that always the most current data is used. The
+ problem is only to point users towards the URIs of Eurostat
+ datasets: Estatwrap provides a feed of modified and new <a
+ href="http://estatwrap.ontologycentral.com/feed.rdf">datasets</a>.
+ Also, it provides a <a
+ href="http://estatwrap.ontologycentral.com/table_of_contents.html">TOC</a>
+ that could be automatically updated from the <a
+ href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">Eurostat
+ TOC</a>.
+ </li>
+ </ul>
+
+
+ </li>
+
+ <li>Query interface</li>
+
+ <ul>
+ <li>Eurostat - Linked Data provides SPARQL endpoint for the
+ metadata (not the observations).</li>
+ <li>Eurostat Linked Data Wrapper allows and demonstrates how to
+ use Qcrumb.com to query the data.</li>
+ </ul>
+
+ <li>Browsing and visualising interface:
+ <ul>
+ <li>Eurostat Linked Data Wrapper provides for each dataset an
+ HTML page showing a visualisation of the data.</li>
+ </ul>
+
+
+ </li>
+ </ul>
+
+ <p>Non-requirements:</p>
+ <ul>
+ <li>One possible application would run validation checks over
+ Eurostat data. The intended standard vocabulary is to publish the
+ Eurostat data as-is and is not intended to represent information for
+ validation (similar to business rules).</li>
+ <li>Information of how to match elements of the geo-spatial
+ dimension to elements of other data sources, e.g., NUTS, GADM, is not
+ part of a vocabulary recommendation.</li>
+ </ul>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a href="#VocabularyshouldbuildupontheSDMXinformationmodel">There
+ should be mechanisms and recommendations regarding publication and
+ consumption of large amounts of statistical data</a></li>
+ <li><a
+ href="#Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">There
+ should be a recommended mechanism to allow for publication of
+ aggregates which cross multiple dimensions</a></li>
+ </ul>
</section> <section>
- <h4>Making transparent transformations on or different versions of
- statistical data (UC 6)</h4>
- <p>Statistical data often is used and further transformed for
- analysis and reporting. There is the risk that data has been
- incorrectly transformed so that the result is not interpretable any
- more. Therefore, if statistical data has been derived from other
- statistical data, this should be made transparent.</p>
- <p>The goal of this use case is to describe provenance and
- versioning around statistical data, so that the history of statistics
- published on the web becomes clear. This may also relate to the issue
- of having relationships between datasets published using QB. To fulfil
- this use case QB should recommend specific approaches to transforming
- and deriving of datasets which can be tracked and stored with the
- statistical data.</p>
+ <h3 id="Representingrelationshipsbetweenstatisticaldata">Publisher
+ Use Case: Representing relationships between statistical data</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has mainly been
+ taken from the COINS project [<cite><a href="#ref-COINS">COINS</a></cite>])
+ </span>
+ </p>
- <p>A simple specific use case is that the Welsh Assembly government
+ <p>In several applications, relationships between statistical data
+ need to be represented.</p>
+
+ <p>The goal of this use case is to describe provenance,
+ transformations, and versioning around statistical data, so that the
+ history of statistics published on the web becomes clear. This may
+ also relate to the issue of having relationships between datasets
+ published.</p>
+
+ <p>
+ For instance, the COINS project [<cite><a href="#ref-COINS">COINS</a></cite>]
+ has at least four perspectives on what they mean by “COINS” data: the
+ abstract notion of “all of COINS”, the data for a particular year, the
+ version of the data for a particular year released on a given date,
+ and the constituent graphs which hold both the authoritative data
+ translated from HMT’s own sources. Also, additional supplementary
+ information which they derive from the data, for example by
+ cross-linking to other datasets.
+ </p>
+
+ <p>Another specific use case is that the Welsh Assembly government
publishes a variety of population datasets broken down in different
ways. For many uses then population broken down by some category (e.g.
ethnicity) is expressed as a percentage. Separate datasets give the
actual counts per category and aggregate counts. In such cases it is
common to talk about the denominator (often DENOM) which is the
aggregate count against which the percentages can be interpreted.</p>
- <p>Challenges of this use case are:</p>
+
+ <p>
+ Another example for representing relationships between statistical
+ data are transformations on datasets, e.g., addition of derived
+ measures, conversion of units, aggregations, OLAP operations, and
+ enrichment of statistical data. A concrete example is given by Freitas
+ et al. [<cite><a href="#ref-COGS">COGS</a></cite>] and illustrated in
+ the following figure.
+ </p>
+
+ <p class="caption">Figure: Illustration of ETL workflows to process
+ statistics</p>
+
+ <p align="center">
+ <img alt="COGS relationships between statistics example"
+ src="./figures/Relationships_Statistical_Data_Cogs_Example.png"></img>
+ </p>
+
+ <p>Here, numbers from a sustainability report have been created by
+ a number of transformations to statistical data. Different numbers
+ (e.g., 600 for year 2009 and 503 for year 2010) might have been
+ created differently, leading to different reliabilities to compare
+ both numbers.</p>
+ <p>Benefits:</p>
+
+ <p>Making transparent the transformation a dataset has been exposed
+ to. Increases trust in the data.</p>
+
+ <p>Challenges:</p>
+
<ul>
<li>Operations on statistical data result in new statistical
- data, depending on the operation. For intance, in terms of Data Cube,
- operations such as slice, dice, roll-up, drill-down will result in
- new Data Cubes. This may require representing general relationships
- between cubes (as discussed here: [12]).</li>
+ data, depending on the operation. For instance, in terms of Data
+ Cube, operations such as slice, dice, roll-up, drill-down will result
+ in new Data Cubes. This may require representing general
+ relationships between cubes (as discussed in the <a
+ href="http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/75762788de10de95">publishing-statistical-data
+ mailing list</a>).
+ </li>
<li>Should Data Cube support explicit declaration of such
relationships either between separated qb:DataSets or between
- measures with a single qb:DataSet (e.g. ex:populationCount and
- ex:populationPercent)?</li>
+ measures with a single <code>qb:DataSet</code> (e.g. <code>ex:populationCount</code>
+ and <code>ex:populationPercent</code>)?
+ </li>
<li>If so should that be scoped to simple, common relationships
like DENOM or allow expression of arbitrary mathematical relations?</li>
</ul>
- <p>Unanticipated Uses (optional): -</p>
- <p>Existing Work (optional): Possible relation to Best Practices
- part on Versioning [13], where it is specified how to publish data
- which has multiple versions.</p>
-
- </section></section> <section>
- <h3>Consuming published statistical data</h3>
+ <p>
+ Existing Work:
+ <ul>
+ <li>Possible relation to <a
+ href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary#Versioning">Versioning</a>
+ part of GLD Best Practices Document, where it is specified how to
+ publish data which has multiple versions.
+ </li>
+ <li>The <a href="http://sites.google.com/site/cogsvocab/">COGS</a>
+ vocabulary [<cite><a href="#ref-COGS">COGS</a></cite>] is related to
+ this use case since it may complement the standard vocabulary for
+ representing ETL pipelines processing statistics.
+ </li>
+ </ul>
+ </p>
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">There
+ should be a recommended way of declaring relations between cubes</a></li>
+ </ul>
- <section>
- <h4>Simple chart visualizations of (integrated) published
- statistical datasets (UC 7)</h4>
+ </section> <section>
+ <h3 id="Simplechartvisualisationsofpublishedstatisticaldata">Consumer
+ Use Case: Simple chart visualisations of (integrated) published
+ statistical data</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a
+ href="http://www.iwrm-smart.org/">SMART research project</a>)
+ </span>
+ </p>
+
<p>Data that is published on the Web is typically visualized by
transforming it manually into CSV or Excel and then creating a
visualization on top of these formats using Excel, Tableau,
RapidMiner, Rattle, Weka etc.</p>
<p>This use case shall demonstrate how statistical data published
- on the web can be directly visualized, without using commercial or
- highly-complex tools. This use case is fulfilled if data that is
- published in QB can be directly visualized inside a webpage.</p>
- <p>An example scenario is environmental research done within the
- SMART research project (http://www.iwrm-smart.org/). Here, statistics
- about environmental aspects (e.g., measurements about the climate in
- the Lower Jordan Valley) shall be visualized for scientists and
- decision makers. Statistics should also be possible to be integrated
- and displayed together. The data is available as XML files on the web.
- On a separate website, specific parts of the data shall be queried and
- visualized in simple charts, e.g., line diagrams. The following figure
- shows the wanted display of an environmental measure over time for
- three regions in the lower Jordan valley; displayed inside a web page:</p>
-
+ on the web can be with few effort visualized inside a webpage, without
+ using commercial or highly-complex tools.</p>
<p>
- <div class="fig">
- <a href="figures/Level_above_msl_3_locations.png"><img
- width="800px" src="figures/Level_above_msl_3_locations.png"
- alt="Line chart visualization of QB data" /> </a>
- <div>Line chart visualization of QB data</div>
- </div>
- </div>
+ An example scenario is environmental research done within the <a
+ href="http://www.iwrm-smart.org/">SMART research project</a>. Here,
+ statistics about environmental aspects (e.g., measurements about the
+ climate in the Lower Jordan Valley) shall be visualized for scientists
+ and decision makers. Statistics should also be possible to be
+ integrated and displayed together. The data is available as XML files
+ on the web. On a separate website, specific parts of the data shall be
+ queried and visualized in simple charts, e.g., line diagrams.
</p>
- <p>The following figure shows the same measures in a pivot table.
- Here, the aggregate COUNT of measures per cell is given.</p>
+ <p class="caption">Figure: HTML embedded line chart of an
+ environmental measure over time for three regions in the lower Jordan
+ valley</p>
- <p>
- <div class="fig">
- <a href="figures/pivot_analysis_measurements.PNG"><img
- src="figures/pivot_analysis_measurements.PNG"
- alt="Pivot analysis measurements" /> </a>
- <div>Pivot analysis measurements</div>
- </div>
- </div>
+ <p align="center">
+ <img
+ alt="display of an environmental measure over time for three regions in the lower Jordan valley"
+ src="./figures/Level_above_msl_3_locations.png" width="1000px"></img>
</p>
- <p>The use case uses Google App Engine, Qcrumb.com, and Spark. An
- example of a line diagram is given at [14] (some loading time needed).
- Current work tries to integrate current datasets with additional data
- sources, and then having queries that take data from both datasets and
- display them together.</p>
+ <p class="caption">Figure: Showing the same data in a pivot table.
+ Here, the aggregate COUNT of measures per cell is given.</p>
+ <p align="center">
+ <img
+ alt="Figure: Showing the same data in a pivot
+ table. Here, the aggregate COUNT of measures per cell is given."
+ src="./figures/pivot_analysis_measurements.PNG"></img>
+ </p>
<p>Challenges of this use case are:</p>
<ul>
<li>The difficulties lay in structuring the data appropriately so
@@ -473,347 +985,533 @@
be represented.</li>
<li>Integration becomes much more difficult if publishers use
different measures, dimensions.</li>
-
</ul>
- <p>Unanticipated Uses (optional): -</p>
- <p>Existing Work (optional): -</p>
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There
+ should be criteria for well-formedness and assumptions consumers can
+ make about published data</a></li>
+ </ul>
+
</section> <section>
- <h4>Uploading published statistical data in Google Public Data
- Explorer (UC 8)</h4>
- <p>Google Public Data Explorer (GPDE -
- http://code.google.com/apis/publicdata/) provides an easy possibility
- to visualize and explore statistical data. Data needs to be in the
- Dataset Publishing Language (DSPL -
- https://developers.google.com/public-data/overview) to be uploaded to
- the data explorer. A DSPL dataset is a bundle that contains an XML
- file, the schema, and a set of CSV files, the actual data. Google
- provides a tutorial to create a DSPL dataset from your data, e.g., in
- CSV. This requires a good understanding of XML, as well as a good
- understanding of the data that shall be visualized and explored.</p>
- <p>In this use case, it shall be demonstrate how to take any
- published QB dataset and to transform it automatically into DSPL for
- visualization and exploration. A dataset that is published conforming
- to QB will provide the level of detail that is needed for such a
- transformation.</p>
- <p>In an example scenario, a publisher P has published data using
- QB. There are two different ways to fulfil this use case: 1) A
- customer C is downloading this data into a triple store; SPARQL
- queries on this data can be used to transform the data into DSPL and
- uploaded and visualized using GPDE. 2) or, one or more XLST
- transformation on the RDF/XML transforms the data into DSPL.</p>
+ <h3 id="VisualisingpublishedstatisticaldatainGooglePublicDataExplorer">Consumer
+ Use Case: Visualising published statistical data in Google Public Data
+ Explorer</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a
+ href="http://code.google.com/apis/publicdata/">Google Public Data
+ Explorer (GPDE)</a>)
+ </span>
+ </p>
+ <p>
+ <a href="http://code.google.com/apis/publicdata/">Google Public
+ Data Explorer</a> (GPDE) provides an easy possibility to visualize and
+ explore statistical data. Data needs to be in the <a
+ href="https://developers.google.com/public-data/overview">Dataset
+ Publishing Language</a> (DSPL) to be uploaded to the data explorer. A
+ DSPL dataset is a bundle that contains an XML file, the schema, and a
+ set of CSV files, the actual data. Google provides a tutorial to
+ create a DSPL dataset from your data, e.g., in CSV. This requires a
+ good understanding of XML, as well as a good understanding of the data
+ that shall be visualized and explored.
+ </p>
+ <p>In this use case, the goal is to take statistical data published
+ on the web and to transform it into DSPL for visualization and
+ exploration with as few effort as possible.</p>
+ <p>For instance, Eurostat data about Unemployment rate downloaded
+ from the web as shown in the following figure:</p>
+
+ <p class="caption">Figure: An interactive chart in GPDE for
+ visualising Eurostat data described with DSPL</p>
+ <p align="center">
+ <img
+ alt="An interactive chart in GPDE for visualising Eurostat data in the DSPL"
+ src="./figures/Eurostat_GPDE_Example.png" width="1000px"></img>
+ </p>
+
+ <p>Benefits:</p>
+ <ul>
+ <li>If a standard Linked Data vocabulary is used, visualising and
+ exploring new data that already is represented using this vocabulary
+ can easily be done using GPDE.</li>
+ <li>Datasets can be first integrated using Linked Data technology
+ and then analysed using GDPE.</li>
+ </ul>
<p>Challenges of this use case are:</p>
<ul>
+ <li>There are different possible approaches each having
+ advantages and disadvantages: 1) A customer C is downloading this
+ data into a triple store; SPARQL queries on this data can be used to
+ transform the data into DSPL and uploaded and visualized using GPDE.
+ 2) or, one or more XLST transformation on the RDF/XML transforms the
+ data into DSPL.</li>
<li>The technical challenges for the consumer here lay in knowing
where to download what data and how to get it transformed into DSPL
without knowing the data.</li>
- <p>Unanticipated Uses (optional): DSPL is representative for using
- statistical data published on the web in available tools for
- analysis. Similar tools that may be automatically covered are: Weka
- (arff data format), Tableau, etc.</p>
- <p>Existing Work (optional): -</p>
</ul>
- <p>Unanticipated Uses (optional): -</p>
- <p>Existing Work (optional): -</p>
+
+ <p>
+ Non-requirements:
+ <ul>
+ <li>DSPL is representative for using statistical data published
+ on the web in available tools for analysis. Similar tools that may
+ be automatically covered are: Weka (arff data format), Tableau,
+ SPSS, STATA, PC-Axis etc.</li>
+ </ul>
+ </p>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There
+ should be criteria for well-formedness and assumptions consumers can
+ make about published data</a></li>
+ </ul>
</section> <section>
- <h4>Allow Online Analytical Processing on published datasets of
- statistical data (UC 9)</h4>
- <p>Online Analytical Processing [15] is an analysis method on
+ <h3 id="AnalysingpublishedstatisticaldatawithcommonOLAPsystems">Consumer
+ Use Case: Analysing published statistical data with common OLAP
+ systems</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a
+ href="http://xbrl.us/research/appdev/Pages/275.aspx">Financial
+ Information Observation System (FIOS)</a>)
+ </span>
+ </p>
+
+ <p>Online Analytical Processing (OLAP) [<cite><a
+ href="#ref-OLAP">OLAP</a></cite>] is an analysis method on
multidimensional data. It is an explorative analysis methode that
allows users to interactively view the data on different angles
(rotate, select) or granularities (drill-down, roll-up), and filter it
for specific information (slice, dice).</p>
- <p>The multidimensional model used in QB to model statistics should
- be usable by OLAP systems. More specifically, data that conforms to QB
- can be used to define a Data Cube within an OLAP engine and can then
- be queries by OLAP clients.</p>
- <p>An example scenario of this use case is the Financial
- Information Observation System (FIOS) [16], where XBRL data has been
- re-published using QB and made analysable for stakeholders in a
- web-based OLAP client. The following figure shows an example of using
- FIOS. Here, for three different companies, cost of goods sold as
- disclosed in XBRL documents are analysed. As cell values either the
- number of disclosures or - if only one available - the actual number
- in USD is given:</p>
+
+ <p>OLAP systems that first use ETL pipelines to
+ Extract-Load-Transform relevant data for efficient storage and queries
+ in a data warehouse and then allows interfaces to issue OLAP queries
+ on the data are commonly used in industry to analyse statistical data
+ on a regular basis.</p>
<p>
- <div class="fig">
- <a href="figures/FIOS_example.PNG"><img
- src="figures/FIOS_example.PNG" alt="OLAP of QB data" /> </a>
- <div>OLAP of QB data</div>
- </div>
- </div>
+ The goal in this use case is to allow analysis of published
+ statistical data with common OLAP systems [<cite><a
+ href="#ref-OLAP4LD">OLAP4LD</a></cite>]
</p>
- <p>Challenges of this use case are:</p>
+
+ <p>For that a multidimensional model of the data needs to be
+ generated. A multidimensional model consists of facts summarised in
+ data cubes. Facts exhibit measures depending on members of dimensions.
+ Members of dimensions can be further structured along hierarchies of
+ levels.</p>
+
+ <p>
+ An example scenario of this use case is the Financial Information
+ Observation System (FIOS) [<cite><a href="#ref-FIOS">FIOS</a></cite>],
+ where XBRL data provided by the SEC on the web is to be re-published
+ as Linked Data and made possible to explore and analyse by
+ stakeholders in a web-based OLAP client Saiku.
+ </p>
+
+ <p>The following figure shows an example of using FIOS. Here, for
+ three different companies, cost of goods sold as disclosed in XBRL
+ documents are analysed. As cell values either the number of
+ disclosures or - if only one available - the actual number in USD is
+ given:</p>
+
+
+ <p class="caption">Figure: Example of using FIOS for OLAP
+ operations on financial data</p>
+ <p align="center">
+ <img alt="Example of using FIOS for OLAP operations on financial data"
+ src="./figures/FIOS_example.PNG"></img>
+ </p>
+
+ <p>Benefits:</p>
+
<ul>
+ <li>OLAP operations cover typical business requirements, e.g.,
+ slice, dice, drill-down.</li>
+ <li>OLAP frontends intuitive interactive, explorative, fast.
+ Interfaces well-known to many people in industry.</li>
+ <li>OLAP functionality provided by many tools that may be reused</li>
+ </ul>
+
+ <p>Challenges:</p>
+ <ul>
+ <li>ETL pipeline needs to automatically populate a data
+ warehouse. Common OLAP systems use relational databases with a star
+ schema.</li>
<li>A problem lies in the strict separation between queries for
- the structure of data, and queries for actual aggregated values.</li>
+ the structure of data (metadata queries), and queries for actual
+ aggregated values (OLAP operations).</li>
<li>Another problem lies in defining Data Cubes without greater
insight in the data beforehand.</li>
<li>Depending on the expressivity of the OLAP queries (e.g.,
aggregation functions, hierarchies, ordering), performance plays an
important role.</li>
- <li>QB allows flexibility in describing statistics, e.g., in
- order to reduce redundancy of information in single observations.
- These alternatives make general consumption of QB data more complex.
- Also, it is not clear, what "conforms" to QB means, e.g., is a
- qb:DataStructureDefinition required?</li>
- <p>Unanticipated Uses (optional): -</p>
- <p>Existing Work (optional): -</p>
+ <li>Olap systems have to cater for possibly missing information
+ (e.g., the aggregation function or a human readable label).</li>
</ul>
- <p>Unanticipated Uses (optional): -</p>
- <p>Existing Work (optional): -</p>
+
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There
+ should be criteria for well-formedness and assumptions consumers can
+ make about published data</a></li>
+ </ul>
</section> <section>
- <h4>Transforming published statistics into XBRL (UC 10)</h4>
- <p>XBRL is a standard data format for disclosing financial
- information. Typically, financial data is not managed within the
- organization using XBRL but instead, internal formats such as excel or
- relational databases are used. If different data sources are to be
- summarized in XBRL data formats to be published, an internally-used
- standard format such as QB could help integrate and transform the data
- into the appropriate format.</p>
- <p>In this use case data that is available as data conforming to QB
- should also be possible to be automatically transformed into such XBRL
- data format. This use case is fulfilled if QB contains necessary
- information to derive XBRL data.</p>
- <p>In an example scenario, DERI has had a use case to publish
- sustainable IT information as XBRL to the Global Reporting Initiative
- (GRI - https://www.globalreporting.org/). Here, raw data (number of
- printouts per person) is collected, then aggregated on a unit level
- and modelled using QB. QB data shall then be used directly to fill-in
- XBRL documents that can be published to the GRI.</p>
- <p>Challenges of this use case are:</p>
+ <h3 id="Registeringpublishedstatisticaldataindatacatalogs">Registry
+ Use Case: Registering published statistical data in data catalogs</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case motivated by <a
+ href="http://www.w3.org/TR/vocab-dcat/">Data Catalog vocabulary</a>)
+ </span>
+ </p>
+
+ <p>
+ After statistics have been published as Linked Data, the question
+ remains how to communicate the publication and let users discover the
+ statistics. There are catalogs to register datasets, e.g., CKAN, <a
+ href="http://www.datacite.org/">datacite.org</a>, <a
+ href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a
+ href="http://pangaea.de/">Pangea</a>. Those catalogs require specific
+ configurations to register statistical data.
+ </p>
+
+ <p>The goal of this use case is to demonstrate how to expose and
+ distribute statistics after publication. For instance, to allow
+ automatic registration of statistical data in such catalogs, for
+ finding and evaluating datasets. To solve this issue, it should be
+ possible to transform the published statistical data into formats that
+ can be used by data catalogs.</p>
+
+ <p>
+ A concrete use case is the structured collection of <a
+ href="http://wiki.planet-data.eu/web/Datasets">RDF Data Cube
+ Vocabulary datasets</a> in the PlanetData Wiki. This list is supposed to
+ describe statistical datasets on a higher level - for easy discovery
+ and selection - and to provide a useful overview of RDF Data Cube
+ deployments in the Linked Data cloud.
+ </p>
+
+ <p>Unanticipated Uses:</p>
+
<ul>
- <li>So far, QB data has been transformed into semantic XBRL, a
- vocabulary closer to XBRL. There is the chance that certain
- information required in a GRI XBRL document cannot be encoded using a
- vocabulary as general as QB. In this case, QB could be used in
- concordance with semantic XBRL.</li>
+ <li>If data catalogs contain statistics, they do not expose those
+ using Linked Data but for instance using CSV or HTML (e.g., Pangea).
+ It could also be a use case to publish such data using the data cube
+ vocabulary.</li>
</ul>
- <p class="editorsnote">@@TODO: Add link to semantic XBRL.</p>
- <p>Unanticipated Uses (optional): -</p>
- <p>Existing Work (optional): -</p>
- </section> </section></section>
+ <p>Existing Work:</p>
+ <ul>
+ <li>The <a href="http://www.w3.org/TR/vocab-dcat/">Data
+ Catalog vocabulary</a> (DCAT) is strongly related to this use case since
+ it may complement the standard vocabulary for representing statistics
+ in the case of registering data in a data catalog.
+ </li>
+ </ul>
+
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">There
+ should be a recommended way to communicate the availability of
+ published statistical data to external parties and to allow
+ automatic discovery of statistical data</a></li>
+ </ul>
+ </section> </section>
+
<section>
- <h2>Requirements</h2>
+ <h2 id="requirements">Requirements</h2>
<p>The use cases presented in the previous section give rise to the
following requirements for a standard representation of statistics.
- Requirements are cross-linked with the use cases that motivate them.
- Requirements are similarly categorized as deriving from publishing or
- consuming use cases.</p>
-
- <section>
- <h3>Publishing requirements</h3>
-
- <section>
- <h4>Machine-readable and application-independent representation of
- statistics</h4>
- <p>It should be possible to add abstraction, multiple levels of
- description, summaries of statistics.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4</p>
- </section> <section>
- <h4>Representing statistics from various resource</h4>
- <p>Statistics from various resource data should be possible to be
- translated into QB. QB should be very general and should be usable for
- other data sets such as survey data, spreadsheets and OLAP data cubes.
- What kind of statistics are described: simple CSV tables (UC 1), excel
- (UC 2) and more complex SDMX (UC 3) data about government statistics
- or other public-domain relevant data.</p>
-
- <p>Required by: UC1, UC2, UC3</p>
- </section> <section>
- <h4>Communicating, exposing statistics on the web</h4>
- <p>It should become clear how to make statistical data available on
- the web, including how to expose it, and how to distribute it.</p>
-
- <p>Required by: UC5</p>
- </section> <section>
- <h4>Coverage of typical statistics metadata</h4>
- <p>It should be possible to add metainformation to statistics as
- found in typical statistics or statistics catalogs.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4, UC5</p>
- </section> <section>
- <h4>Expressing hierarchies</h4>
- <p>It should be possible to express hierarchies on Dimensions of
- statistics. Some of this requirement is met by the work on ISO
- Extension to SKOS [17].</p>
-
- <p>Required by: UC3, UC9</p>
- </section> <section>
- <h4>Machine-readable and application-independent representation of
- statistics</h4>
- <p>It should be possible to add abstraction, multiple levels of
- description, summaries of statistics.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4</p>
- </section> <section>
- <h4>Expressing aggregation relationships in Data Cube</h4>
- <p>Based on [18]: It often comes up in statistical data that you
- have some kind of 'overall' figure, which is then broken down into
- parts. To Supposing I have a set of population observations, expressed
- with the Data Cube vocabulary - something like (in pseudo-turtle):</p>
- <pre>
-ex:obs1
- sdmx:refArea <UK>;
- sdmx:refPeriod "2011";
- ex:population "60" .
-
-ex:obs2
- sdmx:refArea <England>;
- sdmx:refPeriod "2011";
- ex:population "50" .
-
-ex:obs3
- sdmx:refArea <Scotland>;
- sdmx:refPeriod "2011";
- ex:population "5" .
-
-ex:obs4
- sdmx:refArea <Wales>;
- sdmx:refPeriod "2011";
- ex:population "3" .
-
-ex:obs5
- sdmx:refArea <NorthernIreland>;
- sdmx:refPeriod "2011";
- ex:population "2" .
-
-
-
-
- </pre>
- <p>What is the best way (in the context of the RDF/Data Cube/SDMX
- approach) to express that the values for the England/Scotland/Wales/
- Northern Ireland ought to add up to the value for the UK and
- constitute a more detailed breakdown of the overall UK figure? I might
- also have population figures for France, Germany, EU27, etc...so it's
- not as simple as just taking a qb:Slice where you fix the time period
- and the measure.</p>
- <p>Some of this requirement is met by the work on ISO Extension to
- SKOS [19].</p>
+ Requirements are cross-linked with the use cases that motivate them.</p>
- <p>Required by: UC1, UC2, UC3, UC9</p>
- </section> <section>
- <h4>Scale - how to publish large amounts of statistical data</h4>
- <p>Publishers that are restricted by the size of the statistics
- they publish, shall have possibilities to reduce the size or remove
- redundant information. Scalability issues can both arise with
- peoples's effort and performance of applications.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4</p>
- </section> <section>
- <h4>Compliance-levels or criteria for well-formedness</h4>
- <p>The formal RDF Data Cube vocabulary expresses few formal
- semantic constraints. Furthermore, in RDF then omission of
- otherwise-expected properties on resources does not lead to any formal
- inconsistencies. However, to build reliable software to process Data
- Cubes then data consumers need to know what assumptions they can make
- about a dataset purporting to be a Data Cube.</p>
- <p>What *well-formedness* criteria should Data Cube publishers
- conform to? Specific areas which may need explicit clarification in
- the well-formedness criteria include (but may not be limited to):</p>
+ <section>
+ <h3 id="VocabularyshouldbuildupontheSDMXinformationmodel">Vocabulary
+ should build upon the SDMX information model</h3>
+ <p>
+ The draft version of the vocabulary builds upon <a
+ href="http://sdmx.org/?page_id=16">SDMX Standards Version 2.0</a>. A
+ newer version of SDMX, <a href="http://sdmx.org/?p=899">SDMX
+ Standards, Version 2.1</a>, is available.
+ </p>
+ <p>The requirement is to at least build upon Version 2.0, if
+ specific use cases derived from Version 2.1 become available, the
+ working group may consider building upon Version 2.1.</p>
+ <p>Background information:</p>
<ul>
- <li>use of abbreviated data layout based on attachment levels</li>
- <li>use of qb:Slice when (completeness, requirements for an
- explicit qb:SliceKey?)</li>
- <li>avoiding mixing two approaches to handling multiple-measures
- </li>
- <li>optional triples (e.g. type triples)</li>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/37">http://www.w3.org/2011/gld/track/issues/37</a></li>
</ul>
- <p>Required by all use cases.</p>
- </section> <section>
- <h4>Declaring relations between Cubes</h4>
- <p>In some situations statistical data sets are used to derive
- further datasets. Should Data Cube be able to explicitly convey these
- relationships?</p>
- <p>Note that there has been some work towards this within the SDMX
- community as indicated here:
- http://groups.google.com/group/publishing-statistical-data/msg/b3fd023d8c33561d</p>
-
- <p>Required by: UC6</p>
- </section> </section> <section>
- <h3>Consumption requirements</h3>
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#SDMXWebDisseminationUseCase">SDMX Web
+ Dissemination Use Case</a></li>
+ <li><a
+ href="#UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher
+ Use Case: UK government financial data from Combined Online
+ Information System (COINS)</a></li>
+ <li><a href="#EurostatSDMXasLinkedData">Publisher Use Case:
+ Eurostat SDMX as Linked Data</a></li>
+ </ul>
- <section>
- <h4>Finding statistical data</h4>
- <p>Finding statistical data should be possible, perhaps through an
- authoritative service</p>
-
- <p>Required by: UC5</p>
- </section> <section>
- <h4>Retrival of fine grained statistics</h4>
- <p>Query formulation and execution mechanisms. It should be
- possible to use SPARQL to query for fine grained statistics.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4, UC5, UC6, UC7</p>
- </section> <section>
- <h4>Understanding - End user consumption of statistical data</h4>
- <p>Must allow presentation, visualization .</p>
-
- <p>Required by: UC7, UC8, UC9, UC10</p>
</section> <section>
- <h4>Comparing and trusting statistics</h4>
- <p>Must allow finding what's in common in the statistics of two or
- more datasets. This requirement also deals with information quality -
- assessing statistical datasets - and trust - making trust judgements
- on statistical data.</p>
+ <h3 id="Vocabularyshouldclarifytheuseofsubsetsofobservations">Vocabulary
+ should clarify the use of subsets of observations</h3>
+ <p>There should be a consensus on the issue of flattening or
+ abbreviating data; one suggestion is to author data without the
+ duplication, but have the data publication tools "flatten" the compact
+ representation into standalone observations during the publication
+ process.</p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/33">http://www.w3.org/2011/gld/track/issues/33</a></li>
- <p>Required by: UC5, UC6, UC9</p>
- </section> <section>
- <h4>Integration of statistics</h4>
- <p>Interoperability - combining statistics produced by multiple
- different systems. It should be possible to combine two statistics
- that contain related data, and possibly were published independently.
- It should be possible to implement value conversions.</p>
+ <li>Since there are no use cases for qb:subslice, the vocabulary
+ should clarify or drop the use of qb:subslice; issue: <a
+ href="http://www.w3.org/2011/gld/track/issues/34">http://www.w3.org/2011/gld/track/issues/34</a>
+ </li>
+ </ul>
- <p>Required by: UC1, UC3, UC4, UC7, UC9, UC10</p>
+ <p>Required by:</p>
+ <ul>
+ <li><a
+ href="#UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher
+ Use Case: UK government financial data from Combined Online
+ Information System (COINS)</a></li>
+ <li><a href="#PublishingslicesofdataaboutUKBathingWaterQuality">Publisher
+ Use Case: Publishing slices of data about UK Bathing Water Quality</a></li>
+ </ul>
+
</section> <section>
- <h4>Scale - how to consume large amounts of statistical data</h4>
- <p>Consumers that want to access large amounts of statistical data
- need guidance.</p>
-
- <p>Required by: UC7, UC9</p>
- </section> <section>
- <h4>Common internal representation of statistics, to be exported
- in other formats</h4>
- <p>QB data should be possible to be transformed into data formats
- such as XBRL which are required by certain institutions.</p>
+ <h3
+ id="Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Vocabulary
+ should recommend a mechanism to support hierarchical code lists</h3>
+ <p>
+ First, hierarchical code lists may be supported via SKOS [<cite><a
+ href="#ref-skos">SKOS</a></cite>]. Allow for cross-location and cross-time
+ analysis of statistical datasets.
+ </p>
+ <p>
+ Second, one can think of non-SKOS hierarchical code lists. E.g., if
+ simple
+ <code> skos:narrower</code>
+ /
+ <code>skos:broader</code>
+ relationships are not sufficient or if a vocabulary uses specific
+ hierarchical properties, e.g.,
+ <code>geo:containedIn</code>
+ .
+ </p>
+ <p>
+ Also, the use of hierarchy levels needs to be clarified. It has been
+ suggested, to allow
+ <code>skos:Collections</code>
+ as value of
+ <code>qb:codeList</code>
+ .
+ </p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/39">http://www.w3.org/2011/gld/track/issues/39</a>
+ </li>
+ <li>Discussion at publishing-statistical-data mailing list: <a
+ href="http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f">http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f</a></li>
+ <li>Part of the requirement is met by the work on an ISO
+ Extension to SKOS [<cite><a href="#ref-xkos">XKOS</a></cite>]
+ </li>
+ </ul>
- <p>Required by: UC10</p>
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#PublishingExcelSpreadsheetsasLinkedData">Publisher
+ Use Case: Publishing Excel Spreadsheets as Linked Data</a></li>
+ </ul>
+
</section> <section>
- <h4>Dealing with imperfect statistics</h4>
- <p>Imperfections - reasoning about statistical data that is not
- complete or correct.</p>
+ <h3
+ id="VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Vocabulary
+ should define relationship to ISO19156 - Observations & Measurements</h3>
+ <p>An number of organisations, particularly in the Climate and
+ Meteorological area already have some commitment to the OGC
+ "Observations and Measurements" (O&M) logical data model, also
+ published as ISO 19156. Are there any statements about compatibility
+ and interoperability between O&M and Data Cube that can be made to
+ give guidance to such organisations?</p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/32">http://www.w3.org/2011/gld/track/issues/32</a></li>
+ </ul>
- <p>Required by: UC7, UC8, UC9, UC10</p>
- </section> </section> </section>
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#PublishingslicesofdataaboutUKBathingWaterQuality">Publisher
+ Use Case: Publishing slices of data about UK Bathing Water Quality</a></li>
+ </ul>
+
+ </section> <section>
+ <h3
+ id="Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">There
+ should be a recommended mechanism to allow for publication of
+ aggregates which cross multiple dimensions</h3>
+
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li>E.g., the Eurostat SDMX as Linked Data use case suggests to
+ have time lines on data aggregating over the gender dimension: <a
+ href="#EurostatSDMXasLinkedData">Publisher Use Case: Eurostat
+ SDMX as Linked Data</a>
+ </li>
+ <li>Another possible use case could be provided by the <a
+ href="http://data.gov.uk/resources/payments">Payment Ontology</a>.
+ </li>
+ </ul>
+
+ </section> <section>
+ <h3 id="Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">There
+ should be a recommended way of declaring relations between cubes</h3>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/30">http://www.w3.org/2011/gld/track/issues/30</a></li>
+ <li>Discussion in <a
+ href="http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/75762788de10de95">publishing-statistical-data
+ mailing list</a>
+ </li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#Representingrelationshipsbetweenstatisticaldata">Publisher
+ Use Case: Representing relationships between statistical data</a></li>
+ </ul>
+
+ </section> <section>
+ <h3
+ id="Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There
+ should be criteria for well-formedness and assumptions consumers can
+ make about published data</h3>
+
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/29">http://www.w3.org/2011/gld/track/issues/29</a></li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li><a
+ href="#Simplechartvisualisationsofpublishedstatisticaldata">Consumer
+ Use Case: Simple chart visualisations of (integrated) published
+ statistical data</a></li>
+ <li><a
+ href="#VisualisingpublishedstatisticaldatainGooglePublicDataExplorer">Consumer
+ Use Case: Visualising published statistical data in Google Public
+ Data Explorer</a></li>
+ <li><a
+ href="#AnalysingpublishedstatisticaldatawithcommonOLAPsystems">Consumer
+ Use Case: Analysing published statistical data with common OLAP
+ systems</a></li>
+ </ul>
+
+ </section> <section>
+ <h3 id="Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">There
+ should be mechanisms and recommendations regarding publication and
+ consumption of large amounts of statistical data</h3>
+ <p>Background information:</p>
+ <ul>
+ <li>Related issue regarding abbreviations <a
+ href="http://www.w3.org/2011/gld/track/issues/29">http://www.w3.org/2011/gld/track/issues/29</a>
+ </li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#EurostatSDMXasLinkedData">Publisher Use Case:
+ Eurostat SDMX as Linked Data</a></li>
+ </ul>
+
+ </section> <section>
+ <h3
+ id="Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">There
+ should be a recommended way to communicate the availability of
+ published statistical data to external parties and to allow automatic
+ discovery of statistical data</h3>
+ <p>Clarify the relationship between DCAT and QB.</p>
+ <p>Background information:</p>
+ <ul>
+ <li>None.</li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#SDMXWebDisseminationUseCase">SDMX Web
+ Dissemination Use Case</a></li>
+ <li><a href="#Registeringpublishedstatisticaldataindatacatalogs">Registry
+ Use Case: Registering published statistical data in data catalogs</a></li>
+ </ul>
+
+ </section> </section>
<section class="appendix">
- <h2>Acknowledgments</h2>
- <p>The editors are very thankful for comments and suggestions ...</p>
+ <h2 id="acknowledgements">Acknowledgements</h2>
+ <p>We thank Rinke Hoekstra, Dave Reynolds, Bernadette Hyland,
+ Biplav Srivastava, John Erickson, Villazón-Terrazas for
+ feedback and input.</p>
</section>
<h2 id="references">References</h2>
<dl>
- <dt id="ref-SDMX">[SMDX]</dt>
+
+ <dt id="ref-cog">[COG]</dt>
<dd>
- SMDX - User Guide 2009, <a
- href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>
+ SDMX Content Oriented Guidelines, <a
+ href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>
</dd>
- <dt id="ref-SDMX">[Fowler1997]</dt>
+ <dt id="ref-COGS">[COGS]</dt>
+ <dd>
+ Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S., & Curry, E.
+ (2012). Representing Interoperable Provenance Descriptions for ETL
+ Workflows. ESWC 2012 Workshop Highlights (pp. 1–15). Springer Verlag,
+ 2012 (in press). (Extended Paper published in Conf. Proceedings.). <a
+ href="http://andrefreitas.org/papers/preprint_provenance_ETL_workflow_eswc_highlights.pdf">http://andrefreitas.org/papers/preprint_provenance_ETL_workflow_eswc_highlights.pdf</a>.
+ </dd>
+
+ <dt id="ref-COINS">[COINS]</dt>
+ <dd>
+ Ian Dickinson et al., COINS as Linked Data <a
+ href="http://data.gov.uk/resources/coins">http://data.gov.uk/resources/coins</a>,
+ last visited on Jan 9 2013
+ </dd>
+
+ <dt id="ref-FIOS">[FIOS]</dt>
+ <dd>
+ Andreas Harth, Sean O'Riain, Benedikt Kämpgen. Submission XBRL
+ Challenge 2011. <a
+ href="http://xbrl.us/research/appdev/Pages/275.aspx">http://xbrl.us/research/appdev/Pages/275.aspx</a>.
+ </dd>
+
+
+ <dt id="ref-FOWLER97">[FOWLER97]</dt>
<dd>Fowler, Martin (1997). Analysis Patterns: Reusable Object
Models. Addison-Wesley. ISBN 0201895420.</dd>
- <dt id="ref-QB">[QB]</dt>
+
+ <dt id="ref-linked-data">[LOD]</dt>
<dd>
- RDF Data Cube vocabulary, <a
- href="http://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html">http://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html</a>
+ Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>
</dd>
<dt id="ref-OLAP">[OLAP]</dt>
@@ -822,9 +1520,30 @@
href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>
</dd>
- <dt id="ref-linked-data">[LOD]</dt>
+ <dt id="ref-OLAP4LD">[OLAP4LD]</dt>
<dd>
- Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>
+ Kämpgen, B. and Harth, A. (2011). Transforming Statistical Linked
+ Data for Use in OLAP Systems. I-Semantics 2011. <a
+ href="http://www.aifb.kit.edu/web/Inproceedings3211">http://www.aifb.kit.edu/web/Inproceedings3211</a>
+ </dd>
+
+ <dt id="ref-QB-2010">[QB-2010]</dt>
+ <dd>
+ RDF Data Cube vocabulary, <a
+ href="http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html">http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</a>
+ </dd>
+
+ <dt id="ref-QB-2013">[QB-2013]</dt>
+ <dd>
+ RDF Data Cube vocabulary, <a
+ href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a>
+ </dd>
+
+ <dt id="ref-QB4OLAP">[QB4OLAP]</dt>
+ <dd>
+ Etcheverry, Vaismann. QB4OLAP : A New Vocabulary for OLAP Cubes on
+ the Semantic Web. <a
+ href="http://publishing-multidimensional-data.googlecode.com/git/index.html">http://publishing-multidimensional-data.googlecode.com/git/index.html</a>
</dd>
<dt id="ref-rdf">[RDF]</dt>
@@ -846,10 +1565,24 @@
href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>
</dd>
- <dt id="ref-cog">[COG]</dt>
+ <dt id="ref-SDMX">[SMDX]</dt>
<dd>
- SDMX Content Oriented Guidelines, <a
- href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>
+ SMDX - SDMX User Guide Version 2009.1, <a
+ href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>,
+ last visited Jan 8 2013.
+ </dd>
+
+ <dt id="ref-SDMX-21">[SMDX 2.1]</dt>
+ <dd>
+ SDMX 2.1 User Guide Version. Version 0.1 - 19/09/2012. <a
+ href="http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf">http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf</a>.
+ last visited on 8 Jan 2013.
+ </dd>
+
+ <dt id="ref-xkos">[XKOS]</dt>
+ <dd>
+ Extended Knowledge Organization System (XKOS), <a
+ href="https://github.com/linked-statistics/xkos">https://github.com/linked-statistics/xkos</a>
</dd>
</dl>
--- a/data-cube-ucr/respec-config.js Thu Feb 28 09:25:17 2013 -0500
+++ b/data-cube-ucr/respec-config.js Thu Feb 28 10:06:00 2013 -0500
@@ -1,23 +1,23 @@
var respecConfig = {
// specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
- specStatus: "ED",
+ specStatus: "WG-NOTE",
//copyrightStart: "2010",
// the specification's short name, as in http://www.w3.org/TR/short-name/
shortName: "data-cube-ucr",
//subtitle: "",
// if you wish the publication date to be other than today, set this
- // publishDate: "2009-08-06",
+ publishDate: "2013-02-27",
// if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
// and its maturity status
- //previousPublishDate: "2011-06-26",
+ //previousPublishDate: "2012-02-22",
//previousMaturity: "ED",
//previousDiffURI: "http://dvcs.w3.org/hg/gld/bp/",
//diffTool: "http://www.aptest.com/standards/htmldiff/htmldiff.pl",
// if there a publicly available Editor's Draft, this is the link
- edDraftURI: "http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/index.html",
+ edDraftURI: "http://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/data-cube-ucr-20120222/index.html",
// if this is a LCWD, uncomment and set the end of its review period
// lcEnd: "2009-08-05",
--- a/data-cube/index.html Thu Feb 28 09:25:17 2013 -0500
+++ b/data-cube/index.html Thu Feb 28 10:06:00 2013 -0500
@@ -3,11 +3,11 @@
<head>
<title>The RDF Data Cube Vocabulary</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
- <script type="text/javascript" src="http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js" class="remove"></script>
- <script type="text/javascript" src="respec-ref.js" class="remove"></script>
+ <script type="text/javascript" src='../respec/respec3/builds/respec-w3c-common.js' class='remove'></script>
+<!-- <script type="text/javascript" src="respec-ref.js" class="remove"></script> -->
<script type="text/javascript" src="../respec/gld-bib.js" class="remove"></script>
<script type="text/javascript" src="respec-config.js" class="remove"></script>
- <script type="text/javascript" src="../respec/gld-config.js" class="remove"></script>
+<!-- <script type="text/javascript" src="../respec/gld-config.js" class="remove"></script> -->
<style type="text/css">
.todo { background-color: #fdd; border: 1px solid #800; margin: 1em 0em; padding: 1em; page-break-inside: avoid ; font-style: italic; }
@@ -1733,6 +1733,10 @@
<section id="acknowledgements" class="appendix">
<h2>Acknowledgements</h2>
+<p>Jeni Tennison was co-developer of the original Data Cube vocabulary on
+which this specification is based. This vocabulary would not exist without her
+insight, energy and hard work.</p>
+
<p>This work is based on a collaboration that was initiated in a
workshop on Publishing statistical datasets in SDMX and the semantic
web, hosted by ONS in Sunningdale, United Kingdom in February 2010 and
@@ -1822,6 +1826,17 @@
</section>
+<section id="acknowledgements" class="appendix">
+<h2>Change history </h2>
+
+Changes since <a href="http://www.w3.org/TR/2012/WD-vocab-data-cube-20120405/">W3C Working Draft 5 April 2012</a>:
+
+<ul>
+ <li>Moved Jeni Tennison from being listed as an author to the acknowledgements section.</li>
+</ul>
+
+
+</section>
<section id="references_section" class="appendix">
<p class="todo">Bring all references into W3C style</p>
--- a/data-cube/respec-config.js Thu Feb 28 09:25:17 2013 -0500
+++ b/data-cube/respec-config.js Thu Feb 28 10:06:00 2013 -0500
@@ -32,17 +32,17 @@
// only "name" is required
editors: [
{ name: "Richard Cyganiak", url: "http://richard.cyganiak.de/", company: "DERI, NUI Galway", companyURL: "http://www.deri.ie/" },
- { name: "Dave Reynolds", company: "Epimorphics Ltd", companyURL: "http://www.epimorphics.com/" },
+ { name: "Dave Reynolds", company: "Epimorphics Ltd", companyURL: "http://www.epimorphics.com/", mailto: "dave@epimorphics.com" },
],
- authors: [
- { name: "Jeni Tennison",
- url: "http://www.jenitennison.com/blog/",
- company: "TSO",
- companyURL: "http://www.tso.co.uk/"
- //mailto: "xx@yy",
- //note: "xxx",
- },
- ],
+// authors: [
+// { name: "Jeni Tennison",
+// url: "http://www.jenitennison.com/blog/",
+// company: "TSO",
+// companyURL: "http://www.tso.co.uk/"
+// //mailto: "xx@yy",
+// //note: "xxx",
+// },
+// ],
// authors, add as many as you like.
// This is optional, uncomment if you have authors as well as editors.