gld: changeset 292:4f2617cbdab3

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/data-cube-ucr/data-cube-ucr-20120222/index.html	Wed Feb 27 23:44:50 2013 +0100
@@ -0,0 +1,860 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
+                      "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<title>Use Cases and Requirements for the Data Cube Vocabulary</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<script type="text/javascript"
+	src="http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js" class="remove"></script>
+<script src="respec-ref.js"></script>
+<script src="respec-config.js"></script>
+<link rel="stylesheet" type="text/css" href="local-style.css" />
+</head>
+<body>
+
+	<section id="abstract">
+	<p>Many national, regional and local governments, as well as other
+		organizations inside and outside of the public sector, create
+		statistics. There is a need to publish those statistics in a
+		standardized, machine-readable way on the web, so that statistics can
+		be freely integrated and reused in consuming applications. This
+		document is a collection of use cases for a standard vocabulary to
+		publish statistics as Linked Data.</p>
+	</section>
+
+	<section id="sotd">
+	<p>
+		This is a working document of the <a
+			href="http://www.w3.org/2011/gld/wiki/Data_Cube_Vocabulary">Data
+			Cube Vocabulary project</a> within the <a
+			href="http://www.w3.org/2011/gld/">W3C Government Linked Data
+			Working Group</a>. Feedback is welcome and should be sent to the <a
+			href="mailto:public-gld-comments@w3.org">public-gld-comments@w3.org
+			mailing list</a>.
+	</p>
+	</section>
+
+	<section>
+	<h2>Introduction</h2>
+
+	<p>Many national, regional and local governments, as well as other
+		organizations inside and outside of the public sector, create
+		statistics. There is a need to publish those statistics in a
+		standardized, machine-readable way on the web, so that statistics can
+		be freely linked, integrated and reused in consuming applications.
+		This document is a collection of use cases for a standard vocabulary
+		to publish statistics as Linked Data.</p>
+	</section>
+
+
+	<section>
+	<h2>Terminology</h2>
+	<p>
+		<dfn>Statistics</dfn>
+		is the <a href="http://en.wikipedia.org/wiki/Statistics">study</a> of
+		the collection, organization, analysis, and interpretation of data. A
+		statistic is a statistical dataset.
+	</p>
+
+	<p>
+		A
+		<dfn>statistical dataset</dfn>
+		comprises multidimensional data - a set of observed values organized
+		along a group of dimensions, together with associated metadata. Basic
+		structure of (aggregated) statistical data is a multidimensional table
+		(also called a cube) <a href="#ref-SDMX">[SDMX]</a>.
+	</p>
+
+	<p>
+		<dfn>Source data</dfn>
+		is data from datastores such as RDBs or spreadsheets that acts as a
+		source for the Linked Data publishing process.
+	</p>
+
+	<p>
+		<dfn>Metadata</dfn>
+		about statistics defines the data structure and give contextual
+		information about the statistics.
+	</p>
+
+	<p>
+		A format is
+		<dfn>machine-readable</dfn>
+		if it is amenable to automated processing by a machine, as opposed to
+		presentation to a human user.
+	</p>
+
+	<p>
+		A
+		<dfn>publisher</dfn>
+		is a person or organization that exposes source data as Linked Data on
+		the Web.
+	</p>
+
+	<p>
+		A
+		<dfn>consumer</dfn>
+		is a person or agent that uses Linked Data from the Web.
+	</p>
+
+	</section>
+
+
+	<section>
+	<h2>Use cases</h2>
+	<p>
+		This section presents scenarios that would be enabled by the existence
+		of a standard vocabulary for the representation of statistics as
+		Linked Data. Since a draft of the specification of the cube vocabulary
+		has been published, and the vocabulary already is in use, we will call
+		this standard vocabulary after its current name RDF Data Cube
+		vocabulary (short <a href="#ref-QB">[QB]</a>) throughout the document.
+	</p>
+	<p>We distinguish between use cases of publishing statistical data,
+		and use cases of consuming statistical data since requirements for
+		publishers and consumers of statistical data differ.</p>
+	<section>
+	<h3>Publishing statistical data</h3>
+
+	<section>
+	<h4>Publishing general statistics in a machine-readable and
+		application-independent way (UC 1)</h4>
+	<p>More and more organizations want to publish statistics on the
+		web, for reasons such as increasing transparency and trust. Although
+		in the ideal case, published data can be understood by both humans and
+		machines, data often is simply published as CSV, PDF, XSL etc.,
+		lacking elaborate metadata, which makes free usage and analysis
+		difficult.</p>
+
+	<p>The goal in this use case is to use a machine-readable and
+		application-independent description of common statistics with use of
+		open standards. The use case is fulfilled if QB will be a Linked Data
+		vocabulary for encoding statistical data that has a hypercube
+		structure and as such can describe common statistics in a
+		machine-readable and application-independent way.</p>
+
+	<p>
+		An example scenario of this use case has been to publish the Combined
+		Online Information System (<a
+			href="http://data.gov.uk/resources/coins">COINS</a>). There, HM
+		Treasury, the principal custodian of financial data for the UK
+		government, released previously restricted information from its
+		Combined Online Information System (COINS). Five data files were
+		released containing between 3.3 and 4.9 million rows of data. The
+		COINS dataset was translated into RDF for two reasons:
+	</p>
+
+	<ol>
+		<li>To publish statistics (e.g., as data files) are too large to
+			load into widely available analysis tools such as Microsoft Excel, a
+			common tool-of-choice for many data investigators.</li>
+		<li>COINS is a highly technical information source, requiring
+			both domain and technical skills to make useful applications around
+			the data.</li>
+	</ol>
+	<p>Publishing statistics is challenging for the several reasons:</p>
+	<p>
+		Representing observations and measurements requires more complex
+		modeling as discussed by Martin Fowler <a href="#Fowler1997">[Fowler,
+			1997]</a>: Recording a statistic simply as an attribute to an object
+		(e.g., a the fact that a person weighs 185 pounds) fails with
+		representing important concepts such as quantity, measurement, and
+		observation.
+	</p>
+
+	<p>Quantity comprises necessary information to interpret the value,
+		e.g., the unit and arithmetical and comparative operations; humans and
+		machines can appropriately visualize such quantities or have
+		conversions between different quantities.</p>
+
+	<p>Quantity comprises necessary information to interpret the value,
+		e.g., the unit and arithmetical and comparative operations; humans and
+		machines can appropriately visualize such quantities or have
+		conversions between different quantities.</p>
+
+	<p>A Measurement separates a quantity from the actual event at
+		which it was collected; a measurement assigns a quantity to a specific
+		phenomenon type (e.g., strength). Also, a measurement can record
+		metadata such as who did the measurement (person), and when was it
+		done (time).</p>
+
+	<p>Observations, eventually, abstract from measurements only
+		recording numeric quantities. An Observation can also assign a
+		category observation (e.g., blood group A) to an observation. Figure
+		demonstrates this relationship.</p>
+	<p>
+	<div class="fig">
+		<a href="figures/modeling_quantity_measurement_observation.png"><img
+			src="figures/modeling_quantity_measurement_observation.png"
+			alt="Modeling quantity, measurement, observation" /> </a>
+		<div>Modeling quantity, measurement, observation</div>
+	</div>
+	</div>
+	</p>
+
+	<p>QB deploys the multidimensional model (made of observations with
+		Measures depending on Dimensions and Dimension Members, and further
+		contextualized by Attributes) and should cater for these complexity in
+		modelling.</p>
+	<p>Another challenge is that for brevity reasons and to avoid
+		repetition, it is useful to have abbreviation mechanisms such as
+		assigning overall valid properties of observations at the dataset or
+		slice level, and become implicitly part of each observation. For
+		instance, in the case of COINS, all of the values are in thousands of
+		pounds sterling. However, one of the use cases for the linked data
+		version of COINS is to allow others to link to individual
+		observations, which suggests that these observations should be
+		standalone and self-contained – and should therefore have explicit
+		multipliers and units on each observation. One suggestion is to author
+		data without the duplication, but have the data publication tools
+		"flatten" the compact representation into standalone observations
+		during the publication process.</p>
+	<p>A further challenge is related to slices of data. Slices of data
+		group observations that are of special interest, e.g., slices
+		unemployment rates per year of a specific gender are suitable for
+		direct visualization in a line diagram. However, depending on the
+		number of Dimensions, the number of possible slices can become large
+		which makes it difficult to select all interesting slices. Therefore,
+		and because of their additional complexity, not many publishers create
+		slices. In fact, it is somewhat unclear at this point which slices
+		through the data will be useful to (COINS-RDF) users.</p>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>Existing Work (optional): -</p>
+
+	</section> <section>
+	<h4>Publishing one or many MS excel spreadsheet files with
+		statistical data on the web (UC 2)</h4>
+	<p>Not only in government, there is a need to publish considerable
+		amounts of statistical data to be consumed in various (also
+		unexpected) application scenarios. Typically, Microsoft Excel sheets
+		are made available for download. Those excel sheets contain single
+		spreadsheets with several multidimensional data tables, having a name
+		and notes, as well as column values, row values, and cell values.</p>
+	<p>The goal in this use case is to to publish spreadsheet
+		information in a machine-readable format on the web, e.g., so that
+		crawlers can find spreadsheets that use a certain column value. The
+		published data should represent and make available for queries the
+		most important information in the spreadsheets, e.g., rows, columns,
+		and cell values. QB should provide the level of detail that is needed
+		for such a transformation in order to fulfil this use case.</p>
+	<p>In a possible use case scenario an institution wants to develop
+		or use a software that transforms their excel sheets into the
+		appropriate format.</p>
+
+	<p class="editorsnote">@@TODO: Concrete example needed.</p>
+	<p>Challenges of this use case are:</p>
+	<ul>
+		<li>Excel sheets provide much flexibility in arranging
+			information. It may be necessary to limit this flexibility to allow
+			automatic transformation.</li>
+		<li>There may be many spreadsheets.</li>
+		<li>Semi-structured information, e.g., notes about lineage of
+			data cells, may not be possible to be formalized.</li>
+	</ul>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>
+		Existing Work (optional): Stats2RDF uses OntoWiki to translate CSV
+		into QB <a href="http://aksw.org/Projects/Stats2RDF">[Stats2RDF]</a>.
+	</p>
+
+	</section> <section>
+	<h4>Publishing SDMX as Linked Data (UC 3)</h4>
+	<p>The ISO standard for exchanging and sharing statistical data and
+		metadata among organizations is Statistical Data and Metadata eXchange
+		(SDMX). Since this standard has proven applicable in many contexts, QB
+		is designed to be compatible with the multidimensional model that
+		underlies SDMX.</p>
+	<p class="editorsnote">@@TODO: The QB spec should maybe also use
+		the term "multidimensional model" instead of the less clear "cube
+		model" term.</p>
+	<p>Therefore, it should be possible to re-publish SDMX data using
+		QB.</p>
+	<p>
+		The scenario for this use case is Eurostat <a
+			href="http://epp.eurostat.ec.europa.eu/">[EUROSTAT]</a>, which
+		publishes large amounts of European statistics coming from a data
+		warehouse as SDMX and other formats on the web. Eurostat also provides
+		an interface to browse and explore the datasets. However, linking such
+		multidimensional data to related data sets and concepts would require
+		download of interesting datasets and manual integration.
+	</p>
+	<p>The goal of this use case is to improve integration with other
+		datasets; Eurostat data should be published on the web in a
+		machine-readable format, possible to be linked with other datasets,
+		and possible to be freeley consumed by applications. This use case is
+		fulfilled if QB can be used for publishing the data from Eurostat as
+		Linked Data for integration.</p>
+	<p>A publisher wants to make available Eurostat data as Linked
+		Data. The statistical data shall be published as is. It is not
+		necessary to represent information for validation. Data is read from
+		tsv only. There are two concrete examples of this use case: Eurostat
+		Linked Data Wrapper (http://estatwrap.ontologycentral.com/), and
+		Linked Statistics Eurostat Data
+		(http://eurostat.linked-statistics.org/). They have slightly different
+		focus (e.g., with respect to completeness, performance, and agility).
+	</p>
+	<p>Challenges of this use case are:</p>
+	<ul>
+		<li>There are large amounts of SDMX data; the Eurostat dataset
+			comprises 350 GB of data. This may influence decisions about toolsets
+			and architectures to use. One important task is to decide whether to
+			structure the data in separate datasets.</li>
+		<li>Again, the question comes up whether slices are useful.</li>
+	</ul>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>Existing Work (optional): -</p>
+	</section> <section>
+	<h4>Publishing sensor data as statistics (UC 4)</h4>
+	<p>Typically, multidimensional data is aggregated. However, there
+		are cases where non-aggregated data needs to be published, e.g.,
+		observational, sensor network and forecast data sets. Such raw data
+		may be available in RDF, already, but using a different vocabulary.</p>
+	<p>The goal of this use case is to demonstrate that publishing of
+		aggregate values or of raw data should not make much of a difference
+		in QB.</p>
+	<p>
+		For example the Environment Agency uses it to publish (at least
+		weekly) information on the quality of bathing waters around England
+		and Wales <A
+			href="http://www.epimorphics.com/web/wiki/bathing-water-quality-structure-published-linked-data">[EnvAge]</A>.
+		In another scenario DERI tracks from measurements about printing for a
+		sustainability report. In the DERI scenario, raw data (number of
+		printouts per person) is collected, then aggregated on a unit level,
+		and then modelled using QB.
+	</p>
+	<p>Problems and Limitations:</p>
+	<ul>
+		<li>This use case also shall demonstrate how to link statistics
+			with other statistics or non-statistical data (metadata).</li>
+	</ul>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>
+		Existing Work (optional): Semantic Sensor Network ontology <A
+			href="http://purl.oclc.org/NET/ssnx/ssn">[SSN]</A> already provides a
+		way to publish sensor information. SSN data provides statistical
+		Linked Data and grounds its data to the domain, e.g., sensors that
+		collect observations (e.g., sensors measuring average of temperature
+		over location and time). A number of organizations, particularly in
+		the Climate and Meteorological area already have some commitment to
+		the OGC "Observations and Measurements" (O&M) logical data model, also
+		published as ISO 19156. The QB spec should maybe also prefer the term
+		"multidimensional model" instead of the less clear "cube model" term.
+	
+	<p class="editorsnote">@@TODO: Are there any statements about
+		compatibility and interoperability between O&M and Data Cube that can
+		be made to give guidance to such organizations?</p>
+	</p>
+	</section> <section>
+	<h4>Registering statistical data in dataset catalogs (UC 5)</h4>
+	<p>
+		After statistics have been published as Linked Data, the question
+		remains how to communicate the publication and let users find the
+		statistics. There are catalogs to register datasets, e.g., CKAN, <a
+			href="http://www.datacite.org/datacite.org">datacite.org</a>, <a
+			href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a
+			href="http://pangaea.de/">Pangea</a>. Those catalogs require specific
+		configurations to register statistical data.
+	</p>
+	<p>The goal of this use case is to demonstrate how to expose and
+		distribute statistics after modeling using QB. For instance, to allow
+		automatic registration of statistical data in such catalogs, for
+		finding and evaluating datasets. To solve this issue, it should be
+		possible to transform QB data into formats that can be used by data
+		catalogs.</p>
+
+	<p class="editorsnote">@@TODO: Find specific use case scenario or
+		ask how other publishers of QB data have dealt with this issue Maybe
+		relation to DCAT?</p>
+	<p>Problems and Limitations: -</p>
+	<p>Unanticipated Uses (optional): If data catalogs contain
+		statistics, they do not expose those using Linked Data but for
+		instance using CSV or HTML (Pangea [11]). It could also be a use case
+		to publish such data using QB.</p>
+	<p>Existing Work (optional): -</p>
+	</section> <section>
+	<h4>Making transparent transformations on or different versions of
+		statistical data (UC 6)</h4>
+	<p>Statistical data often is used and further transformed for
+		analysis and reporting. There is the risk that data has been
+		incorrectly transformed so that the result is not interpretable any
+		more. Therefore, if statistical data has been derived from other
+		statistical data, this should be made transparent.</p>
+	<p>The goal of this use case is to describe provenance and
+		versioning around statistical data, so that the history of statistics
+		published on the web becomes clear. This may also relate to the issue
+		of having relationships between datasets published using QB. To fulfil
+		this use case QB should recommend specific approaches to transforming
+		and deriving of datasets which can be tracked and stored with the
+		statistical data.</p>
+	<p class="editorsnote">@@TODO: Add concrete example use case
+		scenario.</p>
+	<p>Challenges of this use case are:</p>
+	<ul>
+		<li>Operations on statistical data result in new statistical
+			data, depending on the operation. For intance, in terms of Data Cube,
+			operations such as slice, dice, roll-up, drill-down will result in
+			new Data Cubes. This may require representing general relationships
+			between cubes (as discussed here: [12]).</li>
+	</ul>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>Existing Work (optional): Possible relation to Best Practices
+		part on Versioning [13], where it is specified how to publish data
+		which has multiple versions.</p>
+
+
+	</section></section> <section>
+	<h3>Consuming published statistical data</h3>
+
+	<section>
+	<h4>Simple chart visualizations of (integrated) published
+		statistical datasets (UC 7)</h4>
+	<p>Data that is published on the Web is typically visualized by
+		transforming it manually into CSV or Excel and then creating a
+		visualization on top of these formats using Excel, Tableau,
+		RapidMiner, Rattle, Weka etc.</p>
+	<p>This use case shall demonstrate how statistical data published
+		on the web can be directly visualized, without using commercial or
+		highly-complex tools. This use case is fulfilled if data that is
+		published in QB can be directly visualized inside a webpage.</p>
+	<p>An example scenario is environmental research done within the
+		SMART research project (http://www.iwrm-smart.org/). Here, statistics
+		about environmental aspects (e.g., measurements about the climate in
+		the Lower Jordan Valley) shall be visualized for scientists and
+		decision makers. Statistics should also be possible to be integrated
+		and displayed together. The data is available as XML files on the web.
+		On a separate website, specific parts of the data shall be queried and
+		visualized in simple charts, e.g., line diagrams. The following figure
+		shows the wanted display of an environmental measure over time for
+		three regions in the lower Jordan valley; displayed inside a web page:</p>
+
+	<p>
+	<div class="fig">
+		<a href="figures/Level_above_msl_3_locations.png"><img
+			width="800px" src="figures/Level_above_msl_3_locations.png"
+			alt="Line chart visualization of QB data" /> </a>
+		<div>Line chart visualization of QB data</div>
+	</div>
+	</div>
+	</p>
+
+	<p>The following figure shows the same measures in a pivot table.
+		Here, the aggregate COUNT of measures per cell is given.</p>
+
+	<p>
+	<div class="fig">
+		<a href="figures/pivot_analysis_measurements.PNG"><img
+			src="figures/pivot_analysis_measurements.PNG"
+			alt="Pivot analysis measurements" /> </a>
+		<div>Pivot analysis measurements</div>
+	</div>
+	</div>
+	</p>
+
+	<p>The use case uses Google App Engine, Qcrumb.com, and Spark. An
+		example of a line diagram is given at [14] (some loading time needed).
+		Current work tries to integrate current datasets with additional data
+		sources, and then having queries that take data from both datasets and
+		display them together.</p>
+	<p>Challenges of this use case are:</p>
+	<ul>
+		<li>The difficulties lay in structuring the data appropriately so
+			that the specific information can be queried.</li>
+		<li>Also, data shall be published with having potential
+			integration in mind. Therefore, e.g., units of measurements need to
+			be represented.</li>
+		<li>Integration becomes much more difficult if publishers use
+			different measures, dimensions.</li>
+
+	</ul>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>Existing Work (optional): -</p>
+	</section> <section>
+	<h4>Uploading published statistical data in Google Public Data
+		Explorer (UC 8)</h4>
+	<p>Google Public Data Explorer (GPDE -
+		http://code.google.com/apis/publicdata/) provides an easy possibility
+		to visualize and explore statistical data. Data needs to be in the
+		Dataset Publishing Language (DSPL -
+		https://developers.google.com/public-data/overview) to be uploaded to
+		the data explorer. A DSPL dataset is a bundle that contains an XML
+		file, the schema, and a set of CSV files, the actual data. Google
+		provides a tutorial to create a DSPL dataset from your data, e.g., in
+		CSV. This requires a good understanding of XML, as well as a good
+		understanding of the data that shall be visualized and explored.</p>
+	<p>In this use case, it shall be demonstrate how to take any
+		published QB dataset and to transform it automatically into DSPL for
+		visualization and exploration. A dataset that is published conforming
+		to QB will provide the level of detail that is needed for such a
+		transformation.</p>
+	<p>In an example scenario, a publisher P has published data using
+		QB. There are two different ways to fulfil this use case: 1) A
+		customer C is downloading this data into a triple store; SPARQL
+		queries on this data can be used to transform the data into DSPL and
+		uploaded and visualized using GPDE. 2) or, one or more XLST
+		transformation on the RDF/XML transforms the data into DSPL.</p>
+	<p>Challenges of this use case are:</p>
+	<ul>
+		<li>The technical challenges for the consumer here lay in knowing
+			where to download what data and how to get it transformed into DSPL
+			without knowing the data.</li>
+		<p>Unanticipated Uses (optional): DSPL is representative for using
+			statistical data published on the web in available tools for
+			analysis. Similar tools that may be automatically covered are: Weka
+			(arff data format), Tableau, etc.</p>
+		<p>Existing Work (optional): -</p>
+	</ul>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>Existing Work (optional): -</p>
+	</section> <section>
+	<h4>Allow Online Analytical Processing on published datasets of
+		statistical data (UC 9)</h4>
+	<p>Online Analytical Processing [15] is an analysis method on
+		multidimensional data. It is an explorative analysis methode that
+		allows users to interactively view the data on different angles
+		(rotate, select) or granularities (drill-down, roll-up), and filter it
+		for specific information (slice, dice).</p>
+	<p>The multidimensional model used in QB to model statistics should
+		be usable by OLAP systems. More specifically, data that conforms to QB
+		can be used to define a Data Cube within an OLAP engine and can then
+		be queries by OLAP clients.</p>
+	<p>An example scenario of this use case is the Financial
+		Information Observation System (FIOS) [16], where XBRL data has been
+		re-published using QB and made analysable for stakeholders in a
+		web-based OLAP client. The following figure shows an example of using
+		FIOS. Here, for three different companies, cost of goods sold as
+		disclosed in XBRL documents are analysed. As cell values either the
+		number of disclosures or - if only one available - the actual number
+		in USD is given:</p>
+
+	<p>
+	<div class="fig">
+		<a href="figures/FIOS_example.PNG"><img
+			src="figures/FIOS_example.PNG" alt="OLAP of QB data" /> </a>
+		<div>OLAP of QB data</div>
+	</div>
+	</div>
+	</p>
+	<p>Challenges of this use case are:</p>
+	<ul>
+		<li>A problem lies in the strict separation between queries for
+			the structure of data, and queries for actual aggregated values.</li>
+		<li>Another problem lies in defining Data Cubes without greater
+			insight in the data beforehand.</li>
+		<li>Depending on the expressivity of the OLAP queries (e.g.,
+			aggregation functions, hierarchies, ordering), performance plays an
+			important role.</li>
+		<li>QB allows flexibility in describing statistics, e.g., in
+			order to reduce redundancy of information in single observations.
+			These alternatives make general consumption of QB data more complex.
+			Also, it is not clear, what "conforms" to QB means, e.g., is a
+			qb:DataStructureDefinition required?</li>
+		<p>Unanticipated Uses (optional): -</p>
+		<p>Existing Work (optional): -</p>
+	</ul>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>Existing Work (optional): -</p>
+	</section> <section>
+	<h4>Transforming published statistics into XBRL (UC 10)</h4>
+	<p>XBRL is a standard data format for disclosing financial
+		information. Typically, financial data is not managed within the
+		organization using XBRL but instead, internal formats such as excel or
+		relational databases are used. If different data sources are to be
+		summarized in XBRL data formats to be published, an internally-used
+		standard format such as QB could help integrate and transform the data
+		into the appropriate format.</p>
+	<p>In this use case data that is available as data conforming to QB
+		should also be possible to be automatically transformed into such XBRL
+		data format. This use case is fulfilled if QB contains necessary
+		information to derive XBRL data.</p>
+	<p>In an example scenario, DERI has had a use case to publish
+		sustainable IT information as XBRL to the Global Reporting Initiative
+		(GRI - https://www.globalreporting.org/). Here, raw data (number of
+		printouts per person) is collected, then aggregated on a unit level
+		and modelled using QB. QB data shall then be used directly to fill-in
+		XBRL documents that can be published to the GRI.</p>
+	<p>Challenges of this use case are:</p>
+	<ul>
+		<li>So far, QB data has been transformed into semantic XBRL, a
+			vocabulary closer to XBRL. There is the chance that certain
+			information required in a GRI XBRL document cannot be encoded using a
+			vocabulary as general as QB. In this case, QB could be used in
+			concordance with semantic XBRL.</li>
+	</ul>
+	<p class="editorsnote">@@TODO: Add link to semantic XBRL.</p>
+	<p>Unanticipated Uses (optional): -</p>
+	<p>Existing Work (optional): -</p>
+
+	</section> </section></section>
+	<section>
+	<h2>Requirements</h2>
+
+	<p>The use cases presented in the previous section give rise to the
+		following requirements for a standard representation of statistics.
+		Requirements are cross-linked with the use cases that motivate them.
+		Requirements are similarly categorized as deriving from publishing or
+		consuming use cases.</p>
+
+	<section>
+	<h3>Publishing requirements</h3>
+
+	<section>
+	<h4>Machine-readable and application-independent representation of
+		statistics</h4>
+	<p>It should be possible to add abstraction, multiple levels of
+		description, summaries of statistics.</p>
+
+	<p>Required by: UC1, UC2, UC3, UC4</p>
+	</section> <section>
+	<h4>Representing statistics from various resource</h4>
+	<p>Statistics from various resource data should be possible to be
+		translated into QB. QB should be very general and should be usable for
+		other data sets such as survey data, spreadsheets and OLAP data cubes.
+		What kind of statistics are described: simple CSV tables (UC 1), excel
+		(UC 2) and more complex SDMX (UC 3) data about government statistics
+		or other public-domain relevant data.</p>
+
+	<p>Required by: UC1, UC2, UC3</p>
+	</section> <section>
+	<h4>Communicating, exposing statistics on the web</h4>
+	<p>It should become clear how to make statistical data available on
+		the web, including how to expose it, and how to distribute it.</p>
+
+	<p>Required by: UC5</p>
+	</section> <section>
+	<h4>Coverage of typical statistics metadata</h4>
+	<p>It should be possible to add metainformation to statistics as
+		found in typical statistics or statistics catalogs.</p>
+
+	<p>Required by: UC1, UC2, UC3, UC4, UC5</p>
+	</section> <section>
+	<h4>Expressing hierarchies</h4>
+	<p>It should be possible to express hierarchies on Dimensions of
+		statistics. Some of this requirement is met by the work on ISO
+		Extension to SKOS [17].</p>
+
+	<p>Required by: UC3, UC9</p>
+	</section> <section>
+	<h4>Machine-readable and application-independent representation of
+		statistics</h4>
+	<p>It should be possible to add abstraction, multiple levels of
+		description, summaries of statistics.</p>
+
+	<p>Required by: UC1, UC2, UC3, UC4</p>
+	</section> <section>
+	<h4>Expressing aggregation relationships in Data Cube</h4>
+	<p>Based on [18]: It often comes up in statistical data that you
+		have some kind of 'overall' figure, which is then broken down into
+		parts. To Supposing I have a set of population observations, expressed
+		with the Data Cube vocabulary - something like (in pseudo-turtle):</p>
+	<pre>
+ex:obs1
+  sdmx:refArea <UK>;
+  sdmx:refPeriod "2011";
+  ex:population "60" .
+
+ex:obs2
+  sdmx:refArea <England>;
+  sdmx:refPeriod "2011";
+  ex:population "50" .
+
+ex:obs3
+  sdmx:refArea <Scotland>;
+  sdmx:refPeriod "2011";
+  ex:population "5" .
+
+ex:obs4
+  sdmx:refArea <Wales>;
+  sdmx:refPeriod "2011";
+  ex:population "3" .
+
+ex:obs5
+  sdmx:refArea <NorthernIreland>;
+  sdmx:refPeriod "2011";
+  ex:population "2" .
+  	
+	
+	</pre>
+	<p>What is the best way (in the context of the RDF/Data Cube/SDMX
+		approach) to express that the values for the England/Scotland/Wales/
+		Northern Ireland ought to add up to the value for the UK and
+		constitute a more detailed breakdown of the overall UK figure? I might
+		also have population figures for France, Germany, EU27, etc...so it's
+		not as simple as just taking a qb:Slice where you fix the time period
+		and the measure.</p>
+	<p>Some of this requirement is met by the work on ISO Extension to
+		SKOS [19].</p>
+
+
+	<p>Required by: UC1, UC2, UC3, UC9</p>
+	</section> <section>
+	<h4>Scale - how to publish large amounts of statistical data</h4>
+	<p>Publishers that are restricted by the size of the statistics
+		they publish, shall have possibilities to reduce the size or remove
+		redundant information. Scalability issues can both arise with
+		peoples's effort and performance of applications.</p>
+
+	<p>Required by: UC1, UC2, UC3, UC4</p>
+	</section> <section>
+	<h4>Compliance-levels or criteria for well-formedness</h4>
+	<p>The formal RDF Data Cube vocabulary expresses few formal
+		semantic constraints. Furthermore, in RDF then omission of
+		otherwise-expected properties on resources does not lead to any formal
+		inconsistencies. However, to build reliable software to process Data
+		Cubes then data consumers need to know what assumptions they can make
+		about a dataset purporting to be a Data Cube.</p>
+	<p>What *well-formedness* criteria should Data Cube publishers
+		conform to? Specific areas which may need explicit clarification in
+		the well-formedness criteria include (but may not be limited to):</p>
+	<ul>
+		<li>use of abbreviated data layout based on attachment levels</li>
+		<li>use of qb:Slice when (completeness, requirements for an
+			explicit qb:SliceKey?)</li>
+		<li>avoiding mixing two approaches to handling multiple-measures
+		</li>
+		<li>optional triples (e.g. type triples)</li>
+	</ul>
+
+	<p>Required by all use cases.</p>
+	</section> <section>
+	<h4>Declaring relations between Cubes</h4>
+	<p>In some situations statistical data sets are used to derive
+		further datasets. Should Data Cube be able to explicitly convey these
+		relationships?</p>
+	<p>A simple specific use case is that the Welsh Assembly government
+		publishes a variety of population datasets broken down in different
+		ways. For many uses then population broken down by some category (e.g.
+		ethnicity) is expressed as a percentage. Separate datasets give the
+		actual counts per category and aggregate counts. In such cases it is
+		common to talk about the denominator (often DENOM) which is the
+		aggregate count against which the percentages can be interpreted.</p>
+	<p>Should Data Cube support explicit declaration of such
+		relationships either between separated qb:DataSets or between measures
+		with a single qb:DataSet (e.g. ex:populationCount and
+		ex:populationPercent)?</p>
+	<p>If so should that be scoped to simple, common relationships like
+		DENOM or allow expression of arbitrary mathematical relations?</p>
+	<p>Note that there has been some work towards this within the SDMX
+		community as indicated here:
+		http://groups.google.com/group/publishing-statistical-data/msg/b3fd023d8c33561d</p>
+
+	<p>Required by: UC6</p>
+	</section> </section> <section>
+	<h3>Consumption requirements</h3>
+
+	<section>
+	<h4>Finding statistical data</h4>
+	<p>Finding statistical data should be possible, perhaps through an
+		authoritative service</p>
+
+	<p>Required by: UC5</p>
+	</section> <section>
+	<h4>Retrival of fine grained statistics</h4>
+	<p>Query formulation and execution mechanisms. It should be
+		possible to use SPARQL to query for fine grained statistics.</p>
+
+	<p>Required by: UC1, UC2, UC3, UC4, UC5, UC6, UC7</p>
+	</section> <section>
+	<h4>Understanding - End user consumption of statistical data</h4>
+	<p>Must allow presentation, visualization .</p>
+
+	<p>Required by: UC7, UC8, UC9, UC10</p>
+	</section> <section>
+	<h4>Comparing and trusting statistics</h4>
+	<p>Must allow finding what's in common in the statistics of two or
+		more datasets. This requirement also deals with information quality -
+		assessing statistical datasets - and trust - making trust judgements
+		on statistical data.</p>
+
+	<p>Required by: UC5, UC6, UC9</p>
+	</section> <section>
+	<h4>Integration of statistics</h4>
+	<p>Interoperability - combining statistics produced by multiple
+		different systems. It should be possible to combine two statistics
+		that contain related data, and possibly were published independently.
+		It should be possible to implement value conversions.</p>
+
+	<p>Required by: UC1, UC3, UC4, UC7, UC9, UC10</p>
+	</section> <section>
+	<h4>Scale - how to consume large amounts of statistical data</h4>
+	<p>Consumers that want to access large amounts of statistical data
+		need guidance.</p>
+
+	<p>Required by: UC7, UC9</p>
+	</section> <section>
+	<h4>Common internal representation of statistics, to be exported
+		in other formats</h4>
+	<p>QB data should be possible to be transformed into data formats
+		such as XBRL which are required by certain institutions.</p>
+
+	<p>Required by: UC10</p>
+	</section> <section>
+	<h4>Dealing with imperfect statistics</h4>
+	<p>Imperfections - reasoning about statistical data that is not
+		complete or correct.</p>
+
+	<p>Required by: UC7, UC8, UC9, UC10</p>
+	</section> </section> </section>
+	<section class="appendix">
+	<h2>Acknowledgments</h2>
+	<p>The editors are very thankful for comments and suggestions ...</p>
+	</section>
+
+	<h2 id="references">References</h2>
+
+	<dl>
+		<dt id="ref-SDMX">[SMDX]</dt>
+		<dd>
+			SMDX - User Guide 2009, <a
+				href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>
+		</dd>
+
+		<dt id="ref-SDMX">[Fowler1997]</dt>
+		<dd>Fowler, Martin (1997). Analysis Patterns: Reusable Object
+			Models. Addison-Wesley. ISBN 0201895420.</dd>
+
+		<dt id="ref-QB">[QB]</dt>
+		<dd>
+			RDF Data Cube vocabulary, <a
+				href="http://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html">http://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html</a>
+		</dd>
+
+		<dt id="ref-OLAP">[OLAP]</dt>
+		<dd>
+			Online Analytical Processing Data Cubes, <a
+				href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>
+		</dd>
+
+		<dt id="ref-linked-data">[LOD]</dt>
+		<dd>
+			Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>
+		</dd>
+
+		<dt id="ref-rdf">[RDF]</dt>
+		<dd>
+			Resource Description Framework, <a href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</a>
+		</dd>
+
+		<dt id="ref-scovo">[SCOVO]</dt>
+		<dd>
+			The Statistical Core Vocabulary, <a
+				href="http://sw.joanneum.at/scovo/schema.html">http://sw.joanneum.at/scovo/schema.html</a>
+			<br /> SCOVO: Using Statistics on the Web of data, <a
+				href="http://sw-app.org/pub/eswc09-inuse-scovo.pdf">http://sw-app.org/pub/eswc09-inuse-scovo.pdf</a>
+		</dd>
+
+		<dt id="ref-skos">[SKOS]</dt>
+		<dd>
+			Simple Knowledge Organization System, <a
+				href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>
+		</dd>
+
+		<dt id="ref-cog">[COG]</dt>
+		<dd>
+			SDMX Content Oriented Guidelines, <a
+				href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>
+		</dd>
+
+	</dl>
+</body>
+</html>
author	bkaempge
	Wed, 27 Feb 2013 23:44:50 +0100
changeset 292	4f2617cbdab3
parent 291	5692f975418d
child 293	da1e6bfe3727