--- a/data-cube-ucr/index.html Mon Jun 10 11:25:29 2013 +0200
+++ b/data-cube-ucr/index.html Mon Jun 10 18:01:37 2013 +0200
@@ -28,13 +28,13 @@
In this document, the <a href="http://www.w3.org/2011/gld/">W3C
Government Linked Data Working Group</a> presents use cases and lessons
supporting a recommendation of the RDF Data Cube Vocabulary [<cite><a
- href="#ref-QB-2013">QB-2013</a></cite>]. The document describes case studies
+ href="#ref-QB-2013">QB-2013</a></cite>]. We describe case studies
of existing deployments of an earlier version of the data cube
vocabulary [<cite><a href="#ref-QB-2010">QB-2010</a></cite>] as well
as other possible use cases that would benefit from using the
- vocabulary. In particular, the document identifies benefits and
- challenges in using a vocabulary for representing statistics. Also, it
- derives lessons that can be used for future work on the vocabulary as
+ vocabulary. In particular, we identify benefits and
+ challenges in using a vocabulary for representing statistics. Also, we
+ derive lessons that can be used for future work on the vocabulary as
well as for useful tools complementing the vocabulary.
</p>
</section>
@@ -58,11 +58,11 @@
deployments</a>. The <a href="http://www.w3.org/2011/gld/">W3C
Government Linked Data Working Group</a> intends to transform the data
cube vocabulary into a W3C recommendation of the RDF Data Cube
- Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>]. The
- document describes use cases that would benefit from using the
- vocabulary. In particular, the document identifies possible benefits
+ Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>]. In this document, we
+ describe use cases that would benefit from using the
+ vocabulary. In particular, we identify possible benefits
and challenges in using such a vocabulary for representing statistics.
- Also, it derives lessons that can motivate future work on the
+ Also, we derive lessons that can motivate future work on the
vocabulary as well as associated tools or services complementing the
vocabulary.
@@ -76,12 +76,10 @@
<p>We use the term "data cube vocabulary" throughout the document
when referring to the vocabulary.</p>
- <section>
- <h3 id="describingstatistics">Describing statistics</h3>
- <p>In the following, we describe the the challenge of authoring an
- RDF vocabulary for publishing statistics as Linked Data.</p>
- <p>Describing statistics - collected and aggregated numeric data -
- is challenging for the following reasons:</p>
+ <p>In the following, we describe the challenge of authoring an RDF
+ vocabulary for publishing statistics as Linked Data. Describing
+ statistics - collected and aggregated numeric data - is challenging
+ for the following reasons:</p>
<ul>
<li>Representing statistics requires more complex modelling as
discussed by Martin Fowler [<cite><a href="#ref-FOWLER97">FOWLER97</a></cite>]:
@@ -91,7 +89,7 @@
statistic is modeled as a distinguishable object, an observation.
</li>
<li>The object describes an observation of a value, e.g., a
- numeric value (e.g., 185) in case of a measurement or a categorical
+ numeric value (e.g., 185) in case of a measurement and a categorical
value (e.g., "blood group A") in case of a categorical observation.</li>
<li>To allow correct interpretation of the value, the observation
needs to be further described by "dimensions" such as the specific
@@ -99,8 +97,8 @@
"January 2013" or a location the observation was done, e.g., "New
York".</li>
<li>To further improve interpretation of the value, attributes
- such as presentational information, e.g. a series title "COINS 2010
- to 2013" or critical information to understanding the data, e.g. the
+ such as presentational information, e.g., a series title "COINS 2010
+ to 2013" or critical information to understanding the data, e.g., the
unit of measure "miles" can be given to observations.</li>
<li>Given background information, e.g., arithmetical and
comparative operations, humans and machines can appropriately
@@ -115,13 +113,11 @@
"multidimensional model" to meet the above challenges in modelling
statistics. It can describe statistics as observations. Observations
exhibit values (Measures) that depend on dimensions (Members of
- Dimensions).
+ Dimensions). Since the SDMX standard has proven applicable in many
+ contexts, the vocabulary adopts the multidimensional model that
+ underlies SDMX and will be compatible with SDMX.
</p>
- <p>Since the SDMX standard has proven applicable in many contexts,
- the vocabulary adopts the multidimensional model that underlies SDMX
- and will be compatible with SDMX.</p>
-
- </section> </section>
+ </section>
<section>
<h2 id="terminology">Terminology</h2>
@@ -186,8 +182,8 @@
A
<dfn>registry</dfn>
allows a publisher to announce that data or metadata exists and to add
- information about how to obtain that data. [<cite><a
- href="#ref-SDMX-21">SDMX 2.1</a></cite>]
+ information about how to obtain that data [<cite><a
+ href="#ref-SDMX-21">SDMX 2.1</a></cite>].
</p>
</section>
@@ -255,14 +251,14 @@
href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">Publishers
may need guidance in communicating the availability of published
statistical data to external parties and to allow automatic
- discovery of statistical data</a>
+ discovery of statistical data</a>.
</li>
</ul>
</section> <section>
<h3 id="UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher
- Use Case: UK government financial data from Combined Online
+ Case Study: UK government financial data from Combined Online
Information System (COINS)</h3>
<p>
<span style="font-size: 10pt">(This use case has been
@@ -276,16 +272,13 @@
machines, data often is simply published as CSV, PDF, XSL etc.,
lacking elaborate metadata, which makes free usage and analysis
difficult.</p>
- <p>Therefore, the goal in this use case is to use a
- machine-readable and application-independent description of common
- statistics with use of open standards, to foster usage and innovation
- on the published data.</p>
<p>
+ Therefore, the goal in this scenario is to use a machine-readable and
+ application-independent description of common statistics with use of
+ open standards, to foster usage and innovation on the published data.
In the "COINS as Linked Data" project [<cite><a
href="#ref-COINS">COINS</a></cite>], the Combined Online Information System
(COINS) shall be published using a standard Linked Data vocabulary.
- </p>
- <p>
Via the Combined Online Information System (COINS), <a
href="http://www.hm-treasury.gov.uk/psr_coins_data.htm">HM
Treasury</a>, the principal custodian of financial data for the UK
@@ -296,16 +289,15 @@
<p>The COINS data has a hypercube structure. It describes financial
transactions using seven independent dimensions (time, data-type,
department etc.) and one dependent measure (value). Also, it allows
- thirty-three attributes that may further describe each transaction.</p>
- <p>COINS is an example of one of the more complex statistical
- datasets being publishing via data.gov.uk.</p>
+ thirty-three attributes that may further describe each transaction.
+ COINS is an example of one of the more complex statistical datasets
+ being publishing via data.gov.uk.</p>
<p>Part of the complexity of COINS arises from the nature of the
- data being released.</p>
+ data being released:</p>
<p>The published COINS datasets cover expenditure related to five
different years (2005–06 to 2009–10). The actual COINS database at HM
Treasury is updated daily. In principle at least, multiple snapshots
of the COINS data could be released through the year.</p>
-
<p>The actual data and its hypercube structure are to be
represented separately so that an application first can examine the
structure before deciding to download the actual data, i.e., the
@@ -344,23 +336,23 @@
</li>
<li>Also, the publisher favours a representation that is both as
self-descriptive as possible, i.e., others can link to and download
- fully-described individual transactions and as compact as possible,
+ fully-described individual transactions, and as compact as possible,
i.e., information is not unnecessarily repeated. This challenge
supports lesson: <a
href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">Publishers
and consumers may need guidance in checking and making use of
- well-formedness of published data using data cube</a>
+ well-formedness of published data using data cube</a>.
</li>
<li>Moreover, the publisher is thinking about the possible
benefit of publishing slices of the data, e.g., datasets that fix all
dimensions but the time dimension. For instance, such slices could be
particularly interesting for visualisations or comments. However,
depending on the number of Dimensions, the number of possible slices
- can become large which makes it difficult to select all interesting
- slices. This challenge supports lesson: <a
+ can become large which makes it difficult to semi-automatically
+ select all interesting slices. This challenge supports lesson: <a
href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Publishers
may need more guidance in creating and managing slices or arbitrary
- groups of observations</a>
+ groups of observations</a>.
</li>
<li>An important benefit of linked data is that we are able to
annotate data, at a fine-grained level of detail, to record
@@ -380,14 +372,14 @@
supports lesson: <a
href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">Publishers
may need guidance in making transparent the pre-processing of
- aggregate statistics</a>
+ aggregate statistics</a>.
</li>
<li>A challenge also is the size of the data, especially since it
is updated regularly. Five data files already contain between 3.3 and
4.9 million rows of data. This challenge supports lesson: <a
href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">Publishers
and consumers may need more guidance in efficiently processing data
- using the data cube vocabulary</a>
+ using the data cube vocabulary</a>.
</li>
</ul>
@@ -563,7 +555,7 @@
Region using the SUM function.</p>
<p>Similarly, for many uses then population broken down by some
- category (e.g. ethnicity) is expressed as a percentage. Separate
+ category (e.g., ethnicity) is expressed as a percentage. Separate
datasets give the actual counts per category and aggregate counts. In
such cases it is common to talk about the denominator (often DENOM)
which is the aggregate count against which the percentages can be
@@ -571,12 +563,13 @@
<h4>Benefits</h4>
<ul>
- <li>Expressing aggregation relationships would allow engines to
- automatically derive statistics on higher aggregation levels.</li>
+ <li>Expressing aggregation relationships would allow query
+ engines to automatically derive statistics on higher aggregation
+ levels.</li>
<li>Vice versa, representing further aggregated datasets would
allow to answer queries with a simple lookup instead of computations
which may be more time consuming or require specific features of the
- query engine.</li>
+ query engine (e.g., SPARQL 1.1).</li>
</ul>
@@ -603,7 +596,7 @@
</section> <section>
<h3 id="PublishingslicesofdataaboutUKBathingWaterQuality">Publisher
- Use Case: Publishing Observational Data Sets about UK Bathing Water
+ Case Study: Publishing Observational Data Sets about UK Bathing Water
Quality</h3>
<p>
<span style="font-size: 10pt">(Use case has been provided by
@@ -613,7 +606,7 @@
</span>
</p>
<p>
- As part of their work with data.gov.uk and the UK Location Programme
+ As part of their work with data.gov.uk and the UK Location Programme,
Epimorphics Ltd have been working to pilot the publication of both
current and historic bathing water quality information from the <a
href="http://www.environment-agency.gov.uk/">UK Environment
@@ -644,12 +637,12 @@
data API configuration which makes the data available for re-use in
additional formats such as JSON and CSV.</li>
<li>Publishing bathing-water quality information in this way will
- enable the Environment Agency to meet the needs of its many data
+ 1) enable the Environment Agency to meet the needs of its many data
consumers in a uniform way rather than through diverse pairwise
- arrangements; preempt requests for specific data; and enable a larger
- community of web and mobile application developers and value-added
- information aggregators to use and re-use bathing-water quality
- information sourced by the environment agency.</li>
+ arrangements 2) preempt requests for specific data and 3) enable a
+ larger community of web and mobile application developers and
+ value-added information aggregators to use and re-use bathing-water
+ quality information sourced by the environment agency.</li>
</ul>
<h4>Challenges</h4>
@@ -687,7 +680,7 @@
</li>
</ul>
</section> <section>
- <h3 id="MetOfficeCaseStudy">Publisher Case study: Site specific
+ <h3 id="MetOfficeCaseStudy">Publisher Case Study: Site specific
weather forecasts from Met Office, the UK's National Weather Service</h3>
<span style="font-size: 10pt">(This section contributed by Dave
Reynolds)</span>
@@ -715,28 +708,32 @@
</ul>
<h4>Challenges</h4>
+
+ <p>This weather forecasts case study leads to the following
+ challenges:</p>
+
<h5>ISO19156 compatibility</h5>
<p>
The World Meteorological Organization (WMO) develops and recommends
data interchange standard and within that community compatibility with
ISO19156 <em>"Geographic information — Observations and
- measurements"</em> (O&M) is regarded as important. Thus, this supports
+ measurements"</em> (O&M) is regarded as important. Thus, this supports
lesson <a
href="#VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Modelers
- using ISO19156 - Observations & Measurements may need clarification
- regarding the relationship to the data cube vocabulary</a>
+ using ISO19156 - Observations & Measurements may need
+ clarification regarding the relationship to the data cube vocabulary</a>.
</p>
- <b>Solution:</b>
- <p>O&M provides a data model for an Observation with associated
+ <b>Solution in this case study:</b>
+ <p>O&M provides a data model for an Observation with associated
Phenomenon, measurement ProcessUsed, Domain (feature of interest) and
Result. Prototype vocabularies developed at CSIRO and extended within
this project allow this data model to be represented in RDF. For the
site specific forecasts then a 5-day forecast for all 5000+ sites is
- regarded as a single O&M Observation.</p>
+ regarded as a single O&M Observation.</p>
<p>
- To represent the forecast data itself, the Result in the O&M model,
- then the relevant standard is ISO19123 <em>"Geographic
+ To represent the forecast data itself, the Result in the O&M
+ model, then the relevant standard is ISO19123 <em>"Geographic
information — Schema for coverage geometry and functions"</em>. This
provides a data model for a Coverage which can represent a set of
values across some space. It defines different types of Coverage
@@ -754,8 +751,8 @@
Note that in this situation an <em>observation</em> in the sense of
<code>qb:Observation</code>
and an <em>observation</em> in the sense of ISO19156 Observations and
- Measurements are different things. The O&M Observation is the whole
- forecast whereas each
+ Measurements are different things. The O&M Observation is the
+ whole forecast whereas each
<code>qb:Observation</code>
corresponds to a single GeometryValuePair within the forecast results
Coverage.
@@ -771,18 +768,18 @@
lesson <a
href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">
Publishers and consumers may need more guidance in efficiently
- processing data using the data cube vocabulary.</a>
+ processing data using the data cube vocabulary</a>.
</p>
- <b>Solution:</b>
+ <b>Solution in this case study:</b>
<p>Regarding bandwidth costs then the key is not raw data volume
but compressibility, since such data is transmitted in compressed
- form. A Turtle representation of an non-abbreviated data cube
+ form. A Turtle representation of a non-abbreviated data cube
compressed to within 15-20% of the size of compressed, handcrafted XML
and JSON representations. Thus obviating the need for abbreviations or
custom serialization.</p>
</section> <section>
- <h3 id="EurostatSDMXasLinkedData">Publisher Use Case: Eurostat
+ <h3 id="EurostatSDMXasLinkedData">Publisher Case Study: Eurostat
SDMX as Linked Data</h3>
<p>
<span style="font-size: 10pt">(This use case has been taken
@@ -800,21 +797,22 @@
and Metadata eXchange [<cite><a href="#ref-SDMX">SDMX</a></cite>].
Since this standard has proven applicable in many contexts, we adopt
the multidimensional model that underlies SDMX and intend the standard
- vocabulary to be compatible to SDMX.
+ vocabulary to be compatible to SDMX. Therefore, in this use case we
+ explain the benefit and challenges of publishing SDMX data as Linked
+ Data.
</p>
<p>
- Therefore, in this use case we intend to explain the benefit and
- challenges of publishing SDMX data as Linked Data. As one of the main
- adopters of SDMX, <a href="http://epp.eurostat.ec.europa.eu/">Eurostat</a>
- publishes large amounts of European statistics coming from a data
- warehouse as SDMX and other formats on the web. Eurostat also provides
- an interface to browse and explore the datasets. However, linking such
+ As one of the main adopters of SDMX, <a
+ href="http://epp.eurostat.ec.europa.eu/">Eurostat</a> publishes large
+ amounts of European statistics coming from a data warehouse as SDMX
+ and other formats on the web. Eurostat also provides an interface to
+ browse and explore the datasets. However, linking such
multidimensional data to related data sets and concepts would require
downloading of interesting datasets and manual integration.The goal
here is to improve integration with other datasets; Eurostat data
should be published on the web in a machine-readable format, possible
- to be linked with other datasets, and possible to be freeley consumed
+ to be linked with other datasets, and possible to be freely consumed
by applications. Both <a href="http://estatwrap.ontologycentral.com/">Eurostat
Linked Data Wrapper</a> and <a
href="http://eurostat.linked-statistics.org/">Linked Statistics
@@ -841,7 +839,7 @@
<ul>
<li>Possible implementation of ETL pipelines based on Linked Data
technologies (e.g., <a href="http://code.google.com/p/ldspider/">LDSpider</a>)
- to effectively load the data into a data warehouse for analysis
+ to effectively load the data into a data warehouse for analysis.
</li>
<li>Allows useful queries to the data, e.g., comparison of
@@ -877,7 +875,11 @@
<li>In the Eurostat Linked Data Wrapper, there is a timeout for
transforming SDMX to Linked Data, since Google App Engine is used.
Mechanisms to reduce the amount of data that needs to be translated
- would be needed.</li>
+ would be needed, again supporting lesson <a
+ href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">
+ Publishers and consumers may need more guidance in efficiently
+ processing data using the data cube vocabulary.</a>
+ </li>
<li>Provide a useful interface for browsing and visualising the
data. One problem is that the data sets have too high dimensionality
@@ -894,18 +896,15 @@
<li>Each dimension used by a dataset has a range of permitted
values that need to be described.</li>
- <li>The Eurostat SDMX as Linked Data use case suggests to have
- time lines on data aggregating over the gender dimension. This
- supports lesson <a
+ <li>The Eurostat SDMX as Linked Data use case provides data on a
+ gender level and on a level aggregating over the gender level. This
+ suggests to have time lines on data aggregating over the gender
+ dimension, supporting lesson <a
href="#Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">
Publishers may need guidance in how to represent common analytical
operations such as Slice, Dice, Rollup on data cubes</a>.
</li>
- <li>The Eurostat SDMX as Linked Data use case suggests to provide
- data on a gender level and on a level aggregating over the gender
- dimension.</li>
-
<li>Updates to the data
<ul>
@@ -935,12 +934,12 @@
<li>Eurostat - Linked Data provides SPARQL endpoint for the
metadata (not the observations).</li>
<li>Eurostat Linked Data Wrapper provides resolvable URIs to
- datasets that return all observations of the dataset. Also, every
- dataset serves the URI of its data structure definition (dsd). The
- dsd URI returns all RDF describing the dataset. Separating
+ datasets (ds) that return all observations of the dataset. Also,
+ every dataset serves the URI of its data structure definition (dsd).
+ The dsd URI returns all RDF describing the dataset. Separating
information resources for dataset and data structure definition
allows for example to first gather the dsd and only for actual query
- execution resolve ds URIs.</li>
+ execution resolve the ds.</li>
</ul>
<li>Browsing and visualising interface:
@@ -951,7 +950,7 @@
href="#Consumersmayneedguidanceinconversionsintoformats">
Consumers may need guidance in conversions into formats that can
easily be displayed and further investigated in tools such as
- Google Data Explorer, R, Weka etc.</a>.
+ Google Data Explorer, R, Weka etc.</a>
</li>
</ul>
@@ -999,15 +998,15 @@
<p>
See <a href="http://treo.deri.ie/cogs/example/swpm2012.htm">SWPM
- 2012 Provenance Example</a> for screenshots.
+ 2012 Provenance Example</a> for screenshots about this use case.
</p>
<h4>Benefits</h4>
<p>Making transparent the transformation a dataset has been exposed
- to. Increases trust in the data.</p>
+ to and thereby increasing trust in the data.</p>
- <p>Challenges:</p>
+ <h4>Challenges</h4>
<ul>
<li>Operations on statistical data result in new statistical
@@ -1020,7 +1019,7 @@
</li>
<li>Should Data Cube support explicit declaration of such
relationships either between separated qb:DataSets or between
- measures with a single <code>qb:DataSet</code> (e.g. <code>ex:populationCount</code>
+ measures with a single <code>qb:DataSet</code> (e.g., <code>ex:populationCount</code>
and <code>ex:populationPercent</code>)?
</li>
<li>If so should that be scoped to simple, common relationships
@@ -1045,11 +1044,12 @@
</section> <section>
<h3 id="Simplechartvisualisationsofpublishedstatisticaldata">Consumer
- Use Case: Simple chart visualisations of (integrated) published
- statistical data</h3>
+ Case Study: Simple chart visualisations of (integrated) published
+ climate sensor data</h3>
<p>
<span style="font-size: 10pt">(Use case taken from <a
- href="http://www.iwrm-smart.org/">SMART research project</a>)
+ href="http://www.iwrm-smart.org/">SMART natural sciences research
+ project</a>)
</span>
</p>
@@ -1067,7 +1067,8 @@
climate in the Lower Jordan Valley) shall be visualized for scientists
and decision makers. Statistics should also be possible to be
integrated and displayed together. The data is available as XML files
- on the web. On a separate website, specific parts of the data shall be
+ on the web which are re-published as Linked Data using the data cube
+ vocabulary. On a separate website, specific parts of the data shall be
queried and visualized in simple charts, e.g., line diagrams.
</p>
@@ -1132,8 +1133,9 @@
that shall be visualized and explored.
</p>
<p>In this use case, the goal is to take statistical data published
- on the web and to transform it into DSPL for visualization and
- exploration with as few effort as possible.</p>
+ as Linked Data re-using the data cube vocabulary and to transform it
+ into DSPL for visualization and exploration using GPDE with as few
+ effort as possible.</p>
<p>For instance, Eurostat data about Unemployment rate downloaded
from the web as shown in the following figure:</p>
@@ -1153,7 +1155,8 @@
<h4>Benefits</h4>
<ul>
- <li>Easy to visualise QB data.</li>
+ <li>Easy to visualise statistics published using the data cube
+ vocabulary.</li>
<li>There could be a process of first transforming data into RDF
for further preprocessing and integration and then of loading it into
GPDE for visualisation.</li>
@@ -1171,7 +1174,7 @@
</li>
<li>Define a mapping between data cube and DSPL. DSPL is
representative for using statistical data published on the web in
- available tools for analysis. Similar tools that may be automatically
+ available tools for analysis. Similar tools that may additionally be
covered are: Weka (arff data format), Tableau, SPSS, STATA, PC-Axis
etc. This supports lesson <a
href="#Consumersmayneedguidanceinconversionsintoformats">
@@ -1183,8 +1186,8 @@
</section> <section>
<h3 id="AnalysingpublishedstatisticaldatawithcommonOLAPsystems">Consumer
- Use Case: Analysing published statistical data with common OLAP
- systems</h3>
+ Case Study: Analysing published financial (XBRL) data from the SEC
+ with common OLAP systems</h3>
<p>
<span style="font-size: 10pt">(Use case taken from <a
href="http://xbrl.us/research/appdev/Pages/275.aspx">Financial
@@ -1195,7 +1198,7 @@
<p>
Online Analytical Processing (OLAP) [<cite><a href="#ref-OLAP">OLAP</a></cite>]
is an analysis method on multidimensional data. It is an explorative
- analysis methode that allows users to interactively view the data on
+ analysis method that allows users to interactively view the data on
different angles (rotate, select) or granularities (drill-down,
roll-up), and filter it for specific information (slice, dice).
</p>
@@ -1243,10 +1246,10 @@
<h4>Benefits</h4>
<ul>
+ <li>Data Cube model well-known to many people in industry.</li>
<li>OLAP operations cover typical business requirements, e.g.,
- slice, dice, drill-down.</li>
- <li>OLAP frontends intuitive interactive, explorative, fast.
- Interfaces well-known to many people in industry.</li>
+ slice, dice, drill-down and can be issued via intuitive, interactive,
+ explorative, fast OLAP frontends.</li>
<li>OLAP functionality provided by many tools that may be reused</li>
</ul>
@@ -1294,8 +1297,8 @@
<p>
After statistics have been published as Linked Data, the question
- remains how to communicate the publication and let users discover the
- statistics. There are catalogs to register datasets, e.g., CKAN, <a
+ remains how to communicate the publication and to let users discover
+ the statistics. There are catalogs to register datasets, e.g., CKAN, <a
href="http://www.datacite.org/">datacite.org</a>, <a
href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a
href="http://pangaea.de/">Pangea</a>. Those catalogs require specific
@@ -1318,7 +1321,16 @@
deployments in the Linked Data cloud.
</p>
<h4>Benefits</h4>
- <p>Potential consumers will be pointed to published statistics.</p>
+ <ul>
+ <li>Datasets may automatically be discovered by web or data
+ crawlers.</li>
+ <li>Potential consumers will be pointed to published statistics
+ in search engines if searching for related information.</li>
+ <li>Users can use keyword search or structured queries for
+ specific datasets they may be interested in.</li>
+ <li>Applications and users are told about licenses, download
+ capabilities etc. of datasets.</li>
+ </ul>
<h4>Challenges</h4>
<ul>
@@ -1332,12 +1344,12 @@
statistical data to external parties and to allow automatic
discovery of statistical data</a>
</li>
- <li>Define mapping between data cube vocabulary and data
- catalogue descriptions. If data catalogs contain statistics, they do
- not expose those using Linked Data but for instance using CSV or HTML
- (e.g., Pangea). Therefore, it could also be a use case to publish
- such data using the data cube vocabulary. An example would be data
- described using the Data Documentation Initiative (DDI).</li>
+ <li>Define mapping between data cube vocabulary and data catalog
+ descriptions. If data catalogs contain statistics, they do not expose
+ those using Linked Data but for instance using CSV, HTML (e.g.,
+ Pangea) or XML (e.g., DDI - Data Documentation Initiative).
+ Therefore, it could also be a use case to publish such data using the
+ data cube vocabulary.</li>
</ul>
</section> </section>
@@ -1415,6 +1427,23 @@
<code>qb:codeList</code>
.
</p>
+ <p>
+ Richard Cyganiak gave a summary of different options for specifying
+ the allowed dimension values of a coded property, possibly including
+ hierarchies (see <a
+ href="http://lists.w3.org/Archives/Public/public-gld-wg/2013Mar/0108.html">mail</a>):
+ </p>
+
+ <ol>
+ <li>All instances of a given rdfs:Class (via rdf:type).</li>
+ <li>All skos:Concepts in a given skos:ConceptScheme (via
+ skos:inScheme).</li>
+ <li>All skos:Concepts in a given skos:Collection or its
+ subcollections (via skos:member).</li>
+ <li>All resources that are roots, or children of a root, of a
+ qb:HierarchicalCodeList.</li>
+ </ol>
+
<p>Background information:</p>
<ul>
<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
@@ -1431,18 +1460,14 @@
</section> <section>
<h3
id="VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Modelers
- using ISO19156 - Observations & Measurements may need clarification
+ using ISO19156 - Observations&Measurements may need clarification
regarding the relationship to the data cube vocabulary</h3>
<p>An number of organizations, particularly in the Climate and
Meteorological area already have some commitment to the OGC
- "Observations and Measurements" (O&M) logical data model, also
+ "Observations and Measurements" (O&M) logical data model, also
published as ISO 19156. Are there any statements about compatibility
- and interoperability between O&M and Data Cube that can be made to
+ and interoperability between O&M and Data Cube that can be made to
give guidance to such organizations?</p>
- <p>Background information:</p>
- <ul>
- <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/32">http://www.w3.org/2011/gld/track/issues/32</a></li>
- </ul>
<p>
Partly solved by description for <a
@@ -1451,6 +1476,11 @@
National Weather Service</a>.
</p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/32">http://www.w3.org/2011/gld/track/issues/32</a></li>
+ </ul>
+
</section> <section>
<h3
id="Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">Publishers
@@ -1491,8 +1521,8 @@
id="Publishersmayneedguidanceinconversionsfromcommonstatisticalrepresentations">Publishers
may need guidance in conversions from common statistical
representations such as CSV, Excel, ARFF etc.</h3>
-
- <p>Background information:</p>
+
+ <p>Background information:</p>
<ul>
<li>None.</li>
</ul>
@@ -1502,8 +1532,8 @@
may need guidance in conversions into formats that can easily be
displayed and further investigated in tools such as Google Data
Explorer, R, Weka etc.</h3>
-
- <p>Background information:</p>
+
+ <p>Background information:</p>
<ul>
<li>None.</li>
</ul>
@@ -1552,7 +1582,7 @@
<dt id="ref-COGS">[COGS]</dt>
<dd>
- Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S., & Curry, E.
+ Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S.,&Curry, E.
(2012). Representing Interoperable Provenance Descriptions for ETL
Workflows. ESWC 2012 Workshop Highlights (pp. 1–15). Springer Verlag,
2012 (in press). (Extended Paper published in Conf. Proceedings.). <a