--- a/data-cube-ucr/index.html Thu Jul 18 16:24:29 2013 +0100
+++ b/data-cube-ucr/index.html Fri Jul 19 18:13:44 2013 +0100
@@ -21,7 +21,7 @@
<p>Many national, regional and local governments, as well as other
organizations in- and outside of the public sector, collect numeric
data and aggregate this data into statistics. There is a need to
- publish these statistics in a standardised, machine-readable way on
+ publish these statistics in a standardized, machine-readable way on
the web, so that they can be freely integrated and reused in consuming
applications.</p>
<p>
@@ -29,7 +29,7 @@
Government Linked Data Working Group</a> presents use cases and lessons
supporting a recommendation of the RDF Data Cube Vocabulary [<cite><a
href="#ref-QB-2013">QB-2013</a></cite>]. We describe case studies of
- existing deployments of an earlier version of the data cube vocabulary
+ existing deployments of an earlier version of the Data Cube Vocabulary
[<cite><a href="#ref-QB-2010">QB-2010</a></cite>] as well as other
possible use cases that would benefit from using the vocabulary. In
particular, we identify benefits and challenges in using a vocabulary
@@ -52,12 +52,12 @@
<h2 id="introduction">Introduction</h2>
The aim of this document is to present concrete use cases and lessons
for a vocabulary to publish statistics as Linked Data. An earlier
- version of the data cube vocabulary [<cite><a
+ version of the Data Cube Vocabulary [<cite><a
href="#ref-QB-2010">QB-2010</a></cite>] has existed for some time and has
proven applicable in <a href="http://wiki.planet-data.eu/web/Datasets">several
deployments</a>. The <a href="http://www.w3.org/2011/gld/">W3C
Government Linked Data Working Group</a> intends to transform the data
- cube vocabulary into a W3C recommendation of the RDF Data Cube
+ cube vocabulary into a W3C Recommendation of the RDF Data Cube
Vocabulary [<cite><a href="#ref-QB-2013">QB-2013</a></cite>]. In this
document, we describe use cases that would benefit from using the
vocabulary. In particular, we identify possible benefits and challenges
@@ -66,21 +66,21 @@
associated tools or services complementing the vocabulary.
<p>The rest of this document is structured as follows. We will
- first give a short introduction to modelling statistics. Then, we will
+ first give a short introduction to modeling statistics. Then, we will
describe use cases that have been derived from existing deployments or
- from feedback to the earlier version of the data cube vocabulary. In
+ from feedback to the earlier version of the Data Cube Vocabulary. In
particular, we describe possible benefits and challenges of use cases.
Afterwards, we will describe lessons derived from the use cases.</p>
- <p>We use the term "data cube vocabulary" throughout the document
+ <p>We use the term "Data Cube Vocabulary" throughout the document
when referring to the vocabulary.</p>
<p>In the following, we describe the challenge of authoring an RDF
vocabulary for publishing statistics as Linked Data. Describing
- statistics - collected and aggregated numeric data - is challenging
+ statistics — collected and aggregated numeric data — is challenging
for the following reasons:</p>
<ul>
- <li>Representing statistics requires more complex modelling as
+ <li>Representing statistics requires more complex modeling as
discussed by Martin Fowler [<cite><a href="#ref-FOWLER97">FOWLER97</a></cite>]:
Recording a statistic simply as an attribute to an object (e.g., the
fact that a person weighs 185 pounds) fails to represent important
@@ -88,12 +88,12 @@
statistic is modeled as a distinguishable object, an observation.
</li>
<li>The object describes an observation of a value, e.g., a
- numeric value (e.g., 185) in case of a measurement and a categorical
- value (e.g., "blood group A") in case of a categorical observation.</li>
+ numeric value (e.g., 185) in the case of a measurement and a categorical
+ value (e.g., "blood group A") in the case of a categorical observation.</li>
<li>To allow correct interpretation of the value, the observation
needs to be further described by "dimensions" such as the specific
phenomenon, e.g., "weight", the time the observation is valid, e.g.,
- "January 2013" or a location the observation was done, e.g., "New
+ "January 2013" or a location where the observation was made, e.g., "New
York".</li>
<li>To further improve interpretation of the value, attributes
such as presentational information, e.g., a series title "COINS 2010
@@ -101,15 +101,15 @@
unit of measure "miles" can be given to observations.</li>
<li>Given background information, e.g., arithmetical and
comparative operations, humans and machines can appropriately
- visualize such observations or have conversions between different
+ visualize such observations or perform conversions between different
quantities.</li>
</ul>
<p>
The Statistical Data and Metadata eXchange [<cite><a
- href="#ref-SDMX">SDMX</a></cite>] - the ISO standard for exchanging and
- sharing statistical data and metadata among organizations - uses a
- "multidimensional model" to meet the above challenges in modelling
+ href="#ref-SDMX">SDMX</a></cite>] — the ISO standard for exchanging and
+ sharing statistical data and metadata among organizations — uses a
+ "multidimensional model" to meet the above challenges in modeling
statistics. It can describe statistics as observations. Observations
exhibit values (Measures) that depend on dimensions (Members of
Dimensions). Since the SDMX standard has proven applicable in many
@@ -127,9 +127,7 @@
Statistics comprise statistical data.
</p>
- <p>
-
- The basic structure of
+ <p>The basic structure of
<dfn>statistical data</dfn>
is a multidimensional table (also called a data cube) [<cite><a
href="#ref-SDMX">SDMX</a></cite>], i.e., a set of observed values organized
@@ -148,7 +146,7 @@
<p>
<dfn>Source data</dfn>
- is data from datastores such as relational databases or spreadsheets
+ is data from data stores such as relational databases or spreadsheets
that acts as a source for the Linked Data publishing process.
</p>
@@ -204,24 +202,24 @@
</p>
<p>Since we have adopted the multidimensional model that underlies
SDMX, we also adopt the "Web Dissemination Use Case" which is the
- prime use case for SDMX since it is an increasing popular use of SDMX
+ prime use case for SDMX since it is an increasingly popular use of SDMX
and enables organizations to build a self-updating dissemination
system.</p>
<p>The Web Dissemination Use Case contains three actors, a
- structural metadata web service (registry) that collects metadata
- about statistical data in a registration fashion, a data web service
+ structural metadata Web service (registry) that collects metadata
+ about statistical data in a registration fashion, a data Web service
(publisher) that publishes statistical data and its metadata as
- registered in the structural metadata web service, and a data
+ registered in the structural metadata Web service, and a data
consumption application (consumer) that first discovers data from the
registry, then queries data from the corresponding publisher of
- selected data, and then visualises the data.</p>
+ selected data, and then visualizes the data.</p>
<h4>Benefits</h4>
<ul>
<li>A structural metadata source (registry) can collect metadata
about statistical data.</li>
- <li>A data web service (publisher) can register statistical data
+ <li>A data Web service (publisher) can register statistical data
in a registry, and can provide statistical data from a database and
metadata from a metadata repository for consumers. For that, the
publisher creates database tables, and loads statistical data in a
@@ -235,19 +233,19 @@
database as well as metadata repository and return the statistical
data and metadata.</li>
- <li>The consumer can visualise the returned statistical data and
+ <li>The consumer can visualize the returned statistical data and
metadata.</li>
</ul>
<h4>Challenges</h4>
<ul>
<li>This use case is too abstract. The SDMX Web Dissemination Use
- Case can be concretised by several sub-use cases, detailed in the
+ Case can be concretized by several sub-use cases, detailed in the
following sections.</li>
<li>In particular, this use case requires a recommended way to
advertise published statistical datasets, which supports the
following lesson: <a
- href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">Publishers
+ href="#pubGuidance">Publishers
may need guidance in communicating the availability of published
statistical data to external parties and to allow automatic
discovery of statistical data</a>.
@@ -261,19 +259,19 @@
Information System (COINS)</h3>
<p>
<span style="font-size: 10pt">(This use case has been
- summarised from Ian Dickinson et al. [<cite><a
+ summarized from Ian Dickinson et al. [<cite><a
href="#ref-COINS">COINS</a></cite>])
</span>
</p>
<p>More and more organizations want to publish statistics on the
- web, for reasons such as increasing transparency and trust. Although
+ Web, for reasons such as increasing transparency and trust. Although,
in the ideal case, published data can be understood by both humans and
machines, data often is simply published as CSV, PDF, XSL etc.,
lacking elaborate metadata, which makes free usage and analysis
difficult.</p>
<p>
Therefore, the goal in this scenario is to use a machine-readable and
- application-independent description of common statistics with use of
+ application-independent description of common statistics, expressed using
open standards, to foster usage and innovation on the published data.
In the "COINS as Linked Data" project [<cite><a
href="#ref-COINS">COINS</a></cite>], the Combined Online Information System
@@ -282,7 +280,7 @@
href="http://www.hm-treasury.gov.uk/psr_coins_data.htm">HM
Treasury</a>, the principal custodian of financial data for the UK
government, releases previously restricted financial information about
- government spendings.
+ government spending.
</p>
<p>The COINS data has a hypercube structure. It describes financial
@@ -296,12 +294,12 @@
<p>The published COINS datasets cover expenditure related to five
different years (2005–06 to 2009–10). The actual COINS database at HM
Treasury is updated daily. In principle at least, multiple snapshots
- of the COINS data could be released through the year.</p>
+ of the COINS data could be released throughout the year.</p>
<p>The actual data and its hypercube structure are to be
represented separately so that an application first can examine the
structure before deciding to download the actual data, i.e., the
- transactions. The hypercube structure also defines for each dimension
- and attribute a range of permitted values that are to be represented.</p>
+ transactions. The hypercube structure also defines, for each dimension
+ and attribute, a range of permitted values that are to be represented.</p>
<p>An access or query interface to the COINS data, e.g., via a
SPARQL endpoint or the linked data API, is planned. Queries that are
expected to be interesting are: "spending for one department", "total
@@ -313,50 +311,50 @@
publishing COINS as Linked Data are threefold:</p>
<ul>
- <li>Using open standard representation makes it easier to work
- with the data with available technologies and promises innovative
- third-party tools and usages</li>
- <li>Individual transactions and groups of transactions are given
- an identity, and so can be referenced by web address (URL), to allow
+ <li>using an open standard representation makes it easier to work
+ with the data using available technologies and promises innovative
+ third-party tools and usages;</li>
+ <li>individual transactions and groups of transactions are given
+ an identity, and so can be referenced by Web address (URL), to allow
them to be discussed, annotated, or listed as source data for
- articles or visualizations</li>
- <li>Cross-links between linked-data datasets allow for much
- richer exploration of related datasets</li>
+ articles or visualizations;</li>
+ <li>cross-links between linked-data datasets allow for much
+ richer exploration of related datasets.</li>
</ul>
<h4>Challenges</h4>
<p>The COINS use case leads to the following challenges:</p>
<ul>
- <li>Although not originally intended, the data cube vocabulary
+ <li>Although not originally intended, the Data Cube Vocabulary
could be successfully used for publishing financial data, not just
statistics. This has also been shown by the <a
- href="http://data.gov.uk/resources/payments">Payment Ontology</a>.
+ href="http://data.gov.uk/resources/payments">Payments Ontology</a>.
</li>
- <li>Also, the publisher favours a representation that is both as
+ <li>Also, the publisher favors a representation that is both as
self-descriptive as possible, i.e., others can link to and download
fully-described individual transactions, and as compact as possible,
i.e., information is not unnecessarily repeated. This challenge
supports lesson: <a
- href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">Publishers
+ href="#criteriaForWell">Publishers
and consumers may need guidance in checking and making use of
well-formedness of published data using data cube</a>.
</li>
<li>Moreover, the publisher is thinking about the possible
benefit of publishing slices of the data, e.g., datasets that fix all
dimensions but the time dimension. For instance, such slices could be
- particularly interesting for visualisations or comments. However,
- depending on the number of Dimensions, the number of possible slices
+ particularly interesting for visualizations or comments. However,
+ depending on the number of dimensions, the number of possible slices
can become large which makes it difficult to semi-automatically
select all interesting slices. This challenge supports lesson: <a
- href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Publishers
+ href="#clarify">Publishers
may need more guidance in creating and managing slices or arbitrary
groups of observations</a>.
</li>
<li>An important benefit of linked data is that we are able to
annotate data, at a fine-grained level of detail, to record
- information about the data itself. This includes where it came from –
- the provenance of the data – but could include annotations from
+ information about the data itself. This includes where it came from —
+ the provenance of the data — but could include annotations from
reviewers, links to other useful resources, etc. Being able to trust
that data to be correct and reliable is a central value for
government-published data, so recording provenance is a key
@@ -369,16 +367,16 @@
additional supplementary information which they derive from the data,
for example by cross-linking to other datasets. This challenge
supports lesson: <a
- href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">Publishers
+ href="#declaringRel">Publishers
may need guidance in making transparent the pre-processing of
aggregate statistics</a>.
</li>
<li>A challenge also is the size of the data, especially since it
is updated regularly. Five data files already contain between 3.3 and
4.9 million rows of data. This challenge supports lesson: <a
- href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">Publishers
+ href="#mechRec">Publishers
and consumers may need more guidance in efficiently processing data
- using the data cube vocabulary</a>.
+ using the Data Cube Vocabulary</a>.
</li>
</ul>
@@ -408,12 +406,11 @@
Linked Data.
</p>
- <p>Those excel sheets contain single spreadsheets with several
+ <p>Those Excel sheets contain single spreadsheets with several
multidimensional data tables, having a name and notes, as well as
column values, row values, and cell values.</p>
- <p>
- Another concrete example is the <a
+ <p>Another concrete example is the <a
href="http://ontowiki.net/Projects/Stats2RDF?show_comments=1">Stats2RDF</a>
project that intends to publish biomedical statistical data that is
represented as Excel sheets. Here, Excel files are first translated
@@ -430,20 +427,20 @@
and cell values.</li>
<li>All context and so all meaning of the measurement point is
expressed by means of dimensions. The pure number is the star of an
- ego-network of attributes or dimensions. In a RDF representation it
+ ego-network of attributes or dimensions. In an RDF representation it
is then easily possible to define hierarchical relationships between
the dimensions (that can be exemplified further) as well as mapping
different attributes across different value points. This way a
harmonization among variables is performed around the measurement
points themselves.</li>
- <li>Novel visualisation of census data</li>
+ <li>Novel visualization of census data</li>
<li>Possible integration with provenance vocabularies, e.g.,
PROV-O, for tracking of harmonization steps</li>
<li>In historical research, until now, harmonization across
datasets is performed by hand, and in subsequent iterations of a
database: it is very hard to trace back the provenance of decisions
made during the harmonization procedure. Publishing the census data
- as Linked Data may allow (semi-)automatical harmonization.</li>
+ as Linked Data may allow (semi-)automatic harmonization.</li>
</ul>
<h4>Challenges</h4>
@@ -451,16 +448,16 @@
<li>Semi-structured information, e.g., notes about lineage of
data cells, may not be possible to be formalized. This supports
lesson <a
- href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">Publishers
+ href="#declaringRel">Publishers
may need guidance in making transparent the pre-processing of
aggregate statistics</a>
</li>
<li>Combining Data Cube with SKOS [<cite><a
href="#ref-skos">SKOS</a></cite>] to allow for cross-location and
cross-time historical analysis, supporting lesson <a
- href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Publishers
+ href="#heirarchic">Publishers
may need more guidance to decide which representation of hierarchies
- is most suitable for their use case</a>
+ is most suitable for their use case</a>.
</li>
<li>These challenges may seem to be particular to the field of
historical research, but in fact apply to government information at
@@ -469,21 +466,21 @@
bodies, scattered across multiple levels, jurisdictions and areas.
Publishing government information in a consistent, integrated manner
requires exactly the type of harmonization required in this use case.</li>
- <li>Define a mapping between Excel and the data cube vocabulary.
+ <li>Define a mapping between Excel and the Data Cube Vocabulary.
Excel spreadsheets are representative for other common representation
formats for statistics such as CSV, XBRL, ARFF, which supports lesson
<a
- href="#publishers-may-need-guidance-in-conversions-from-common-statistical-representations-such-as-csv-excel-arff-etc.">Publishers
+ href="#excelCSV">Publishers
may need guidance in conversions from common statistical
representations such as CSV, Excel, ARFF etc.</a>
</li>
- <li>Excel sheets provide much flexibility in arranging
+ <li>Excel sheets provide a great deal of flexibility in arranging
information. It may be necessary to limit this flexibility to allow
automatic transformation.</li>
<li>There may be many spreadsheets which supports lesson <a
- href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">Publishers
+ href="#mechRec">Publishers
and consumers may need more guidance in efficiently processing data
- using the data cube vocabulary</a></li>
+ using the Data Cube Vocabulary</a></li>
</ul>
@@ -524,24 +521,19 @@
ex:obs5
sdmx:refArea <northernireland>;
sdmx:refPeriod "2011";
- ex:population "2" .
-
-
- </pre>
+ ex:population "2" . </pre>
<p>
We are looking for the best way (in the context of the RDF/Data
- Cube/SDMX approach) to express that the values for the
- England/Scotland/Wales/ Northern Ireland ought to add up to the value
+ Cube/SDMX approach) to express that the values for
+ England, Scotland, Wales & Northern Ireland ought to add up to the value
for the UK and constitute a more detailed breakdown of the overall UK
- figure? Since we might also have population figures for France,
- Germany, EU27, it is not as simple as just taking a
- <code>qb:Slice</code>
- where you fix the time period and the measure.
+ figure. Since we might also have population figures for France,
+ Germany, EU28 etc., it is not as simple as just taking a
+ <code>qb:Slice</code> where you fix the time period and the measure.
</p>
- <p>
- Similarly, Etcheverry and Vaisman [<cite><a href="#ref-QB4OLAP">QB4OLAP</a></cite>]
+ <p>Similarly, Etcheverry and Vaisman [<cite><a href="#ref-QB4OLAP">QB4OLAP</a></cite>]
present the use case to publish household data from <a
href="http://statswales.wales.gov.uk/index.htm">StatsWales</a> and <a
href="http://opendatacommunities.org/doc/dataset/housing/household-projections">Open
@@ -574,7 +566,7 @@
engines to automatically derive statistics on higher aggregation
levels.</li>
<li>Vice versa, representing further aggregated datasets would
- allow to answer queries with a simple lookup instead of computations
+ allow the answering of queries with a simple lookup instead of computations
which may be more time consuming or require specific features of the
query engine (e.g., SPARQL 1.1).</li>
</ul>
@@ -587,15 +579,15 @@
functions. Again, this use case does not simply need a selection (or
"dice" in OLAP context) where one fixes the time period dimension.
This supports lesson <a
- href="#Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">Publishers
+ href="#aggregations">Publishers
may need guidance in how to represent common analytical operations
such as Slice, Dice, Rollup on data cubes</a>
</li>
- <li>Literals that are used in observations, cannot be used as
- subjects in triples. So, no hierarchies can be defined that would for
- example link integer years via skos:narrower to months. This supports
+ <li>Literals that are used in observations cannot be used as
+ subjects in triples. So no hierarchies can be defined that would, for
+ example, link integer years via skos:narrower to months. This supports
lesson <a
- href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Publishers
+ href="#heirarchic">Publishers
may need more guidance to decide which representation of hierarchies
is most suitable for their use case</a>.
</li>
@@ -623,16 +615,16 @@
designated as bathing waters where people routinely enter the water.
The Environment Agency monitors and reports on the quality of the
water at these bathing waters.</p>
- <p>The Environement Agency's data can be thought of as structured
+ <p>The Environment Agency's data can be thought of as structured
in 3 groups:</p>
<ul>
- <li>There is basic reference data describing the bathing waters
- and sampling points</li>
- <li>There is a data set "Annual Compliance Assessment Dataset"
+ <li>basic reference data describing the bathing waters
+ and sampling points;</li>
+ <li>"Annual Compliance Assessment Dataset"
giving the rating for each bathing water for each year it has been
- monitored</li>
- <li>There is a data set "In-Season Sample Assessment Dataset"
- giving the detailed weekly sampling results for each bathing water</li>
+ monitored;</li>
+ <li>"In-Season Sample Assessment Dataset"
+ giving the detailed weekly sampling results for each bathing water.</li>
</ul>
<p>The most important dimensions of the data are bathing water,
sampling point, and compliance classification.</p>
@@ -640,14 +632,14 @@
<h4>Benefits</h4>
<ul>
<li>The bathing-water dataset (documentation) is structured
- around the use of the data cube vocabulary and fronted by a linked
+ around the use of the Data Cube Vocabulary and fronted by a linked
data API configuration which makes the data available for re-use in
additional formats such as JSON and CSV.</li>
<li>Publishing bathing-water quality information in this way will
1) enable the Environment Agency to meet the needs of its many data
- consumers in a uniform way rather than through diverse pairwise
+ consumers in a uniform way rather than through diverse pair-wise
arrangements 2) preempt requests for specific data and 3) enable a
- larger community of web and mobile application developers and
+ larger community of Web and mobile application developers and
value-added information aggregators to use and re-use bathing-water
quality information sourced by the environment agency.</li>
</ul>
@@ -658,7 +650,7 @@
whether there was an abnormal weather exception.</li>
<li>Relevant slices of both datasets are to be created, which
supports lesson <a
- href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Publishers
+ href="#clarify">Publishers
may need more guidance in creating and managing slices or arbitrary
groups of observations</a>:
<ul>
@@ -675,13 +667,13 @@
</ul>
</li>
<li>In this use case, observation and measurement data is to be
- published which per se is not aggregated statistics. The <a
+ published which <i>per se</i> is not aggregated statistics. The <a
href="http://purl.oclc.org/NET/ssnx/ssn">Semantic Sensor Network
ontology</a> (SSN) already provides a way to publish sensor information.
SSN data provides statistical Linked Data and grounds its data to the
domain, e.g., sensors that collect observations (e.g., sensors
measuring average of temperature over location and time). Still, this
- case study has shown that the data cube vocabulary may be a useful
+ case study has shown that the Data Cube Vocabulary may be a useful
alternative and can be successfully used for observation and
measurement data, as well as statistical data.
</li>
@@ -727,9 +719,9 @@
ISO19156 <em>"Geographic information — Observations and
measurements"</em> (O&M) is regarded as important. Thus, this supports
lesson <a
- href="#VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Modelers
+ href="#relToSO19156">Modelers
using ISO19156 - Observations & Measurements may need
- clarification regarding the relationship to the data cube vocabulary</a>.
+ clarification regarding the relationship to the Data Cube Vocabulary</a>.
</p>
<b>Solution in this case study:</b>
<p>O&M provides a data model for an Observation with associated
@@ -773,9 +765,9 @@
are thus a key consideration and the apparent verbosity of RDF in
general, and Data Cube specifically, was a concern. This supports
lesson <a
- href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">
+ href="#mechRec">
Publishers and consumers may need more guidance in efficiently
- processing data using the data cube vocabulary</a>.
+ processing data using the Data Cube Vocabulary</a>.
</p>
<b>Solution in this case study:</b>
<p>Regarding bandwidth costs then the key is not raw data volume
@@ -794,7 +786,7 @@
Linked Data Wrapper</a> and <a
href="http://eurostat.linked-statistics.org/">Linked Statistics
Eurostat Data</a>, both deployments for publishing Eurostat SDMX as
- Linked Data using the draft version of the data cube vocabulary)
+ Linked Data using the draft version of the Data Cube Vocabulary)
</span>
</p>
@@ -813,13 +805,13 @@
As one of the main adopters of SDMX, <a
href="http://epp.eurostat.ec.europa.eu/">Eurostat</a> publishes large
amounts of European statistics coming from a data warehouse as SDMX
- and other formats on the web. Eurostat also provides an interface to
+ and other formats on the Web. Eurostat also provides an interface to
browse and explore the datasets. However, linking such
multidimensional data to related data sets and concepts would require
- downloading of interesting datasets and manual integration.The goal
+ downloading of interesting datasets and manual integration. The goal
here is to improve integration with other datasets; Eurostat data
- should be published on the web in a machine-readable format, possible
- to be linked with other datasets, and possible to be freely consumed
+ should be published on the Web in a machine-readable format, possibly
+ to be linked with other datasets, and possibly to be freely consumed
by applications. Both <a href="http://estatwrap.ontologycentral.com/">Eurostat
Linked Data Wrapper</a> and <a
href="http://eurostat.linked-statistics.org/">Linked Statistics
@@ -831,7 +823,7 @@
href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">TOC
of published datasets</a> as well as a feed of modified and new datasets.
- Eurostat provides a list of used codelists, i.e., <a
+ Eurostat provides a list of used code lists, i.e., <a
href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&dir=dic">range
of permitted dimension values</a>. Any Eurostat dataset contains a
varying set of dimensions (e.g., date, geo, obs_status, sex, unit) as
@@ -852,10 +844,10 @@
<li>Allows useful queries to the data, e.g., comparison of
statistical indicators across EU countries.</li>
- <li>Allows to attach contextual information to statistics during
+ <li>Allows one to attach contextual information to statistics during
the interpretation process.</li>
- <li>Allows to reuse single observations from the data.</li>
+ <li>Allows one to reuse single observations from the data.</li>
<li>Linking to information from other data sources, e.g., for
geo-spatial dimension.</li>
@@ -874,28 +866,28 @@
when converted into RDF require more than 350GB of disk space
yielding a dataspace with some 8 billion triples. This supports
lesson <a
- href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">
+ href="#mechRec">
Publishers and consumers may need more guidance in efficiently
- processing data using the data cube vocabulary.</a>
+ processing data using the Data Cube Vocabulary.</a>
</li>
<li>In the Eurostat Linked Data Wrapper, there is a timeout for
transforming SDMX to Linked Data, since Google App Engine is used.
Mechanisms to reduce the amount of data that needs to be translated
would be needed, again supporting lesson <a
- href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">
+ href="#mechRec">
Publishers and consumers may need more guidance in efficiently
- processing data using the data cube vocabulary.</a>
+ processing data using the Data Cube Vocabulary.</a>
</li>
- <li>Provide a useful interface for browsing and visualising the
+ <li>Provide a useful interface for browsing and visualizing the
data. One problem is that the data sets have too high dimensionality
- to be displayed directly. Instead, one could visualise slices of time
+ to be displayed directly. Instead, one could visualize slices of time
series data. However, for that, one would need to either fix most
other dimensions (e.g., sex) or aggregate over them (e.g., via
average). The selection of useful slices from the large number of
possible slices is a challenge. This supports lesson <a
- href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">
+ href="#clarify">
Publishers may need more guidance in creating and managing slices or
arbitrary groups of observations</a>.
</li>
@@ -905,9 +897,9 @@
<li>The Eurostat SDMX as Linked Data use case provides data on a
gender level and on a level aggregating over the gender level. This
- suggests to have time lines on data aggregating over the gender
- dimension, supporting lesson <a
- href="#Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">
+ suggests a need to have time lines on data aggregating over the gender
+ dimension, supporting the lesson: <a
+ href="#aggregations">
Publishers may need guidance in how to represent common analytical
operations such as Slice, Dice, Rollup on data cubes</a>.
</li>
@@ -915,12 +907,12 @@
<li>Updates to the data
<ul>
- <li>Eurostat - Linked Data pulls in changes from the original
+ <li>Eurostat Linked Data pulls in changes from the original
Eurostat dataset on a weekly basis and the conversion process runs
every Saturday at noon taking into account new datasets along with
updates to existing datasets.</li>
- <li>Eurostat Linked Data Wrapper on-the-fly translates Eurostat
- datasets into RDF so that always the most current data is used. The
+ <li>Eurostat Linked Data Wrapper translates Eurostat
+ datasets into RDF on the fly so that the most current data is always used. The
problem is only to point users towards the URIs of Eurostat
datasets: Estatwrap provides a feed of modified and new <a
href="http://estatwrap.ontologycentral.com/feed.rdf">datasets</a>.
@@ -938,23 +930,23 @@
<li>Query interface</li>
<ul>
- <li>Eurostat - Linked Data provides SPARQL endpoint for the
+ <li>Eurostat Linked Data provides a SPARQL endpoint for the
metadata (not the observations).</li>
<li>Eurostat Linked Data Wrapper provides resolvable URIs to
datasets (ds) that return all observations of the dataset. Also,
every dataset serves the URI of its data structure definition (dsd).
The dsd URI returns all RDF describing the dataset. Separating
information resources for dataset and data structure definition
- allows for example to first gather the dsd and only for actual query
- execution resolve the ds.</li>
+ allows one, for example, to first gather the dsd and only for actual query
+ execution to resolve the ds.</li>
</ul>
- <li>Browsing and visualising interface:
+ <li>Browsing and visualizing interface:
<ul>
<li>Eurostat Linked Data Wrapper provides for each dataset an
- HTML page showing a JavaScript-based visualisation of the data.
+ HTML page showing a JavaScript-based visualization of the data.
This also supports lesson <a
- href="#Consumersmayneedguidanceinconversionsintoformats">
+ href="#consumers">
Consumers may need guidance in conversions into formats that can
easily be displayed and further investigated in tools such as
Google Data Explorer, R, Weka etc.</a>
@@ -964,17 +956,19 @@
</li>
<li>One possible application would run validation checks over
- Eurostat data. However, the data cube vocabulary is to publish
+ Eurostat data. However, the Data Cube Vocabulary is designed to publish
statistical data as-is and is not intended to represent information
for validation (similar to business rules).</li>
<li>An application could try to automatically match elements of
the geo-spatial dimension to elements of other data sources, e.g.,
NUTS, GADM. In Eurostat Linked Data wrapper this is done by simple
URI guessing from external data sources. Automatic linking datasets
- or linking datasets with metadata is not part of data cube
- vocabulary.</li>
- <li>The draft version of the data cube vocabulary builds upon SDMX Standards Version 2.0. A newer version of SDMX, SDMX Standards, Version 2.1, is available which might be used by Eurostat in the future which supports lesson <a
- href="#there-is-a-putative-requirement-to-update-to-sdmx-2.1-if-there-are-specific-use-cases-that-demand-it">
+ or linking datasets with metadata is not part of Data Cube
+ Vocabulary.</li>
+ <li>The draft version of the Data Cube Vocabulary builds upon SDMX Standards Version 2.0.
+A newer version of SDMX, SDMX Standards, Version 2.1, is available which might be used by
+Eurostat in the future which supports lesson <a
+ href="#putative">
There is a putative requirement to update to SDMX 2.1 if there are specific use cases that demand it</a></li>
</ul>
@@ -993,16 +987,16 @@
<p>The goal of this use case is to describe provenance,
transformations, and versioning around statistical data, so that the
- history of statistics published on the web becomes clear. This may
+ history of statistics published on the Web becomes clear. This may
also relate to the issue of having relationships between datasets
published.</p>
<p>
A concrete example is given by Freitas et al. [<cite><a
href="#ref-COGS">COGS</a></cite>], where transformations on financial
- datasets, e.g., addition of derived measures, conversion of units,
+ datasets, e.g., the addition of derived measures, conversion of units,
aggregations, OLAP operations, and enrichment of statistical data are
- executed on statistical data before showing them in a web-based
+ executed on statistical data before showing them in a Web-based
report.
</p>
@@ -1014,20 +1008,20 @@
<h4>Benefits</h4>
<p>Making transparent the transformation a dataset has been exposed
- to and thereby increasing trust in the data.</p>
+ to increases trust in the data.</p>
<h4>Challenges</h4>
<ul>
<li>Operations on statistical data result in new statistical
- data, depending on the operation. For instance, in terms of Data
- Cube, operations such as slice, dice, roll-up, drill-down will result
- in new Data Cubes. This may require representing general
+ data, depending on the operation. For instance, in terms of the Data
+ Cube Vocabulary, operations such as slice, dice, roll-up, drill-down will result
+ in new data cubes. This may require representing general
relationships between cubes (as discussed in the <a
href="http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/75762788de10de95">publishing-statistical-data
mailing list</a>).
</li>
- <li>Should Data Cube support explicit declaration of such
+ <li>Should the Data Cube Vocabulary support explicit declaration of such
relationships either between separated qb:DataSets or between
measures with a single <code>qb:DataSet</code> (e.g., <code>ex:populationCount</code>
and <code>ex:populationPercent</code>)?
@@ -1036,17 +1030,17 @@
like DENOM or allow expression of arbitrary mathematical relations?</li>
<li>This use case opens up questions regarding versioning of
- statistical Linked Data. Thus, there is a possible relation to <a
+ statistical Linked Data. Thus, there is a possible relation to the <a
href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary#Versioning">Versioning</a>
part of GLD Best Practices Document, where it is specified how to
publish data which has multiple versions.
</li>
<li>In this use case, the <a
href="http://sites.google.com/site/cogsvocab/">COGS</a> vocabulary [<cite><a
- href="#ref-COGS">COGS</a></cite>] has shown to complement the data cube
- vocabulary w.r.t. representing ETL pipelines processing statistics.
+ href="#ref-COGS">COGS</a></cite>] has shown to complement the Data Cube
+ Vocabulary with respect to representing ETL pipelines processing statistics.
This supports lesson <a
- href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">
+ href="#declaringRel">
Publishers may need guidance in making transparent the
pre-processing of aggregate statistics</a>.
</li>
@@ -1054,7 +1048,7 @@
</section> <section>
<h3 id="Simplechartvisualisationsofpublishedstatisticaldata">Consumer
- Case Study: Simple chart visualisations of (integrated) published
+ Case Study: Simple chart visualizations of (integrated) published
climate sensor data</h3>
<p>
<span style="font-size: 10pt">(Use case taken from <a
@@ -1068,7 +1062,7 @@
visualization on top of these formats using Excel, Tableau,
RapidMiner, Rattle, Weka etc.</p>
<p>This use case shall demonstrate how statistical data published
- on the web can be with few effort visualized inside a webpage, without
+ on the Web can be visualized inside a webpage with little effort and without
using commercial or highly-complex tools.</p>
<p>
An example scenario is environmental research done within the <a
@@ -1077,8 +1071,8 @@
climate in the Lower Jordan Valley) shall be visualized for scientists
and decision makers. Statistics should also be possible to be
integrated and displayed together. The data is available as XML files
- on the web which are re-published as Linked Data using the data cube
- vocabulary. On a separate website, specific parts of the data shall be
+ on the Web which are re-published as Linked Data using the Data Cube
+ Vocabulary. On a separate website, specific parts of the data shall be
queried and visualized in simple charts, e.g., line diagrams.
</p>
@@ -1102,27 +1096,27 @@
src="./figures/pivot_analysis_measurements.PNG"></img>
</p>
<h4>Benefits</h4>
- <p>Easy, flexible and powerful visualisations of published
+ <p>Easy, flexible and powerful visualizations of published
statistical data.</p>
<h4>Challenges</h4>
<ul>
<li>The difficulties lay in structuring the data appropriately so
- that the specific information can be queried. This supports lesson <a
- href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">
+ that the specific information can be queried. This supports lesson: <a
+ href="#criteriaForWell">
Publishers and consumers may need guidance in checking and making
use of well-formedness of published data using data cube</a>.
</li>
- <li>Also, data shall be published with having potential
+ <li>Also, data shall be published with potential
integration in mind. Therefore, e.g., units of measurements need to
be represented.</li>
<li>Integration becomes much more difficult if publishers use
- different measures, dimensions.</li>
+ different measures/dimensions.</li>
</ul>
</section> <section>
- <h3 id="VisualisingpublishedstatisticaldatainGooglePublicDataExplorer">Consumer
- Use Case: Visualising published statistical data in Google Public Data
+ <h3 id="consumer-use-case-visualising-published-statistical-data-in-google-public-data-explorer">Consumer
+ Use Case: Visualizing published statistical data in Google Public Data
Explorer</h3>
<p>
<span style="font-size: 10pt">(Use case taken from <a
@@ -1143,14 +1137,14 @@
that shall be visualized and explored.
</p>
<p>In this use case, the goal is to take statistical data published
- as Linked Data re-using the data cube vocabulary and to transform it
+ as Linked Data re-using the Data Cube Vocabulary and to transform it
into DSPL for visualization and exploration using GPDE with as few
effort as possible.</p>
<p>For instance, Eurostat data about Unemployment rate downloaded
- from the web as shown in the following figure:</p>
+ from the Web as shown in the following figure:</p>
<p class="caption">Figure 3: An interactive chart in GPDE for
- visualising Eurostat data described with DSPL</p>
+ visualizing Eurostat data described with DSPL</p>
<p align="center">
<img
alt="An interactive chart in GPDE for visualising Eurostat data in the DSPL"
@@ -1165,29 +1159,28 @@
<h4>Benefits</h4>
<ul>
- <li>Easy to visualise statistics published using the data cube
- vocabulary.</li>
+ <li>Easy to visualize statistics published using the Data Cube Vocabulary.</li>
<li>There could be a process of first transforming data into RDF
for further preprocessing and integration and then of loading it into
- GPDE for visualisation.</li>
+ GPDE for visualization.</li>
<li>Linked Data could provide the way to automatically load data
- from a data source whereas GPDE is only for visualisation.</li>
+ from a data source whereas GPDE is only for visualization.</li>
</ul>
<h4>Challenges</h4>
<ul>
<li>The technical challenges for the consumer here lay in knowing
where to download what data and how to get it transformed into DSPL
without knowing the data. This supports lesson <a
- href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">
+ href="#criteriaForWell">
Publishers and consumers may need guidance in checking and making
use of well-formedness of published data using data cube</a>.
</li>
- <li>Define a mapping between data cube and DSPL. DSPL is
- representative for using statistical data published on the web in
+ <li>Define a mapping between Data Cube and DSPL. DSPL is
+ representative for using statistical data published on the Web in
available tools for analysis. Similar tools that may additionally be
covered are: Weka (arff data format), Tableau, SPSS, STATA, PC-Axis
etc. This supports lesson <a
- href="#Consumersmayneedguidanceinconversionsintoformats">
+ href="#consumers">
Consumers may need guidance in conversions into formats that can
easily be displayed and further investigated in tools such as Google
Data Explorer, R, Weka etc.</a>.
@@ -1196,7 +1189,7 @@
</section> <section>
<h3 id="AnalysingpublishedstatisticaldatawithcommonOLAPsystems">Consumer
- Case Study: Analysing published financial (XBRL) data from the SEC
+ Case Study: Analyzing published financial (XBRL) data from the SEC
with common OLAP systems</h3>
<p>
<span style="font-size: 10pt">(Use case taken from <a
@@ -1216,7 +1209,7 @@
<p>OLAP systems that first use ETL pipelines to
Extract-Load-Transform relevant data for efficient storage and queries
in a data warehouse and then allows interfaces to issue OLAP queries
- on the data are commonly used in industry to analyse statistical data
+ on the data are commonly used in industry to analyze statistical data
on a regular basis.</p>
<p>
@@ -1226,7 +1219,7 @@
</p>
<p>For that a multidimensional model of the data needs to be
- generated. A multidimensional model consists of facts summarised in
+ generated. A multidimensional model consists of facts summarized in
data cubes. Facts exhibit measures depending on members of dimensions.
Members of dimensions can be further structured along hierarchies of
levels.</p>
@@ -1234,15 +1227,15 @@
<p>
An example scenario of this use case is the Financial Information
Observation System (FIOS) [<cite><a href="#ref-FIOS">FIOS</a></cite>],
- where XBRL data provided by the SEC on the web is re-published as
+ where XBRL data provided by the SEC on the Web is re-published as
Linked Data and made possible to explore and analyse by stakeholders
in a web-based OLAP client Saiku.
</p>
<p>The following figure shows an example of using FIOS. Here, for
- three different companies, cost of goods sold as disclosed in XBRL
- documents are analysed. As cell values either the number of
- disclosures or - if only one available - the actual number in USD is
+ three different companies, the cost of goods sold as disclosed in XBRL
+ documents are analyzed. As cell values either the number of
+ disclosures or — if only one available — the actual number in USD is
given:</p>
@@ -1256,7 +1249,7 @@
<h4>Benefits</h4>
<ul>
- <li>Data Cube model well-known to many people in industry.</li>
+ <li>Data cube model well-known to many people in industry.</li>
<li>OLAP operations cover typical business requirements, e.g.,
slice, dice, drill-down and can be issued via intuitive, interactive,
explorative, fast OLAP frontends.</li>
@@ -1265,17 +1258,17 @@
<h4>Challenges</h4>
<ul>
- <li>Define a mapping between XBRL and the data cube vocabulary.
+ <li>Define a mapping between XBRL and the Data Cube Vocabulary.
XBRL is representative for other common representation formats for
statistics such as CSV, Excel, ARFF, which supports lesson <a
- href="#publishers-may-need-guidance-in-conversions-from-common-statistical-representations-such-as-csv-excel-arff-etc.">Publishers
+ href="#excelCSV">Publishers
may need guidance in conversions from common statistical
representations such as CSV, Excel, ARFF etc.</a>
</li>
<li>ETL pipeline needs to automatically populate a data
warehouse. Common OLAP systems use relational databases with a star
schema. This supports lesson <a
- href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">
+ href="#criteriaForWell">
Publishers and consumers may need guidance in checking and making
use of well-formedness of published data using data cube</a>.
</li>
@@ -1283,21 +1276,21 @@
the structure of data (metadata queries), and queries for actual
aggregated values (OLAP operations).</li>
<li>Define a mapping between OLAP operations and operations on
- data using the data cube vocabulary. This supports lesson <a
- href="#Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">
+ data using the Data Cube Vocabulary. This supports lesson <a
+ href="#aggregations">
Publishers may need guidance in how to represent common analytical
operations such as Slice, Dice, Rollup on data cubes</a>.
</li>
- <li>Another problem lies in defining Data Cubes without greater
+ <li>Another problem lies in defining data cubes without greater
insight in the data beforehand. Thus, OLAP systems have to cater for
possibly missing information (e.g., the aggregation function or a
human readable label).</li>
<li>Depending on the expressivity of the OLAP queries (e.g.,
aggregation functions, hierarchies, ordering), performance plays an
important role. This supports lesson <a
- href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">
+ href="#mechRec">
Publishers and consumers may need more guidance in efficiently
- processing data using the data cube vocabulary</a>.
+ processing data using the Data Cube Vocabulary</a>.
</li>
</ul>
@@ -1333,13 +1326,13 @@
A concrete use case is the structured collection of <a
href="http://wiki.planet-data.eu/web/Datasets">RDF Data Cube
Vocabulary datasets</a> in the PlanetData Wiki. This list is supposed to
- describe statistical datasets on a higher level - for easy discovery
- and selection - and to provide a useful overview of RDF Data Cube
+ describe statistical datasets on a higher level — for easy discovery
+ and selection — and to provide a useful overview of RDF Data Cube
deployments in the Linked Data cloud.
</p>
<h4>Benefits</h4>
<ul>
- <li>Datasets may automatically be discovered by web or data
+ <li>Datasets may automatically be discovered by Web or data
crawlers.</li>
<li>Potential consumers will be pointed to published statistics
in search engines if searching for related information.</li>
@@ -1351,22 +1344,22 @@
<h4>Challenges</h4>
<ul>
- <li>Define mapping between DCAT and data cube vocabulary. The <a
+ <li>Define mapping between DCAT and Data Cube Vocabulary. The <a
href="http://www.w3.org/TR/vocab-dcat/">Data Catalog vocabulary</a>
(DCAT) is strongly related to this use case since it may complement
the standard vocabulary for representing statistics in the case of
registering data in a data catalog. This supports lesson <a
- href="#Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">Publishers
+ href="#mechRec">Publishers
may need guidance in communicating the availability of published
statistical data to external parties and to allow automatic
discovery of statistical data</a>
</li>
- <li>Define mapping between data cube vocabulary and data catalog
+ <li>Define mapping between the Data Cube Vocabulary and data catalog
descriptions. If data catalogs contain statistics, they do not expose
those using Linked Data but for instance using CSV, HTML (e.g.,
Pangea) or XML (e.g., DDI - Data Documentation Initiative).
Therefore, it could also be a use case to publish such data using the
- data cube vocabulary.</li>
+ Data Cube Vocabulary.</li>
</ul>
</section> </section>
@@ -1379,7 +1372,7 @@
well as associated tools or services complementing the vocabulary.</p>
<section>
- <h3 id="VocabularyshouldbuildupontheSDMXinformationmodel">There is
+ <h3 id="putative">There is
a putative requirement to update to SDMX 2.1 if there are specific use
cases that demand it</h3>
<p>
@@ -1401,7 +1394,7 @@
Case Study: Eurostat SDMX as Linked Data</a></li>
</ul>
</section> <section>
- <h3 id="Vocabularyshouldclarifytheuseofsubsetsofobservations">Publishers
+ <h3 id="clarify">Publishers
may need more guidance in creating and managing slices or arbitrary
groups of observations</h3>
<p>There should be a consensus on the issue of flattening or
@@ -1437,7 +1430,7 @@
</ul>
</section> <section>
<h3
- id="Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Publishers
+ id="heirarchic">Publishers
may need more guidance to decide which representation of hierarchies
is most suitable for their use case</h3>
<p>
@@ -1506,11 +1499,11 @@
</ul>
</section> <section>
<h3
- id="VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Modelers
+ id="relToSO19156">Modelers
using ISO19156 - Observations & Measurements may need clarification
- regarding the relationship to the data cube vocabulary</h3>
+ regarding the relationship to the Data Cube Vocabulary</h3>
<p>An number of organizations, particularly in the Climate and
- Meteorological area already have some commitment to the OGC
+ Meteorological area, already have some commitment to the OGC
"Observations and Measurements" (O&M) logical data model, also
published as ISO 19156. Are there any statements about compatibility
and interoperability between O&M and Data Cube that can be made to
@@ -1536,7 +1529,7 @@
</ul>
</section> <section>
<h3
- id="Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">Publishers
+ id="aggregations">Publishers
may need guidance in how to represent common analytical operations
such as Slice, Dice, Rollup on data cubes</h3>
@@ -1554,11 +1547,11 @@
Case Study: Eurostat SDMX as Linked Data</a></li>
<li><a
href="#consumer-case-study-analysing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">Consumer
- Case Study: Analysing published financial (XBRL) data from the SEC
+ Case Study: Analyzing published financial (XBRL) data from the SEC
with common OLAP systems</a></li>
</ul>
</section> <section>
- <h3 id="Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">Publishers
+ <h3 id="declaringRel">Publishers
may need guidance in making transparent the pre-processing of
aggregate statistics</h3>
<p>Background information:</p>
@@ -1586,7 +1579,7 @@
</ul>
</section> <section>
<h3
- id="Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">Publishers
+ id="criteriaForWell">Publishers
and consumers may need guidance in checking and making use of
well-formedness of published data using data cube</h3>
@@ -1602,7 +1595,7 @@
Information System (COINS)</a></li>
<li><a
href="#consumer-case-study-simple-chart-visualisations-of-integrated-published-climate-sensor-data">Consumer
- Case Study: Simple chart visualisations of (integrated) published
+ Case Study: Simple chart visualizations of (integrated) published
climate sensor data</a></li>
<li><a
href="#consumer-use-case-visualising-published-statistical-data-in-google-public-data-explorer">Consumer
@@ -1610,12 +1603,12 @@
Data Explorer</a></li>
<li><a
href="#consumer-case-study-analysing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">Consumer
- Case Study: Analysing published financial (XBRL) data from the SEC
+ Case Study: Analyzing published financial (XBRL) data from the SEC
with common OLAP systems</a></li>
</ul>
</section> <section>
<h3
- id="Publishersmayneedguidanceinconversionsfromcommonstatisticalrepresentations">Publishers
+ id="excelCSV">Publishers
may need guidance in conversions from common statistical
representations such as CSV, Excel, ARFF etc.</h3>
@@ -1631,11 +1624,11 @@
census data as Linked Data</a></li>
<li><a
href="#consumer-case-study-analysing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">Consumer
- Case Study: Analysing published financial (XBRL) data from the SEC
+ Case Study: Analyzing published financial (XBRL) data from the SEC
with common OLAP systems</a></li>
</ul>
</section> <section>
- <h3 id="Consumersmayneedguidanceinconversionsintoformats">Consumers
+ <h3 id="consumers">Consumers
may need guidance in conversions into formats that can easily be
displayed and further investigated in tools such as Google Data
Explorer, R, Weka etc.</h3>
@@ -1655,9 +1648,9 @@
</ul>
</section> <section>
<h3
- id="Thereshouldbemechanismsandrecommendationsregardingpublicationandconsumptionoflargeamountsofstatisticaldata">Publishers
+ id="mechRec">Publishers
and consumers may need more guidance in efficiently processing data
- using the data cube vocabulary</h3>
+ using the Data Cube Vocabulary</h3>
<p>Background information:</p>
<ul>
<li>Related issue regarding abbreviations <a
@@ -1682,12 +1675,12 @@
Case Study: Eurostat SDMX as Linked Data</a></li>
<li><a
href="#consumer-case-study-analysing-published-financial-xbrl-data-from-the-sec-with-common-olap-systems">Consumer
- Case Study: Analysing published financial (XBRL) data from the SEC
+ Case Study: Analyzing published financial (XBRL) data from the SEC
with common OLAP systems</a></li>
</ul>
</section> <section>
<h3
- id="Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">Publishers
+ id="pubGuidance">Publishers
may need guidance in communicating the availability of published
statistical data to external parties and to allow automatic discovery
of statistical data</h3>