--- a/data-cube-ucr/index.html Sun May 26 12:32:42 2013 +0100
+++ b/data-cube-ucr/index.html Mon May 27 10:18:53 2013 +0200
@@ -19,7 +19,7 @@
<section id="abstract">
<p>Many national, regional and local governments, as well as other
- organisations in- and outside of the public sector, collect numeric
+ organizations in- and outside of the public sector, collect numeric
data and aggregate this data into statistics. There is a need to
publish these statistics in a standardised, machine-readable way on
the web, so that they can be freely integrated and reused in consuming
@@ -51,7 +51,7 @@
The aim of this document is to present concrete use cases and
requirements for a vocabulary to publish statistics as Linked Data. An
earlier version of the data cube vocabulary [<cite><a
- href="#ref-QB-2010">QB-2010</a></cite>] has been existing for some time and
+ href="#ref-QB-2010">QB-2010</a></cite>] has existed for some time and
has proven applicable in <a
href="http://wiki.planet-data.eu/web/Datasets">several deployments</a>.
The <a href="http://www.w3.org/2011/gld/">W3C Government Linked
@@ -62,35 +62,33 @@
to document and illustrate design decisions that have driven the work.
<p>The rest of this document is structured as follows. We will
- first give a short introduction of the specificities of modelling
- statistics. Then, we will describe use cases that have been derived
- from existing deployments or feedback to the earlier data cube
- vocabulary version. In particular, we describe possible benefits and
+ first give a short introduction to modelling statistics. Then, we will describe use cases that have been derived
+ from existing deployments or feedback to the earlier version of the data cube vocabulary. In particular, we describe possible benefits and
challenges of use cases. Afterwards, we will describe concrete
requirements that were derived from those use cases and that have been
taken into account for the specification.</p>
- <p>We use the name data cube vocabulary throughout the document
+ <p>We use the term "data cube vocabulary" throughout the document
when referring to the vocabulary.</p>
<section>
<h3 id="describingstatistics">Describing statistics</h3>
- <p>In the following, we describe the challenge of an RDF vocabulary
+ <p>In the following, we describe the the challenge of authoring an RDF vocabulary
for publishing statistics as Linked Data.</p>
<p>Describing statistics - collected and aggregated numeric data -
is challenging for the following reasons:</p>
<ul>
- <li>Representing statistics requires more complex modeling as
+ <li>Representing statistics requires more complex modelling as
discussed by Martin Fowler [<cite><a href="#ref-FOWLER97">FOWLER97</a></cite>]:
Recording a statistic simply as an attribute to an object (e.g., the
- fact that a person weighs 185 pounds) fails with representing
+ fact that a person weighs 185 pounds) fails to represent
important concepts such as quantity, measurement, and unit. Instead,
a statistic is modeled as a distinguishable object, an observation.
</li>
<li>The object describes an observation of a value, e.g., a
numeric value (e.g., 185) in case of a measurement or a categorical
value (e.g., "blood group A") in case of a categorical observation.</li>
- <li>To allow correct interpretation of the value, the object can
+ <li>To allow for correct interpretation of the value, the object can
be further described by "dimensions", e.g., the specific phenomenon
"weight" observed and the unit "pounds". Given background
information, e.g., arithmetical and comparative operations, humans
@@ -101,11 +99,10 @@
"Person" that collected the observation and the "Time" the
observation was collected.</li>
</ul>
- The following figure illustrates this specificitiy of modelling in a
+ The following figure illustrates these details in a
class diagram:
- <p class="caption">Figure: Illustration of specificities in
- modelling of a statistic</p>
+ <p class="caption">Figure: Modelling a statistic</p>
<p align="center">
<img alt="specificity of modelling a
@@ -116,15 +113,14 @@
<p>
The Statistical Data and Metadata eXchange [<cite><a
href="#ref-SDMX">SDMX</a></cite>] - the ISO standard for exchanging and
- sharing of statistical data and metadata among organisations - uses
- "multidimensional model" that caters for the specificity of modelling
- statistics. It allows to describe statistics as observations.
+ sharing statistical data and metadata among organizations -
+ uses a "multidimensional model" to meet the above challenges in modelling statistics. It can describe statistics as observations.
Observations exhibit values (Measures) that depend on dimensions
(Members of Dimensions).
</p>
<p>Since the SDMX standard has proven applicable in many contexts,
the vocabulary adopts the multidimensional model that underlies SDMX
- and will be compatible to SDMX.</p>
+ and will be compatible with SDMX.</p>
</section> </section>
@@ -133,7 +129,7 @@
<p>
<dfn>Statistics</dfn>
is the <a href="http://en.wikipedia.org/wiki/Statistics">study</a> of
- the collection, organisation, analysis, and interpretation of data.
+ the collection, organization, analysis, and interpretation of data.
Statistics comprise statistical data.
</p>
@@ -143,28 +139,27 @@
<dfn>statistical data</dfn>
is a multidimensional table (also called a data cube) [<cite><a
href="#ref-SDMX">SDMX</a></cite>], i.e., a set of observed values organized
- along a group of dimensions, together with associated metadata. If
- aggregated we refer to statistical data as "macro-data" whereas if
- not, we refer to "micro-data".
+ along a group of dimensions, together with associated metadata. We refer to aggregated statistical
+ data as "macro-data" and unaggregated statistical data as "micro-data".
</p>
<p>
Statistical data can be collected in a
<dfn>dataset</dfn>
- , typically published and maintained by an organisation [<cite><a
+ , typically published and maintained by an organization [<cite><a
href="#ref-SDMX">SDMX</a></cite>]. The dataset contains metadata, e.g.,
about the time of collection and publication or about the maintaining
- and publishing organisation.
+ and publishing organization.
</p>
<p>
<dfn>Source data</dfn>
- is data from datastores such as RDBs or spreadsheets that acts as a
+ is data from datastores such as relational databases or spreadsheets that acts as a
source for the Linked Data publishing process.
</p>
<p>
<dfn>Metadata</dfn>
- about statistics defines the data structure and give contextual
+ about statistics defines the data structure and gives contextual
information about the statistics.
</p>
@@ -178,7 +173,7 @@
<p>
A
<dfn>publisher</dfn>
- is a person or organisation that exposes source data as Linked Data on
+ is a person or organization that exposes source data as Linked Data on
the Web.
</p>
@@ -190,7 +185,7 @@
<p>
A
<dfn>registry</dfn>
- collects metadata about statistical data in a registration fashion.
+ allows a publisher to announce that data or metadata exists and to add information about how to obtain that data. [<cite><a href="#ref-SDMX-21">SDMX 2.1</a></cite>]
</p>
</section>
@@ -213,7 +208,7 @@
<p>Since we have adopted the multidimensional model that underlies
SDMX, we also adopt the "Web Dissemination Use Case" which is the
prime use case for SDMX since it is an increasing popular use of SDMX
- and enables organisations to build a self-updating dissemination
+ and enables organizations to build a self-updating dissemination
system.</p>
<p>The Web Dissemination Use Case contains three actors, a
structural metadata web service (registry) that collects metadata
@@ -283,7 +278,7 @@
href="#ref-COINS">COINS</a></cite>])
</span>
</p>
- <p>More and more organisations want to publish statistics on the
+ <p>More and more organizations want to publish statistics on the
web, for reasons such as increasing transparency and trust. Although
in the ideal case, published data can be understood by both humans and
machines, data often is simply published as CSV, PDF, XSL etc.,
@@ -627,7 +622,7 @@
observations (e.g., sensors measuring average of temperature over
location and time).
</li>
- <li>A number of organisations, particularly in the Climate and
+ <li>A number of organizations, particularly in the Climate and
Meteorological area already have some commitment to the OGC
"Observations and Measurements" (O&M) logical data model, also
published as ISO 19156.</li>
@@ -659,7 +654,7 @@
<p>
As mentioned already, the ISO standard for exchanging and sharing
- statistical data and metadata among organisations is Statistical Data
+ statistical data and metadata among organizations is Statistical Data
and Metadata eXchange [<cite><a href="#ref-SDMX">SDMX</a></cite>].
Since this standard has proven applicable in many contexts, we adopt
the multidimensional model that underlies SDMX and intend the standard
@@ -1338,12 +1333,12 @@
<h3
id="VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Vocabulary
should define relationship to ISO19156 - Observations & Measurements</h3>
- <p>An number of organisations, particularly in the Climate and
+ <p>An number of organizations, particularly in the Climate and
Meteorological area already have some commitment to the OGC
"Observations and Measurements" (O&M) logical data model, also
published as ISO 19156. Are there any statements about compatibility
and interoperability between O&M and Data Cube that can be made to
- give guidance to such organisations?</p>
+ give guidance to such organizations?</p>
<p>Background information:</p>
<ul>
<li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/32">http://www.w3.org/2011/gld/track/issues/32</a></li>