--- a/data-cube-ucr/index.html Mon Feb 25 16:52:18 2013 +0100
+++ b/data-cube-ucr/index.html Wed Feb 27 19:19:36 2013 +0100
@@ -91,8 +91,8 @@
The following figure illustrates this specificitiy of modelling in a
class diagram:
- <p class="caption">Figure demonstrating specificity of modelling a
- statistic</p>
+ <p class="caption">Figure: Illustration of specificities in
+ modelling of a statistic</p>
<p align="center">
<img alt="specificity of modelling a
@@ -196,7 +196,7 @@
<p>
<span style="font-size: 10pt">(Use case taken from SDMX Web
Dissemination Use Case [<cite><a href="#ref-SDMX-21">SDMX
- 2.1</a></cite>]
+ 2.1</a></cite>])
</span>
</p>
<p>Since we have adopted the multidimensional model that underlies
@@ -217,13 +217,13 @@
SDMX and in more detail described as follows:</p>
<p class="caption">
- Process flow diagram by SDMX [<cite><a href="#ref-SDMX-21">SDMX
- 2.1</a></cite>]
+ Figure: Process flow diagram by SDMX [<cite><a
+ href="#ref-SDMX-21">SDMX 2.1</a></cite>]
</p>
<p align="center">
<img alt="SDMX Web Dissemination Use Case"
- src="./figures/SDMX_Web_Dissemination_Use_Case.png"></img>
+ src="./figures/SDMX_Web_Dissemination_Use_Case.png" width="1000px"></img>
</p>
<p>Benefits:</p>
<p>A structural metadata source (registry) collects metadata about
@@ -242,18 +242,894 @@
metadata.</p>
<p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">There
+ should be a recommended way to communicate the availability of
+ published statistical data to external parties and to allow
+ automatic discovery of statistical data</a></li>
+ </ul>
+
<p>The SDMX Web Dissemination Use Case can be concretised by
several sub-use cases, detailed in the following sections.</p>
</section> <section>
- <h3 id="COINS">Publisher Use Case: UK government financial data
- from Combined Online Information System (COINS)</h3>
+ <h3 id="UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher
+ Use Case: UK government financial data from Combined Online
+ Information System (COINS)</h3>
<p>
<span style="font-size: 10pt">(This use case has been
- summarised from Ian Dickinson et al. (COINS as Linked Data.
- http://data.gov.uk/resources/coins. Last visited on Jan 9 2013). </span>
+ summarised from Ian Dickinson et al. [<cite><a
+ href="#ref-COINS">COINS</a></cite>])
+ </span>
</p>
+ <p>More and more organizations want to publish statistics on the
+ web, for reasons such as increasing transparency and trust. Although
+ in the ideal case, published data can be understood by both humans and
+ machines, data often is simply published as CSV, PDF, XSL etc.,
+ lacking elaborate metadata, which makes free usage and analysis
+ difficult.</p>
+ <p>Therefore, the goal in this use case is to use a
+ machine-readable and application-independent description of common
+ statistics with use of open standards, to foster usage and innovation
+ on the published data.</p>
+ <p>In the "COINS as Linked Data" project (Ian Dickinson et al.
+ COINS as Linked Data. http://data.gov.uk/resources/coins. Last visited
+ on Jan 9 2013), the Combined Online Information System (COINS)
+ (Treasury's web site.
+ http://www.hm-treasury.gov.uk/psr_coins_data.htm. Last visited on Jan
+ 9 2013) shall be published using a standard Linked Data vocabulary.</p>
+ <p>In the Combined Online Information System (COINS), HM Treasury,
+ the principal custodian of financial data for the UK government,
+ released previously restricted financial information about government
+ spendings.</p>
+
+ <p>Benefits:</p>
+
+ According to the COINS as Linked Data project, the reason for
+ publishing COINS as Linked Data are threefold.
+
+ <ul>
+ <li>using open standard representation makes it easier to work
+ with the data with available technologies and promises innovative
+ third-party tools and usages</li>
+ <li>individual transactions and groups of transactions are given
+ an identity, and so can be referenced by web address (URL), to allow
+ them to be discussed, annotated, or listed as source data for
+ articles or visualizations</li>
+ <li>cross-links between linked-data datasets allow for much
+ richer exploration of related datasets</li>
+ </ul>
+
+ <p>The COINS data has a hypercube structure. It describes financial
+ transactions using seven independent dimensions (time, data-type,
+ department etc.) and one dependent measure (value). Also, it allows
+ thirty-three attributes that may further describe each transaction.
+ For further information, see the "COINS as Linked Data" project
+ website.</p>
+
+ <p>COINS is an example of one of the more complex statistical
+ datasets being publishing via data.gov.uk.</p>
+
+ <p>Part of the complexity of COINS arises from the nature of the
+ data being released.</p>
+
+ <p>The published COINS datasets cover expenditure related to five
+ different years (2005–06 to 2009–10). The actual COINS database at HM
+ Treasury is updated daily. In principle at least, multiple snapshots
+ of the COINS data could be released through the year.</p>
+
+ <p>The COINS use case leads to the following challenges:</p>
+ <ul>
+ <li>The actual data and its hypercube structure are to be
+ represented separately so that an application first can examine the
+ structure before deciding to download the actual data, i.e., the
+ transactions. The hypercube structure also defines for each dimension
+ and attribute a range of permitted values that are to be represented.</li>
+ <li>An access or query interface to the COINS data, e.g., via a
+ SPARQL endpoint or the linked data API, is planned. Queries that are
+ expected to be interesting are: "spending for one department", "total
+ spending by department", "retrieving all data for a given
+ observation",</li>
+ <li>Also, the publisher favours a representation that is both as
+ self-descriptive as possible, i.e., others can link to and download
+ fully-described individual transactions and as compact as possible,
+ i.e., information is not unnecessarily repeated.</li>
+ <li>Moreover, the publisher is thinking about the possible
+ benefit of publishing slices of the data, e.g., datasets that fix all
+ dimensions but the time dimension. For instance, such slices could be
+ particularly interesting for visualisations or comments. However,
+ depending on the number of Dimensions, the number of possible slices
+ can become large which makes it difficult to select all interesting
+ slices.</li>
+ <li>An important benefit of linked data is that we are able to
+ annotate data, at a fine-grained level of detail, to record
+ information about the data itself. This includes where it came from –
+ the provenance of the data – but could include annotations from
+ reviewers, links to other useful resources, etc. Being able to trust
+ that data to be correct and reliable is a central value for
+ government-published data, so recording provenance is a key
+ requirement for the COINS data.</li>
+ <li>A challenge also is the size of the data, especially since it
+ is updated regularly. Five data files already contain between 3.3 and
+ 4.9 million rows of data.</li>
+ </ul>
+ <p>Requirements::</p>
+ <ul>
+ <li><a
+ href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Vocabulary
+ should clarify the use of subsets of observations</a></li>
+ </ul>
+
+
+
+
+ </section> <section>
+ <h3 id="PublishingExcelSpreadsheetsasLinkedData">Publisher Use
+ Case: Publishing Excel Spreadsheets as Linked Data</h3>
+ <p>
+ <span style="font-size: 10pt">(Part of this use case has been
+ contributed by Rinke Hoekstra. See <a
+ href="http://ehumanities.nl/ceda_r/">CEDA_R</a> and <a
+ href="http://www.data2semantics.org/">Data2Semantics</a> for more
+ information.)
+ </span>
+ </p>
+
+ <p>Not only in government, there is a need to publish considerable
+ amounts of statistical data to be consumed in various (also
+ unexpected) application scenarios. Typically, Microsoft Excel sheets
+ are made available for download. Those excel sheets contain single
+ spreadsheets with several multidimensional data tables, having a name
+ and notes, as well as column values, row values, and cell values.</p>
+ <p>Benefits:</p>
+ <p>The goal in this use case is to to publish spreadsheet
+ information in a machine-readable format on the web, e.g., so that
+ crawlers can find spreadsheets that use a certain column value. The
+ published data should represent and make available for queries the
+ most important information in the spreadsheets, e.g., rows, columns,
+ and cell values.</p>
+ <p>
+ For instance, in the C<a href="http://ehumanities.nl/ceda_r/">CEDA_R</a>
+ and <a href="http://www.data2semantics.org/">Data2Semantics</a>
+ projects publishing and harmonizing Dutch historical census data (from
+ 1795 onwards) is a goal. These censuses are now only available as
+ Excel spreadsheets (obtained by data entry) that closely mimic the way
+ in which the data was originally published and shall be published as
+ Linked Data.
+ </p>
+ <p>Challenges in this use case:</p>
+ <p>All context and so all meaning of the measurement point is
+ expressed by means of dimensions. The pure number is the star of an
+ ego-network of attributes or dimensions. In a RDF representation it is
+ then easily possible to define hierarchical relationships between the
+ dimensions (that can be exemplified further) as well as mapping
+ different attributes across different value points. This way a
+ harmonization among variables is performed around the measurement
+ points themselves.</p>
+ <p>In historical research, until now, harmonization across datasets
+ is performed by hand, and in subsequent iterations of a database: it
+ is very hard to trace back the provenance of decisions made during the
+ harmonization procedure.</p>
+ <p>Combining Data Cube with SKOS to allow for cross-location and
+ cross-time historical analysis</p>
+ <p>Novel visualisation of census data</p>
+ <p>Integration with provenance vocabularies, e.g., PROV-O, for
+ tracking of harmonization steps</p>
+ <p>These challenges may seem to be particular to the field of
+ historical research, but in fact apply to government information at
+ large. Government is not a single body that publishes information at a
+ single point in time. Government consists of multiple (altering)
+ bodies, scattered across multiple levels, jurisdictions and areas.
+ Publishing government information in a consistent, integrated manner
+ requires exactly the type of harmonization required in this use case.</p>
+ <p>Excel sheets provide much flexibility in arranging information.
+ It may be necessary to limit this flexibility to allow automatic
+ transformation.</p>
+ <p>There are many spreadsheets.</p>
+ <p>Semi-structured information, e.g., notes about lineage of data
+ cells, may not be possible to be formalized.</p>
+ <p>Another concrete example is the Stats2RDF [1] project that
+ intends to publish biomedical statistical data that is represented as
+ Excel sheets. Here, Excel files are first translated into CSV and then
+ translated into RDF.</p>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Vocabulary
+ should recommend a mechanism to support hierarchical code lists</a></li>
+ </ul>
+
+
+ </section> <section>
+ <h3
+ id="PublishinghierarchicallystructureddatafromStatsWalesandOpenDataCommunities">Publisher
+ Use Case: Publishing hierarchically structured data from StatsWales
+ and Open Data Communities</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case has been taken from [<cite><a
+ href="#ref-SDMX-21">QB4OLAP</a></cite>])
+ </span>
+ </p>
+
+ <p>It often comes up in statistical data that you have some kind of
+ 'overall' figure, which is then broken down into parts (GLD mailing
+ list discussion.
+ http://groups.google.com/group/publishing-statistical-data/msg/7c80f3869ff4ba0f).</p>
+
+ <p>
+ Etcheverry and Vaisman [<cite><a href="#ref-SDMX-21">QB4OLAP</a></cite>]
+ present the use case to publish household data from <a
+ href="http://statswales.wales.gov.uk/index.htm">StatsWales</a> and <a
+ href="http://opendatacommunities.org/doc/dataset/housing/household-projections">Open
+ Data Communities</a>.
+ </p>
+
+ <p>This multidimensional data contains for each fact a time
+ dimension with one level year and a location dimension with levels
+ Unitary Authority, Government Office Region, Country, and ALL.</p>
+
+ <p>As unit, units of 1000 households is used.</p>
+
+ <p>In this use case, one wants to publish not only a dataset on the
+ bottom most level, i.e. what are the number of households at each
+ Unitary Authority in each year, but also a dataset on more aggregated
+ levels.</p>
+
+ <p>For instance, in order to publish a dataset with the number of
+ households at each Government Office Region per year, one needs to
+ aggregate the measure of each fact having the same Government Office
+ Region using the SUM function.</p>
+
+ <p>Importantly, one would like to maintain the relationship between
+ the resulting datasets, i.e., the levels and aggregation functions.</p>
+
+ <p>Note, this use case does not simply need a selection (or "dice"
+ in OLAP context) where one fixes the time period and the measure
+ (qb:Slice where you fix the time period and the measure).</p>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Vocabulary
+ should recommend a mechanism to support hierarchical code lists</a></li>
+ </ul>
+
+
+ </section> <section>
+ <h3 id="PublishingslicesofdataaboutUKBathingWaterQuality">Publisher
+ Use Case: Publishing slices of data about UK Bathing Water Quality</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case has been provided by
+ Epimorphics Ltd (<a
+ href="http://www.epimorphics.com/web/projects/bathing-water-quality">http://www.epimorphics.com/web/projects/bathing-water-quality</a>))
+ </span>
+ </p>
+ <p>As part of their work with data.gov.uk and the UK Location
+ Programme Epimorphics Ltd have been working to pilot the publication
+ of both current and historic bathing water quality information from
+ the UK Environment Agency (http://www.environment-agency.gov.uk/) as
+ Linked Data.</p>
+ <p>The UK has a number of areas, typically beaches, that are
+ designated as bathing waters where people routinely enter the water.
+ The Environment Agency monitors and reports on the quality of the
+ water at these bathing waters.</p>
+ <p>The Environement Agency's data can be thought of as structured
+ in 3 groups:</p>
+ <ul>
+ <li>There is basic reference data describing the bathing waters
+ and sampling points</li>
+ <li>There is a data set "Annual Compliance Assessment Dataset"
+ giving the rating for each bathing water for each year it has been
+ monitored</li>
+ <li>There is a data set "In-Season Sample Assessment Dataset"
+ giving the detailed weekly sampling results for each bathing water</li>
+ </ul>
+ <p>The most important dimensions of the data are bathing water,
+ sampling point, and compliance classification.</p>
+ <p>Challenges:</p>
+
+ <p>Observations may exhibit a number of attributes, e.g., whether
+ ther was an abnormal weather exception.</p>
+ <p>
+ Relevant slices of both datasets are to be created:
+ <ul>
+ <li>Annual Compliance Assessment Dataset: all the observations
+ for a specific sampling point, all the observations for a specific
+ year.</li>
+ <li>In-Season Sample Assessment Dataset: samples for a given
+ sampling point, samples for a given week, samples for a given year,
+ samples for a given year and sampling point, latest samples for each
+ sampling point.</li>
+ <li>The use case suggests more arbitrary subsets of the
+ observations, e.g., collecting all the "latest" observations in a
+ continuously updated data set.</li>
+ </ul>
+
+
+ </p>
+ <p>Existing Work:</p>
+ <ul>
+ <li>Semantic Sensor Network ontology (SSN) [2] already provides a
+ way to publish sensor information. SSN data provides statistical
+ Linked Data and grounds its data to the domain, e.g., sensors that
+ collect observations (e.g., sensors measuring average of temperature
+ over location and time).</li>
+ <li>A number of organizations, particularly in the Climate and
+ Meteorological area already have some commitment to the OGC
+ "Observations and Measurements" (O&M) logical data model, also
+ published as ISO 19156.</li>
+ </ul>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Vocabulary
+ should define relationship to ISO19156 - Observations & Measurements</a></li>
+ <li><a
+ href="#Vocabularyshouldclarifytheuseofsubsetsofobservations">Vocabulary
+ should clarify the use of subsets of observations</a></li>
+ </ul>
+
+
+ </section> <section>
+ <h3 id="EurostatSDMXasLinkedData">Publisher Use Case: Eurostat
+ SDMX as Linked Data</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has been taken
+ from <a href="http://estatwrap.ontologycentral.com/">Eurostat
+ Linked Data Wrapper</a> and <a
+ href="http://eurostat.linked-statistics.org/">Linked Statistics
+ Eurostat Data</a>, both deployments for publishing Eurostat SDMX as
+ Linked Data using the draft version of the vocabulary)
+ </span>
+ </p>
+
+ <p>As mentioned already, the ISO standard for exchanging and
+ sharing statistical data and metadata among organizations is
+ Statistical Data and Metadata eXchange (SDMX). Since this standard has
+ proven applicable in many contexts, we adopt the multidimensional
+ model that underlies SDMX and intend the standard vocabulary to be
+ compatible to SDMX.</p>
+
+ <p>
+ Therefore, in this use case we intend to explain the benefit and
+ challenges of publishing SDMX data as Linked Data. As one of the main
+ adopters of SDMX, <a href="http://epp.eurostat.ec.europa.eu/">Eurostat</a>
+ publishes large amounts of European statistics coming from a data
+ warehouse as SDMX and other formats on the web. Eurostat also provides
+ an interface to browse and explore the datasets. However, linking such
+ multidimensional data to related data sets and concepts would require
+ download of interesting datasets and manual integration.The goal here
+ is to improve integration with other datasets; Eurostat data should be
+ published on the web in a machine-readable format, possible to be
+ linked with other datasets, and possible to be freeley consumed by
+ applications. Both <a href="http://estatwrap.ontologycentral.com/">Eurostat
+ Linked Data Wrapper</a> and <a
+ href="http://eurostat.linked-statistics.org/">Linked Statistics
+ Eurostat Data</a> intend to publish Eurostat SDMX data as Linked Data. In
+ these use cases, <a
+ href="http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/">Eurostat
+ data</a> shall be published as <a href="http://5stardata.info/">5-star
+ Linked Open Data</a>. Eurostat data is partly published as SDMX, partly
+ as tabular data (TSV, similar to CSV). Eurostat provides a <a
+ href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">TOC
+ of published datasets</a> as well as a feed of modified and new datasets.
+
+ Eurostat provides a list of used codelists, i.e., <a
+ href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&dir=dic">range
+ of permitted dimension values</a>. Any Eurostat dataset contains a
+ varying set of dimensions (e.g., date, geo, obs_status, sex, unit) as
+ well as measures (generic value, content is specified by dataset,
+ e.g., GDP per capita in PPS, Total population, Employment rate by
+ sex).
+ </p>
+
+
+ <p>Benefits:</p>
+
+ <ul>
+ <li>Possible implementation of ETL pipelines based on Linked Data
+ technologies (e.g., LDSpider) to load the data into a data warehouse
+ for analysis</li>
+
+ <li>Allows useful queries to the data, e.g., comparison of
+ statistical indicators across EU countries.</li>
+
+ <li>Allows to attach contextual information to statistics during
+ the interpretation process.</li>
+
+ <li>Allows to reuse single observations from the data.</li>
+
+ <li>Linking to information from other data sources, e.g., for
+ geo-spatial dimension.
+ </ul>
+
+ <p>Challenges:</p>
+
+ <ul>
+ <li>New Eurostat datasets are added regularly to Eurostat. The
+ Linked Data representation should automatically provide access to the
+ most-up-to-date data.</li>
+
+ <li>How to match elements of the geo-spatial dimension to
+ elements of other data sources, e.g., NUTS, GADM.</li>
+
+ <li>There is a large number of Eurostat datasets, each possibly
+ containing a large number of columns (dimensions) and rows
+ (observations). Eurostat publishes more than 5200 datasets, which,
+ when converted into RDF require more than 350GB of disk space
+ yielding a dataspace with some 8 billion triples.</li>
+
+ <li>In the Eurostat Linked Data Wrapper, there is a timeout for
+ transforming SDMX to Linked Data, since Google App Engine is used.
+ Mechanisms to reduce the amount of data that needs to be translated
+ would be needed.</li>
+
+ <li>Provide a useful interface for browsing and visualising the
+ data. One problem is that the data sets have to high dimensionality
+ to be displayed directly. Instead, one could visualise slices of time
+ series data. However, for that, one would need to either fix most
+ other dimensions (e.g., sex) or aggregate over them (e.g., via
+ average). The selection of useful slices from the large number of
+ possible slices is a challenge.</li>
+
+ <li>Each dimension used by a dataset has a range of permitted
+ values that ought to be represented.</li>
+
+ <li>The Eurostat SDMX as Linked Data use case suggests to have
+ time lines on data aggregating over the gender dimension.</li>
+
+ <li>The Eurostat SDMX as Linked Data use case suggests to provide
+ data on a gender level and on a level aggregating over the gender
+ dimension.</li>
+
+ <li>Updates to the data
+
+ <ul>
+ <li>Eurostat - Linked Data pulls in changes from the original
+ Eurostat dataset on weekly basis and conversion process runs every
+ Saturday at noon taking into account new datasets along with
+ updates to existing datasets.</li>
+ <li>Eurostat Linked Data Wrapper on-the-fly translates Eurostat
+ datasets into RDF so that always the most current data is used. The
+ problem is only to point users towards the URIs of Eurostat
+ datasets: Estatwrap provides a feed of modified and new <a
+ href="http://estatwrap.ontologycentral.com/feed.rdf">datasets</a>.
+ Also, it provides a <a
+ href="http://estatwrap.ontologycentral.com/table_of_contents.html">TOC</a>
+ that could be automatically updated from the <a
+ href="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=table_of_contents_en.xml">Eurostat
+ TOC</a>.
+ </li>
+ </ul>
+
+
+ </li>
+
+ <p>Query interface</p>
+
+ <ul>
+ <li>Eurostat - Linked Data provides SPARQL endpoint for the
+ metadata (not the observations).</li>
+ <li>Eurostat Linked Data Wrapper allows and demonstrates how to
+ use Qcrumb.com to query the data.</li>
+ </ul>
+
+ <p>
+ Browsing and visualising interface:
+ <ul>
+ <li>Eurostat Linked Data Wrapper provides for each dataset an
+ HTML page showing a visualisation of the data.</li>
+ </ul>
+
+
+ </p>
+
+ <p>Non-requirements:</p>
+ <ul>
+ <li>One possible application would run validation checks over
+ Eurostat data. The intended standard vocabulary is to publish the
+ Eurostat data as-is and is not intended to represent information for
+ validation (similar to business rules).</li>
+ </ul>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a href="#VocabularyshouldbuildupontheSDMXinformationmodel">There
+ should be mechanisms and recommendations regarding publication and
+ consumption of large amounts of statistical data</a></li>
+ <li><a
+ href="#Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">There
+ should be a recommended mechanism to allow for publication of
+ aggregates which cross multiple dimensions</a></li>
+ </ul>
+ </section> <section>
+ <h3 id="Representingrelationshipsbetweenstatisticaldata">Publisher
+ Use Case: Representing relationships between statistical data</h3>
+ <p>
+ <span style="font-size: 10pt">(This use case has mainly been
+ taken from the COINS project [<cite><a href="#ref-COINS">COINS</a></cite>])
+ </span>
+ </p>
+
+ <p>In several applications, relationships between statistical data
+ need to be represented.</p>
+
+ <p>The goal of this use case is to describe provenance,
+ transformations, and versioning around statistical data, so that the
+ history of statistics published on the web becomes clear. This may
+ also relate to the issue of having relationships between datasets
+ published.</p>
+
+ <p>
+ For instance, the COINS project [<cite><a href="#ref-COINS">COINS</a></cite>]
+ has at least four perspectives on what they mean by “COINS” data: the
+ abstract notion of “all of COINS”, the data for a particular year, the
+ version of the data for a particular year released on a given date,
+ and the constituent graphs which hold both the authoritative data
+ translated from HMT’s own sources. Also, additional supplementary
+ information which they derive from the data, for example by
+ cross-linking to other datasets.
+ </p>
+
+ <p>Another specific use case is that the Welsh Assembly government
+ publishes a variety of population datasets broken down in different
+ ways. For many uses then population broken down by some category (e.g.
+ ethnicity) is expressed as a percentage. Separate datasets give the
+ actual counts per category and aggregate counts. In such cases it is
+ common to talk about the denominator (often DENOM) which is the
+ aggregate count against which the percentages can be interpreted.</p>
+
+ <p>
+ Another example for representing relationships between statistical
+ data are transformations on datasets, e.g., addition of derived
+ measures, conversion of units, aggregations, OLAP operations, and
+ enrichment of statistical data. A concrete example is given by Freitas
+ et al. [<cite><a href="#ref-COGS">COGS</a></cite>] and illustrated in
+ the following figure.
+ </p>
+
+ <p class="caption">Figure: Illustration of ETL of statistics</p>
+
+ <p align="center">
+ <img alt="COGS relationships between statistics example"
+ src="./figures/Relationships_Statistical_Data_Cogs_Example.png"></img>
+ </p>
+
+ <p>Here, numbers from a sustainability report have been created by
+ a number of transformations to statistical data. Different numbers
+ (e.g., 600 for year 2009 and 503 for year 2010) might have been
+ created differently, leading to different reliabilities to compare
+ both numbers.</p>
+ <p>Benefits:</p>
+
+ <p>Making transparent the transformation a dataset has been exposed
+ to. Increases trust in the data.</p>
+
+ <p>Challenges:</p>
+
+ <ul>
+ <li>Operations on statistical data result in new statistical
+ data, depending on the operation. For instance, in terms of Data
+ Cube, operations such as slice, dice, roll-up, drill-down will result
+ in new Data Cubes. This may require representing general
+ relationships between cubes (as discussed in the <a
+ href="http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/75762788de10de95">publishing-statistical-data
+ mailing list</a>).
+ </li>
+ <li>Should Data Cube support explicit declaration of such
+ relationships either between separated qb:DataSets or between
+ measures with a single <code>qb:DataSet</code> (e.g. <code>ex:populationCount</code>
+ and <code>ex:populationPercent</code>)?
+ </li>
+ <li>If so should that be scoped to simple, common relationships
+ like DENOM or allow expression of arbitrary mathematical relations?</li>
+ </ul>
+
+ <p>
+ Existing Work (optional):
+
+ <p>
+ Possible relation to <a
+ href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary#Versioning">Versioning</a>
+ part of GLD Best Practices Document, where it is specified how to
+ publish data which has multiple versions.
+ </p>
+ <p>
+ The <a href="http://sites.google.com/site/cogsvocab/">COGS</a>
+ vocabulary [<cite><a href="#ref-COGS">COGS</a></cite>] is related to
+ this use case since it may complement the standard vocabulary for
+ representing ETL pipelines processing statistics.
+ </p>
+
+ </p>
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">There
+ should be a recommended way of declaring relations between cubes</a></li>
+ </ul>
+
+ </section> <section>
+ <h3 id="Simplechartvisualisationsofpublishedstatisticaldata">Consumer
+ Use Case: Simple chart visualisations of (integrated) published
+ statistical data</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a
+ href="http://www.iwrm-smart.org/">SMART research project</a>)
+ </span>
+ </p>
+
+ <p>Data that is published on the Web is typically visualized by
+ transforming it manually into CSV or Excel and then creating a
+ visualization on top of these formats using Excel, Tableau,
+ RapidMiner, Rattle, Weka etc.</p>
+ <p>This use case shall demonstrate how statistical data published
+ on the web can be with few effort visualized inside a webpage, without
+ using commercial or highly-complex tools.</p>
+ <p>
+ An example scenario is environmental research done within the <a
+ href="http://www.iwrm-smart.org/">SMART research project</a>. Here,
+ statistics about environmental aspects (e.g., measurements about the
+ climate in the Lower Jordan Valley) shall be visualized for scientists
+ and decision makers. Statistics should also be possible to be
+ integrated and displayed together. The data is available as XML files
+ on the web. On a separate website, specific parts of the data shall be
+ queried and visualized in simple charts, e.g., line diagrams.
+ </p>
+
+ <p class="caption">Figure: HTML embedded line chart of an
+ environmental measure over time for three regions in the lower Jordan
+ valley</p>
+
+ <p align="center">
+ <img
+ alt="display of an environmental measure over time for three regions in the lower Jordan valley"
+ src="./figures/Level_above_msl_3_locations.png" width="1000px"></img>
+ </p>
+
+ <p class="caption">Figure: Showing the same data in a pivot table.
+ Here, the aggregate COUNT of measures per cell is given.</p>
+ <p align="center">
+ <img
+ alt="Figure: Showing the same data in a pivot
+ table. Here, the aggregate COUNT of measures per cell is given."
+ src="./figures/pivot_analysis_measurements.PNG"></img>
+ </p>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>The difficulties lay in structuring the data appropriately so
+ that the specific information can be queried.</li>
+ <li>Also, data shall be published with having potential
+ integration in mind. Therefore, e.g., units of measurements need to
+ be represented.</li>
+ <li>Integration becomes much more difficult if publishers use
+ different measures, dimensions.</li>
+ </ul>
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There
+ should be criteria for well-formedness and assumptions consumers can
+ make about published data</a></li>
+ </ul>
+
+ </section> <section>
+ <h3 id="VisualisingpublishedstatisticaldatainGooglePublicDataExplorer">Consumer
+ Use Case: Visualising published statistical data in Google Public Data
+ Explorer</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a
+ href="http://code.google.com/apis/publicdata/">Google Public Data
+ Explorer (GPDE)</a>)
+ </span>
+ </p>
+ <p>
+ <a href="http://code.google.com/apis/publicdata/">Google Public
+ Data Explorer</a> (GPDE) provides an easy possibility to visualize and
+ explore statistical data. Data needs to be in the <a
+ href="https://developers.google.com/public-data/overview">Dataset
+ Publishing Language</a> (DSPL) to be uploaded to the data explorer. A
+ DSPL dataset is a bundle that contains an XML file, the schema, and a
+ set of CSV files, the actual data. Google provides a tutorial to
+ create a DSPL dataset from your data, e.g., in CSV. This requires a
+ good understanding of XML, as well as a good understanding of the data
+ that shall be visualized and explored.
+ </p>
+ <p>In this use case, the goal is to take statistical data published
+ on the web and to transform it into DSPL for visualization and
+ exploration with as few effort as possible.</p>
+ <p>For instance, Eurostat data about Unemployment rate downloaded
+ from the web as shown in the following figure:</p>
+
+ <p class="caption">Figure: An interactive chart in GPDE for
+ visualising Eurostat data in the DSPL</p>
+ <p align="center">
+ <img
+ alt="An interactive chart in GPDE for visualising Eurostat data in the DSPL"
+ src="./figures/Eurostat_GPDE_Example.png" width="1000px"></img>
+ </p>
+
+ <p>Benefits:</p>
+ <ul>
+ <li>If a standard Linked Data vocabulary is used, visualising and
+ exploring new data that already is represented using this vocabulary
+ can easily be done using GPDE.</li>
+ <li>Datasets can be first integrated using Linked Data technology
+ and then analysed using GDPE.</li>
+ </ul>
+ <p>Challenges of this use case are:</p>
+ <ul>
+ <li>There are different possible approaches each having
+ advantages and disadvantages: 1) A customer C is downloading this
+ data into a triple store; SPARQL queries on this data can be used to
+ transform the data into DSPL and uploaded and visualized using GPDE.
+ 2) or, one or more XLST transformation on the RDF/XML transforms the
+ data into DSPL.</li>
+ <li>The technical challenges for the consumer here lay in knowing
+ where to download what data and how to get it transformed into DSPL
+ without knowing the data.</li>
+ </ul>
+
+ <p>Unanticipated Uses (optional): DSPL is representative for using
+ statistical data published on the web in available tools for analysis.
+ Similar tools that may be automatically covered are: Weka (arff data
+ format), Tableau, SPSS, STATA, PC-Axis etc.</p>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There
+ should be criteria for well-formedness and assumptions consumers can
+ make about published data</a></li>
+ </ul>
+ </section> <section>
+ <h3 id="AnalysingpublishedstatisticaldatawithcommonOLAPsystems">Consumer
+ Use Case: Analysing published statistical data with common OLAP
+ systems</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case taken from <a
+ href="http://xbrl.us/research/appdev/Pages/275.aspx">Financial
+ Information Observation System (FIOS)</a>)
+ </span>
+ </p>
+
+ <p>Online Analytical Processing (OLAP) is an analysis method on
+ multidimensional data. It is an explorative analysis methode that
+ allows users to interactively view the data on different angles
+ (rotate, select) or granularities (drill-down, roll-up), and filter it
+ for specific information (slice, dice).</p>
+
+ <p>OLAP systems that first use ETL pipelines to
+ Extract-Load-Transform relevant data for efficient storage and queries
+ in a data warehouse and then allows interfaces to issue OLAP queries
+ on the data are commonly used in industry to analyse statistical data
+ on a regular basis.</p>
+
+ <p>
+ The goal in this use case is to allow analysis of published
+ statistical data with common OLAP systems [<cite><a
+ href="#ref-OLAP4LD">OLAP4LD</a></cite>]
+ </p>
+
+ <p>For that a multidimensional model of the data needs to be
+ generated. A multidimensional model consists of facts summarised in
+ data cubes. Facts exhibit measures depending on members of dimensions.
+ Members of dimensions can be further structured along hierarchies of
+ levels.</p>
+
+ <p>
+ An example scenario of this use case is the Financial Information
+ Observation System (FIOS) [<cite><a href="#ref-FIOS">FIOS</a></cite>],
+ where XBRL data provided by the SEC on the web is to be re-published
+ as Linked Data and made analysable for stakeholders in a web-based
+ OLAP client Saiku.
+ </p>
+
+ <p>The following figure shows an example of using FIOS. Here, for
+ three different companies, cost of goods sold as disclosed in XBRL
+ documents are analysed. As cell values either the number of
+ disclosures or - if only one available - the actual number in USD is
+ given:</p>
+
+
+ <p class="caption">Figure: Example of using FIOS for OLAP
+ operations on financial data</p>
+ <p align="center">
+ <img alt="Example of using FIOS for OLAP operations on financial data"
+ src="./figures/FIOS_example.PNG"></img>
+ </p>
+
+ <p>Benefits:</p>
+
+ <ul>
+ <li>OLAP operations cover typical business requirements, e.g.,
+ slice, dice, drill-down.</li>
+ <li>OLAP frontends intuitive interactive, explorative, fast.
+ Interfaces well-known to many people in industry.</li>
+ <li>OLAP functionality provided by many tools that may be reused</li>
+ </ul>
+
+ <p>Challenges:</p>
+ <ul>
+ <li>ETL pipeline needs to automatically populate a data
+ warehouse. Common OLAP systems use relational databases with a star
+ schema.</li>
+ <li>A problem lies in the strict separation between queries for
+ the structure of data (metadata queries), and queries for actual
+ aggregated values (OLAP operations).</li>
+ <li>Another problem lies in defining Data Cubes without greater
+ insight in the data beforehand.</li>
+ <li>Depending on the expressivity of the OLAP queries (e.g.,
+ aggregation functions, hierarchies, ordering), performance plays an
+ important role.</li>
+ </ul>
+
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There
+ should be criteria for well-formedness and assumptions consumers can
+ make about published data</a></li>
+ </ul>
+ </section> <section>
+ <h3 id="Registeringpublishedstatisticaldataindatacatalogs">Registry
+ Use Case: Registering published statistical data in data catalogs</h3>
+ <p>
+ <span style="font-size: 10pt">(Use case motivated by <a
+ href="http://www.w3.org/TR/vocab-dcat/">Data Catalog vocabulary</a>)
+ </span>
+ </p>
+
+ <p>
+ After statistics have been published as Linked Data, the question
+ remains how to communicate the publication and let users discover the
+ statistics. There are catalogs to register datasets, e.g., CKAN, <a
+ href="http://www.datacite.org/">datacite.org</a>, <a
+ href="http://www.gesis.org/dara/en/home/?lang=en">da|ra</a>, and <a
+ href="http://pangaea.de/">Pangea</a>. Those catalogs require specific
+ configurations to register statistical data.
+ </p>
+
+ <p>The goal of this use case is to demonstrate how to expose and
+ distribute statistics after publication. For instance, to allow
+ automatic registration of statistical data in such catalogs, for
+ finding and evaluating datasets. To solve this issue, it should be
+ possible to transform the published statistical data into formats that
+ can be used by data catalogs.</p>
+
+ <p>
+ A concrete use case is the structured collection of <a
+ href="http://wiki.planet-data.eu/web/Datasets">RDF Data Cube
+ Vocabulary datasets</a> in the PlanetData Wiki. It is supposed to list
+ statistical data published. This list is supposed to describe the
+ formal RDF descriptions on a higher level and to provide a useful
+ overview of RDF Data Cube deployments in the Linked Data cloud.
+ </p>
+
+ <p>Unanticipated Uses: If data catalogs contain statistics, they do
+ not expose those using Linked Data but for instance using CSV or HTML
+ (e.g., Pangea). It could also be a use case to publish such data using
+ the standard vocabulary.</p>
+ <p>
+ Existing Work: The <a href="http://www.w3.org/TR/vocab-dcat/">Data
+ Catalog vocabulary</a> (DCAT) is strongly related to this use case since
+ it may complement the standard vocabulary for representing statistics
+ in the case of registering data in a data catalog.
+ </p>
+
+ <p>Requirements:</p>
+ <ul>
+ <li><a
+ href="#Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">There
+ should be a recommended way to communicate the availability of
+ published statistical data to external parties and to allow
+ automatic discovery of statistical data</a></li>
+ </ul>
</section> </section>
<section>
@@ -261,260 +1137,278 @@
<p>The use cases presented in the previous section give rise to the
following requirements for a standard representation of statistics.
- Requirements are cross-linked with the use cases that motivate them.
- Requirements are similarly categorized as deriving from publishing or
- consuming use cases.</p>
+ Requirements are cross-linked with the use cases that motivate them.</p>
- <section>
- <h3>Publishing requirements</h3>
<section>
- <h4>Machine-readable and application-independent representation of
- statistics</h4>
- <p>It should be possible to add abstraction, multiple levels of
- description, summaries of statistics.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4</p>
- </section> <section>
- <h4>Representing statistics from various resource</h4>
- <p>Statistics from various resource data should be possible to be
- translated into QB. QB should be very general and should be usable for
- other data sets such as survey data, spreadsheets and OLAP data cubes.
- What kind of statistics are described: simple CSV tables (UC 1), excel
- (UC 2) and more complex SDMX (UC 3) data about government statistics
- or other public-domain relevant data.</p>
-
- <p>Required by: UC1, UC2, UC3</p>
- </section> <section>
- <h4>Communicating, exposing statistics on the web</h4>
- <p>It should become clear how to make statistical data available on
- the web, including how to expose it, and how to distribute it.</p>
-
- <p>Required by: UC5</p>
- </section> <section>
- <h4>Coverage of typical statistics metadata</h4>
- <p>It should be possible to add metainformation to statistics as
- found in typical statistics or statistics catalogs.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4, UC5</p>
- </section> <section>
- <h4>Expressing hierarchies</h4>
- <p>It should be possible to express hierarchies on Dimensions of
- statistics. Some of this requirement is met by the work on ISO
- Extension to SKOS [17].</p>
-
- <p>Required by: UC3, UC9</p>
- </section> <section>
- <h4>Machine-readable and application-independent representation of
- statistics</h4>
- <p>It should be possible to add abstraction, multiple levels of
- description, summaries of statistics.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4</p>
- </section> <section>
- <h4>Expressing aggregation relationships in Data Cube</h4>
- <p>Based on [18]: It often comes up in statistical data that you
- have some kind of 'overall' figure, which is then broken down into
- parts. To Supposing I have a set of population observations, expressed
- with the Data Cube vocabulary - something like (in pseudo-turtle):</p>
- <pre>
-ex:obs1
- sdmx:refArea <UK>;
- sdmx:refPeriod "2011";
- ex:population "60" .
-
-ex:obs2
- sdmx:refArea <England>;
- sdmx:refPeriod "2011";
- ex:population "50" .
-
-ex:obs3
- sdmx:refArea <Scotland>;
- sdmx:refPeriod "2011";
- ex:population "5" .
-
-ex:obs4
- sdmx:refArea <Wales>;
- sdmx:refPeriod "2011";
- ex:population "3" .
-
-ex:obs5
- sdmx:refArea <NorthernIreland>;
- sdmx:refPeriod "2011";
- ex:population "2" .
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- </pre>
- <p>What is the best way (in the context of the RDF/Data Cube/SDMX
- approach) to express that the values for the England/Scotland/Wales/
- Northern Ireland ought to add up to the value for the UK and
- constitute a more detailed breakdown of the overall UK figure? I might
- also have population figures for France, Germany, EU27, etc...so it's
- not as simple as just taking a qb:Slice where you fix the time period
- and the measure.</p>
- <p>Some of this requirement is met by the work on ISO Extension to
- SKOS [19].</p>
-
-
- <p>Required by: UC1, UC2, UC3, UC9</p>
- </section> <section>
- <h4>Scale - how to publish large amounts of statistical data</h4>
- <p>Publishers that are restricted by the size of the statistics
- they publish, shall have possibilities to reduce the size or remove
- redundant information. Scalability issues can both arise with
- peoples's effort and performance of applications.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4</p>
- </section> <section>
- <h4>Compliance-levels or criteria for well-formedness</h4>
- <p>The formal RDF Data Cube vocabulary expresses few formal
- semantic constraints. Furthermore, in RDF then omission of
- otherwise-expected properties on resources does not lead to any formal
- inconsistencies. However, to build reliable software to process Data
- Cubes then data consumers need to know what assumptions they can make
- about a dataset purporting to be a Data Cube.</p>
- <p>What *well-formedness* criteria should Data Cube publishers
- conform to? Specific areas which may need explicit clarification in
- the well-formedness criteria include (but may not be limited to):</p>
+ <h3 id="VocabularyshouldbuildupontheSDMXinformationmodel">Vocabulary
+ should build upon the SDMX information model</h3>
+ <p>
+ The draft version of the vocabulary builds upon <a
+ href="http://sdmx.org/?page_id=16">SDMX Standards Version 2.0</a>. A
+ newer version of SDMX, <a href="http://sdmx.org/?p=899">SDMX
+ Standards, Version 2.1</a>, is available.
+ </p>
+ <p>The requirement is to at least build upon Version 2.0, if
+ specific use cases derived from Version 2.1 become available, the
+ working group may consider building upon Version 2.1.</p>
+ <p>Background information:</p>
<ul>
- <li>use of abbreviated data layout based on attachment levels</li>
- <li>use of qb:Slice when (completeness, requirements for an
- explicit qb:SliceKey?)</li>
- <li>avoiding mixing two approaches to handling multiple-measures
- </li>
- <li>optional triples (e.g. type triples)</li>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/37">http://www.w3.org/2011/gld/track/issues/37</a></li>
</ul>
- <p>Required by all use cases.</p>
- </section> <section>
- <h4>Declaring relations between Cubes</h4>
- <p>In some situations statistical data sets are used to derive
- further datasets. Should Data Cube be able to explicitly convey these
- relationships?</p>
- <p>Note that there has been some work towards this within the SDMX
- community as indicated here:
- http://groups.google.com/group/publishing-statistical-data/msg/b3fd023d8c33561d</p>
-
- <p>Required by: UC6</p>
- </section> </section> <section>
- <h3>Consumption requirements</h3>
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#SDMXWebDisseminationUseCase">SDMX Web
+ Dissemination Use Case</a></li>
+ <li><a
+ href="#UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher
+ Use Case: UK government financial data from Combined Online
+ Information System (COINS)</a></li>
+ <li><a href="#EurostatSDMXasLinkedData">Publisher Use Case:
+ Eurostat SDMX as Linked Data</a></li>
+ </ul>
- <section>
- <h4>Finding statistical data</h4>
- <p>Finding statistical data should be possible, perhaps through an
- authoritative service</p>
-
- <p>Required by: UC5</p>
- </section> <section>
- <h4>Retrival of fine grained statistics</h4>
- <p>Query formulation and execution mechanisms. It should be
- possible to use SPARQL to query for fine grained statistics.</p>
-
- <p>Required by: UC1, UC2, UC3, UC4, UC5, UC6, UC7</p>
- </section> <section>
- <h4>Understanding - End user consumption of statistical data</h4>
- <p>Must allow presentation, visualization .</p>
-
- <p>Required by: UC7, UC8, UC9, UC10</p>
</section> <section>
- <h4>Comparing and trusting statistics</h4>
- <p>Must allow finding what's in common in the statistics of two or
- more datasets. This requirement also deals with information quality -
- assessing statistical datasets - and trust - making trust judgements
- on statistical data.</p>
+ <h3 id="Vocabularyshouldclarifytheuseofsubsetsofobservations">Vocabulary
+ should clarify the use of subsets of observations</h3>
+ <p>There should be a consensus on the issue of flattening or
+ abbreviating data; one suggestion is to author data without the
+ duplication, but have the data publication tools "flatten" the compact
+ representation into standalone observations during the publication
+ process.</p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/33">http://www.w3.org/2011/gld/track/issues/33</a></li>
- <p>Required by: UC5, UC6, UC9</p>
- </section> <section>
- <h4>Integration of statistics</h4>
- <p>Interoperability - combining statistics produced by multiple
- different systems. It should be possible to combine two statistics
- that contain related data, and possibly were published independently.
- It should be possible to implement value conversions.</p>
+ <li>Since there are no use cases for qb:subslice, the vocabulary
+ should clarify or drop the use of qb:subslice; issue: <a
+ href="http://www.w3.org/2011/gld/track/issues/34">http://www.w3.org/2011/gld/track/issues/34</a>
+ </li>
+ </ul>
- <p>Required by: UC1, UC3, UC4, UC7, UC9, UC10</p>
+ <p>Required by:</p>
+ <ul>
+ <li><a
+ href="#UKgovernmentfinancialdatafromCombinedOnlineInformationSystem">Publisher
+ Use Case: UK government financial data from Combined Online
+ Information System (COINS)</a></li>
+ <li><a href="#PublishingslicesofdataaboutUKBathingWaterQuality">Publisher
+ Use Case: Publishing slices of data about UK Bathing Water Quality</a></li>
+ </ul>
+
</section> <section>
- <h4>Scale - how to consume large amounts of statistical data</h4>
- <p>Consumers that want to access large amounts of statistical data
- need guidance.</p>
+ <h3
+ id="Vocabularyshouldrecommendamechanismtosupporthierarchicalcodelists">Vocabulary
+ should recommend a mechanism to support hierarchical code lists</h3>
+ <p>First, hierarchical code lists may be supported via SKOS. Allow
+ for cross-location and cross-time analysis of statistical datasets.</p>
+ <p>Second, one can think of non-SKOS hierarchical code lists. E.g.,
+ if simple skos:narrower/skos:broader relationships are not sufficient
+ or if a vocabulary uses specific hierarchical properties, e.g.,
+ geo:containedIn.</p>
+ <p>Also, the use of hierarchy levels needs to be clarified. It has
+ been suggested, to allow skos:Collections as value of qb:codeList.</p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/39">http://www.w3.org/2011/gld/track/issues/39</a>
+ </li>
+ </ul>
- <p>Required by: UC7, UC9</p>
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#PublishingExcelSpreadsheetsasLinkedData">Publisher
+ Use Case: Publishing Excel Spreadsheets as Linked Data</a></li>
+ </ul>
+
</section> <section>
- <h4>Common internal representation of statistics, to be exported
- in other formats</h4>
- <p>QB data should be possible to be transformed into data formats
- such as XBRL which are required by certain institutions.</p>
+ <h3
+ id="VocabularyshoulddefinerelationshiptoISO19156ObservationsMeasurements">Vocabulary
+ should define relationship to ISO19156 - Observations & Measurements</h3>
+ <p>An number of organizations, particularly in the Climate and
+ Meteorological area already have some commitment to the OGC
+ "Observations and Measurements" (O&M) logical data model, also
+ published as ISO 19156. Are there any statements about compatibility
+ and interoperability between O&M and Data Cube that can be made to
+ give guidance to such organizations?</p>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/32">http://www.w3.org/2011/gld/track/issues/32</a></li>
+ </ul>
- <p>Required by: UC10</p>
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#PublishingslicesofdataaboutUKBathingWaterQuality">Publisher
+ Use Case: Publishing slices of data about UK Bathing Water Quality</a></li>
+ </ul>
+
</section> <section>
- <h4>Dealing with imperfect statistics</h4>
- <p>Imperfections - reasoning about statistical data that is not
- complete or correct.</p>
+ <h3
+ id="Thereshouldbearecommendedmechanismtoallowforpublicationofaggregateswhichcrossmultipledimensions">There
+ should be a recommended mechanism to allow for publication of
+ aggregates which cross multiple dimensions</h3>
- <p>Required by: UC7, UC8, UC9, UC10</p>
- </section> </section> </section>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/31">http://www.w3.org/2011/gld/track/issues/31</a></li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li>E.g., the Eurostat SDMX as Linked Data use case suggests to
+ have time lines on data aggregating over the gender dimension: <a
+ href="#EurostatSDMXasLinkedData">Publisher Use Case: Eurostat
+ SDMX as Linked Data</a>
+ </li>
+ <li>Another possible use case could be provided by the <a
+ href="http://data.gov.uk/resources/payments">Payment Ontology</a>.
+ </li>
+ </ul>
+
+ </section> <section>
+ <h3 id="Thereshouldbearecommendedwayofdeclaringrelationsbetweencubes">There
+ should be a recommended way of declaring relations between cubes</h3>
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/30">http://www.w3.org/2011/gld/track/issues/30</a></li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#Representingrelationshipsbetweenstatisticaldata">Publisher
+ Use Case: Representing relationships between statistical data</a></li>
+ </ul>
+
+ </section> <section>
+ <h3
+ id="Thereshouldbecriteriaforwell-formednessandassumptionsconsumerscanmakeaboutpublisheddata">There
+ should be criteria for well-formedness and assumptions consumers can
+ make about published data</h3>
+
+ <p>Background information:</p>
+ <ul>
+ <li>Issue: <a href="http://www.w3.org/2011/gld/track/issues/29">http://www.w3.org/2011/gld/track/issues/29</a></li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li><a
+ href="#Simplechartvisualisationsofpublishedstatisticaldata">Consumer
+ Use Case: Simple chart visualisations of (integrated) published
+ statistical data</a></li>
+ <li><a
+ href="#VisualisingpublishedstatisticaldatainGooglePublicDataExplorer">Consumer
+ Use Case: Visualising published statistical data in Google Public
+ Data Explorer</a></li>
+ <li><a
+ href="#AnalysingpublishedstatisticaldatawithcommonOLAPsystems">Consumer
+ Use Case: Analysing published statistical data with common OLAP
+ systems</a></li>
+ </ul>
+
+ </section> <section>
+ <h3 id="VocabularyshouldbuildupontheSDMXinformationmodel">There
+ should be mechanisms and recommendations regarding publication and
+ consumption of large amounts of statistical data</h3>
+ <p>Background information:</p>
+ <ul>
+ <li>Related issue regarding abbreviations <a
+ href="http://www.w3.org/2011/gld/track/issues/29">http://www.w3.org/2011/gld/track/issues/29</a>
+ </li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#EurostatSDMXasLinkedData">Publisher Use Case:
+ Eurostat SDMX as Linked Data</a></li>
+ </ul>
+
+ </section> <section>
+ <h3
+ id="Thereshouldbearecommendedwaytocommunicatetheavailabilityofpublishedstatisticaldatatoexternalpartiesandtoallowautomaticdiscoveryofstatisticaldata">There
+ should be a recommended way to communicate the availability of
+ published statistical data to external parties and to allow automatic
+ discovery of statistical data</h3>
+ <p>Clarify the relationship between DCAT and QB.</p>
+ <p>Background information:</p>
+ <ul>
+ <li>None.</li>
+ </ul>
+
+ <p>Required by:</p>
+ <ul>
+ <li><a href="#SDMXWebDisseminationUseCase">SDMX Web
+ Dissemination Use Case</a></li>
+ <li><a href="#Registeringpublishedstatisticaldataindatacatalogs">Registry
+ Use Case: Registering published statistical data in data catalogs</a></li>
+ </ul>
+
+ </section> </section>
<section class="appendix">
<h2 id="acknowledgements">Acknowledgements</h2>
- <p>The editors are very thankful for comments and suggestions ...</p>
+ <p>We thank Rinke Hoekstra, Dave Reynolds, Bernadette Hyland,
+ Biplav Srivastava, John Erickson, Villazón-Terrazas for
+ feedback and input.</p>
</section>
<h2 id="references">References</h2>
<dl>
- <dt id="ref-SDMX">[SMDX]</dt>
+
+ <dt id="ref-cog">[COG]</dt>
<dd>
- SMDX - SDMX User Guide Version 2009.1, <a
- href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>,
- last visited Jan 8 2013.
+ SDMX Content Oriented Guidelines, <a
+ href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>
</dd>
- <dt id="ref-SDMX-21">[SMDX 2.1]</dt>
+ <dt id="ref-COGS">[COGS]</dt>
<dd>
- SDMX 2.1 User Guide Version. Version 0.1 - 19/09/2012. <a
- href="http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf">http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf</a>.
- Last visited on 8 Jan 2013.
+ Freitas, A., Kämpgen, B., Oliveira, J. G., O’Riain, S., & Curry, E.
+ (2012). Representing Interoperable Provenance Descriptions for ETL
+ Workflows. ESWC 2012 Workshop Highlights (pp. 1–15). Springer Verlag,
+ 2012 (in press). (Extended Paper published in Conf. Proceedings.). <a
+ href="http://andrefreitas.org/papers/preprint_provenance_ETL_workflow_eswc_highlights.pdf">http://andrefreitas.org/papers/preprint_provenance_ETL_workflow_eswc_highlights.pdf</a>.
</dd>
+ <dt id="ref-COINS">[COINS]</dt>
+ <dd>
+ Ian Dickinson et al., COINS as Linked Data <a
+ href="http://data.gov.uk/resources/coins">http://data.gov.uk/resources/coins</a>,
+ Last visited on Jan 9 2013
+ </dd>
+
+ <dt id="ref-FIOS">[FIOS]</dt>
+ <dd>
+ Andreas Harth, Sean O'Riain, Benedikt Kämpgen. Submission XBRL
+ Challenge 2011. <a
+ href="http://xbrl.us/research/appdev/Pages/275.aspx">http://xbrl.us/research/appdev/Pages/275.aspx</a>.
+ </dd>
+
+
<dt id="ref-Fowler1997">[Fowler1997]</dt>
<dd>Fowler, Martin (1997). Analysis Patterns: Reusable Object
Models. Addison-Wesley. ISBN 0201895420.</dd>
+
+ <dt id="ref-linked-data">[LOD]</dt>
+ <dd>
+ Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>
+ </dd>
+
+ <dt id="ref-OLAP">[OLAP]</dt>
+ <dd>
+ Online Analytical Processing Data Cubes, <a
+ href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>
+ </dd>
+
+ <dt id="ref-OLAP">[OLAP4LD]</dt>
+ <dd>
+ Kämpgen, B. and Harth, A. (2011). Transforming Statistical Linked
+ Data for Use in OLAP Systems. I-Semantics 2011. <a
+ href="http://www.aifb.kit.edu/web/Inproceedings3211">http://www.aifb.kit.edu/web/Inproceedings3211</a>
+ </dd>
+
<dt id="ref-QB">[QB-2010]</dt>
<dd>
RDF Data Cube vocabulary, <a
@@ -527,15 +1421,11 @@
href="http://www.w3.org/TR/vocab-data-cube/">http://www.w3.org/TR/vocab-data-cube/</a>
</dd>
- <dt id="ref-OLAP">[OLAP]</dt>
+ <dt id="ref-QB4OLAP">[QB4OLAP]</dt>
<dd>
- Online Analytical Processing Data Cubes, <a
- href="http://en.wikipedia.org/wiki/OLAP_cube">http://en.wikipedia.org/wiki/OLAP_cube</a>
- </dd>
-
- <dt id="ref-linked-data">[LOD]</dt>
- <dd>
- Linked Data, <a href="http://linkeddata.org/">http://linkeddata.org/</a>
+ Etcheverry, Vaismann. QB4OLAP : A New Vocabulary for OLAP Cubes on
+ the Semantic Web. <a
+ href="http://publishing-multidimensional-data.googlecode.com/git/index.html">http://publishing-multidimensional-data.googlecode.com/git/index.html</a>
</dd>
<dt id="ref-rdf">[RDF]</dt>
@@ -557,10 +1447,18 @@
href="http://www.w3.org/2004/02/skos/">http://www.w3.org/2004/02/skos/</a>
</dd>
- <dt id="ref-cog">[COG]</dt>
+ <dt id="ref-SDMX">[SMDX]</dt>
<dd>
- SDMX Content Oriented Guidelines, <a
- href="http://sdmx.org/?page_id=11">http://sdmx.org/?page_id=11</a>
+ SMDX - SDMX User Guide Version 2009.1, <a
+ href="http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf">http://sdmx.org/wp-content/uploads/2009/02/sdmx-userguide-version2009-1-71.pdf</a>,
+ Last visited Jan 8 2013.
+ </dd>
+
+ <dt id="ref-SDMX-21">[SMDX 2.1]</dt>
+ <dd>
+ SDMX 2.1 User Guide Version. Version 0.1 - 19/09/2012. <a
+ href="http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf">http://sdmx.org/wp-content/uploads/2012/11/SDMX_2-1_User_Guide_draft_0-1.pdf</a>.
+ Last visited on 8 Jan 2013.
</dd>
</dl>