abstract, intro, some layout stuff
authorRichard Cyganiak <richard@cyganiak.de>
Sat, 11 Feb 2012 17:51:18 +0000
changeset 88 5a181159ccf6
parent 87 26012778fea6
child 89 7916a6ff936c
abstract, intro, some layout stuff
dcat-ucr/index.html
--- a/dcat-ucr/index.html	Sat Feb 11 14:15:54 2012 +0000
+++ b/dcat-ucr/index.html	Sat Feb 11 17:51:18 2012 +0000
@@ -8,23 +8,54 @@
 	<script type="text/javascript" src="../respec/ReSpec.js/js/respec.js" class="remove"></script>
 	<script src="../respec/gld-bib.js" class="remove"></script>
 	<script src="respec-config.js" class="remove"></script>
+  <style type="text/css">
+.todo { background-color: #fdd; border: 1px solid #800; margin: 1em 0em; padding: 1em; page-break-inside: avoid ; font-style: italic; }
+.todo:before { content: 'TODO: '; }
+  </style>
 </head>
 <body>
 
 <section id="abstract">
 <p>
[email protected]@ Abstract
+Many national, regional and local governments, as well as other organizations
+inside and outside of the public sector, are operating
+data catalogs – web portals that provide access to
+machine-readable data published by these organizations. The need for a
+standard format for representing the metadata contained
+in these catalogs has been recognized. This document is a collection
+of use cases for such a standard.
 </p>
 </section>
 
 <section id="sotd">
-  <p>This is a working document of the <a href="http://www.w3.org/2011/gld/wiki/Data_Cube_Vocabulary">Data Catalog Vocabulary project</a> within the <a href="http://www.w3.org/2011/gld/">W3C Government Linked Data Working Group</a>. Feedback is welcome and should be sent to the <a href="mailto:[email protected]">[email protected] mailing list</a>.</p>
+  <p>The use cases presented in this document were originally collected by the <a href="http://www.w3.org/egov/wiki/Data_Catalog_Vocabulary">Data Catalog Vocabulary Task Force</a> of the <a href="http://www.w3.org/egov/wiki/Main_Page">W3C eGovernment Interest Group</a>.</p>
 </section>
 
 <section>
 <h2>Introduction</h2>
 
-<p>@@@ a few paragraphs about the context here.</p>
+<div class="todo">
+<ul>
+<li>Work on cross-linking of terms and consistent terminology throughout</li>
+<li>Add a diagram illustrating each use case</li>
+<li>Consider some additional use cases or extend existing ones:
+  <ul>
+    <li>Use cases like Mondeca SPARQL Endpoint Status, LODStats</li>
+    <li>Use cases like ADMS (extending for specific domains)</li>
+    <li>Use case for integration with VoID</li>
+    <li>Get input from RPI, Edsu, CKAN, … re use cases</li>
+  </ul>
+</li>
+</ul>
+</div>
+
+<p>Many national, regional and local governments, as well as other organizations
+inside and outside of the public sector, are operating
+<strong>data catalogs</strong> – web portals that provide access to
+machine-readable data published by these organizations. The need for a
+<strong>standard format</strong> for representing the metadata contained
+in these catalogs has been recognized. This document is a collection
+of use cases for such a standard.</p>
 </section>
 
 
@@ -36,19 +67,23 @@
 
 <p>A <dfn>catalog</dfn> is a collection of <a id="catalog record">catalog records</a>, and thus contains <a>metadata</a> for a collection of <a id="dataset">datasets</a>. It is operated by a <a>catalog operator</a>, which could be a government agency, citizen initiative, …</p>
 
-<p><dfn>Metadata</dfn> are …</p>
+<p>A <dfn>catalog operator</dfn> is an organization that collects
+<a>metadata</a> about <a title="dataset">datasets</a> and publishes
+them as a <a>catalog</a> on the Web.</p>
 
-<p>A format is <dfn>machine-readable</dfn> if …</p>
+<p><dfn>Metadata</dfn> are data that provide information about aspects of other data, such as its time and date of creation, its creator or author, its purpose,its format, and so on.</p>
 
-<p>A <dfn>catalog operator</dfn> is …</p>
+<p>A format is <dfn>machine-readable</dfn> if it is amenable to automated
+processing by a machine, as opposed to presentation to a human user.</p>
 </section>
 
 
 <section>
 <h2>Use cases</h2>
-<p>@@@ One introductory sentence here</p>
+<p>This section presents scenarios that would be enabled by the existence
+of a standard for the representation of data catalogs.</p>
 
-<section>
+<section id="uc1">
 <h3>Creating a combined catalog from multiple data catalogs (UC1)</h3>
 
 <p>An increasing number of government agencies make their data available on-line in the form of data catalogs such as <a href="http://data.gov/">data.gov</a> (see <a href="http://datacatalogs.org/">datacatalogs.org</a>for a list). Catalogs exist at national, regional and local level; some are operated by official government bodies and others by citizen initiatives; some have general coverage, while others have a specific focus (e.g., <a href="http://www.statcentral.ie/">statistical data</a>, <a href="http://www.ndad.nationalarchives.gov.uk/">historical datasets</a>).</p>
@@ -61,44 +96,44 @@
 
 <ol>
 <li>Not all catalogs make their records available in a <a>machine-readable</a> form, forcing the developers of federated catalogs to employ screen scraping.</li>
-<li>Where the catalog is available in machine-processable form, it is usually in a <em>custom one-off format</em>, requiring the development of custom importers for each catalog that is to be federated.
-<li>The developer of the federated catalog has to undertake the task of mapping and <em>harmonising the metadata fields</em> provided by different catalogs.</li>
+<li>Where the catalog is available in machine-processable form, it is usually in a <strong>custom one-off format</strong>, requiring the development of custom importers for each catalog that is to be federated.
+<li>The developer of the federated catalog has to undertake the task of mapping and <strong>harmonising the metadata fields</strong> provided by different catalogs.</li>
 </ol>
 
-<p>A standard format for data catalogs helps with all three problems: First, the existence of a well-documented standard creates an additional <em>incentive towards publishing machine-readable metadata</em> for the catalog operators. Second, a <em>single importer</em> can be used to import all catalogs that support the format. Third, <em>harmonising metadata fields becomes the job of individual catalog operators</em>, who know the contents of their own catalog best.</p>
+<p>A standard format for data catalogs helps with all three problems: First, the existence of a well-documented standard creates an additional <strong>incentive towards publishing machine-readable metadata</strong> for the catalog operators. Second, a <strong>single importer</strong> can be used to import all catalogs that support the format. Third, <strong>harmonising metadata fields becomes the job of individual catalog operators</strong>, who know the contents of their own catalog best.</p>
 </section>
 
 
-<section>
+<section id="uc2">
 <h3>Including metadata published directly on agency web sites into catalogs (UC2)</h3>
 
-<p>The model of most current data catalogs assumes that <em>agencies publish datasets on their own website</em>, and then <em>register the dataset with the central catalog</em> by providing the download location and other metadata to the catalog operator. This model is not always efficient. Individual agencies sometimes have existing dataset publishing workflows and metadata management capabilities (e.g., statistics offices). Also, the amount and nature of metadata that agencies can provide differs widely, and a central catalog with a single, non-extensible metadata schema cannot capture the requirements of a wide range of government institutions.</p>
+<p>The model of most current data catalogs assumes that <strong>agencies publish datasets on their own website</strong>, and then <strong>register the dataset with the central catalog</strong> by providing the download location and other metadata to the catalog operator. This model is not always efficient. Individual agencies sometimes have existing dataset publishing workflows and metadata management capabilities (e.g., statistics offices). Also, the amount and nature of metadata that agencies can provide differs widely, and a central catalog with a single, non-extensible metadata schema cannot capture the requirements of a wide range of government institutions.</p>
 
-<p>In a <em>distributed publishing model</em>, on the other hand, <em>agencies manage their own metadata</em> on their own websites, using their own publishing workflows and information systems. Central catalogs such as data.gov play the role of <em>aggregator</em> that collects dataset descriptions from different agency websites and presents them in a unified user interface. The central catalog must somehow be able to <em>discover newly published datasets</em> on an agency's web site, e.g., by crawling or by receiving an automated notification from the agency. There also has to be a way of <em>notifying about changes to the metadata</em>.</p>
+<p>In a <strong>distributed publishing model</strong>, on the other hand, <strong>agencies manage their own metadata</strong> on their own websites, using their own publishing workflows and information systems. Central catalogs such as data.gov play the role of <strong>aggregator</strong> that collects dataset descriptions from different agency websites and presents them in a unified user interface. The central catalog must somehow be able to <strong>discover newly published datasets</strong> on an agency's web site, e.g., by crawling or by receiving an automated notification from the agency. There also has to be a way of <strong>notifying about changes to the metadata</strong>.</p>
 
-<p>Note that individual agencies in this scenario may not want to run a full-blown “agency-level data catalog”, but may just want to make metadata available in a more structured form alongside the datasets that are already scattered throughout its web site. This distinguishes this use case from the catalog federation scenario (UC1), which assumes that the sites to be federated are dedicated data catalog websites.</p>
+<p>Note that individual agencies in this scenario may not want to run a full-blown “agency-level data catalog”, but may just want to make metadata available in a more structured form alongside the datasets that are already scattered throughout its web site. This distinguishes this use case from the catalog federation scenario (<a href="#uc1">UC1</a>), which assumes that the sites to be federated are dedicated data catalog websites.</p>
 </section>
 
 
-<section>
+<section id="uc3">
 <h3>Advanced queries against catalogs (UC3)</h3>
 
-<p>All catalogs websites provide some sort of <em>parametric search facility</em> (e.g., search by publishing agency, by data format, or by theme). Available search parameters differ among catalogs and they <em>are not sufficient for all users needs</em>. For example, data.gov provides search by department, format and category, but not by keyword, update date, or temporal/geographic coverage.</p>
+<p>All catalogs websites provide some sort of <strong>parametric search facility</strong> (e.g., search by publishing agency, by data format, or by theme). Available search parameters differ among catalogs and they <strong>are not sufficient for all users needs</strong>. For example, data.gov provides search by department, format and category, but not by keyword, update date, or temporal/geographic coverage.</p>
 
-<p>If catalogs are exposed in a <em>standard machine-readable format</em>, then third parties are able to <em>replicate the contents of a catalog into their own database</em>, and run advanced queries over the catalog, or provide interfaces for performing such queries to the general public.</p>
+<p>If catalogs are exposed in a <strong>standard machine-readable format</strong>, then third parties are able to <strong>replicate the contents of a catalog into their own database</strong>, and run advanced queries over the catalog, or provide interfaces for performing such queries to the general public.</p>
 
-<p>Queries may rely on information that is not present in the catalog but in <em>external sources</em>. For example, by using the <a href="http://www.oegov.us/democracy/us/core/owl/us1gov.n3">US Government Structure Ontology</a> one can query for datasets published by an agency that directly reports to the Executive Office of the President.</p>
+<p>Queries may rely on information that is not present in the catalog but in <strong>external sources</strong>. For example, by using the <a href="http://www.oegov.us/democracy/us/core/owl/us1gov.n3">US Government Structure Ontology</a> one can query for datasets published by an agency that directly reports to the Executive Office of the President.</p>
 </section>
 
 
-<section>
+<section id="uc4">
 <h3>Bulk download of datasets (UC4)</h3>
 
 <p>Data catalogs support the creation of innovative mashups of government data by making it easier for developers to find data sources of interest. Developers may browse or search the catalog until they have found a dataset of interest, and then download the linked file.</p>
 
-<p>However some mashups and applications <em>may access not just one but a very large number of datasets</em> from a catalog. For example, an application could make <em>all</em> geographic datasets (in ESRI shapefile, GML, KML formats) available for display on a map.</p>
+<p>However some mashups and applications <strong>may access not just one but a very large number of datasets</strong> from a catalog. For example, an application could make <em>all</em> geographic datasets (in ESRI shapefile, GML, KML formats) available for display on a map.</p>
 
-<p>The creation of such applications would become much easier if it was possible to <em>automate the downloading of all datasets</em> that meet certain criteria. Furthermore, the ability to <em>automatically discover new datasets</em> that meet those criteria, and to <em>discover updated datasets</em>, would be useful.</p>
+<p>The creation of such applications would become much easier if it was possible to <strong>automate the downloading of all datasets</strong> that meet certain criteria. Furthermore, the ability to <strong>automatically discover new datasets</strong> that meet those criteria, and to <strong>discover updated datasets</strong>, would be useful.</p>
 </section>
 
 </section>
@@ -107,14 +142,16 @@
 <section>
 <h2>Requirements</h2>
 
-<p>@@@ Intro sentence</p>
+<p>The use cases presented in the previous section give rise to the following
+requirements for a standard representation of data catalogs. Requirements
+are cross-linked with the use cases that motivate them.</p>
 
 <section>
 <h3>Machine-readable representations of catalog entries</h3>
 
 <p>Must allow retrieval of a machine-readable representation of catalog entries.</p>
 
-<p>Required by: UC1, UC2, UC3, UC4</p>
+<p><strong>Required by: <a href="#uc1">UC1</a>, <a href="#uc2">UC2</a>, <a href="#uc3">UC3</a>, <a href="#uc4">UC4</a></strong></p>
 </section>
 
 
@@ -123,7 +160,7 @@
 
 <p>Must allow retrieval of all entries in a catalog.</p>
 
-<p>Required by: UC1, UC3, UC4</p>
+<p><strong>Required by: <a href="#uc1">UC1</a>, <a href="#uc3">UC3</a>, <a href="#uc4">UC4</a></strong></p>
 </section>
 
 
@@ -132,7 +169,7 @@
 
 <p>Must provide stable, persistent identifiers for individual entries.</p>
 
-<p>Required by: UC1, UC2</p>
+<p><strong>Required by: <a href="#uc1">UC1</a>, <a href="#uc2">UC2</a></strong></p>
 </section>
 
 
@@ -141,7 +178,7 @@
 
 <p>Must allow checking wether an individual dataset has changed or was updated.</p>
 
-<p>Required by: UC2</p>
+<p><strong>Required by: <a href="#uc2">UC2</a></strong></p>
 </section>
 
 
@@ -150,7 +187,7 @@
 
 <p>Must allow the discovery of new entries in a catalog, and the discovery of entries that have been recently updated.</p>
 
-<p>Required by: UC1, UC4</p>
+<p><strong>Required by: <a href="#uc1">UC1</a>, <a href="#uc4">UC4</a></strong></p>
 </section>
 
 
@@ -159,7 +196,7 @@
 
 <p>Must include pointers/links to original catalog record when an entry is federated into another catalog.</p>
 
-<p>Required by: UC1, UC2</p>
+<p><strong>Required by: <a href="#uc1">UC1</a>, <a href="#uc2">UC2</a></strong></p>
 </section>
 
 
@@ -168,7 +205,7 @@
 
 <p>Must cover the metadata that is found in typical government data catalogs.</p>
 
-<p>Required by: all use cases</p>
+<p><strong>Required by: <a href="#uc1">UC1</a>, <a href="#uc2">UC2</a>, <a href="#uc3">UC3</a>, <a href="#uc4">UC4</a></strong></p>
 </section>
 
 
@@ -177,7 +214,7 @@
 
 <p>Must allow population from existing data catalogs without requiring the production of new metadata, or an expensive (that is, manual) modification of existing metadata. In other words, implementing the standard format for an existing data catalog must not require cleaning up or otherwise modifying the metadata that your catalog collects beyond simple mechanical transformations.</p>
 
-<p>Required by: all use cases</p>
+<p><strong>Required by: <a href="#uc1">UC1</a>, <a href="#uc2">UC2</a>, <a href="#uc3">UC3</a>, <a href="#uc4">UC4</a></strong></p>
 </section>
 
 
@@ -186,7 +223,7 @@
 
 <p>Must be extensible with additional, catalog-specific metadata fields.</p>
 
-<p>Required by: UC2</p>
+<p><strong>Required by: <a href="#uc2">UC2</a></strong></p>
 </section>
 
 
@@ -195,7 +232,7 @@
 
 <p>Must scale to catalogs that contain thousands of datasets without putting unreasonable strain on the bandwidth resources of catalog operator and catalog consumer.</p>
 
-<p>Required by: all use cases</p>
+<p><strong>Required by: <a href="#uc1">UC1</a>, <a href="#uc2">UC2</a>, <a href="#uc3">UC3</a>, <a href="#uc4">UC4</a></strong></p>
 </section>
 
 
@@ -204,7 +241,7 @@
 
 <p>Must allow to query the entries and catalog metadata using a standard mechanism (e.g., SPARQL, XQuery, OpenSearch, etc.).</p>
 
-<p>Required by: UC3</p>
+<p><strong>Required by: <a href="#uc3">UC3</a></strong></p>
 </section>
 
 
@@ -213,7 +250,15 @@
 
 <section class="appendix">
 <h2>Acknowledgments</h2>
-<p>The editors are very thankful for comments and suggestions ...</p>
+<p>The editors are very thankful for comments and contributions from
+Vassilios Peristeras, Martin Alvarez, Ed Summers, Christopher Gutteridge,
+and David Read.</p>
+
+<p>This document is the result of a collective effort of the
+W3C's <a href="http://www.w3.org/egov/wiki/">eGovernment Interest Group</a>
+and the <a href="http://www.w3.org/2011/gld/">Government Linked Data
+Working Group</a>.
+Many members of these groups have provided valuable input.</p>
 </section>