merge
authorDave Reynolds <dave@epimorphics.com>
Thu, 22 Mar 2012 15:05:31 +0000
changeset 132 8ccd1b1144c7
parent 131 b61d690925e5 (current diff)
parent 130 77ca46544091 (diff)
child 133 21be7f43d7d6
child 142 e976be5d510c
merge
--- a/bp/index.html	Thu Mar 22 15:04:30 2012 +0000
+++ b/bp/index.html	Thu Mar 22 15:05:31 2012 +0000
@@ -13,80 +13,65 @@
 </head>
 <body>
 
-<p class="todo">Make sure all contributors are listed as Authors</p>
+<p class="todo">Make sure all contributors are acknowledged.</p>
 
 <section id="abstract">
 <p>
-This document provides best practices and guidance to create high quality, re-usable Linked Open Data (LOD). 
+The goal of this document is to aid the development of high quality, re-usable Linked Open Data (LOD). It collects the most relevant engineering practices, promoting best practices for publishing and consuming authoritative data and warning against practices that are considered harmful.
 </p>
 </section>
 
 <section id="sotd">
-  <p>This document is work in progress. You might also want to check the accompanying <a href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary">Wiki page</a> of the GLD Working Group for ongoing discussions.</p>
+  <p>This document is work in progress. The GLD WG ongoing discussions are recorded on the <a href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary">Best Practices Wiki page</a></p>
 </section>
 
 
+<!--    INTRODUCTION    -->
+
 <section class="introductory">
-<h2>Scope</h2>
+<h2>Purpose of the Document</h2>
 
 <p>
-This document is aimed at assisting government IT managers, procurement officers, Web developers, vendors, and researchers who are interested in publishing open government data using W3C standards.  The benefits of using international standards for data exchange is to significantly increase interoperability of data.
+This document sets out a series of best practices designed to facilitate development and delivery of Linked Open Data. The recommendations are offered to creators, maintainers and operators of Web sites publishing government data in both human and machine readable data formats.
 </p>
-<p>
-Readers of this document are expected to be familiar with delivery of content via the Web, and to have a general familiarity with the technologies involved, but are not required to have a background in semantic technologies or previous experience with Linked Data. Data stewards, curators, database administrators and other personnel involved in Open Government initiatives are encouraged to read this Linked Open Data Best Practices document. 
-</section>
 
-<!--    INTRODUCTION    -->
-<section>
-<h2>Introduction</h2>
+<h2>Audience</h2>
+<p>
+Readers of this document are expected to be familiar with the creation of Web applications, and to have a general familiarity with the technologies involved, but are not expected to have a background in Linked Data technologies or previous experience with publishing data as Linked Open Data on the Web.</p>
 
-<section>
-<h3>Overview - Bernadette</h3>
 <p>
-Many governments have mandated publication of open government data to the public via the Web. The intention of these mandates are to facilitate the maintenance of open societies and support governmental accountability and transparency initiatives. However, publication of unstructured data on the World Wide Web is in itself insufficient; in order to realize the goals of efficiency, transparency and accountability, re-use of published data means members of the public must be able to absorb data in ways that can be readily found via search, visualized and absorbed programmatically.
+The document is not targeted solely at developers; others, such as government procurement officers, website administrators, and tool developers are encouraged to read it.</p>
+
+<h2>Scope</h2>
+
+<p>This document aims to ease the adoption of Linked Open Data by providing an intuitive explanation of what is involved in publishing open government data on the Web.  <a href="http://www.w3.org/DesignIssues/LinkedData.html" title="Linked Data - Design Issues">Linked Data</a> addresses many objectives of open government transparency initiatives through the use international Web standards for the publication, dissemination and reuse of structured data.
 </p>
 
 <p>
-This compilation of best practices for Linked Open Data produced by the W3C Government Linked Data Working Group is intended to help data curators and publishers better understand how to best use their time and resources to achieve the goals of Open Government. Linked Data principles address many of the data description and data format requirements for realizing the goals of Open Government. Linked Data uses a family of international standards and best practices for the publication, dissemination and reuse of structured data. Linked Data, unlike previous data formatting and publication approaches, provides a simple mechanism for combining data from multiple sources across the Web. 
+Linked Data uses a family of international standards and best practices for the publication, dissemination and reuse of structured data. Linked Data, unlike previous data formatting and publication approaches, provides a simple mechanism for combining data from multiple sources across the Web. 
 </p>
 
-<p>
-<a href="http://www.w3.org/DesignIssues/LinkedData.html" title="Linked Data - Design Issues">Linked Data</a> addresses key requirements of open government by providing a family of international standards for the publication, dissemination and reuse of structured data.
-</p>
-</section>
-
-<section>
-<h3>Scope - Bernadette</h3>
-
+<h2>Government Motivation for Publishing Linked Open Data</h2>
 <p>
-The approach in writing this document has been to collate and present the most relevant engineering practices prevalent in the Linked Data development community today and identify those that:
-<ul>
-<li> Facilitate the exploitation of Linked Data to enable better search, access and re-use of open government information;</li>
-<li> Are considered harmful and can have non-obvious detrimental effects on the overall quality of data publishing on the Web.</li>
-</ul>
-The goal of this best practices document is not endorse specific technologies, rather, this document focuses on key considerations and guidance to be successful. However, there are a number of cases where explicitly omitting a Best Practice that references an emerging technology on the grounds that it was too recent to have received wide adoption would have unnecessarily excluded a valuable recommendation. As such, some Best Practices have been included on the grounds that the Working Group believes that they will soon become fully qualified Best Practices (e.g. in prevalent use within the development community).
-</p>
-
-<p>
-Finally, in publishing Linked Open Data, it is not necessary to implement everything decribed herein. Instead, each Best Practice should be considered as a possible measure that may be implemented towards the goal of providing as rich and dynamic an experience as possible via a Web browser and Linked Data client. 
+Many governments have mandated publication of open government data to the public Web. The intention of these mandates are to facilitate the maintenance of open societies and support governmental accountability and transparency initiatives. However, publication of unstructured data on the World Wide Web is in itself insufficient; in order to realize the goals of efficiency, transparency and accountability, re-use of published data means members of the public must be able to absorb data in ways that can be readily found via search, visualized and absorbed programmatically.
 </p>
 
 </section>
 
+
 <section>
 <h3>Motivation</h3>
 The best practices provided here are provide a methodical approach for the creation, publication and dissemination of government Linked Data, including:
 <ul>
-	<li>Description of the full life cycle of a Government Linked Data project, starting with identification of suitable data sets, procurement, modeling, vocabulary selection, through publication and ongoing maintenance.</li>
-	<li>Definition of known, proven steps to create and maintain government data sets using Linked Data principles.</li>
-	<li>Guidance in explaining the value proposition for LOD to stakeholders, managers and executives.</li>
-	<li>Assist the Working Group in later stages of the Standards Process, in order to solicit feedback, use cases, etc.</li>
+	<li> Description of the life cycle of a Linked Data project, starting with identification of suitable data sets, modeling, vocabulary selection, through publication and ongoing maintenance.</li>
+	<li> Definition of proven steps to create and maintain government data sets using Linked Data principles.</li>
+	<li> Guidance in the procurement process for publishing Linked Open Data.
 </ul>
+
 <p class='issue'>Does it make sense to base the GLD life cycle on one of the general LD life cycles? See <a href="https://www.w3.org/2011/gld/track/issues/15">ISSUE-15</a></p>
 </section>
 
-<p class='issue'>Michael suggests to include the three available GLD life cycles we have</p>
-
+<section>
 <p>Currently we have identified the following Government Linked Data Life Cycles
 </p>
 <p class="todo"> Include a brief description for each one of them.
@@ -100,18 +85,13 @@
 <p>Villazon-terrazas</p>
 <img src="img/GLF_Villazon-terrazas.PNG" width="600" />
 
-<section>
-<h3>Brief History of Open Government Linked Data - Bernadette</h3>
-
-</section>
-
 </section>
 <!--    PROCUREMENT   -->
 <section>
-<h3>Procurement - Bernadette</h3>
-<p class='responsible'>George Thomas (Health & Human Services, US), Mike Pendleton (Environmental Protection Agency, US), John Sheridan (OPSI, UK)</p>
+<h3>Procurement</h3>
+<p class='responsible'>Mike Pendleton (Environmental Protection Agency, USA)</p>
 <p>
-Specific products and services involved in governments publishing linked data will be defined, suitable for use during government procurement. Just as the <a href="http://www.w3.org/WAI/intro/wcag" title="WCAG Overview">Web Content Accessibility Guidelines</a> allow governments to easily specify what they mean when they contract for an accessible Website, these definitions will simplify contracting for data sites and applications.
+Just as the <a href="http://www.w3.org/WAI/intro/wcag" title="WCAG Overview">Web Content Accessibility Guidelines</a> allow governments to easily specify what they mean when they contract for an accessible Website, these definitions will simplify contracting for data sites and applications.
 </p>
 
 <p>
@@ -120,27 +100,18 @@
 
 <h4>Overview</h4>
 <p>
-Recent Open Government initiatives call for more and better access to government data. To meet expanding consumer needs, many governments are now looking to go beyond traditional provisioning formats (e.g. CSV, XML), and are beginning to provision data using Linked Open Data (LOD) approaches.
-</p>
-
-<p>
-In contrast to provisioning data on the Web, LOD provisions data into the Web so it can be interlinked with other linked data, making it easier to discover, and more useful and reusable. LOD leverages World Wide Web standards such as Hypertext Transfer Protocol (HTTP), Resource Description Framework (RDF), and Uniform Resource Identifiers (URIs), which make data self-describing so that it is both human and machine readable. Self-describing data is important because most government data comes from relational data systems that do not fully describe the source data schema needed for application development by third parties.
-</p>
-
-<p>
-While LOD is a relatively new approach to data provisioning, growth has been exponential. LOD has been adopted by other national governments including the UK, Sweden, Germany, France, Spain, New Zealand and Australia.
+LOD provisions data into the Web so it can be interlinked with other linked data, making it easier to discover, and more useful and reusable. LOD leverages World Wide Web standards such as Hypertext Transfer Protocol (HTTP), Resource Description Framework (RDF), and Uniform Resource Identifiers (URIs), which make data self-describing so that it is both human and machine readable. Self-describing data is important because most government data comes from relational data systems that do not fully describe the source data schema needed for application development by third parties.
 </p>
 
 <h5>LOD Production through Consumption Lifecycle</h5>
 
 <p>
-The following categorizes activities associated with LOD development and maintenance, and identifies products and services and associated with these activities:
+The following categorizes general activities associated with LOD development and maintenance:
 </p>
 
 <ol type="1">
 	<li>LOD Preparation<li>
-	<p>Products :</p>
-	<p>Services : Services that support modeling relational or other data sources using URIs, developing scripts used to generate/create linked open data. Overlap exists between LOD preparation and publishing.</p>
+	<p>Services : Services that support modeling relational or other data sources using URIs, developing scripts used to generate/create linked open data.</p>
 	<li>LOD Publishing</li>
 	<p>Products: RDF database (a.k.a. triple store) enables hosting of linked data</p>
 	<p>Services: These are services that support creation, interlinking and deployment of linked data (see also linked data preparation). Hosting data via a triple store is a key aspect of publishing. LD publishing may include implementing a PURL strategy. During preparation for publishing linked data, data and publishing infrastructure may be tested and debugged to ensure it adheres to linked data principles and best practices. (Source: Linked Data: Evolving the Web into a Global Data Space, Heath and Bizer, Morgan and Claypool, 2011, Section 5.4, p. 53)</p>
@@ -210,36 +181,12 @@
 
 <p>As such, opportunities may exist to streamline the development of a security plan, or conversely, to identify potential project security vulnerabilities and risks, through early engagement with hosting providers, software vendors and others who may be responsible for those common, inherited controls. If the inherited controls meet the recommendations, they can then be assembled following the requisite templates, and the system security plan can be completed through addition of any applicable controls specific or unique to the linked data application's configuration, implementation, processes or other elements described in the security control and security plan guidance.</p>
 
-<h4>Glossary</h4>
-<ul>
-<li>
-Linked Open Data: A pattern for hyper-linking machine-readable data sets to each other using Semantic Web techniques, especially via the use of RDF and URIs. Enables distributed SPAQL queries of the data sets and a “browsing” or “discovery” approach to finding information (as compared to a search strategy. (Source: Linking Enterprise Data, David Wood, Springer, 2010, p. 286)
-</li>
-<li>
-Linked Open Data Cloud: Linked Open Data that has been published is depicted in a LOD cloud diagram. The diagram shows connections between linked data sets and color codes them based on data type (e.g., government, media, life sciences, etc.). The diagram can be viewed at: http://richard.cyganiak.de/2007/10/lod/
-</li>
-<li>
-RDF (Resource Description Framework): A language for representing information about resources in the World Wide Web. RDF is based on the idea of identifying things using Web identifiers (called Uniform Resource Identifiers, or URIs), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values. (http://www.w3.org/TR/rdf-primer/)
-</li>
-<li>
-Semantic Technologies: The broad set of technologies that related to the extraction, representation, storage, retrieval and analysis of machine-readable information. The Semantic Web standards are a subset of semantic technologies and techniques. (Source: Linking Enterprise Data, David Wood, Springer, 2010, p. 286) Semantic Web: An evolution or part of the World Wide Web that consists of machine-readable data in RDF and an ability to query that information in standard ways (e.g. via SPARQL)
-</li>
-<li>
-Semantic Web Standards: Standards of the World Wide Web Consortium (W3C) relating to the Semantic Web, including RDF, RDFa, SKOS and OWL. (Source: Linking Enterprise Data, David Wood, Springer, 2010, p. 287)
-</li>
-<li>
-SPARQL: Simple Protocol and RDF Query Language (SPARQL) defines a standard query language and data access protocol for use with the Resource Description Framework (RDF) data model. (http://msdn.microsoft.com/en-us/library/aa303673.aspx) Just as SQL is used to query relational data, SPARQL is used to query graph, or linked, data.
-</li>
-<li>
-Uniform Resource Identifiers (URIs): URI’s play a key role in enabling linked data. To publish data on the Web, the items in a domain of interest must first be identified. These are the things whose properties and relationships will be described in the data, and may include Web documents as well as real-world entities and abstract concepts. As Linked Data builds directly on Web architecture [67], the Web architecture term resource is used to refer to these things of interest, which are, in turn, identified by HTTP URIs. Wide Web Consortium’s Government Linked Data (W3C/GLD) workgroup: http://www.w3.org/2011/gld/charter
-</li>
-</ul>
 </section>
 
 
 <!--    << VOCABULARY SELECTION   -->
 <section>
-<h3>Vocabulary Selection -  	Boris</h3>
+<h3>Vocabulary Selection</h3>
 <p class='responsible'>Michael Hausenblas (DERI), Ghislain Atemezing (INSTITUT TELECOM), Boris Villazon-Terrazas (UPM),  Daniel Vila-Suero (UPM), George Thomas (Health & Human Services, US), John Erickson (RPI), Biplav Srivastava (IBM)</p>
 <p>
 Modeling is an important phase in any Government Linked Data life cycle. Within this phase Governments need to build a vocabulary that models the data sources they want to publish as Linked Data. The most important recommendation in this context is to reuse as much as possible available vocabularies. This reuse-based approach speeds up the vocabulary development, and therefore, governments will save time, effort and resources. However, the reuse-based approach leads to two main questions (1) where/how do I find/discover available vocabularies, and (2) how do I select a vocabulary that best fits my needs?. Moreover, we have to consider that there may be cases in which Governments will need to mint their own vocabulary terms, these cases lead to another question (3) how to mint my own vocabulary terms?. In this section we provide answers to those questions, by means of checklists for each question.
@@ -370,9 +317,12 @@
 <p class="highlight"><b>Vocabulary should be published following available best practices</b><br/>
 	<i>What it means:</i> One of the goals is to contribute to the community by sharing the new vocabulary. To this end, it is recommended to follow available recipes for publishing RDF vocabularies, e.g., <a href="http://www.w3.org/TR/swbp-vocab-pub/" target="_blank">Best Practice Recipes for Publishing RDF Vocabularies</a>.	
 </p>
-</section> <!-- Vocabulary management/creation >> -->
+</section> <!-- Vocabulary management/creation -->
 
 <section> <!-- << Multilingualism in vocabs -->
+
+<!-- TODO add references to Felix Sasaka's work on multilingual Web and new W3C WG -->
+
 	<h4>Multilingualism in vocabs</h4>
 <p>
 This section provides some considerations when we are dealing with multilingualism in vocabularies. We have identified that multilingualism in vocabularies can be found nowadays in the following formats:
@@ -510,7 +460,8 @@
 
 <section> 
 <h4>URI Persistence</h4>
-<p>@@[email protected]@ Expand this section (Bernadette)</p>
+<p class='responsible'>Bernadette Hyland (3 Round Stones), John Erickson (RPI)</p>
+
 <p><i>Advice, info related to persistent URIs</i></p>
 <p>As is the case with many human interactions, confidence in interactions via the Web depends on stability and predictability. For an information resource, persistence depends on the consistency of representations. The representation provider decides when representations are sufficiently consistent (although that determination generally takes user expectations into account).</p>
 <p>
@@ -559,7 +510,7 @@
 
 <!--    VERSIONING   -->
 <section>
-<h3>Versioning - Boris</h3>
+<h3>Versioning</h3>
 <p class='responsible'>John Erickson (RPI), Ghislain Atemezing (INSTITUT TELECOM), Hadley Beeman (LinkedGov)</p>
 <p>
 This section specifies how to publish data which has multiple versions, including variations such as:
@@ -606,7 +557,7 @@
 
 <!--  << STABILITY   -->
 <section>
-<h3>Stability - Boris</h3>
+<h3>Stability</h3>
 <p class='responsible'>Anne Washington (GMU), Ron Reck</p>
 
 <section> <!-- << STABILITY.overview -->
@@ -779,7 +730,7 @@
 
 <!--    COOKBOOK   -->
 <section>
-<h3>Cookbook - Bernadette</h3>
+<h3>Linked Open Data Cookbook</h3>
 <p class='responsible'>Bernadette Hyland (3 Round Stones)</p>
 <p>
 See <a href="http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook">Cookbook for Open Government Linked Data</a>.
@@ -834,52 +785,9 @@
 </ul>
 <p>Possible organization of use cases (Adapted from <a href="http://bit.ly/wlKOEF" target="_blank">Trust and Linked Data</a>):</p>
 
-<ul>
-	<li>Simple "Oh Yeah?" scenario</li>
-	<ul>
-		<li>User retrieves a dataset, then clicks on “oh yeah” button, then site returns a provenance record</li>
-	</ul>
-</ul>
-
-<ul>
-	<li>Licensing scenario</li>
-	<ul>
-		<li>User retrieves dataset, then wants to check permission to use</li>
-	</ul>
-</ul>
-
-<ul>
-	<li>Referral scenario</li>
-	<ul>
-		<li>Site refers queries about provenance in terms of pointers to another site’s provenance facilities</li>
-	</ul>
-</ul>
-
-<ul>
-	<li>Repeated queries scenario</li>
-	<ul>
-		<li>Service repeatedly queries a site, wants provenance for all the answers</li>
-		<li>This is similar to PROV WG example, where user follows provenance record, asking follow-up questions based on previous answers</li>
-	</ul>
-</ul>
-
-<ul>
-	<li>Versioning scenario</li>
-	<ul>
-		<li>User retrieves a dataset, then wants to see its provenance, but the dataset has been updated in the original site (its provenance as well)</li>
-	</ul>
-</ul>
-
-<ul>
-	<li>Dynamic scenario</li>
-	<ul>
-		<li>User retrieves a resource that is dynamically created</li>
-	</ul>
-</ul>
 </section>
 
 
-
 </section> <!-- Pragmatic Provenance >> -->
 
 
--- a/data-cube-ucr/index.html	Thu Mar 22 15:04:30 2012 +0000
+++ b/data-cube-ucr/index.html	Thu Mar 22 15:05:31 2012 +0000
@@ -168,11 +168,6 @@
 		machines can appropriately visualize such quantities or have
 		conversions between different quantities.</p>
 
-	<p>Quantity comprises necessary information to interpret the value,
-		e.g., the unit and arithmetical and comparative operations; humans and
-		machines can appropriately visualize such quantities or have
-		conversions between different quantities.</p>
-
 	<p>A Measurement separates a quantity from the actual event at
 		which it was collected; a measurement assigns a quantity to a specific
 		phenomenon type (e.g., strength). Also, a measurement can record
@@ -340,6 +335,8 @@
 		the OGC "Observations and Measurements" (O&M) logical data model, also
 		published as ISO 19156. The QB spec should maybe also prefer the term
 		"multidimensional model" instead of the less clear "cube model" term.
+
+
 	
 	<p class="editorsnote">@@TODO: Are there any statements about
 		compatibility and interoperability between O&M and Data Cube that can
@@ -387,8 +384,14 @@
 		this use case QB should recommend specific approaches to transforming
 		and deriving of datasets which can be tracked and stored with the
 		statistical data.</p>
-	<p class="editorsnote">@@TODO: Add concrete example use case
-		scenario.</p>
+
+	<p>A simple specific use case is that the Welsh Assembly government
+		publishes a variety of population datasets broken down in different
+		ways. For many uses then population broken down by some category (e.g.
+		ethnicity) is expressed as a percentage. Separate datasets give the
+		actual counts per category and aggregate counts. In such cases it is
+		common to talk about the denominator (often DENOM) which is the
+		aggregate count against which the percentages can be interpreted.</p>
 	<p>Challenges of this use case are:</p>
 	<ul>
 		<li>Operations on statistical data result in new statistical
@@ -396,6 +399,12 @@
 			operations such as slice, dice, roll-up, drill-down will result in
 			new Data Cubes. This may require representing general relationships
 			between cubes (as discussed here: [12]).</li>
+		<li>Should Data Cube support explicit declaration of such
+			relationships either between separated qb:DataSets or between
+			measures with a single qb:DataSet (e.g. ex:populationCount and
+			ex:populationPercent)?</li>
+		<li>If so should that be scoped to simple, common relationships
+			like DENOM or allow expression of arbitrary mathematical relations?</li>
 	</ul>
 	<p>Unanticipated Uses (optional): -</p>
 	<p>Existing Work (optional): Possible relation to Best Practices
@@ -674,6 +683,8 @@
   ex:population "2" .
   	
 	
+	
+	
 	</pre>
 	<p>What is the best way (in the context of the RDF/Data Cube/SDMX
 		approach) to express that the values for the England/Scotland/Wales/
@@ -721,19 +732,6 @@
 	<p>In some situations statistical data sets are used to derive
 		further datasets. Should Data Cube be able to explicitly convey these
 		relationships?</p>
-	<p>A simple specific use case is that the Welsh Assembly government
-		publishes a variety of population datasets broken down in different
-		ways. For many uses then population broken down by some category (e.g.
-		ethnicity) is expressed as a percentage. Separate datasets give the
-		actual counts per category and aggregate counts. In such cases it is
-		common to talk about the denominator (often DENOM) which is the
-		aggregate count against which the percentages can be interpreted.</p>
-	<p>Should Data Cube support explicit declaration of such
-		relationships either between separated qb:DataSets or between measures
-		with a single qb:DataSet (e.g. ex:populationCount and
-		ex:populationPercent)?</p>
-	<p>If so should that be scoped to simple, common relationships like
-		DENOM or allow expression of arbitrary mathematical relations?</p>
 	<p>Note that there has been some work towards this within the SDMX
 		community as indicated here:
 		http://groups.google.com/group/publishing-statistical-data/msg/b3fd023d8c33561d</p>