Major update to best practices doc to streamline & improve formating
authorbhyland
Sun, 29 Apr 2012 23:08:07 -0400
changeset 188 1b94f22388f2
parent 187 39cf3b3ce427
child 189 017e6f91f784
Major update to best practices doc to streamline & improve formating
bp/index.html
bp/local-style.css
bp/respec-config.js
--- a/bp/index.html	Thu Apr 26 21:08:14 2012 +0100
+++ b/bp/index.html	Sun Apr 29 23:08:07 2012 -0400
@@ -17,74 +17,89 @@
 
 <section id="abstract">
 <p>
-The goal of this document is to aid the development of high quality, re-usable Linked Open Data (LOD). It collects the most relevant engineering practices, promoting best practices for publishing and consuming authoritative data and warning against practices that are considered harmful.
+This document provides best practices for creating, publishing and announcing government content as Linked Data. Guidance on the life cycle of a Linked Data project, beginning with identification of suitable data sets, vocabulary selection, URI naming conventions through publication of data sets is included.  The goal of this document is to aid the publication of high quality Linked Open Data (LOD) from government authorities. This document collects the most relevant data management practices, promoting best practices for publishing Linked Open Data and warning against practices that are considered harmful.
 </p>
 </section>
 
 <section id="sotd">
-  <p>This document is work in progress. The GLD WG ongoing discussions are recorded on the <a href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary">Best Practices Wiki page</a></p>
+  <p>This document is work in progress.</a></p>
 </section>
 
-<!-- Structure of Best 
+<!-- Structure of Best Practices -->
+<section class="organization">
+<h2>How the Best Practices are Organized</h2>
+
+The document is organized as follows:
+<ol>
+<li>Introduction. Describes the purpose, audience, scope and background of this document.</li>
+<li>Requirements. An illustration of the type of problems that the Best Practices are intended to ameliorate.</li>
+<li>Delivery Context. Discusses the environment within which Linked Open Data published on the Web is realized.</li>
+<li>Overview of Best Practices. A discussion of the organization of the Best Practices, and sources from which they were derived.</li>
+<li>Best Practices. The statements themselves.</li>
+<li>Appendices</li>
+<li>Sources</li>
+<li>Related Reading</li>
+<li>Acknowledgements</li>
+</ol>
+
+<!--    INTRODUCTION    -->
+
+<section class="introductory">
+<h2>Purpose of the Document</h2>
+<p>
+This document sets out a series of best practices designed to facilitate development and delivery of Linked Open Data in both human and machine readable data formats. The recommendations are offered to creators, maintainers and operators of Web sites publishing government data.
+</p>
+
+<h2>Audience</h2>
+<p>
+Readers of this document are expected to be familiar with the creation of Web applications, and to have a general familiarity with the technologies involved.  The document is targeted at developers, government procurement officers, website administrators, and tool developers.
+</p>
+
+<h2>Scope</h2>
+<p>
+This document aims to facilitate the adoption of a Linked Open Data approach to publishing open government data on the Web.  
+
+<p>
+Linked Data uses a family of international standards and best practices for the publication, dissemination and reuse of structured data. Linked Data, unlike previous data formatting and publication approaches, provides a simple mechanism for combining data from multiple sources across the Web. <a href="http://www.w3.org/DesignIssues/LinkedData.html" title="Linked Data - Design Issues">Linked Data</a> addresses many objectives of open government transparency initiatives through the use international Web standards for the publication, dissemination and reuse of structured data.
+</p>
+
+<h2>Background</h2>
+<p>
+In recent years, governments worldwide have mandated publication of open government content to the public Web for the purpose of facilitating open societies and to support governmental accountability and transparency initiatives. However, publication of unstructured data in HTML alone is insufficient.  In order to realize the goals of open government initiatives, the re-use of government data requires that members of the public can find, visualize and programmatically absorb the data.  Publishing government content as Linked Data is the means to achieve these worthy objectives.
+</p>
+
+</section>
 
 <!-- List of Best Practices -->
 
 The following best practices are discussed in this document and listed here for convenience.
 
-Select data sets that other people may wish to re-use.
-
-Re-use vocabularies whenever possible
-
-Identify relevant words that describe the main ideas or concepts
-
-Search for vocabularies using semantic search sites and dataset catalogues
-
-Ensure new vocabularies you create pass the "Vocabulary Creation Criteria Checklist"
-* Vocabulary is self-descriptive
-* Vocabulary is described in more than one language, ideally
-* Vocabulary will be accessible for a long period (has longevity)
-
-Ensure vocabularies you choose pass the "Vocabulary Selection Criteria Checklist"
-* Ensure vocabularies you use are published by a trusted group or organization
-* Ensure vocabularies have permanent URI
-* Confirm the versioning policy 
-
-
-
-
-<!--    INTRODUCTION    -->
-
-<section class="introductory">
-<h2>Purpose of the Document</h2>
-
-<p>
-This document sets out a series of best practices designed to facilitate development and delivery of Linked Open Data. The recommendations are offered to creators, maintainers and operators of Web sites publishing government data in both human and machine readable data formats.
+<p class='stmt'><a href="#IDENTIFY">IDENTIFY</a> The first step is indentifying data sets that other people may wish to re-use.
 </p>
 
-<h2>Audience</h2>
-<p>
-Readers of this document are expected to be familiar with the creation of Web applications, and to have a general familiarity with the technologies involved, but are not expected to have a background in Linked Data technologies or previous experience with publishing data as Linked Open Data on the Web.</p>
-
-<p>
-The document is not targeted solely at developers; others, such as government procurement officers, website administrators, and tool developers are encouraged to read it.</p>
-
-<h2>Scope</h2>
-
-<p>This document aims to ease the adoption of Linked Open Data by providing an intuitive explanation of what is involved in publishing open government data on the Web.  <a href="http://www.w3.org/DesignIssues/LinkedData.html" title="Linked Data - Design Issues">Linked Data</a> addresses many objectives of open government transparency initiatives through the use international Web standards for the publication, dissemination and reuse of structured data.
+<p class='stmt'><a href="#MODEL">MODEL</a> Sketch the main objects the data describes.  Use lines to describe how they are related to each other.  Denormalize the data as necessary.  Put aside immediate needs of any given application and model the data.
 </p>
 
-<p>
-Linked Data uses a family of international standards and best practices for the publication, dissemination and reuse of structured data. Linked Data, unlike previous data formatting and publication approaches, provides a simple mechanism for combining data from multiple sources across the Web. 
+<p class='stmt'><a href="#NAME">NAME</a> Use HTTP URIs as names for your objects. Give careful consideration to the URI naming strategy. Consider how the data will change over time and name as necessary.
 </p>
 
-<h2>Government Motivation to Publish Linked Open Data</h2>
-<p>
-Governments worldwide have mandated publication of open government data to the public Web. The intention of these mandates is to facilitate the maintenance of open societies and support governmental accountability and transparency initiatives. However, publication of unstructured data on the World Wide Web is in itself insufficient; in order to realize the goals of efficiency, transparency and accountability, re-use of published data means members of the public must be able to absorb data in ways that can be readily found via search, visualized and absorbed programmatically.
+<p class='stmt'><a href="#STANDARD_VOCABULARIES">STANDARD_VOCABULARIES</a> Describe objects with standard vocabularies whenever possible.
 </p>
 
-The best practices provided provide a methodical approach for the creation, publication and dissemination of government Linked Data. Guidance on the life cycle of a Linked Data project, beginning with identification of suitable data sets, modeling, vocabulary selection, through publication and ongoing maintenance are provided.  Since publishing data to the Web following Linked Data principles is considering an emerging publication mode for governments, guidance for procurement of necessary services is provided to help inform government stakeholders.
-</section>
+<p class='stmt'><a href="#DESCRIPTIONS">DESCRIPTIONS</a> Publish human and machine readable descriptions with your Linked Data.
+</p>
 
+<p class='stmt'><a href="#CONVERT">CONVERT</a> Convert the source data into a Linked Data representation, also called an RDF serialization including Turtle, Notation-3 (N3), N-Triples, XHTML with embedded RDFa, and RDF/XML.
+</p>
+
+<p class='stmt'><a href="#SPECIFY_LICENSE">SPECIFY_LICENSE</a> Specify an appropriate license.
+</p>
+
+<p class='stmt'><a href="#ANNOUNCE">ANNOUNCE</a> Host Linked Open Data on the public Web and announce it.  
+</p>
+
+<p class='stmt'><a href="#SOCIAL_CONTRACT">SOCIAL_CONTRACT</a> Maintenance is critical. If you move or remove data that is published to the Web, you may break third party applications or mashups.
+</p>
 
 
 <h3> Linked Open Data Lifecycle </h3>
@@ -108,42 +123,23 @@
 <!--    PROCUREMENT   -->
 <section>
 <h3>Procurement</h3>
-<p class='responsible'>Mike Pendleton (Environmental Protection Agency, USA)</p>
-<p>
-Just as the <a href="http://www.w3.org/WAI/intro/wcag" title="WCAG Overview">Web Content Accessibility Guidelines</a> allow governments to easily specify what they mean when they contract for an accessible Website, these definitions will simplify contracting for data sites and applications.
-</p>
 
 <p>
-Linked Open Data (LOD) offers novel approaches for publishing and consuming data on the Web. This procurement overview and companion glossary is intended to help contract officers and their technical representatives understand LOD activities, and their associated products and services. This guidance is intended to aid government officials in the procurement process.
-</p>
+This procurement overview and companion glossary is intended to assist contract officers understand the requirements associated with publishing open government content as Linked Data.</p>
 
 <h4>Overview</h4>
-<p>
-LOD provisions data into the Web so it can be interlinked with other linked data, making it easier to discover, and more useful and reusable. Linked Open Data is based on W3C including Hypertext Transfer Protocol (HTTP), Uniform Resource Identifiers (URIs), and Resource Description Framework (RDF) family of standards.  
-</p>
-
-</p>
-Publish structured content as <em>self-describing</em> Linked Data.
-
-Currently, the majority of structured data collected and curated by governments worldwide resides in relational data systems.  A logical schema for this structured content is typically maintained external to the data itself.  
-
-Understanding the schema, among other characteristics of the data, is necessary for data re-use.  Regardless of whether data is being re-used internally or externally to the original data owner, the <em>schema</em> or structure and organization of the data must be understood by those analyzing the data itself.
-
-Data published in an RDF format <em>combines the schema with the data</em> itself, which is what makes it self-describing.
-</p>
-
-<h5>LOD Production through Consumption Lifecycle</h5>
 
 <p>
-The following categorizes general activities associated with LOD development and maintenance:
+The majority of structured data collected and curated by governments worldwide resides in relational data systems. The general activities associated with Linked Open Data development and maintenance include:
 </p>
 
 <ol type="1">
 	<li>LOD Preparation<li>
-	<p>Services : Services that support modeling relational or other data sources using URIs, developing scripts used to generate/create linked open data.</p>
+	<p>Services : Services that support modeling relational or other data sources using HTTP URIs, developing scripts used to generate/create Linked Open Data.</p>
 	<li>LOD Publishing</li>
-	<p>Products: RDF database (a.k.a. triple store) enables hosting of linked data</p>
-	<p>Services: These are services that support creation, interlinking and deployment of linked data (see also linked data preparation). Hosting data via a triple store is a key aspect of publishing. LD publishing may include implementing a PURL strategy. During preparation for publishing linked data, data and publishing infrastructure may be tested and debugged to ensure it adheres to linked data principles and best practices. (Source: Linked Data: Evolving the Web into a Global Data Space, Heath and Bizer, Morgan and Claypool, 2011, Section 5.4, p. 53)</p>
+	<p>Products: General categories include databases, visualization, application development platforms</p>
+	<p>Services: These are services that support creation, interlinking and deployment of linked data (see also linked data preparation). Hosting data via graph database is an important component of Linked Open Data publishing. During preparation for publishing linked data, data and publishing infrastructure may be tested and debugged to ensure it adheres to linked data principles and best practices. (Source: Linked Data: Evolving the Web into a Global Data Space, Heath and Bizer, Morgan and Claypool, 2011, Section 5.4, p. 53)</p>
+	</p>
 	<li>LOD Discovery and Consumption</li>
 	<p>Products: Linked Data Browsers allow users to navigate between data sources by following RDF links; Linked Data Search Engines crawl linked data by following RDF links, and provide query capabilities over aggregated data.</p>
 	<p>Services: These are services that support describing, finding and using linked data on the Web. Publication of linked data contributes to a global data space often referred to as the Linked Open Data Cloud or ‘Web of Data.’ These are services that support the development of applications that use (i.e. consume) this ‘Web of Data.’</p>
@@ -157,37 +153,32 @@
 
 <h4>Procurement Checklist</h4>
 
-<p class="todo">Check if latest Wiki <a href="http://www.w3.org/2011/gld/wiki/Best_Practices_Discussion_Summary#Best_Practices_for_Procurement">content</a> is here</p>
-<p>
-Credit: This section of Procurement Best Practices was taken from the <a href="http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook">Linked Data Cookbook</a>.
-
 The following is an outline of questions a department/agency should consider reviewing as part of their decision to choose a service provider:
 <ul>
-<li>Is the infrastructure accessible and usable from developers’ environment?</li>
+<li>Is there a government approved contract vehicle to obtain this service or product?</li>
 
-<li>Is the documentation aimed at developers comprehensive and usable?</li>
+<li>What is the vendor’s past performance with government agencies or authorities?</li>
+
+<li>Does the vendor have reference sites? Are they similar to what you are considering in production?</li>
+
+<li>Does the vendor provide training for the products or services?</li>
+
+<li>Is the documentation comprehensive and usable?</li>
 
 <li>Is the software supported and under active development?</li>
 
 <li>Is there an interface to load data and “follow your nose” through a Web interface?</li>
 
-<li>Can the data be queried programmatically via a SPARQL endpoint?</li>
+<li>Is the government data accessible for developers once it is published?</li>
 
-<li>Does the vendor have reference sites? Are they similar to what you are considering in production?</li>
-
-<li>What is the vendor’s past performance with government agencies or authorities?</li>
-
-<li>Does the vendor provide training for the products or services?</li>
+<li>Can the data be queried programmatically via a SPARQL endpoint?</li>
 
 <li>What is the vendor’s Service Level Agreement?</li>
 
-<li>Is there a government approved contract vehicle to obtain this service or product?</li>
-
 <li>Is the vendor or provider an active contributor to Open Source Software, Standards groups, activities associated with data.gov and Linked Open Data projects at the enterprise and/or government level.</li>
 
-<li>Does the department/agency have a published Open Source Policy?</li>
+<li>Does the vendor adhere to the agency's published Open Source Policy, if one exists?</li>
 
-<li>If so, does the vendor or provider comply with the department/agency’s published Open Source Policy?</li>
 </ul>
 </p>
 
@@ -216,24 +207,23 @@
 <!--    << VOCABULARY SELECTION   -->
 <section>
 <h3>Vocabulary Selection</h3>
-<p class='responsible'>Michael Hausenblas (DERI), Ghislain Atemezing (INSTITUT TELECOM), Boris Villazon-Terrazas (UPM),  Daniel Vila-Suero (UPM), George Thomas (Health & Human Services, US), John Erickson (RPI), Biplav Srivastava (IBM)</p>
+<p class='responsible'>Michael Hausenblas (DERI), Ghislain Atemezing (INSTITUT TELECOM), Boris Villazon-Terrazas (UPM),  Daniel Vila-Suero (UPM)</p>
 <p>
-Modeling is an important phase in any Government Linked Data life cycle. Within this phase Governments need to build a vocabulary that models the data sources they want to publish as Linked Data. The most important recommendation in this context is to reuse as much as possible available vocabularies. This reuse-based approach speeds up the vocabulary development, and therefore, governments will save time, effort and resources. However, the reuse-based approach leads to two main questions (1) where/how do I find/discover available vocabularies, and (2) how do I select a vocabulary that best fits my needs?. Moreover, we have to consider that there may be cases in which Governments will need to mint their own vocabulary terms, these cases lead to another question (3) how to mint my own vocabulary terms?. In this section we provide answers to those questions, by means of checklists for each question.
-</p>
+Reuse standard, vetted vocabularies to encourage others to use your data. Guidance on finding standard, vetted vocabularies is described in the Vocabulary Discovery Checklist below.
 
 <section> <!-- Discovery checklist -->
+
 	<h4>Discovery checklist</h4>
-	<p>As we already stated, following the reuse-based approach, governments have to look for available vocabularies to reuse, instead of building new vocabularies from scratch. This checklist provides some considerations when trying to find out existing vocabularies that could best fit the needs of a Government or a specialized agency.
+	<p>This checklist provides some considerations when trying to find out existing vocabularies that could best fit the needs of a government authority.
 	</p>
 	
-<p class="highlight"><b>Define the scope of the domain</b><br/>
-<i>What it means:</i> Developing a common understanding as to what is included in, or excluded from, in the domain. By defining the scope of the domain, it restricts and helps to quickly find out related works in Linked Open Data initiatives. Hence, it could help in reusing some existing vocabularies of the same domain. Most of the time, the dataset gives you some hints about the domain. <br/><br/>
-Examples of domain: Geography, Environment, Administrations, State Services, Statistics, People, Organisation, etc.	
-	</p>
+<p class="highlight"><b>Specify the domain</b><br/>
+<i>What it means:</i> 
+Examples of domain: Geography, Environment, Administrations, State Services, Statistics, People, Organisation, etc.	</p>
 
 <p class="highlight"><b>Identify relevant keywords in the dataset</b><br/>
-	<i>What it means:</i> Identifying words that describe the main ideas or concepts. By identifying the relevant keywords or categories of your dataset, it helps for the searching process using Semantic Web Search Engine. If you have raw data in csv, the columns of the tables can be used for the searching process. <br/><br/>
-	Examples: commune, county, point, feature, address, etc.	
+	<i>What it means:</i> Identify words that describe the main ideas or concepts. By identifying the relevant keywords or categories of your dataset, it helps for the searching process using a Semantic Web Search Engine. If you have raw data in csv, the columns of the tables can be used for the searching process. <br/><br/>
+	Examples: commune, county, feature	
 </p>
 
 <p class="highlight"><b>Searching for a vocabulary in one specific language</b><br/>
@@ -254,10 +244,18 @@
 </section> <!-- Discovery checklist >> -->
 
 <section> <!-- << Vocabulary Selection Criteria checklist -->
+
 <h4>Vocabulary Selection Criteria checklist</h4>
 <p>This checklist aims at giving some advices to better assess and select the vocabulary that best fits your needs, according to the output of the vocabularies discovered in the Discovery section. The final result should be one or two vocabularies that could be reused for your own purpose (mappings, extension, etc.)
 </p>
 
+<p>
+Ensure vocabularies you choose pass the "Vocabulary Selection Criteria Checklist"
+<li>Ensure vocabularies you use are published by a trusted group or organization</li>
+<li>Ensure vocabularies have permanent URI</li>
+<li>Confirm the versioning policy </li>
+</p>
+
 <p class="highlight"><b>Vocabularies should be self-descriptive</b><br/>
 	<i>What it means:</i> Each property or term in a vocabulary should have a Label, Definition and Comment defined.
 	Self-describing data suggests that information about the encodings used for each representation is provided explicitly within the representation. The ability for Linked Data to describe itself, to place itself in context, contributes to the usefulness of the underlying data.<br/><br/>
@@ -300,17 +298,24 @@
 	Major changes of the vocabularies should be reflected on the documentation, in both machine or human-readable formats. This is strongly related to the best practices described in the Versioning section.	
 </p>
 
-<p class="highlight"><b>Vocabularies should provide documentations</b><br/>
-	<i>What it means:</i> A vocabulary should be well-documented for machine readable (use of labels and comments; tags to language used).
-	Also for human-readable, an extra documentation should be provided by the publisher to better understand the classes and properties, and if possible with some valuable use cases.	
+<p class="highlight"><b>Vocabularies must be documented</b><br/>
+	<i>What it means:</i> A vocabulary must be documented. This includes the liberal use of labels and comments; tags to language used.  Human-readable pages must be provided by the publisher describe the classes and properties, preferably with use cases defined.	
 </p>
 </section> <!--  Vocabulary Selection Criteria checklist >> -->
 
+
 <section> <!-- << Vocabulary management/creation -->
 <h4>Vocabulary management/creation</h4>
 <p>As we already mentioned, we have to take into account that there may be cases in which Governments will need to mint their own vocabulary terms. This section provides a set of considerations aimed at helping to government stakeholders to mint their own vocabulary terms. This section includes some items of the previous section because some recommendations for vocabulary selection also apply to vocabulary creation.
 </p>
 
+<p>
+Ensure new vocabularies you create pass the "Vocabulary Creation Criteria Checklist"
+<li>Vocabulary is self-descriptive </li>
+<li>Vocabulary is described in more than one language, ideally </li>
+<li>Vocabulary will be accessible for a long period (has longevity) </li>
+</p>
+
 <p class="highlight"><b>Define the URI of the vocabulary.</b><br/>
 	<i>What it means:</i> The URI that identifies your vocabulary must be defined. This is strongly related to the Best Practices described in section URI Construction.<br/><br/>
 	For example: If we are minting new vocabulary terms from a particular government, we should define the URI of that particular vocabulary.	
--- a/bp/local-style.css	Thu Apr 26 21:08:14 2012 +0100
+++ b/bp/local-style.css	Sun Apr 29 23:08:07 2012 -0400
@@ -53,7 +53,7 @@
 padding: 10px;
 }
 
-.responsible {
+.stmt {
 border: 3px solid #6a6;
 margin: 0 0 0 20px;
 padding: 10px;
@@ -65,7 +65,6 @@
 padding: 10px;
 }
 
-
 ol.prereq li {
 padding-bottom: 10px;
 }
--- a/bp/respec-config.js	Thu Apr 26 21:08:14 2012 +0100
+++ b/bp/respec-config.js	Sun Apr 29 23:08:07 2012 -0400
@@ -1,7 +1,7 @@
 var respecConfig = {
     // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
     specStatus:           "ED",
-    publishDate:          "2012-03-22",
+    publishDate:          "2012-04-29",
     //copyrightStart:       "2010",
 
     // the specification's short name, as in http://www.w3.org/TR/short-name/
@@ -32,9 +32,9 @@
     // editors, add as many as you like
     // only "name" is required
     editors:  [
-        { name: "Michael Hausenblas", url: "http://sw-app.org/mic.xhtml#i", company: "DERI", companyURL: "http://www.deri.ie" },
-		{ name: "Bernadette Hyland", url: "https://twitter.com/bernhyland",  company: "3 Round Stones", companyURL: "http://3roundstones.com/"},
-		{ name: "Boris Villaz&oacute;n-Terrazas", url: "http://boris.villazon.terrazas.name",  company: "OEG, UPM", companyURL: "http://www.oeg-upm.net"}
+        		{ name: "Bernadette Hyland", url: "http://3roundstones.com/about-us/leadership-team/bernadette-hyland/",  company: "3 Round Stones", companyURL: "http://3roundstones.com/"},
+		{ name: "Boris Villaz&oacute;n-Terrazas", url: "http://boris.villazon.terrazas.name",  company: "OEG, UPM", companyURL: "http://www.oeg-upm.net"},
+		{ name: "Michael Hausenblas", url: "http://sw-app.org/mic.xhtml#i", company: "DERI", companyURL: "http://www.deri.ie" }
     ],
 
     // authors, add as many as you like.