BHyland - Last last last edits to Model section & fixed incorrect use of HTML note and replaced with highlight. All done for good!
authorbhyland
Thu, 19 Dec 2013 18:42:48 -0500
changeset 761 d4bb534eb5f9
parent 760 7f6ff5589f47
child 762 789ae71d5e44
BHyland - Last last last edits to Model section & fixed incorrect use of HTML note and replaced with highlight. All done for good!
bp/index.html
--- a/bp/index.html	Thu Dec 19 14:17:35 2013 -0500
+++ b/bp/index.html	Thu Dec 19 18:42:48 2013 -0500
@@ -163,13 +163,11 @@
                         href: "http://philarcher.org/diary/2013/uripersistence/#recs",
                         authors: ["Phil Archer"] ,
                         },
-         
-                        
+                              
             "CSARVEN": {
                         title: "Towards Linked Statistical Data Analysis",
                         href: "http://csarven.ca/linked-statistical-data-analysis",
                         authors: ["Sarven Capadisli"] 
-
                         },
             "HAUSENBLAS":
                         {
@@ -198,8 +196,15 @@
                         href: "http://id.loc.gov/vocabulary/iso639-2.html",
                         authors: ["U.S. Library of Congress"],
                         publisher: "International Standards Organization (ISO)"
-                        }
-                                
+                        },
+                        
+             "NIST800-122":
+                        {
+                        title: "Guide to Protecting the Confidentiality of Personally Identifiable Information (PII)",
+                        href: "http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf",
+                        authors: ["U.S. Department of Commerce by Erika McCallister, Tim Grance, Karen Scarfone"],
+                        publisher: "National Institute of Standards and Technology, U.S. Department of Commerce, Special Publication 800-122"
+                        }     
             }
 
       };
@@ -232,10 +237,9 @@
 
 <h2>Scope</h2>
 <p>
-<a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> refers to a set of best practices for publishing and interlinking structured data for access by both humans and machines via the use of the <a href="http://www.w3.org/TR/ld-glossary/#rdf">RDF</a> (Resource Description Framework) family of standards for data interchange [[RDF-CONCEPTS]] and <a href="http://www.w3.org/TR/ld-glossary/#sparql">SPARQL</a> for query. <a href="http://www.w3.org/TR/ld-glossary/#rdf">RDF</a> and <a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> are not synonyms. Linked Data however could not exist without the consistent underlying data model that we call RDF [[RDF-CONCEPTS]].  Understanding the basics of RDF will be helpful in leveraging <a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a>. 
+<a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> refers to a set of best practices for publishing and interlinking structured data for access by both humans and machines via the use of the <a href="http://www.w3.org/TR/ld-glossary/#rdf">RDF</a> (Resource Description Framework) family of standards for data interchange [[RDF-CONCEPTS]] and <a href="http://www.w3.org/TR/ld-glossary/#sparql">SPARQL</a> for query. <a href="http://www.w3.org/TR/ld-glossary/#rdf">RDF</a> and <a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> are not synonyms. Linked Data however could not exist without the consistent underlying data model that we call RDF [[RDF-CONCEPTS]].  Understanding the basics of RDF is helpful in leveraging <a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a>. 
 </p>
 
-
 <h2>Background</h2>
 <p>
 In recent years, governments worldwide have mandated publication of open government content to the public Web for the purpose of facilitating open societies and to support governmental accountability and transparency initiatives. In order to realize the goals of open government initiatives, the W3C Government Linked Data Working Group offers the following guidance to aid in the access and re-use of open 
@@ -347,30 +351,35 @@
 <section id="MODEL">
 <h2>Model the Data</h2>
 
-<p class="note">
-It is not within scope of this document to treat Linked Open Data modeling comprehensively however, we highlight recommended participants in the modeling effort, some of the differences between Linked Data versus traditional relational data modeling and provision of basic metadata.
+<p>
+It is not within scope of this document to treat Linked Open Data modeling comprehensively.  We provide guidance to organizations on conducting Linked Data modeling and we describe aspects that differentiate Linked Data modeling from other approaches.
 </p>
 
-<h3>Participants in the Modeling Process</h3>
+<h3>Participants</h3>
 <p>
-The modeling process for Linked Data requires participants who represent a range of concerns including one or more people familiar with the existing data workflow and data policies.  It is helpful to include a database administrator (DBA) and/or someone responsible for data collection.  Ideally, a Linked Data subject matter expert will help facilitate the Linked Data modeling process.  Their role is to help explain the similarities, differences and benefits of a Linked Data approach, as well as, collect the information to express the data using <a href="http://www.w3.org/TR/ld-glossary/#linked-data-principles">Linked Data Principles</a>.  
+The modeling process may include participants representing a broad range of concerns including: the government program or office, the data steward of the originating data source, data standards and policies, and a Linked Data subject matter expert.  For example, if the source data is from a relational database, include a database administrator (DBA).  If the organization has a data standards group, include a stakeholder in the modeling effort.  A Linked Data subject matter expert should facilitate the modeling process and be capable of explaining <a href="http://www.w3.org/TR/ld-glossary/#linked-data-principles">Linked Data Principles</a> and the Linked Data life cycle (see <a href="#PREPARE">Prepare Stakeholders</a>).  The modeling phase may involve onsite or virtual meetings during which stakeholders specify details about the data and how it is related. The Linked Data subject matter expert is expected to record this information in order to assist in completing the remaining steps in the process.  The eventual outcome of the modeling process is Linked Open Data being available on an authoritative domain for access and reuse.
 </p>
 
-<h3>Data Relationships and Context</h3> 
+<h3>Understanding the Differences</h3> 
+
 <p>
-In general, data modeling requires an understanding of the category of database being used, for example relational or NoSQL.  An <a href="http://www.w3.org/TR/ld-glossary/#rdf-database"> RDF database</a> is a type of NoSQL database and the only type based on an international family of standards.[[DWOOD2013]] Linked Data uses RDF as its data model because RDF is the international standard for representing data on the Web.  RDF databases are built on well-established and widely deployed standards including <a href="http://www.w3.org/TR/ld-glossary/#http-uris">HTTP URIs</a>.  Thus, one important difference between relational databases versus Linked Data is in the use of international standards for data interchange (e.g., <a href="http://www.w3.org/TR/ld-glossary/#rdfa">RDFa</a>, <a href="http://www.w3.org/TR/ld-glossary/#json-ld">, <a href="http://www.w3.org/TR/ld-glossary/#turtle">Turtle</a> and <a href="http://www.w3.org/TR/ld-glossary/#rdf-xml">RDF/XML</a>) and <a href="http://www.w3.org/TR/ld-glossary/#sparql">SPARQL</a> for query.
+Linked Data modeling involves data going from one model to another.  For example, the modeling effort may involve converting a tabular representation of data to a graph-based representation.  Another common approach is to use extracts from a relational database and store data in an <a href="http://www.w3.org/TR/ld-glossary/#rdf-database">RDF database</a>.   When doing this, avoid translating relational artifacts, such as foreign keys and NULL values, into the Linked Data model.  Also, avoid encoding "housekeeping" data into the Linked Data model. 
+</p>
+
+<p class=note>
+It is common to use denormalized tables from a relational database as inputs to the Linked Data modeling process.  
 </p>
 
 <p>
-An additional difference is the requirement to define the context or semantic meaning of the data using <a href="http://www.w3.org/TR/ld-glossary/#linked-data-principles">Linked Data Principles</a>.  During the Linked Data modeling process, stakeholders specify how objects are related to each other. These relationships or context will then be "packaged" with the data itself.  Linked Data modeling contrasts with relational modeling where external documents describe the data schema and visual diagrams describe the logical model.  By packaging data and its context together, data is transformed into immediately useful information and is more readily accessible and available for reuse by others.
+Providing data context, also referred to as semantic meaning, is a differentiator in Linked Data modeling.  In fact, the schema is "packaged" with the data itself using <a href="http://www.w3.org/TR/ld-glossary/#linked-data-principles">Linked Data Principles</a>.  In contrast, traditional relational modeling leverages external documents to describe the schema and diagrams to visualize the logical model.  
 </p>
 
-<h3> Provide Basic Metadata </h3>
+<p>
+During the modeling process, highlight any private data that your organization does not wish to expose.  This may include Personally Identifiable Information (PII). NIST Special Publication 800-122 defines PII as "any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual‘s identity, such as name, social security number, date and place of birth, mother‘s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information."[[NIST800-122]] It is a best practice to seek the relevant guidance for an organization in relation to PII and publishing data on the Web.
+</p>
 
 <p>
-When modeling Linked Data<a href="http://www.w3.org/TR/ld-glossary/#metadata">metadata</a>, it is a best practice to include the MIME type, publishing organization and/or agency, creation date, modification date, version, frequency of updates, and contact email address, if this information is available and appropriate to the data. In subsequent sections we outline guidance for the use of vocabularies, as well as, a vocabulary "checklist" to assist in the modeling process.
-</p>
-
+Other differences between relational and Linked Data include the use of open Web standards. Linked Data is predicated on the use of international standards for data interchange (e.g., <a href="http://www.w3.org/TR/ld-glossary/#rdfa">RDFa</a>, <a href="http://www.w3.org/TR/ld-glossary/#json-ld"> JSON-LD</a>, <a href="http://www.w3.org/TR/ld-glossary/#turtle">Turtle</a> and <a href="http://www.w3.org/TR/ld-glossary/#rdf-xml">RDF/XML</a>) and query <a href="http://www.w3.org/TR/ld-glossary/#sparql">SPARQL</a>.  Linked Data modeling leverages many of the advances in modern information management, including increased levels of data abstraction.  We hope that understanding where there is overlap and where there are differences proves helpful and informs efforts to efficiently model open government data as Linked Open Data.
 </section>
 
 
@@ -378,26 +387,13 @@
 <section id="LICENSE">
 <h2>Specify an Appropriate License</h2>
 
-<p class="note">
-It is beyond the charter of this working group to describe or recommend appropriate licenses for Open Government content published as Linked Data, however there are useful Web sites that offer detailed guidance and licenses.
-</p>
-
 <p>
-It is important to specify who owns data published on the Web and to explicitely connect that license with the data itself. Governmental authorities publishing open data are encouraged to review the relevant guidance for open licenses and copyright.   Publishing Linked Open Data makes associating a license that travels with the data itself easy.  Thus, people are more likely to reuse data when there is a clear, acceptable license associated with it.
+It is important to specify who owns data published on the Web and to explicitely connect that license with the data itself. Governmental authorities publishing open data are encouraged to review the relevant guidance for open licenses and copyright.   Publishing Linked Open Data makes associating a license that travels with the data itself easier.  People are more likely to reuse data when there is a clear, acceptable license associated with it.
 </p>
 
 <p>
-A valuable resource for open data publishers to review may be found on the <a href="http://creativecommons.org/">Creative Commons</a> Web site.  Creative Commons develops, supports, and stewards legal and technical infrastructure for digital content publishing.
+A valuable resource for open data publishers may be found on the <a href="http://creativecommons.org/">Creative Commons</a> Web site.  Creative Commons develops, supports, and stewards legal and technical infrastructure for digital content publishing.
 </p>
-
-<!-- NOTE TO FUTURE EDITORS:  This was commented out as it was deemed too US centric.  Perhaps in a later version, it could be expanded to be more international.  
-
-<p>
-The UK and many former Commonwealth countries maintain the concept of the Crown Copyright. The US Government designates information produced by civil servants as a U.S. Government Work, whereas contractors may produce works under a variety of licenses and copyright assignments. U.S. Government Works are not subject to copyright restrictions in the United States. It is critical for US government officials to know their rights and responsibilities under the Federal Acquisition Regulations (especially FAR Subpart 27.4, the Contract Clauses in 52.227-14, -17 and -20 and any agency-specific FAR Supplements) and copyright assignments if data is produced by a government contractor.  
-</p>
-
--->
-
 </section>
 
 
@@ -480,7 +476,7 @@
 </p>
 
 <p>
-PURLs implement one form of persistent identifier for virtual resources. Other persistent identifier schemes include Digital Object Identifiers (DOIs), Life Sciences Identifiers (LSIDs) and INFO URIs. All persistent identification schemes provide unique identifiers for (possibly changing) virtual resources, but not all schemes provide curation opportunities. Curation of virtual resources has been defined as, <b>“the active involvement of information professionals in the management, including the preservation, of digital data for future use.”</b> [[yakel-07]] For a persistent identification scheme to provide a curation opportunity for a virtual resource, it must allow real-time resolution of that resource and also allow real-time administration of the identifier.
+PURLs implement one form of persistent identifier for virtual resources. Other persistent identifier schemes include Digital Object Identifiers (DOIs), Life Sciences Identifiers (LSIDs) and INFO URIs. All persistent identification schemes provide unique identifiers for (possibly changing) virtual resources, but not all schemes provide curation opportunities. Curation of virtual resources has been defined as, “the active involvement of information professionals in the management, including the preservation, of digital data for future use.” [[yakel-07]] For a persistent identification scheme to provide a curation opportunity for a virtual resource, it must allow real-time resolution of that resource and also allow real-time administration of the identifier.
 </p>
 
 
@@ -598,15 +594,20 @@
 
 <h3>Vocabulary Checklist</h3>
 
+<p>
+This section provides a set of considerations aimed at helping stakeholders review a vocabulary to evaluate its usefulness.
+</p>
+
 <p class="note"> 
-It is best practice to use or extend an existing vocabulary before creating a new vocabulary.   This 
-section provides a set of considerations aimed at helping stakeholders review a vocabulary to evaluate its usefulness. 
+It is best practice to use or extend an existing vocabulary before creating a new vocabulary.  
 </p> 
 
 <p>A basic vocabulary checklist:
-<b>ensure vocabularies you use are published by a trusted group or organization;</b>	
-<b>ensure vocabularies have permanent URIs; and </b>	
-<b>confirm the versioning policy</b>. 
+<ul>
+<li>ensure vocabularies you use are published by a trusted group or organization;</li>	
+<li>ensure vocabularies have permanent URIs; and </li>	
+<li>confirm the versioning policy.</li> 
+</ul>
 </p>
 
 <p class="highlight"><b>Vocabularies MUST be documented</b><br />
@@ -740,14 +741,11 @@
 
 <h3>Using SKOS to Create a Controlled Vocabulary</h3>
 
-<div class='note'>
-     SKOS, the Simple Knowledge Organization System [[SKOS-REFERENCE]], is a W3C standard, 
-based on other Semantic Web standards (RDF and OWL), that provides a way to represent controlled 
-vocabularies, taxonomies and thesauri. Specifically, SKOS itself is an OWL ontology and it can be written out in any RDF flavor.
-</div>
+<p>
+SKOS, the Simple Knowledge Organization System [[SKOS-REFERENCE]], is a W3C standard, based on other Semantic Web standards (RDF and OWL), that provides a way to represent controlled vocabularies, taxonomies and thesauri. Specifically, SKOS itself is an OWL ontology and it can be written out in any RDF flavor.
+</p>
 
-<p>The W3C SKOS standard defines a portable, flexible controlled vocabulary format that is 
-increasingly popular, with the added benefit of a good entry-level step toward the use of Semantic Web technology. </p>
+<p>The W3C SKOS standard defines a portable, flexible controlled vocabulary format that is increasingly popular, with the added benefit of a good entry-level step toward the use of Semantic Web technology. </p>
 
 <div class="highlight"> SKOS is appropriate in the following situations:
     <ul>
@@ -779,8 +777,8 @@
 
 <p class="highlight"><b>If designing a vocabulary, provide labels and descriptions if possible, in several languages, to make the vocabulary usable by a global audience.</b> </p>	
 
-<p> Multilingual vocabularies may be found in the following formats:
-</p>
+
+<p class="highlight"><b>Multilingual vocabularies may be found in the following formats</b> <br />
 <ul>
 	<li>As a set of <code>rdfs:label</code> in which the language has been restricted (@en, @fr...). Currently, this is the most commonly used approach. </li>
     
@@ -800,15 +798,13 @@
 this approach is that semantics and linguistic information are kept separated. One can link several 
 lemon models in different natural languages to the same ontology.</li>
     
-    <li>A list of codes and their corresponding URIs for the representation of language names is published and maintained by the official registration authority of ISO639-2, the US Library of Congress. [[ISO639-1]], [[ISO639-2]]</li>
-    
+    <li>A list of codes and their corresponding URIs for the representation of language names is published and maintained by the official registration authority of ISO639-2, the US Library of Congress. [[ISO-639-1]], [[ISO-639-2]]</li>
+
+</p>
+
 <!-- 19-Dec-2013 - Removed Lexvo.org reference in favor of reference to an authoritative list of URIs for languages maintained by the official registration authority of ISO639-2, the US Library of Congress. This is the same reference used in the DCAT Vocabulary.
 
- <li> It could be also useful to use the <a href="http://www.lexinfo.net/lmf#">lexInfo</a> ontology
-where they provide stable resources for languages, such as 
-
-<a href="http://lexvo.org/id/iso639-3/eng"><code>http://lexvo.org/id/iso639-3/eng</code></a> 
-for English, or <a href="http://lexvo.org/id/iso639-3/cmn"><code>http://lexvo.org/id/iso639-3/cmn</code></a> for Chinese Mandarin. </li>
+ <li> It could be also useful to use the <a href="http://www.lexinfo.net/lmf#">lexInfo</a> ontology where they provide stable resources for languages, such as <a href="http://lexvo.org/id/iso639-3/eng"><code>http://lexvo.org/id/iso639-3/eng</code></a> for English, or <a href="http://lexvo.org/id/iso639-3/cmn"><code>http://lexvo.org/id/iso639-3/cmn</code></a> for Chinese Mandarin. </li>
 -->
 
 </ul>
@@ -826,18 +822,26 @@
 </p>
 
 <p>
+<div class="highlight">
 <ul>
 <li><a href="http://www.w3.org/TR/ld-glossary/#rdfa">RDFa</a>,</li>
 <li><a href="http://www.w3.org/TR/ld-glossary/#json-ld">JSON-LD</a>,</li>
 <li><a href="http://www.w3.org/TR/ld-glossary/#turtle">Turtle</a> and <a href="http://www.w3.org/TR/ld-glossary/#n-triples">N-Triples</a>, </li>
 <li><a href="http://www.w3.org/TR/ld-glossary/#rdf-xml">RDF/XML</a></li>
 </ul>
+</div>
 </p>
 
-<div class="note"> 
+<p> 
 Linked Data modelers and developers have certain reasons they prefer to use one RDF serialization over another.  No one RDF serialization is better than the other.  Benefits of using one over another include simplicity, ease of reading (for a human) and speed of processing.
 </p>
 
+<h3>Provide Basic Metadata</h3>
+<p>
+When modeling Linked Data<a href="http://www.w3.org/TR/ld-glossary/#metadata">metadata</a>, it is a best practice to include the MIME type, publishing organization and/or agency, creation date, modification date, version, frequency of updates, and contact email address, if this information is available and appropriate to the data. In subsequent sections we outline guidance for the use of vocabularies, as well as, a vocabulary "checklist" to assist in the modeling process.
+</p>
+
+<h3>Link to Other Stuff</h3>
 <p>
 As the name suggests, Linked Open Data means the data links to other stuff.  Data in isolation is rarely valuable, however, interlinked data is suddenly very valuable.  There are many popular datasets, such as DBpedia that provide valuable data, including photos and geographic information. Being able to connect data from a government authority with DBpedia for example, is quick way to show the value of adding content to the <a href="http://www.w3.org/TR/ld-glossary/#linked-open-data-cloud">Linked Data Cloud</a>.  
 </p>
@@ -850,14 +854,16 @@
 <h2>Provide Machine Access to Data</h2>
 
 <p>
+<div class="highlight">
 A major benefit of Linked Data is that it provides access to data for machines. Machines can use a variety of methods to read data including, but not limited to: 
-</p>
 <ul>
 <li>Direct URI resolution ("follow your nose"), </li>
 <li>a <a href="http://www.w3.org/TR/ld-glossary/#rest-api">RESTful API</a>, </li>
 <li>a <a href="http://www.w3.org/TR/ld-glossary/#sparql-endpoint">SPARQL endpoint</a>, and/or </li>
 <li>via file download.
 </ul>
+</div>
+</p>
 
 <p>
 The SPARQL Protocol and RDF Query Language (SPARQL) defines a query language for RDF data, analogous to the Structured Query Language (SQL) for relational databases. SPARQL is to RDF data what SQL is to a relational database.  For more information, see the SPARQL 1.1 Overview [SPARQL-11]. 
@@ -873,12 +879,13 @@
 <section id="ANNOUNCE">
 <h2>Announce to the Public</h2>
 
-<p class="note">It is not within scope of this document to discuss domain name issues and data hosting however, it is a vital part of the publication process.  Hosting Linked Open Data may require involvement with agency system security staff and require planning that often takes considerable time and experise for compliance, so involve stakeholders early and schedule accordingly.
+<p>It is not within scope of this document to discuss domain name issues and data hosting however, it is a vital part of the publication process.  Hosting Linked Open Data may require involvement with agency system security staff and require planning that often takes considerable time and experise for compliance, so involve stakeholders early and schedule accordingly.
 </p>
 
 <p>Now you're ready to point people to authoritative open government data.  Be sure the datasets are available via an authoritative domain.  Using an authoritative domain increases the perception of trusted content.  Authoritative data that is regularly updated on a government domain is critical to re-use of authoritative datasets.
 </p>
 
+<div class="highlight">
 <p>
 The following checklist is intended to help organizations realize the benefits of publishing open government data, as well as, communicate to the public that you are serious about providing this data over time.
 
@@ -892,7 +899,8 @@
 <li>Provide a form for people to report problematic data and give feedback;</li>
 <li>Provide a contact email address (alias) for those responsible for curating and publishing the data;</li>and
 <li>Ensure staff have the necessary training to respond in a timely manner to feedback.
-</ul> </div>
+</ul> 
+</div>
 </p>
 
 </section>
@@ -916,8 +924,8 @@
 
 <h3>Stability Properties</h3>
 
-<p class="note"> It is beyond the scope of this document to comprehensively treat issues related to data stability over time on the Web.  Mention is included such that readers may consider data stability in the context of a given agency and region.  
-There are characteristics that influence the stability or longevity of useful <a href="http://www.w3.org/TR/ld-glossary/#open-government-data">open government data</a>. Many of these properties are not unique to government <a href='http://www.w3.org/TR/ld-glossary/#linked-open-data'>Linked Open Data</a>, yet they influence data cost and therefore data value.  </p>
+<p> It is beyond the scope of this document to comprehensively treat issues related to data stability over time on the Web.  Mention is included such that readers may consider data stability in the context of a given agency and region.  There are characteristics that influence the stability or longevity of useful <a href="http://www.w3.org/TR/ld-glossary/#open-government-data">open government data</a>. Many of these properties are not unique to government <a href='http://www.w3.org/TR/ld-glossary/#linked-open-data'>Linked Open Data</a>, yet they influence data cost and therefore data value.  
+</p>
 
 <p>
 As a final note related to the importance of stability.  The W3C prepares to celebrate its 20th anniversary and the Web turns 25 years old in 2014. Perhaps surprisingly, the first Web page cannot be found.  A team at CERN is looking into restoring it,  however at the time of the writing of this document, it has not yet been found.[[GBRUMFIEL]]  Thus, the Government Linked Data Working Group wished to reference the importance of <i>data stability</i> as the vast majority of government data is quickly available <i>only</i> in digital form.  As stewards and supporters of open government data, it is encumbant upon us all to pursue the methods and tools to support responsible data stability on the Web over time.  Thanks for your interest in this topic and please join us in helping evolve the Web of Data into the 21st Century and beyond!