--- a/bp/index.html Fri Dec 13 11:32:41 2013 +0000
+++ b/bp/index.html Fri Dec 13 14:00:40 2013 -0500
@@ -143,13 +143,13 @@
title: "Linking Enterprise Data",
href: "http://download.e-bookshelf.de/download/0000/0067/81/L-G-0000006781-0002335618.pdf",
authors: ["David Wood"] ,
- publisher: "Springer New York"
+ publisher: "Springer"
},
"BHYLAND2011":{
- title: "The Joy of Data - A Cookbook for Publishing Linked Government Data on the Web",
- href: "http://dx.doi.org/10.1007/978-1-4614-1767-5_1",
+ title: "The Joy of Data - Cookbook for Publishing Linked Government Data on the Web",
+ href: "http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook",
authors: ["Bernadette Hyland", "David Wood"],
- publisher: "Springer"
+ publisher: "W3C"
},
"BVILLAZON": {
@@ -189,7 +189,7 @@
<section id="abstract">
<p>
-This document sets out a series of best practices designed to facilitate development and delivery of open government data as Linked Data. <a href="http://www.w3.org/TR/ld-glossary/#linked-data-principles">Linked Data Principles</a> are used to publish high quality, heterogeneous datasets in a distributed manner. The goal of this document is to aid the publication of high quality <a href="http://www.w3.org/TR/ld-glossary/#linked-open-data">Linked Open Data (LOD)</a> from government authorities, to compile the most relevant data management practices, to promote best practices for publishing Linked Open Data, and to warn against practices that are considered harmful. Linked Data can be published by a person or organization behind a firewall or on the public Web. If Linked Data is published on the public Web, it is generally called Linked Open Data. The following recommendations are offered to creators, maintainers and operators of Web sites interested in Linked Data.
+This document sets out a series of best practices designed to facilitate development and delivery of open government data as <a href="http://www.w3.org/TR/ld-glossary/#linked-open-data">Linked Open Data</a> (LOD). <a href="http://www.w3.org/TR/ld-glossary/#linked-open-data">Linked Open Data</a> makes the World Wide Web into a global database, sometimes refered to as the "<a href="http://www.w3.org/TR/ld-glossary/#web-of-data">Web of Data</a>". Using <a href="http://www.w3.org/TR/ld-glossary/#linked-data-principles">Linked Data Principles</a>, developers can query <a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> from multiple sources at once and combine it without the need for a single common schema that all data shares. Prior to international data exchange standards for data on the Web, it was time consuming and difficult to build applications using traditional data management techniques. Using the Web of Data, developers can more easily integrate <a href="http://www.w3.org/TR/ld-glossary/#dataset-rdf">RDF datasets</a> to create useful Web applications. As more open government data is published on the Web, best practices are evolving too. The goal of this document is to compile the most relevant data management practices for the publication and use of of high quality data published by governments around the world as <a href="http://www.w3.org/TR/ld-glossary/#linked-open-data">Linked Open Data</a>. The following recommendations are offered to creators, maintainers and operators of Web sites.
</p>
<h2>Audience</h2>
@@ -203,15 +203,14 @@
<h2>Scope</h2>
<p>
-<a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> refers to a set of best practices for publishing and interlinking structured data for access by both humans and machines via the use of <a href="http://www.w3.org/TR/ld-glossary/#rdf">RDF</a> (Resource Description Framework) [[RDF-CONCEPTS]]. RDF can be written in a variety of syntaxes (e.g.
-<a href="http://www.w3.org/TR/ld-glossary/#rdfa">RDFa</a>,
-<a href="http://www.w3.org/TR/ld-glossary/#json-ld">JSON-LD</a>,
-<a href="http://www.w3.org/TR/ld-glossary/#turtle">Turtle</a> and N-Triples),
-<a href="http://www.w3.org/TR/ld-glossary/#rdf-xml">RDF/XML</a>,
-and <a href="http://www.w3.org/TR/ld-glossary/#http-uris">HTTP URIs</a>.
-
-<a href="http://www.w3.org/TR/ld-glossary/#rdf">RDF</a> and
-<a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> are not synonyms. Linked Data however could not exist without a consistent underlying data model which is RDF.[RDF-CONCEPTS]. Understanding the basics of RDF is necessary to leverage <a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a>.
+<a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> refers to a set of best practices for publishing and interlinking structured data for access by both humans and machines via the use of the <a href="http://www.w3.org/TR/ld-glossary/#rdf">RDF</a> (Resource Description Framework) family of standards for data interchange [[RDF-CONCEPTS]] and <a href="http://www.w3.org/TR/ld-glossary/#sparql">SPARQL</a> for query. <a href="http://www.w3.org/TR/ld-glossary/#rdf">RDF</a> and <a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a> are not synonyms. Linked Data however could not exist without the consistent underlying data model that we call RDF [RDF-CONCEPTS]. Understanding the basics of RDF is helpful to leverage the usefulness of <a href="http://www.w3.org/TR/ld-glossary/#linked-data">Linked Data</a>. Linked Data can be written in a variety of syntaxes including:
+<ul>
+<li><a href="http://www.w3.org/TR/ld-glossary/#rdfa">RDFa</a>,</li>
+<li><a href="http://www.w3.org/TR/ld-glossary/#json-ld">JSON-LD</a>,</li>
+<li><a href="http://www.w3.org/TR/ld-glossary/#turtle">Turtle</a> and <a href="http://www.w3.org/TR/ld-glossary/#n-triples">N-Triples</a>, </li>
+<li><a href="http://www.w3.org/TR/ld-glossary/#rdf-xml">RDF/XML</a>, and </li>
+<li><a href="http://www.w3.org/TR/ld-glossary/#http-uris">HTTP URIs</a>. </li>
+</ul>
</p>
<h2>Background</h2>
@@ -240,7 +239,7 @@
<p class='stmt'><a href="#SELECT">SELECT A DATASET:</a><br /> Select a dataset that provides benefit to others for re-use.
</p>
-<p class='stmt'><a href="#PERSONAL">PERSONALLY IDENTIFIABLE DATA:</a><br /> Do not publish personally identifiable information as Linked Open Data as it can potentially be misused.</p>
+<p class='stmt'><a href="#PERSONAL">PERSONAL IDENTIFIABLE DATA:</a><br /> Do not publish personally identifiable information as Linked Open Data as it can potentially be misused.</p>
<p class='stmt'><a href="#MODEL">MODEL:</a> <br /><a href="http://www.w3.org/TR/ld-glossary/#modeling-process">Model</a> the
data in an application-independent way.</p>
@@ -250,31 +249,24 @@
</p>
<p class='stmt'><a href="#METADATA">BASIC METADATA:</a> <br />Always provide basic
-<a href="http://www.w3.org/TR/ld-glossary/#metadata">metadata</a>, including MIME type, publishing
-organization and/or agency, creation date, modification date, version, frequency of updates, contact email for the data steward(s).
+<a href="http://www.w3.org/TR/ld-glossary/#metadata">metadata</a>.
</p>
<p class='stmt'><a href="#LICENSE">SPECIFY A LICENSE:</a> <br />Specify an appropriate open
license with the published data.</p>
-<p class='stmt'><a href="#HUMAN">HUMAN READABLE:</a> <br />Provide human readable descriptions with your Linked Data.
+<p class='stmt'><a href="#INTERNATIONAL">USE INTERNATIONALIZED RESOURCE IDENTIFIERS:</a> <br />Provide internationalized Resource Identifiers with your Linked Data.
</p>
-<p class='stmt'><a href="#HTTPURIS">HTTP URIs:</a><br /> Create <a href="http://www.w3.org/TR/ld-glossary/#http-uris">HTTP URIs</a>
-as names for your objects. Give careful consideration to the <a href="http://www.w3.org/TR/ld-glossary/#uri">URI</a> naming
-strategy. Consider how the data will change over time and name as necessary.
+<p class='stmt'><a href="#HTTPURIS">HTTP URIs:</a><br /> Create <a href="http://www.w3.org/TR/ld-glossary/#http-uris">HTTP URIs</a> as names for your objects. Give careful consideration to the <a href="http://www.w3.org/TR/ld-glossary/#uri">URI</a> naming strategy. Consider how the data will change over time and name as necessary.
</p>
<p class='stmt'><a href="#MACHINE">MACHINE ACCESSIBLE:</a><br />A major benefit of Linked Data is that it provides
-access to data for machines. Machines can use a variety of methods to read data including, but not limited to:
-a <a href="http://www.w3.org/TR/ld-glossary/#rest-api">RESTful API</a>,
-a <a href="http://www.w3.org/TR/ld-glossary/#sparql-endpoint">SPARQL endpoint</a> or download.
+access to data for machines.
</p>
-<p class='stmt'><a href="#SERIALIZATION">DATA CONVERSION:</a><br /> Convert the sources data to a Linked Data
-representation. This will typically mean mapping the source data to a set of RDF statements
-about entities described by the data. These statements can then be serialized into
-a range of RDF serializations including Turtle, N-Triples, JSON-LD, (X)HTML with embedded RDFa and RDF/XML.
+<p class='stmt'><a href="#CONVERT">DATA CONVERSION:</a><br /> Convert the sources data to a Linked Data
+representation.
</p>
<p class='stmt'><a href="#LINK">LINKS ARE KEY:</a> <br />As the name suggests, Linked Open Data
@@ -286,8 +278,6 @@
</p>
<p class='stmt'><a href="#HOST">DOMAIN:</a> <br />Deliver Linked Open Data on an authoritative domain.
-Using an authoritative domain increases the perception of trusted content. Authoritative data
-that is regularly updated on a government domain is critical to uptake and reuse of the dataset(s).
</p>
<p class='stmt'><a href="#ANNOUNCE">ANNOUNCE:</a><br /> Announce the Linked Open Data on
@@ -306,18 +296,13 @@
<!-- Diagrams -->
<section id='WORKFLOW'>
-<h2> Linked Open Data Lifecycle </h2>
-<!-- <p class='issue'>Does it make sense to base the GLD life cycle on one of the general LD life cycles? See <a href="https://www.w3.org/2011/gld/track/issues/15">ISSUE-15</a></p> -->
+<h2> Linked Open Data Workflow </h2>
+
<p>
-The process of publishing Government Linked Open Data (GLD) should be comprised of
-tractable and manageable steps, forming a life cycle in the same way software engineering
-uses life cycles in development projects. The life cycle for Government Linked Data includes
-all steps starting with identifying appropriate datasets, through publication and ongoing maintenance.
-In the following paragraph three different life cycle models are presented, however it
-is evident that they all share common (and sometimes overlapping) characteristics in
-their constituents. For example, they all identify the need to specify, model and publish data
-in standard open Web formats. In essence, they capture the same tasks that are needed in
-the process, but provide different boundaries between these tasks.
+The process of publishing Government Linked Open Data (GLD) can be tracked in well-defined steps. The worklow for includes starting with identifying a useful dataset to publish through publication and ongoing maintenance.
+
+Three different life cycle models are presented, however it is evident that they all share common (and sometimes overlapping) characteristics in their constituents. For example, they all identify the need to specify, model and publish data in standard open Web formats. In essence, they capture the same tasks that are needed in
+the process, but provide different boundaries between these tasks. One workflow is not better than another, they are simply different ways to visualize the process.
</p>
<ul>
@@ -361,6 +346,7 @@
<img src="img/GLF_Villazon-terrazas.PNG" width="600" />
</section>
+
<!-- SELECT -->
<section id="SELECT">
<h2>Select a Dataset</h2>
@@ -376,7 +362,7 @@
<!-- SELECT -->
<section id="PERSONAL">
-<h2>Personally Identifiable Information</h2>
+<h2>Personal Identifiable Information</h2>
<p>
Do not publish personally identifiable information as Linked Open Data as it can potentially be misused.
@@ -397,74 +383,181 @@
</section>
+<!-- CONVERT DATA TO LINKED DATA -->
+<section id="CONVERT">
+<h2>Convert Data to Linked Data</h2>
+<p>
+Convert the sources data to a Linked Data representation. This involves a data modeling step, followed by consensus that the object and relationships correctly reflect the dataset(s). The next step involves mapping the source data into a set of RDF statements via a script. When we convert data, we are serializing the data into RDF statements. RDF can be converted into a range of RDF serializations that include:
+</p>
+
+<p>
+<ul>
+<li><a href="http://www.w3.org/TR/ld-glossary/#rdfa">RDFa</a>,</li>
+<li><a href="http://www.w3.org/TR/ld-glossary/#json-ld">JSON-LD</a>,</li>
+<li><a href="http://www.w3.org/TR/ld-glossary/#turtle">Turtle</a> and <a href="http://www.w3.org/TR/ld-glossary/#n-triples">N-Triples</a>, </li>
+<li><a href="http://www.w3.org/TR/ld-glossary/#rdf-xml">RDF/XML</a>, and </li>
+<li><a href="http://www.w3.org/TR/ld-glossary/#http-uris">HTTP URIs</a>. </li>
+</ul>
+</p>
+
+<p>
+Linked Data modelers and developers have certain reasons they prefer to use one RDF serialization over another. No one RDF serialization is better than the other. Benefits of using one over another include simplicity, ease of reading (for a human) and speed of processing.
+</p>
+</section>
+
+
<!-- BASIC METADATA -->
-<section id="BASIC-METADATA">
-<h2>Basic Metadata for Linked Datasets</h2>
+<section id="BASIC">
+<h2>Include Basic Metadata</h2>
-<!-- NOTE TO EDITORS: Link summary to right place -->
+<p>
+Always provide basic <a href="http://www.w3.org/TR/ld-glossary/#metadata">metadata</a>, including MIME type, publishing organization and/or agency, creation date, modification date, version, frequency of updates, contact email for the data steward(s).
+</p>
+
+</section>
+
+<!-- Publish 5-Star Data -->
+<section id="5STAR">
+<h2>Publish 5 Star Linked Open Data</h2>
+
+<p>While organizations around the globe are making very valuable steps in government transparency by publishing datasets in non-proprietary formats such as CSV and PDF, striving to publish authoritative data as <a href="http://www.w3.org/TR/ld-glossary/#5-star-linked-open-data">5 Star Linked Open Data</a> helps efforts for consistent re-use. Here is a diagram of the <a href="http://5stardata.info/">5-Star Scheme</a>, that is summarized as follows:
+</p>
+
+<p class="highlight">☆ <b>Publish your vocabulary on the Web at a stable URI using an open license.</b>
+</p>
+
+<p class="highlight">☆☆ <b>Provide human-readable documentation and basic metadata such as creator, publisher, date of creation, last modification, version number.</b>
+</p>
+
+<p class="highlight">☆☆☆ <b>Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes.</b>
+</p>
+
+<p class="highlight">☆☆☆☆ <b>Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation.</b>
+</p>
+
+<p class="highlight">☆☆☆☆☆ <b>Link to other vocabularies by re-using elements rather than re-inventing.</b>
+</p>
</section>
<!-- MACHINE ACCESSIBLE -->
-<section id="MACHINE-ACCESSIBLE">
-<h2>Machine Accessible Access to Data</h2>
-
-<!-- NOTE TO EDITORS: Link summary to right place -->
-
-</section>
-
-
-<!-- DATA CONVERSION -->
-<section id="DATA CONVERSION">
-<h2>Converting Data to Linked Data</h2>
-
-<!-- NOTE TO EDITORS: (BOH) @@TODO@@ -->
+<section id="MACHINE-ACCESS">
+<h2>Machine Access to Data</h2>
-</section>
-
-
-<!-- LINKS ARE KEY -->
-<section id="LINKS">
-<h2>Links are Key</h2>
+<p>
+A major benefit of Linked Data is that it provides access to data for machines. Machines can use a variety of methods to read data including, but not limited to:
+</p>
-<!-- NOTE TO EDITORS: (BOH) Link summary to right place with all the vocab stuff -->
-
-</section>
-
+<ul>
+<li>A <a href="http://www.w3.org/TR/ld-glossary/#rest-api">RESTful API</a>, </li>
+<li>a <a href="http://www.w3.org/TR/ld-glossary/#sparql-endpoint">SPARQL endpoint</a>, and/or </li>
+<li>via file download.
+</ul>
-<!-- ANNOUNCE -->
-<section id="LINKS">
-<h2>Announcing to the Public</h2>
-
-<!-- NOTE TO EDITORS: (BOH) Link summary to right place -->
+<p>
+SPARQL Protocol and RDF Query Language (SPARQL) defines a query language for RDF data, analogous to the Structured Query Language (SQL) for relational databases. A family of standards of the World Wide Web Consortium. See also SPARQL 1.1 Overview [SPARQL-11].
+</p>
+<p>
+A SPARQL endpoint is a a service that accepts SPARQL queries and returns answers to them as SPARQL result sets. It is a best practice for datasets providers to give the URL of their SPARQL endpoint to allow access to their data programmatically or through a Web interface. A <a href="http://labs.mondeca.com/sparqlEndpointsStatus/">list of some SPARQL endpoints</a> may be found here.
+</p>
</section>
-<!-- VOCABULARY SELECTION -->
-<section id="STANDARD-VOCABULARIES">
-<h2>Vocabulary Selection</h2>
+<!-- ANNOUNCE -->
+<section id="LINKS">
+<h2>Announce to the Public</h2>
+
+<!-- NOTE TO EDITORS: (BOH) Link summary to right place -->
+</section>
+
+
+<!-- SPECIFY LICENSE -->
+<section id="LICENSE">
+<h2>Specifying an Appropriate License</h2>
<p>
-There are several core W3C vocabularies that allow a developer to describe basic or more
-complex relationships as Linked Data. Standardized vocabularies should be reused as much
-as possible to facilitate inclusion and expansion of the <a href="http://www.w3.org/TR/ld-glossary/#web-of-data">Web of data</a>.
-
-Government publishers are encouraged to use standardized vocabularies rather than reinventing the wheel, wherever possible.
+Specify an appropriate open license with the published data. People will only reuse data when there is a clear, acceptable license associated with it. Governments typically define ownership of works produced by government employees or contractors in legislation.
</p>
<p>
-For example, organizational structures and activities are often described by government authorities.
-The <a href="http://www.w3.org/TR/vocab-org/">Organization Ontology</a> [[vocab-org]] supports the
-publishing of organizational information across a number of domains, as Linked Data. The Organizational
-Ontology is designed to allow domain-specific extensions to add classification of
-organizations and roles, as well as extensions to support neighboring information such as organizational activities.
+It is beyond the charter of this working group to describe and recommend appropriate licenses for
+Open Government content published as Linked Data, however there are useful Web sites that
+offer detailed guidance and licenses. One valuable resource is the <a href="http://creativecommons.org/">Creative
+Commons</a> Web site. Creative Commons develops, supports, and stewards legal and technical infrastructure for digital content publishing.
+</p>
+
+<p class="note">
+As an informative note, the UK and many former Commonwealth countries maintain the concept of
+the Crown Copyright. It is important to know who owns your data and to say so. The US
+Government designates information produced by civil servants as a U.S. Government Work, whereas
+contractors may produce works under a variety of licenses and copyright assignments. U.S.
+Government Works are not subject to copyright restrictions in the United States. It
+is critical for US government officials to know their rights and responsibilities under the Federal
+Acquisition Regulations (especially FAR Subpart 27.4, the Contract Clauses in 52.227-14, -17 and -20 and
+any agency-specific FAR Supplements) and copyright assignments if data is produced by
+a government contractor. It is recommended that governmental authorities publishing
+Linked Data review the relevant guidance for data published on the Web.
+</p>
+</section>
+
+
+<!-- DOMAIN AND HOSTING -->
+<section id="HOST">
+
+<h2>Domain and Hosting</h2>
+
+<p>
+Deliver Linked Open Data on an authoritative domain. Using an authoritative domain increases the perception of trusted content. Authoritative data that is regularly updated on a government domain is critical to re-use of authoritative datasets.
+</p>
+
+<p>It is not within scope of this document to expand on hosting Linked Open Data however, data hosting is a vital part of the publication process. Hosting Linked Open Data may require involvement with agency system security staff and require planning that often takes considerable time and experise for compliance, so involve stakeholders early and schedule accordingly.
+</p>
+
+</section>
+
+<!-- SOCIAL_CONTRACT -->
+<section id="SOCIAL-CONTRAT">
+<h2>The Social Contract of a Linked Data Publisher</h2>
+
+<p>
+A closing but important final word. Publishers of Linked Data enter into an implicit social contract with users of their data. Publishers must recognize their responsibility in maintaining data once it is published. Key to the widespread use of the Web of Data is ensuring that the dataset(s) your organization publishes remains available where you say it will be. The following is intended to help your organization fulfill its social contract by publishing Linked Data for others to re-use on the Web:
+</p>
+<div class="note">
+<ul class="highlight">
+<li>Publish a description for each published dataset using [[vocab-dcat]] or [[void]] vocabulary;</li>
+<li>Associate metadata on the frequency of data updates;</li>
+<li>Associate a government appropriate license with all content your agency publishes if you wish to encourage re-use;</li>
+<li>Plan and implement a persistence strategy;</li>
+<li>Ensure data is accurate to the greatest degree possible;</li>
+<li>Publish an email address to report problematic data;</li>
+<li>Ensure the contact person or team responds to enquires via email or telephone, if necessary.</li>
+</ul> </div>
+
+<p>
+Giving due consideration to your organization's URI strategy should be one of the first activities
+your team undertakes as they prepare a Linked Open Data strategy. Authoritative data requires the
+permanence and resolution of HTTP URIs. If publishers move or remove data that was published
+to the Web, third party applications or mashups may break. This is considered rude for obvious
+reasons and is the basis for the Linked Data "social contract." A good way to prevent causing HTTP
+404s is for your organization to implement a persistence strategy.
+</p>
+
+</section>
+
+
+<!-- STANDARD VOCABS -->
+<section id="STANDARD-VOCABS">
+<h2>Standard Vocabularies</h2>
+
+<p>
+Standardized vocabularies should be reused as much as possible to facilitate inclusion and expansion of the <a href="http://www.w3.org/TR/ld-glossary/#web-of-data">Web of data</a>. The W3C has published several useful vocabularies for Linked Data. For example, the following standard vocabularies help developers to describe basic or more complex relationships for describing <a href="http://www.w3.org/TR/vocab-dcat/">data catalogs</a>, <a href="http://www.w3.org/TR/vocab-org/">organizations</a>, and <a href="http://www.w3.org/TR/vocab-data-cube/">multidimentional data</a>, such as statistics on the Web. Government publishers are encouraged to use standardized vocabularies rather than reinventing the wheel, wherever possible.
</p>
<p>
-The <a href="http://www.w3.org/TR/vocab-dcat/">Data Catalog Vocabulary (DCAT)</a> [[vocab-dcat]] is
+Specifically, <a href="http://www.w3.org/TR/vocab-dcat/">Data Catalog Vocabulary (DCAT)</a> [[vocab-dcat]] is
an RDF vocabulary designed to facilitate interoperability between data catalogs published on the
Web. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and
enable applications easily to consume metadata from multiple catalogs. It further
@@ -473,106 +566,59 @@
</p>
<p>
+Organizational structures and activities are often described by government authorities. The <a href="http://www.w3.org/TR/vocab-org/">Organization Ontology</a> [[vocab-org]] supports the publishing of organizational information across a number of domains, as Linked Data. The Organizational Ontology is designed to allow domain-specific extensions to add classification of organizations and roles, as well as extensions to support neighboring information such as organizational activities.
+</p>
+
+<p>
Many government agencies publish statistical information on the public Web. The <a href="http://www.w3.org/TR/vocab-data-cube/"> Data Cube Vocabulary</a> [[vocab-cube]] provides a means to do this using the <a href="http://www.w3.org/TR/ld-glossary/#rdf">Resource Description Framework (RDF)</a>. CSARVENón-Capadisli propose in [[CSARVEN]] the RDF Data Cube Vocabulary makes it possible to discover and identify statistical data artifacts in a uniform way and presents a design and implementation approach using the Data Cube Vocabulary. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations. The Data Cube vocabulary is a core foundation which supports extension vocabularies to enable publication of other aspects of statistical data flows or other multi-dimensional datasets.
</p>
+
+<h3>How to Find Existing Vocabularies</h3>
+
+There are search tools that collect, analyze and index vocabularies and semantic data available online for efficient access. Search tools that use structured data represented as Linked Data include: (<a href="http://ws.nju.edu.cn/falcons/">Falcons</a>,
+<a href="http://watson.kmi.open.ac.uk/WatsonWUI/">Watson</a>,
+<a href="http://sindice.com/">Sindice</a>, <a href="http://swse.deri.org/">Semantic Web Search Engine</a>,
+<a href="http://swoogle.umbc.edu/">Swoogle</a>, and <a href="http://schemapedia.com/">Schemapedia</a>).<br /><br />. Others include the <a href="http://lov.okfn.org/">LOV</a> directory,
+<a href="http://prefix.cc">Prefix.cc</a>,
+<a href="http://bioportal.bioontology.org/">Bioportal (biological domain)</a> and the European Commission's
+<a href="https://joinup.ec.europa.eu/catalogue/repository">Joinup platform.</a>
+</p>
+
+<p class="highlight"><b>Where to find existing vocabularies in data catalogues</b><br />
+
+Another way around is to perform search using the previously identified key terms in datasets catalogs. Some of these catalogs provide samples of how the underlying data was modeled and used. One popular catalogue is the: <a href="http://datahub.io/">Data Hub</a>.
+</p>
+
</section>
-<!-- Conformance for Vocabularies -->
-<section id='conformanceForVocabs'>
-
-<h2>Conformance for Vocabularies</h2>
-A data interchange, however that interchange occurs, is <b>conformant</b> with a vocabulary if:
-
-<ul>
-<li>it is within the scope and objectives of the vocabulary;</li>
-
-<li>the classes and properties defined in the vocabulary are used in a way consistent with the semantics declared in its specification;</li>
-
-<li> it does not use terms from other vocabularies instead of ones defined in the vocabulary that could reasonably be used.</li>
-
-</ul>
-A conforming data interchange:
-<ul>
- <li>MAY include terms from other vocabularies;</li>
- <li>MAY use a non-empty subset of terms from the vocabulary.</li>
-</ul>
-A vocabulary profile is a specification that adds additional constraints
-to it. Such additional constraints in a profile may include:
-<ul>
-<li>a minimum set of terms that must be used;</li>
-<li>classes and properties not covered in the vocabulary; </li>
-<li>controlled vocabularies or URI sets as acceptable values for properties.</li>
-</ul>
-</section>
-
-
-<!-- Discovery Checklist -->
-<section id='LINK'>
-<h2>Vocabulary Discovery Checklist</h2>
-<p>The following checklist is a guide to helping developers determine whether an
-existing vocabulary would be a reasonable candidate for use by a government authority.
-</p>
-
-<p class="highlight"><b>Specify the domain</b><br />
-<i>What it means:</i> Identify the domain scope of the vocabulary <br />
-<i>Examples of domain: Geography, Environment, Administrations, State Services, Statistics, People, Organization.</i> </p>
-
-<p class="highlight"><b>Identify relevant keywords in the dataset</b><br />
- <i>What it means:</i> Identify words that describe the main ideas or concepts. By identifying the relevant keywords or categories of your dataset, it helps search engines that employ algorithms that utilize structured data to improve query results. <br /><br />
-</p>
-
-
-<p class="highlight"><b>Searching for a vocabulary in one specific language</b><br />
- <i>What it means:</i> Many of the available vocabularies are in English. You may be aware of having a vocabulary in your own language.
- Consider this issue as it may restrict your search. Sometimes it might be useful to translate some of the keywords to English.
-</p>
-
+<!-- Vocabulary Checklist -->
+<section id="vocab-checklist">
+<h2>Vocabulary Checklist</h2>
-<p class="highlight"><b>How to find vocabularies</b><br />
-There are search tools that collect, analyze and index vocabularies and semantic data available online for efficient access. Search tools that use structured data represented as Linked Data include: (<a href="http://ws.nju.edu.cn/falcons/">Falcons</a>,
-<a href="http://watson.kmi.open.ac.uk/WatsonWUI/">Watson</a>,
-<a href="http://sindice.com/">Sindice</a>, <a href="http://swse.deri.org/">Semantic Web Search Engine</a>,
-<a href="http://swoogle.umbc.edu/">Swoogle</a>, and <a href="http://schemapedia.com/">Schemapedia</a>).<br /><br />.
-Others include the <a href="http://lov.okfn.org/">LOV</a> directory,
-<a href="http://prefix.cc">Prefix.cc</a>,
-<a href="http://bioportal.bioontology.org/">Bioportal (biological domain)</a> and the European Commission's
-<a href="https://joinup.ec.europa.eu/catalogue/repository">Joinup platform.</a>
-</p>
+<p class="note">
+It is best practice to use or extend an existing vocabulary before creating a new vocabulary. This
+section provides a set of considerations aimed at helping stakeholders review a vocabulary to evaluate its usefulness.
+</p>
-<p class="highlight"><b>Where to find existing vocabularies in datasets catalogues</b><br />
- <i>What it means:</i>Another way around is to perform search using the previously
-identified key terms in datasets catalogs. Some of these catalogs provide samples
-of how the underlying data was modeled and how it was used for.<br /><br />
- One popular catalogue is the: <a href="http://datahub.io/">Data Hub</a>.
-</p>
-</section>
-
-
-<!-- Vocabulary Selection Criteria -->
-<section id="vocab-checklist">
-<h2>Vocabulary Selection Criteria</h2>
-
-<p class="note"> This checklist aims to help in vocabulary selection, in summary: (1)-
-<b>ensure vocabularies you use are published by a trusted group or organization</b>; (2)-
-<b>ensure vocabularies have permanent URIs</b> and (3)
+<p>Some basics:
+<b>ensure vocabularies you use are published by a trusted group or organization;</b>
+<b>ensure vocabularies have permanent URIs; and </b>
<b>confirm the versioning policy</b>.
</p>
<p class="highlight"><b>Vocabularies MUST be documented</b><br />
- <i>What it means:</i> A vocabulary MUST be documented. This includes
-the liberal use of labels and comments, as well as appropriate language tags. The
-publisher must provide human-readable pages that describe the vocabulary, along with
+
+A vocabulary MUST be documented. This includes the liberal use of labels and comments, as well as appropriate language tags. The publisher must provide human-readable pages that describe the vocabulary, along with
its constituent classes and properties. Preferably, easily comprehensible use-cases should be defined and documented.
</p>
<p class="highlight"><b>Vocabularies SHOULD be self-descriptive</b><br />
- <i>What it means:</i> Each property or term in a vocabulary should have a Label, Definition and Comment defined.
- Self-describing data suggests that information about the encodings used for each representation is provided
-explicitly within the representation. The ability for Linked Data to describe itself, to place
-itself in context, contributes to the usefulness of the underlying data.<br /><br />
+
+Each property or term in a vocabulary should have a Label, Definition and Comment defined. Self-describing data suggests that information about the encodings used for each representation is provided explicitly within the representation. The ability for Linked Data to describe itself, to place itself in context, contributes to the usefulness of the underlying data.<br /><br />
+
For example, the widely-used Dublin Core vocabulary (formally <code>DCMI Metadata Terms</code>)
has a Term Name <a href="http://dublincore.org/documents/dcmi-terms/#terms-contributor">Contributor</a> which has a:<br />
<code>Label: Contributor </code><br />
@@ -581,70 +627,60 @@
</p>
<p class="highlight"><b>Vocabularies SHOULD be described in more than one language</b><br />
- <i>What it means:</i> Multilingualism should be supported by the vocabulary, i.e.
-all the elements of the vocabulary should have labels, definitions and comments
-available in the government's official language(s), e.g. Spanish and at least in English.
- That is also very important as the documentation should be clear enough with
-appropriate tags for the language used for the comments or labels.<br /><br />
+
+Multilingualism should be supported by the vocabulary, i.e. all the elements of the vocabulary should have labels, definitions and comments available in the government's official language(s), e.g. Spanish and at least in English.
+This is also important as the documentation should suppoly appropriate tags for the language used for the comments or labels.<br /><br />
+
For example, for the same term <a href="http://dublincore.org/documents/dcmi-terms/#terms-contributor">Contributor</a><br />
+
<code>rdfs:label "Contributor"@en, "Colaborador"@es<br />
- rdfs:comment "Examples of a Contributor include a person, an organization, or a
-service"@en , "Ejemplos de collaborator incluyen persona, organización o servicio"@es<br /></code>
+ rdfs:comment "Examples of a Contributor include a person, an organization, or a service"@en , "Ejemplos de collaborator incluyen persona, organización o servicio"@es<br /></code>
</p>
<p class="highlight"><b>Vocabularies SHOULD be used by other datasets</b><br />
- <i>What it means:</i> If the vocabulary is used by other authoritative Linked Open datasets that
-is helpful. It is in re-use of vocabularies that we achieve the benefits of Linked Open Data.
-Selected vocabularies from third parties should be already in use by other datasets, as this shows
-that they are already established in the LOD community, and thus better candidates for wider adoption and reuse. <br /><br />
- For example: An analysis on the <a href="http://stats.lod2.eu/vocabularies">use of vocabularies</a> on
+If the vocabulary is used by other authoritative Linked Open datasets that is helpful. It is in re-use of vocabularies that we achieve the benefits of Linked Open Data. Selected vocabularies from third parties should be already in use by other datasets, as this shows that they are already established in the LOD community, and thus better candidates for wider adoption and reuse. <br /><br />
+
+For example: An analysis on the <a href="http://stats.lod2.eu/vocabularies">use of vocabularies</a> on
the Linked Data cloud reveals that <a href="http://xmlns.com/foaf/0.1">FOAF</a> is reused by more than 55 other vocabularies.
</p>
<p class="highlight"><b>Vocabularies SHOULD be accessible for a long period</b><br />
- <i>What it means:</i> The vocabulary selected should provide some guarantee of maintenance over a specified
+
+The vocabulary selected should provide some guarantee of maintenance over a specified
period, ideally indefinitely.
</p>
<p class="highlight"><b>Vocabularies SHOULD be published by a trusted group or organization</b><br />
- <i>What it means:</i> Although anyone can create a vocabulary, it is always better to check
+Although anyone can create a vocabulary, it is always better to check
if it is one person, group or authoritative organization that is responsible for publishing and maintaining the vocabulary.
</p>
<p class="highlight"><b>Vocabularies SHOULD have persistent URLs</b><br />
- <i>What it means:</i> Persistent access to the server hosting the vocabulary, facilitating reusability is necessary.<br /><br />
- Example: The <a href="http://lov.okfn.org/dataset/lov/details/vocabulary_geo.html">Geo W3C vocabulary</a> [[vocab-geo]]
-is one of the most used vocabularies for a basic representation of geometry points (latitute/longitude) and has been around
-since 2009, always available at the same namespace.
+Persistent access to the server hosting the vocabulary, facilitating reusability is necessary.<br /><br />
+
+Example: The <a href="http://lov.okfn.org/dataset/lov/details/vocabulary_geo.html">Geo W3C vocabulary</a> [[vocab-geo]] is one of the most used vocabularies for a basic representation of geometry points (latitute/longitude) and has been around since 2009, always available at the same namespace.
</p>
-<p class="highlight"><b>Vocabularies should provide a versioning policy</b><br />
- <i>What it means:</i> The publisher ideally will address compatibility of versions over time. Major
-changes to the vocabularies should be reflected in the documentation.
+<p class="highlight"><b>Vocabularies SHOULD provide a versioning policy</b><br />
+
+The publisher ideally will address compatibility of versions over time. Major changes to the vocabularies should be reflected in the documentation.
</p>
+
</section>
-<!-- << Vocabulary creation -->
-<section id="MODEL">
+<!-- Vocabulary creation -->
+
+<section id="VOCAB-CREATION">
<h2>Vocabulary Creation</h2>
-<p><i>There will be cases in which authorities will need to mint their own vocabulary terms. This
-section provides a set of considerations aimed at helping to government stakeholders mint
-their own vocabulary terms. This section includes some items of the previous section
-because some recommendations for vocabulary selection also apply to vocabulary creation.</i> </p>
-<p class="note"> Ensure new vocabularies you create are:
-<b>self-descriptive </b>,
-<b>described in more than one language</b> (ideally),
-<b>accessible for a long period</b> ,
-<b>link to other vocabularies by re-using elements rather than re-inventing</b> ,
-<b>on the Web at a stable URI using an open license </b>.
-</p>
+<p class="note">
+This section provides a set of informative considerations aimed at stateholder who need to create their own vocabularies. This section includes some items of the previous section because some recommendations for vocabulary selection also apply to vocabulary creation.
+</p>
<p class="highlight"><b>Define the URI of the vocabulary.</b><br />
- <i>What it means:</i> The URI that identifies your vocabulary must be defined. This is strongly
-related to the Best Practices described in section URI Construction.
- <br /><br />
+
+The URI that identifies your vocabulary must be defined. This is strongly related to the Best Practices described in section URI Construction.<br /><br />
For example: If we are minting new vocabulary terms from a particular government, we
should define the URI of that particular vocabulary.
@@ -668,20 +704,24 @@
</p>
<p class="highlight"><b>Vocabularies should be described in more than one language</b><br />
- <i>What it means:</i> Multilingualism should be supported by the vocabulary, i.e., all the elements of the vocabulary should have labels, definitions and comments available in the government's official language, e.g., Spanish, and at least in English. That is also very important as the documentation should be clear enough with appropriate tag for the language used for the comments or labels.<br /><br />
+
+Multilingualism should be supported by the vocabulary, i.e., all the elements of the vocabulary should have labels, definitions and comments available in the government's official language, e.g., Spanish, and at least in English. That is also very important as the documentation should be clear enough with appropriate tag for the language used for the comments or labels.<br /><br />
+
For example, for the same term <a href="http://dublincore.org/documents/dcmi-terms/#terms-contributor"><code>Contributor</code></a><br />
<code>rdfs:label "Contributor"@en, "Colaborador"@es<br />
rdfs:comment "Examples of a Contributor include a person, an organization, or a service"@en , "Ejemplos de collaborator incluyen persona, organización o servicio"@es<br /></code>
</p>
<p class="highlight"><b>Vocabularies should provide a versioning policy</b><br />
- <i>What it means:</i> It refers to the mechanism put in place by the publisher to always take
+
+It refers to the mechanism put in place by the publisher to always take
care of backward compatibilities of the versions, the ways those changes affected the previous
versions. Major changes of the vocabularies should be reflected on the documentation, in both machine or human-readable formats.
</p>
<p class="highlight"><b>Vocabularies should provide documentation</b><br />
- <i>What it means:</i> A vocabulary should be well-documented for machine readable (use of labels and
+
+A vocabulary should be well-documented for machine readable (use of labels and
comments; tags to language used). Also for human-readable, an extra documentation should be provided by
the publisher to better understand the classes and properties, and if possible with some
valuable use cases. <b>Provide human-readable documentation and basic metadata such as
@@ -689,7 +729,8 @@
</p>
<p class="highlight"><b>Vocabulary should be published following available best practices</b><br />
- <i>What it means:</i> <b>Publish your vocabulary on the Web at a stable URI using an open license.</b>. One
+
+<b>Publish your vocabulary on the Web at a stable URI using an open license.</b>. One
of the goals is to contribute to the community by sharing the new vocabulary. To this end,
it is recommended to follow available recipes for publishing RDF vocabularies e.g.
<a href="http://www.w3.org/TR/swbp-vocab-pub/">Best Practice Recipes for Publishing RDF Vocabularies</a> [[bp-pub]].
@@ -697,70 +738,7 @@
</section>
-<!-- Vocabulary Creation -->
-<!--TODO: put this in other sections without mentioning 5-stars: recommendation of Sandro -->
-
-<!--<br />
-<b>Is your Linked Data Vocabulary 5-star?</b>
-<br />
-Inspired by the 5-star linked data scale <a href="http://5stardata.info/">5-Star Scheme</a>, suggestions on creating a 5-star vocabulary <a href="http://bvatant.blogspot.fr/2012/02/is-your-linked-data-vocabulary-5-star_9588.html">[[BV5STAR]]</a>. -->
-
-<!--<p class="highlight">☆ <b>Publish your vocabulary on the Web at a stable URI using an open license.</b>
-</p> -->
-
-<!--<p class="highlight">☆☆ <b>Provide human-readable documentation and basic metadata such as creator, publisher, date of creation, last modification, version number.</b>
-</p> -->
-
-<!--<p class="highlight">☆☆☆ <b>Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes.</b>
-</p> -->
-
-<!--<p class="highlight">☆☆☆☆ <b>Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation.</b>
-</p> -->
-
-<!--<p class="highlight">☆☆☆☆☆ <b>Link to other vocabularies by re-using elements rather than re-inventing.</b>
-</p> -->
-
-
-<section id="multilingual">
-<h2>Multilingual Vocabularies</h2>
-<p class="highlight"><b>While designing a vocabulary, provide labels and descriptions
-if possible, in several languages, to make your vocabulary usable in multiple linguistic scopes.</b> </p>
-<p>
-This section provides some considerations when we are dealing with multilingualism
-in vocabularies. For more details on the multilingualism on the Web, see the
-<a href="http://www.w3.org/International/multilingualweb/lt/"> MultilingualWeb-LT Working Group</a> </p>
-<p> We have identified that multilingualism in vocabularies can be found nowadays in the following formats:
-</p>
-<ul>
- <li>As a set of <code>rdfs:label</code> in which the language has been restricted
-(@en, @fr...). Currently, this is the most commonly used approach. </li>
- <!--remove this, suggested by Phil? It is also a best practice to always include an <code>rdfs:label</code> for which the language tag in not indicated. This term corresponds to the <b>"default"</b>language of the vocabulary</li> -->
-
- <li>As <code>skos:prefLabel</code> (or <code>skosxl:Label</code>), in which the language has also been restricted.</li>
- <li>As a set of monolingual ontologies (ontologies in which labels are expressed in one natural language) in
-the same domain mapped or aligned to each other (see the example of EuroWordNet, in which wordnets in
-different natural languages are mapped to each other through the so-called <code>ILI - inter-lingual-index-</code>,
-which consists of a set of concepts common to all categorizations).</li>
- <li>As a set of ontology + lexicon. This is an approach to the representation
-of linguistic (multilingual) information associated to ontologies. The idea is that the ontology
-is associated to an external ontology of linguistic descriptions. One of the best exponents
-in this case is the <a href="http://lexinfo.net/">lemon model</a>, an ontology of linguistic
-descriptions that is to be related with the concepts and properties in an ontology to provide
-lexical, terminological, morphosyntactic, etc., information. One of the main advantages of
-this approach is that semantics and linguistic information are kept separated. One can link several
-lemon models in different natural languages to the same ontology.</li>
- <li> It could be also useful to use the <a href="http://www.lexinfo.net/lmf#">lexInfo</a> ontology
-where they provide stable resources for languages, such as
-<a href="http://lexvo.org/id/iso639-3/eng"><code>http://lexvo.org/id/iso639-3/eng</code></a>
-for English, or <a href="http://lexvo.org/id/iso639-3/cmn"><code>http://lexvo.org/id/iso639-3/cmn</code></a> for Chinese Mandarin. </li>
-</ul>
-<p class="note">The current trend is to follow the first approach, i.e. to
-use at least a <code>rdfs:label</code> and <code>rdfs:comment</code> for each term in the vocabulary.</p>
-
-</section>
-
-
-<!-- Using SKOS to create a controlled vocavulary -->
+<!-- Using SKOS to create a controlled vocabulary -->
<section id='skos'>
<h2>Using SKOS to Create a Controlled Vocabulary</h2>
@@ -791,8 +769,87 @@
</div>
</section>
-<!-- NOTE TO EDITORS: These are notes, not cogent content for a BP doc. Major editing required!
+<section id="multilingual">
+<h2>Multilingual Vocabularies</h2>
+
+<p>
+This section is not comprehensive however, is intended to mention some of the issues identified by the Working Group and some of the work performed by others in relation to publishing Linked Data in multiple languages. For more details on the multilingualism on the Web, see the <a href="http://www.w3.org/International/multilingualweb/lt/"> MultilingualWeb-LT Working Group</a>
+</p>
+
+<p class="highlight"><b>Multilingual Vocabularies broaden Search</b><br />
+
+As of the writing of this Note, many of the available Linked Data vocabularies are in English. This may restrict your content from being searched by multilingual search engines and by non-English speakers.</p>
+
+<p class="highlight"><b>If designing a vocabulary, provide labels and descriptions if possible, in several languages, to make the vocabulary usable by a global audience.</b> </p>
+
+<p> Multilingual vocabularies may be found in the following formats:
+</p>
+<ul>
+ <li>As a set of <code>rdfs:label</code> in which the language has been restricted (@en, @fr...). Currently, this is the most commonly used approach. </li>
+
+ <li>As <code>skos:prefLabel</code> (or <code>skosxl:Label</code>), in which the language has also been restricted.</li>
+
+ <li>As a set of monolingual ontologies (ontologies in which labels are expressed in one natural language) in
+the same domain mapped or aligned to each other (see the example of EuroWordNet, in which wordnets in
+different natural languages are mapped to each other through the so-called <code>ILI - inter-lingual-index-</code>,
+which consists of a set of concepts common to all categorizations).</li>
+
+ <li>As a set of ontology + lexicon. This is an approach to the representation
+of linguistic (multilingual) information associated to ontologies. The idea is that the ontology
+is associated to an external ontology of linguistic descriptions. One of the best exponents
+in this case is the <a href="http://lexinfo.net/">lemon model</a>, an ontology of linguistic
+descriptions that is to be related with the concepts and properties in an ontology to provide
+lexical, terminological, morphosyntactic, etc., information. One of the main advantages of
+this approach is that semantics and linguistic information are kept separated. One can link several
+lemon models in different natural languages to the same ontology.</li>
+
+ <li> It could be also useful to use the <a href="http://www.lexinfo.net/lmf#">lexInfo</a> ontology
+where they provide stable resources for languages, such as
+<a href="http://lexvo.org/id/iso639-3/eng"><code>http://lexvo.org/id/iso639-3/eng</code></a>
+for English, or <a href="http://lexvo.org/id/iso639-3/cmn"><code>http://lexvo.org/id/iso639-3/cmn</code></a> for Chinese Mandarin. </li>
+</ul>
+
+<p class="note">The current trend is to follow the first approach, i.e. to use at least a <code>rdfs:label</code> and <code>rdfs:comment</code> for each term in the vocabulary.</p>
+
+</section>
+
+
+<!--
+
+NOTE TO EDITORS: This was deemed too technical for a general Linked Data best practices guide but is left for your future consideration.
+
+<section id='conformanceForVocabs'>
+
+<h2>Conformance for Vocabularies</h2>
+A data interchange, however that interchange occurs, is <b>conformant</b> with a vocabulary if:
+
+<ul>
+<li>it is within the scope and objectives of the vocabulary;</li>
+
+<li>the classes and properties defined in the vocabulary are used in a way consistent with the semantics declared in its specification;</li>
+
+<li> it does not use terms from other vocabularies instead of ones defined in the vocabulary that could reasonably be used.</li>
+
+</ul>
+A conforming data interchange:
+<ul>
+ <li>MAY include terms from other vocabularies;</li>
+ <li>MAY use a non-empty subset of terms from the vocabulary.</li>
+</ul>
+A vocabulary profile is a specification that adds additional constraints
+to it. Such additional constraints in a profile may include:
+<ul>
+<li>a minimum set of terms that must be used;</li>
+<li>classes and properties not covered in the vocabulary; </li>
+<li>controlled vocabularies or URI sets as acceptable values for properties.</li>
+</ul>
+</section>
+
+-->
+
+
+<!-- NOTE TO EDITORS: These are useful notes that we didn't have time to properly edit in the December 2013 BP doc. Major editing is required.
<section id="howto">
<h2>Best Practice for choosing entity URIs</h2>
@@ -868,53 +925,50 @@
</section>
-->
-<!-- << URI CONSTRUCTION -->
+<!-- URI CONSTRUCTION -->
<section id="HTTPURIS">
<h2>URI Construction</h2>
-<!--<p class="issue"> The editors will rephrase better this content and may extend it </p> -->
-The following guidance is provided with the intention to address URI minting, i.e., URI
-creation for vocabularies, concepts and datasets. This section specifies how to create good
-URIs for use in government linked data. Input documents include:
+<p>
+The following guidance is has been developed by organizations involved in URI strategy and implementation for government agencies:
<ul>
<li>Cool URIs for the Semantic Web [[COOLURIS]]</li>
+
<li><a href="http://data.gov.uk/resources/uris" title="Creating URIs | data.gov.uk">Designing URI</a> Sets for the UK Public Sector [[uk-govuri]]</li>
+
+ <li><a href="http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector">Designing URI Sets for the UK Public Sector</a>, a document from the UK Cabinet offices that defines the
+design considerations on how to URIs can be used to publish public sector reference data;</li>
+
<!--<li><a href="http://data.gov.uk/resources/uris" title="Creating URIs | data.gov.uk">Creating URIs</a> (data.gov.uk).</li> -->
<li> <a href="http://philarcher.org/diary/2013/uripersistence/">10 rules for persistent URI</a> </li>
+
<li> <a href="http://www.w3.org/2013/04/odw/odw13_submission_14.pdf">Draft URI Strategy for the NL Public Sector</a> (PDF) </li>
+</ul>
+</p>
+
+<p>
+General-purpose guidelines exist for the URI designer to consider, including
+<ul>
+ <li> <a href="http://www.w3.org/TR/cooluris/">Cool URIs for the Semantic Web</a>, which provides guidance on
+how to use URIs to describe things that are not Web documents; </li>
+
<li> <a href="http://dcevents.dublincore.org/index.php/IntConf/dc-2011/paper/download/47/15">Style Guidelines for Naming and Labeling Ontologies in the Multilingual Web</a> (PDF)</li>
</ul>
-
-<!--Removed suggestion of Boris
-<p>The purpose of URIs is to uniquely and reliably name resources on the Web. According to Cool URIs for the Semantic Web [[COOL-SWURIS]] (W3C IG Note), URIs should be designed with simplicity, stability and manageability in mind, thinking about them as identifiers rather than as names for Web resources.
-</p> -->
-
+</p>
+</section>
-<p>
-Many general-purpose guidelines exist for the URI designer to consider, including
-<a href="http://www.w3.org/TR/cooluris/">Cool URIs for the Semantic Web</a>, which provides guidance on
-how to use URIs to describe things that are not Web documents;
-<a href="http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector">Designing URI
-Sets for the UK Public Sector</a>, a document from the UK Cabinet offices that defines the
-design considerations on how to URIs can be used to publish public sector reference data; and (3)
-<a href="http://dcevents.dublincore.org/index.php/IntConf/dc-2011/paper/download/47/15">Style
-Guidelines for Naming and Labeling Ontologies in the Multilingual Web</a> (PDF), which proposes
-guidelines for designing URIs in a multilingual scenario.
-</p>
-
-<h3>URI Design Principles</h3>
+<!-- URI PRINCIPLES -->
+<section id="URIPRINCIPLES">
+<h2>URI Design Principles</h2>
<p>The Web makes use of the <a href="http://www.w3.org/TR/ld-glossary/#uri">URI</a>
as a single global identification system. The global scope of URIs promotes large-scale
"network effects". Therefore, in order to benefit from the value of LD, government and governmental
agencies need to identify their <a href="http://www.w3.org/TR/ld-glossary/#resource">resources</a> using
URIs. This section provides a set of general principles aimed at helping government stakeholders
-to define and manage URIs for their resources.</p>
-
-<!--
-
-NOTE TO EDITORS: (BOH) suggests this does not belong in a best practices document.
+to define and manage URIs for their resources.
+</p>
<p class="highlight"><b>Use HTTP URIs</b><br />
-<i>What it means:</i> To benefit from and increase the value of the World Wide Web, governments and
+To benefit from and increase the value of the World Wide Web, governments and
agencies SHOULD provide HTTP URIs as identifiers for their resources. There are many
benefits to participating in the existing network of URIs, including linking, caching, and indexing
by search engines. As stated in [[howto-lodp]], HTTP URIs enable people to "look-up" or
@@ -923,7 +977,7 @@
</p>
<p class="highlight"><b>Provide at least one machine-readable representation of the resource identified by the URI</b><br />
-<i>What it means:</i> In order to enable HTTP URIs to be "dereferenced", data publishers have
+In order to enable HTTP URIs to be "dereferenced", data publishers have
to set up the necessary infrastructure elements (e.g. TCP-based HTTP servers) to
serve representations of the resources they want to make available (e.g. a human-readable
HTML representation or a machine-readable Turtle). A publisher may supply zero or
@@ -933,13 +987,9 @@
</p>
<p class="highlight"><b>A URI structure will not contain anything that could change</b><br />
-<i>What it means:</i> It is good practice that URIs do not contain anything that
-could easily change or that is expected to change like session tokens or other
-state information. URIs should be stable and reliable in order to maximize the possibilities
-of reuse that Linked Data brings to users. There must be a balance between making URIs
+It is good practice that URIs do not contain anything that could easily change or that is expected to change like session tokens or other state information. URIs should be stable and reliable in order to maximize the possibilities of reuse that Linked Data brings to users. There must be a balance between making URIs
readable and keeping them more stable by removing descriptive information that will likely
-change. For more information on this see [MDinURI] and
-<a href="http://www.w3.org/TR/webarch/#uri-opacity">Architecture of the World Wide Web: URI Opacity</a>.
+change. For more information on this see [MDinURI] and <a href="http://www.w3.org/TR/webarch/#uri-opacity">Architecture of the World Wide Web: URI Opacity</a>.
</p>
<p class="highlight"><b>URI Opacity</b><br />
@@ -953,45 +1003,11 @@
However, Web clients accessing such URIs SHOULD NOT parse or otherwise read into the meaning of URIs.
</p>
--->
-
-<!--
-
-NOTE TO EDITORS: (BOH) I do not think discussion of the TAG belongs in a BP document.
-
-<p class="note"><b>W3C Technical Architecture Group (TAG)</b><br />
-The World Wide Web Consortium's (W3C's) <a href="http://www.w3.org/2001/tag/">Technical Architecture
-Group (TAG)</a> is a special working group within the W3C, in charge of resolving issues involving
-general Web architecture. The group maintains a list of <a href="http://www.w3.org/2001/tag/#publications">publications</a> and
-findings, such as the architecture of the World Wide Web. [[webarch]]</p>
-
-<p ><b>TAG advices on http issues</b><br />
-The TAG provides advice to the community that they may mint
-<a href="http://www.w3.org/TR/ld-glossary/#uniform-resource-identifier">HTTP URIs</a> for
-any resource provided that they follow this simple rule for the sake of removing ambiguity as below:
-
-<div class="highlight">
- <ul>
- <li> If an <code>"http"</code> resource responds to a <code>GET</code> request with a <code>2xx</code> response, then the resource identified by that URI is an information resource;</li>
- <li> If an <code>"http"</code> resource responds to a <code>GET</code> request with a <code>303</code> (See Other) response, then the resource identified by that URI could be any resource;</li>
- <li> If an <code>"http"</code> resource responds to a <code>GET</code> request with a <code>4xx</code> (error) response, then the nature of the resource is unknown.
- </li>
-</ul>
-</div>
-
-<p>
- Linked Data and Semantic Web implementers have the requirement to return an HTTP 303 (See Other)
-response when resolving HTTP URI identifiers for conceptual or physical resources (that is,
-for resources whose canonical content is non-informational in nature. Current implementations of
-the Persistent URL (PURL) server provide support for 303 URIs [[Wood2010]]. Some issues
-remain unsettled and the TAG is most of the time involved to coordinate and make recommendations to implementers.
-</p>
-
+</h3>
</section>
--->
-<!-- NOTE TO EDITORS: (Bernadette) If we provide a list of questions, we must provide answers too! This is incomplete 'as is'
+<!-- NOTE TO FUTURE EDITORS: We didn't have enough time to polish this and therefore chose to omit at this time. For your future consideration.
<section>
<h5>A Checklist for Constructing URIs</h5>
@@ -1054,17 +1070,9 @@
-->
<section id="URI-POLICY">
-<h3>URI Policy for Persistence</h3>
-<!-- <p class="todo">To Review: Bernadette Hyland, John Erickson</p> -->
-
-<!-- Acknowledge D. Wood, "Reliable and Persistent Identification of Linked Data Elements" LED chapter, 2010 -->
+<h2>URI Policy for Persistence</h2>
-<p>Persistent identifiers are used by organizations interested in retaining addresses to information
-resources over the long term. Today, persistent identifiers are used to uniquely identify
-objects in the real world and concepts, in addition to information resources. For example, persistent
-identifiers have been created by the United Nations Food and Agriculture Organization (FAO) to
-provide URIs for major food crops. The National Center for Biomedical Ontology provides persistent identifiers
-to unify and address the terminology used in many existing biomedical databases.
+<p>Persistent identifiers are used to retain addresses to information resources over the long term. Persistent identifiers are used to uniquely identify objects in the real world and concepts, in addition to information resources. For example, persistent identifiers have been created by the United Nations Food and Agriculture Organization (FAO) to provide URIs for major food crops. The National Center for Biomedical Ontology provides persistent identifiers to unify and address the terminology used in many existing biomedical databases. The US Government Printing Office uses persistent identifiers to point to documents like the U.S. Budget that are deemed essential to a democratic, transparent government.
</p>
<p>
@@ -1082,21 +1090,13 @@
<p>
PURLs implement one form of persistent identifier for virtual resources. Other persistent identifier
-schemes include Digital Object Identifiers (DOIs), Life Sciences Identifiers (LSIDs) and INFO URIs. All persistent identification
-schemes provide unique identifiers for (possibly changing) virtual resources, but not all schemes provide curation
-opportunities. Curation of virtual resources has been defined as, <b>“the active involvement
+schemes include Digital Object Identifiers (DOIs), Life Sciences Identifiers (LSIDs) and INFO URIs. All persistent identificationschemes provide unique identifiers for (possibly changing) virtual resources, but not all schemes provide curation opportunities. Curation of virtual resources has been defined as, <b>“the active involvement
of information professionals in the management, including the preservation, of digital data
-for future use.”</b> [[yakel-07]] For a persistent identification scheme to provide a
-curation opportunity for a virtual resource, it must allow real-time resolution of that
-resource and also allow real-time administration of the identifier.
+for future use.”</b> [[yakel-07]] For a persistent identification scheme to provide a curation opportunity for a virtual resource, it must allow real-time resolution of that resource and also allow real-time administration of the identifier.
</p>
<p>URI persistence is a matter of policy and commitment on the part of the URI owner. The
-choice of a particular URI scheme provides no guarantee that those URIs
-will be persistent or that they will not be persistent. HTTP [[RFC2616]] has been designed to help
-manage URI persistence. For example, HTTP redirection (using the 3xx response codes) permits servers to
-tell an agent that further action needs to be taken by the agent in order to fulfill the request
-(for example, a new URI is associated with the resource).
+choice of a particular URI scheme provides no guarantee that those URIs will be persistent or that they will not be persistent. HTTP [[RFC2616]] has been designed to help manage URI persistence. For example, HTTP redirection (using the 3xx response codes) permits servers to tell an agent that further action needs to be taken by the agent in order to fulfill the request (for example, a new URI is associated with the resource).
</p>
<p>In addition, content negotiation also promotes consistency, as a site manager is not required to
@@ -1107,37 +1107,20 @@
</section>
-<section id="HUMAN">
-<h3>Internationalized Resource Identifiers</h3>
-
-<p><i>This section on Internationalized Resource Identifiers focuses on using non-ASCII characters in URIs
-and provides guidelines for those interested in minting URIs in their own languages (German, Dutch,
-Spanish, French, Chinese, etc.)</i></p>
+<section id="INTERNATIONAL">
+<h2>Internationalized Resource Identifiers</h2>
-<p>The URI syntax defined in [[RFC3986]]</a> STD 66 (Uniform Resource Identifier (URI): Generic Syntax) restricts
-URIs to a small number of characters: basically, just upper and lower case letters of the English
-alphabet, European numerals and a small number of symbols. There is now a growing need to
-enable use of characters from any language in URIs.
-</p>
-
-<p>The purpose of this section is to provide guidance to government stakeholders who are planning to create URIs
-using characters that go beyond the subset defined in [[RFC3986]]</a>.
-</p>
-
-<p>
-<a href='http://www.w3.org/TR/ld-glossary/index.html#iri'>IRI</a> (<a href="http://tools.ietf.org/html/rfc3987">RFC 3987</a>) is
-a new protocol element, that represents a complement to the Uniform Resource Identifier (URI). An IRI is
-a sequence of characters from the Universal Character Set (Unicode/ISO 10646) that can be therefore
-used to mint identifiers that use a wider set of characters than the one defined in [[RFC3986]]</a>.
+<p>Stakeholders who are planning to create URIs using characters that go beyond the subset defined in [[RFC3986]]</a> are encouraged to reference <a href='http://www.w3.org/TR/ld-glossary/index.html#iri'>IRI</a> (<a href="http://tools.ietf.org/html/rfc3987">RFC 3987</a>) is a protocol element, that represents a complement to the Uniform Resource Identifier (URI). An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646) that can be therefore used to mint identifiers that use a wider set of characters than the one defined in [[RFC3986]]</a>.
</p>
<p>The Internationalized Domain Name or IDN is a standard approach to dealing with multilingual domain
names was agreed by the <a href="http://www.w3.org/TR/ld-glossary/#internet-engineering-task-force-ietf">IETF</a> in March 2003.
</p>
+<p><i>Internationalized Resource Identifiers use non-ASCII characters in URIs which is relevent to those organizations interested in minting URIs in languages including German, Dutch, Spanish, French and Chinese.</i></p>
+
<p>Although there exist some standards focused on enabling the use of international characters in Web
-identifiers, government stakeholders need to take into account several issues before constructing such internationalized identifiers.
-This section is not meant to be exhaustive and we point the interested audience to
+identifiers, government stakeholders need to take into account several issues before constructing such internationalized identifiers. This section is not exhaustive and the editors point the interested audience to
<a href="http://www.w3.org/International/articles/idn-and-iri/">An Introduction to Multilingual Web Addresses</a>,
however some of the most relevant issues are following:
</p>
@@ -1151,113 +1134,13 @@
</li>
</ul>
-</section>
-
-
-<!-- SPECIFY LICENSE -->
-<section id="LICENSE">
-<h2>Specifying an Appropriate License</h2>
-
-<p>
-Specify an appropriate open license with the published data. People will only reuse data when there is a clear, acceptable license associated with it. Governments typically define ownership of works produced by government employees or contractors in legislation.
-</p>
-
-<p>
-It is beyond the charter of this working group to describe and recommend appropriate licenses for
-Open Government content published as Linked Data, however there are useful Web sites that
-offer detailed guidance and licenses. One valuable resource is the <a href="http://creativecommons.org/">Creative
-Commons</a> Web site. Creative Commons develops, supports, and stewards legal
-and technical infrastructure for digital content publishing.
+<p>The URI syntax defined in [[RFC3986]]</a> STD 66 (Uniform Resource Identifier (URI): Generic Syntax) restricts
+URIs to a small number of characters: basically, just upper and lower case letters of the English
+alphabet, European numerals and a small number of symbols. There is now a growing need to enable use of characters from any language in URIs.
</p>
-<p class="note">
-As an informative note, the UK and many former Commonwealth countries maintain the concept of
-the Crown Copyright. It is important to know who owns your data and to say so. The US
-Government designates information produced by civil servants as a U.S. Government Work, whereas
-contractors may produce works under a variety of licenses and copyright assignments. U.S.
-Government Works are not subject to copyright restrictions in the United States. It
-is critical for US government officials to know their rights and responsibilities under the Federal
-Acquisition Regulations (especially FAR Subpart 27.4, the Contract Clauses in 52.227-14, -17 and -20 and
-any agency-specific FAR Supplements) and copyright assignments if data is produced by
-a government contractor. It is recommended that governmental authorities publishing
-Linked Data review the relevant guidance for data published on the Web.
-</p>
-</section>
-
-<!-- DOMAIN AND HOSTING -->
-<section id="HOST">
-
-<h2>Domain and Hosting</h2>
-
-<!-- <p class='todo'>To drop must part of this section, Dave comments</p> -->
-
-<p>Within government agencies, hosting linked data may require submission and review
-of a security plan to the authority's security team. While security plan specifics will
-vary widely based on a range of factors like hosting environment and software
-configuration, the process for developing and getting a security plan approved can be
-streamlined if the appropriate advisors are involved early on in the process</p>
-
-<p>Security plans are typically comprised of a set of security controls, describing physical, procedural,
-technical and other processes and controls in a system which are in place to protect
-information access, availability and integrity, and for avoiding, counteracting and minimizing security
-risks. These are typically comprised of several layers, that can range from physical
-facility security, network and communications, to considerations of operating system, software, integration
-and many other elements. As such, there will typically be some common security controls which are inherited, and which may not be specific or unique to the linked data implementation, such as controls inherited from the hosting environment, whether cloud hosting provider, agency data center, et cetera. Additionally, some security controls will be inherited from the software vendors.</p>
-
-<p> Detailed considerations of security issues are beyond the scope of this document. </p>
-
-</section>
-
-
-<!-- SOCIAL_CONTRACT -->
-<section id="SOCIAL-CONTRAT">
-<h2>Publishers' "Social Contract"</h2>
+</section>
-<!-- <p class='todo'>To Review: Bernadette Hyland </p> -->
-
-<p>
-Publishers of Linked Data enter into an implicit social contract with users of their data.
-Publishers should recognize the responsibility to maintain data once it is published
-by a government authority. Ensure that the Linked Open dataset(s) your organization publishes
-remains available where you say it will be. Here is a summary of best practices
-that relate to the implicit "social contract". Additional informational details are included for reference.
-</p>
-<div class="note">
-<ul class="highlight">
-<li>Publish a description for each published dataset using [[vocab-dcat]] or [[void]] vocabulary;</li>
-<li>Associate metadata on the frequency of data updates;</li>
-<li>Associate a government appropriate license with all content your agency publishes if you wish to encourage re-use;</li>
-<li>Plan and implement a persistence strategy;</li>
-<li>Ensure data is accurate to the greatest degree possible;</li>
-<li>Publish an email address to report problematic data;</li>
-<li>Ensure the contact person or team responds to enquires via email or telephone, if necessary.</li>
-</ul> </div>
-
-<p>
-Giving due consideration to your organization's URI strategy should be one of the first activities
-your team undertakes as they prepare a Linked Open Data strategy. Authoritative data requires the
-permanence and resolution of HTTP URIs. If publishers move or remove data that was published
-to the Web, third party applications or mashups may break. This is considered rude for obvious
-reasons and is the basis for the Linked Data "social contract." A good way to prevent causing HTTP
-404s is for your organization to implement a persistence strategy.
-</p>
-
-</section>
-
-
-
-<section id="PROV">
-<h2>Provenance</h2>
-<!-- <p class='todo'>John Erickson (RPI)</p> -->
-
-<p>
-Provenance is information about entities, activities, and people involved in producing a piece of
-data or thing, which can be used to form assessments about its quality, reliability or
-trustworthiness. The <code>PROV</code> Family of Documents [[prov-o]] defines a model,
-corresponding serializations and other supporting definitions to enable the inter-operable
-interchange of provenance information in heterogeneous environments such as the Web.
-</p>
-</section>
<!-- << STABILITY.overview -->
<section id="stability-prop">
@@ -1278,38 +1161,71 @@
</section>
-
+<!--
-<!-- REFERENCE: LINKED DATA COOKBOOK -->
-<!-- <section>
- <h2>References</h2>
- <h3>Linked Open Data Cookbook</h3>
- <p>
-See <a href="http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook">Cookbook for Open Government Linked Data</a>.
+NOTE TO EDITORS: (Per Bernadette) I do not think discussion of the TAG belongs in a BP document.
+
+<p class="note"><b>W3C Technical Architecture Group (TAG)</b><br />
+
+The World Wide Web Consortium's (W3C's) <a href="http://www.w3.org/2001/tag/">Technical Architecture
+Group (TAG)</a> is a special working group within the W3C, in charge of resolving issues involving
+general Web architecture. The group maintains a list of <a href="http://www.w3.org/2001/tag/#publications">publications</a> and
+findings, such as the architecture of the World Wide Web. [[webarch]]</p>
+
+<p ><b>TAG advices on http issues</b><br />
+The TAG provides advice to the community that they may mint
+<a href="http://www.w3.org/TR/ld-glossary/#uniform-resource-identifier">HTTP URIs</a> for
+any resource provided that they follow this simple rule for the sake of removing ambiguity as below:
+
+<div class="highlight">
+ <ul>
+ <li> If an <code>"http"</code> resource responds to a <code>GET</code> request with a <code>2xx</code> response, then the resource identified by that URI is an information resource;</li>
+ <li> If an <code>"http"</code> resource responds to a <code>GET</code> request with a <code>303</code> (See Other) response, then the resource identified by that URI could be any resource;</li>
+ <li> If an <code>"http"</code> resource responds to a <code>GET</code> request with a <code>4xx</code> (error) response, then the nature of the resource is unknown.
+ </li>
+</ul>
+</div>
+
+<p>
+ Linked Data and Semantic Web implementers have the requirement to return an HTTP 303 (See Other)
+response when resolving HTTP URI identifiers for conceptual or physical resources (that is,
+for resources whose canonical content is non-informational in nature. Current implementations of
+the Persistent URL (PURL) server provide support for 303 URIs [[Wood2010]]. Some issues
+remain unsettled and the TAG is most of the time involved to coordinate and make recommendations to implementers.
</p>
- </section>
+
+</section>
+
-->
+
<!-- ACK -->
<section class="appendix">
<h2>Acknowledgments</h2>
- This document has been produced by the Government Linked Data Working Group, and its contents reflect
-extensive discussion within the Working Group as a whole.
+
<p>
-The editors gratefully acknowledge the many contributors to this Best Practices document
-including: <a href="http://www.about.me/david_wood/">David Wood</a> (3 Round Stones),
-<a href="http://www.epimorphics.com">Dave Reynolds</a>, (Epimorphics),
-<a href="http://www.w3.org/People/#phila">Phil Archer</a>, (W3C / ERCIM),
-<a href="http://logd.tw.rpi.edu/person/john_erickson">John Erickson</a> (Rensselaer Polytechnic Institute),
-<a href="http://csarven.ca/">Sarven Capadisli</a>,
-<a href="http://data.semanticweb.org/person/bernard-vatant/">Bernard Vatant </a> (Semantic Web - Mondeca),
-Michael Pendleton (U.S. Environmental Protection Agency),
+The editors wish to gratefully acknowledge the considerable contributions to the Linked Data Best Practices document by the following people: <a href="http://www.about.me/david_wood/">David Wood</a> (3 Round Stones, USA),
+<a href="http://www.epimorphics.com">Dave Reynolds</a>, (Epimorphics,UK),
+<a href="http://www.w3.org/People/#phila">Phil Archer</a>, (W3C / ERCIM, UK),
+<a href="http://logd.tw.rpi.edu/person/john_erickson">John Erickson</a> (Rensselaer Polytechnic Institute, USA),
+<a href= "http://nemo.inf.ufes.br/jpalmeida">João Paulo Almeida </a>, (Federal University of Espírito Santo, Brazil)
+<a href="http://csarven.ca/">Sarven Capadisli</a>, (UK)
+<a href="http://data.semanticweb.org/person/bernard-vatant/">Bernard Vatant </a> (Mondeca, France),
+Michael Pendleton (U.S. Environmental Protection Agency, USA),
<a href="http://researcher.watson.ibm.com/researcher/view_person_subpage.php?id=3088">Biplav Srivastava</a> (IBM India),
-<a href="http://www.oeg-upm.net">Daniel Vila </a> (Ontology Engineering Group),
-Martín Álvarez Espinar (CTIC-Centro Tecnológico),
-<a href="http://mhausenblas.info/#i">Michael Hausenblas</a> (MapR), and
-<a href="http://linkedgov.org">Hadley Beeman </a> (UK LinkedGov).
+<a href="http://www.oeg-upm.net">Daniel Vila </a> (Ontology Engineering Group, Universidad Politécnica de Madrid, UPM, Spain),
+Martín Álvarez Espinar (CTIC-Centro Tecnológico, Spain),
+<a href="http://mhausenblas.info/#i">Michael Hausenblas</a> (MapR, USA), and
+<a href="http://linkedgov.org">Hadley Beeman </a> (UK LinkedGov, UK). Please accept our apologies if we've inadvertantly omitted your name from this list as many people were absolutely instrumental in the production of this international publication.
+</p>
+<p>
+Thank you, grazie, gracias, obrigado, merci, धन्यवाद.
+</p>
+<p>
+This document has been produced by the Government Linked Data Working Group, and its contents reflect
+extensive discussion within the Working Group as a whole.
</p>
+
</section>