BP - URI Construction section included
authorBoris Villazon-Terrazas <bvillazon@fi.upm.es>
Thu, 23 Feb 2012 19:50:52 +0100
changeset 108 11e7dc631232
parent 107 ec86f8011e36
child 109 8b8abb920961
child 110 c17fe482db32
BP - URI Construction section included
bp/index.html
--- a/bp/index.html	Thu Feb 23 15:51:38 2012 +0100
+++ b/bp/index.html	Thu Feb 23 19:50:52 2012 +0100
@@ -330,23 +330,135 @@
 <section> 
 <h4>Design principles</h4>
 <p>The Web makes use of the URI (Uniform Resource Identifiers) as a single global identification system. The global scope of URIs promotes large-scale "network effects", in order to benefit from the value of Linked Data government and governmental agencies need to identify their resources using URIs. This section provides a set of general principles aimed at helping to government stakeholders to define and manage URIs for their resources.</p>
-<p class="highlight"><b>Identify with URIs</b><br>
+<p class="highlight"><b>Use HTTP URIs</b><br>
+What it means: To benefit from and increase the value of the World Wide Web, governments and agencies SHOULD provide HTTP URIs as identifiers for their resources. There are many benefits to participating in the existing network of URIs, including linking, caching, and indexing by search engines. As stated in [LDPrinciples], HTTP URIs enable people to "look-up" or "dereference" a URI in order to access a representation of the resource identified by that URI.
 To benefit from and increase the value of the World Wide Web, data publishers SHOULD provide URIs as identifiers for their resources.
 </p>
+<p class="highlight"><b>Provide at least one machine-readable representation of the resource identified by the URI</b><br>
+What it means: In order to enable HTTP URIs to be "dereferenced", data publishers have to set up the neccesary infraestructure elements (e.g. TCP-based HTTP servers) to serve representations of the resources they want to make available (e.g. a human-readable HTML representation or a machine-readable RDF/XML). A publisher may supply zero or more representations of the resource identified by that URI. However, there is a clear benefit to data users in providing at least one machine-readable representation. More information about serving different representations of a resource can be found in <a href="http://www.w3.org/TR/cooluris/" target="_blank">Cool URIs for the Semantic Web</a>.
+</p>
+<p class="highlight"><b>A URI structure will not contain anything that could change</b><br>
+What it means: 	It is good practice that URIs do not contain anything that could easily change or that is expected to change like session tokens or other state information. URIs should be stable and reliable in order to maximize the possibilities of reuse that Linked Data brings to users. There must be a balance between making URIs readable and keeping them more stable by removing descriptive information that will likely change. For more information on this see [MDinURI] and <a href="http://www.w3.org/TR/cooluris/" target="_blank">Architecture of the World Wide Web: URI Opacity</a>.
+</p>
 </section>
 
 <section> 
-<h4>Best Practices Checklist<h4>
+<h4>Best Practices Checklist</h4>
+
+<section>	
+<h5>High-level Considerations for Constructing URIs</h5>
+<p>The purpose of URIs is to uniquely and reliably name resources on the Web. According to <a href="http://www.w3.org/TR/cooluris/" target="_blank">Cool URIs for the Semantic Web</a> (W3C IG Note), URIs should be designed with simplicity, stability and manageability in mind, thinking about them as identifiers rather than as names for Web resources.
+</p>
+<p>
+Many general-purpose guidelines exist for the URI designer to consider, including <a href="http://www.w3.org/TR/cooluris/" target="_blank">Cool URIs for the Semantic Web</a>, which provides guidance on how to use URIs to describe things that are not Web documents; <a href="http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector" target="_blank">Designing URI Sets for the UK Public Sector</a>, a document from the UK Cabinet offices that defines the design considerations on how to URIs can be used to publish public sector reference data; and (3) <a href="http://bit.ly/xJwA9g" target="_blank">Style Guidelines for Naming and Labelling Ontologies in the Multilingual Web</a> (PDF), which proposes guidelines for designing URIs in a multilingual scenario.
+</p>
+<p>The purpose of this subsection is to provide specific, practical guidance to government stakeholders who are planning to create systems for publishing government Linked Data and therefore must create sensible, sustainable URI designs that fit their specific requirements.
+</p>
+</section>
+	
+<section>
+<h5>A Checklist for Constructing Government URIs</h5>
+<p>The following checklist is based in part on <a href="http://data.gov.uk/resources/uris" target="_blank">Creating URIs</a> (short; on the Web) and <a href="http://www.cabinetoffice.gov.uk/sites/default/files/resources/designing-URI-sets-uk-public-sector.pdf" target="_blank">Designing URI Sets for the UK Public Sector</a> (long; in PDF).
+</p>
+<ol type="1">
+	<li>
+		What will your proposed URIs name? Will they:
+		<ul>
+			<li>Point to something downloadable? (e.g. PDF, CSV, RDF, TTL or ZIP files)</li>
+			<li>Identify some real world thing? (e.g. school, department, agency)</li>
+			<li>Point to information about a real world thing?</li>
+			<li>Identify some abstract thing? (e.g. a position, a service, a relationship)</li>
+			<li>Define a concept? (e.g. a vocabulary term or metadata element)</li>
+		</ul>
+	</li>
+	<li>
+		Do you already have (non-URI) names for those things? (e.g. using other information systems)
+	</li>
+	<li>
+		Do URIs already exist for naming these things?
+		<ul>
+			<li>Are you sure that the existing URIs refer to the same thing as you intend?</li>
+		</ul>
+	</li>
+	<li>
+		Do you have any strong syntax preferences or requirements?
+		<ul>
+			<li>Will your stakeholders need to easily write the chosen URI on a piece of paper, or remember it easily?</li>
+			<li>Will you spell URIs on the phone?</li>
+			<li>Will the URIs need to give hints about the content of the resource?</li>
+			<li>Is it necessary for the URI structure to make guessing of related URIs easier?</li>
+		</ul>
+	</li>
+	<li>
+		What are the long-term persistence requirements of your URIs?
+		<ul>
+			<li>Should the URIs you create still make sense if the named resource evolves?</li>
+			<li>How far into the future must your resolvable URIs lead to results (e.g. data, documents, definitions)</li>
+		</ul>
+	</li>
+	<li>
+		Will you need to move the URI-named resources in the future?
+		<ul>
+			<li>Will such moves be related to organizational changes and may need to be reflected in the URIs?</li>
+			<li>Will these moves be technical only and should not need to be reflected in the URIs?</li>
+		</ul>
+	</li>
+	<li>
+		Should the government sector (e.g. "Health," "Energy," "Defense") be included in the domain of the URI?
+		<ul>
+			<li>Have these sectors been defined formally (e.g. by statute)?</li>
+			<li>Will informal or equivalent sector names also be used?</li>
+		</ul>
+	</li>
+	<li>Is sensible resolution of partial/incomplete URIs necessary or anticipated?</li>
+</ol>
+</section>	
 </section>
 
 <section> 
-<h4>URI Persistence<h4>
+<h4>URI Persistence</h4>
+<p>@@TODO@@ Expand this section (Bernadette)</p>
+<p><i>Advice, info related to persistent URIs</i></p>
+<p>As is the case with many human interactions, confidence in interactions via the Web depends on stability and predictability. For an information resource, persistence depends on the consistency of representations. The representation provider decides when representations are sufficiently consistent (although that determination generally takes user expectations into account).</p>
+<p>
+Although persistence in this case is observable as a result of representation retrieval, the term URI persistence is used to describe the desirable property that, once associated with a resource, a URI should continue indefinitely to refer to that resource.	
+</p>
+<p class="highlight"><b>Consistent representation</b><br>
+A URI owner SHOULD provide representations of the identified resource consistently and predictably.
+</p>
+<p>URI persistence is a matter of policy and commitment on the part of the URI owner. The choice of a particular URI scheme provides no guarantee that those URIs will be persistent or that they will not be persistent.
+</p>
+<p>HTTP [RFC2616] has been designed to help manage URI persistence. For example, HTTP redirection (using the 3xx response codes) permits servers to tell an agent that further action needs to be taken by the agent in order to fulfill the request (for example, a new URI is associated with the resource).
+</p>
+<p>In addition, content negotiation also promotes consistency, as a site manager is not required to define new URIs when adding support for a new format specification. Protocols that do not support content negotiation (such as FTP) require a new identifier when a new data format is introduced. Improper use of content negotiation can lead to inconsistent representations.
+</p>
 </section>
 
 <section> 
 <h4>Internationalized Resource Identifiers: Using non-ASCII characters in URIs<h4>
 </section>
-
+<p><i>Guidelines for those interested in minting URIs in their own languages (German, Dutch, Spanish, Chinese, etc.)</i></p>
+<p>The URI syntax defined in <a href="http://tools.ietf.org/html/rfc3986" target="_blank">RFC 3986</a> STD 66 (Uniform Resource Identifier (URI): Generic Syntax) restricts URIs to a small number of characters: basically, just upper and lower case letters of the English alphabet, European numerals and a small number of symbols. There is now a growing need to enable use of characters from any language in URIs.
+</p>
+<p>The purpose of this section is to provide guidance to government stakeholders who are planning to create URIs using characters that go beyond the subset defined in <a href="http://tools.ietf.org/html/rfc3986" target="_blank">RFC 3986</a>.
+</p>
+<p>First we provide two important definitions:</p>
+<p>
+IRI (<a href="http://tools.ietf.org/html/rfc3987" target="_blank">RFC 3987</a>) is a new protocol element, that represents a complement to the Uniform Resource Identifier (URI). An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646) that can be therefore be used to mint identifiers that use a wider set of characters than the one defined in <a href="http://tools.ietf.org/html/rfc3986" target="_blank">RFC 3986</a>.
+</p>
+<p>The Internationalized Domain Name or IDN is a standard approach to dealing with multilingual domain names was agreed by the IETF in March 2003.
+</p>
+<p>Althought there exist some standards focused on enabling the use of international characters in Web identifiers, government stakeholders need to take into account several issues before constructing such internationalized identifiers. This section is not meant to be exhaustive and we point the interested audience to <a ref="http://www.w3.org/International/articles/idn-and-iri/" target="_blank">An Introduction to Multilingual Web Addresses</a>, however some of the most relevant issues are following:
+</p>
+<ul>
+	<li><b>Domain Name lookup:</b> Numerous domain name authorities already offer registration of internationalized domain names. These include providers for top level country domains as .cn, .jp, .kr, etc., and global top level domains such as .info, .org and .museum.
+	</li>
+	<li><b>Domain names and phishing:</b> One of the problems associated with IDN support in browsers is that it can facilitate phishing through what are called 'homograph attacks'. Consequently, most browsers that support IDN also put in place some safeguards to protect users from such fraud.
+	</li>
+	<li><b>Encoding problems:</b> IRI provides a standard way for creating and handling international identifiers, however the support for IRIs among the various semantic Web technology stacks and libraries is not homogenic and may lead to difficulties for applications working with this kind of identifiers. A good reference on this subject can be found in "I18n of Semantic Web Applications" by Auer et al.
+	</li>
+	
+</ul>
 @@ TODO: include references
 <!-- <p class="todo">Integrate Wiki <a href="http://www.w3.org/2011/gld/wiki/223_Best_Practices_URI_Construction">content</a>.</p> -->
 </section> <!-- URI CONSTRUCTION >> -->