--- a/bp/index.html Fri Nov 22 08:33:29 2013 +0000
+++ b/bp/index.html Thu Nov 28 09:29:36 2013 +0100
@@ -46,7 +46,7 @@
// only "name" is required
editors: [
{ name: "Bernadette Hyland", company: "3 Round Stones.", companyURL: "http://3roundstones.com/" },
- {name: "Ghislain A. Atemezing", company: "Eurecom", companyURL: "http://www.eurecom.fr"},
+ {name: "Ghislain Auguste Atemezing", company: "Eurecom", companyURL: "http://www.eurecom.fr"},
{name: "Boris Villazon-terrazas", company: "iSOCO", companyURL: "http://www.isoco.com"}
],
@@ -613,21 +613,74 @@
</section>
-<!-- Multilingualism in vocabs >> -->
-
-<section id="howto">
-<h2>How should publishers figure out good URIs for properties with non-literal ranges?</h2>
-<p class="informative"> <i>This check-list is to help the publishers decide which URIs in different namespace to choose with non-literal ranges. It extends more specifically Section 3.</i></p>
-
- <ul>
+<!--<ul>
<li>Find out the frequence of usage of the term in Linked Open Data using tools like <a href="http://stats.lod2.eu/properties" target="_blank">lodstats</a> or <a href="http://lov.okfn.org/dataset/lov/stats/" target="_blank">lov stats</a></li>
<li>Always use first a property in the namespace of a "standard", or recommended vocabulary by the W3C. </li>
<li>Always use a property in a vocabulary more recently published, because it likely extendes or reuses a similar previous vocabulary in the same scope.</li>
<li> Consider with higher priority the criteria of sustainability or long term presence of the namespace.</li>
<li>Authoritive criteria of the underlying vocabulary has to be taken into account.</li>
<li>Don't be ashame, learn from others vocabularies published in the Wild .</li>
- </ul>
+ </ul> -->
+
+<section id="howto">
+<h2>Best Practice for choosing entity URIs</h2>
+<p class="informative"> <i>This is intended to be a best-practice guide for data publishers who map existing data to RDF. Assuming they have identified their entity types, and what attributes and relationships are in the data for each entity types, the question is what URIs to choose for the entities. The step-by-step guide should be followed for each entity type individually</i></p>
+
+ <p class="highlight"><b>Scope note:</b> <br>
+ This is only for choosing identifiers for existing data that is to be translated from a different format or storage technology to RDF. This isn't applicable for authoring fresh data.</p>
+
+<p class="highlight"><b>Assumption:</b> <br>
+ The input data may change. Therefore, if there are no reliable identifiers/keys in the input, one may not be able to track identity over updates. For the same reason, creating synthetic keys in the conversion process is not possible -- if they're arbitrary, they won't survive inserts/deletes, and if they're based on the input (e.g., hashing some fields) then they won't survive updates to those fields.</p>
+
+<ul class="note">
+ <li> If Unique Name Assumption doesn't hold for an ID, then don't rely on it too much. Example: Email addresses. A person can have multiple.</li>
+ <li>Discuss "authority files", "master data", and how those should be RDFized first. Q: Does your organization or some other reputable body maintain master data or an authority file for the entity type? If so, is it RDFized? If not, can you get that RDFized first? Use their IDs if you can. Otherwise, <b>make your own and do a best-effort mapping.</b></li>
+</ul>
+
+<div class="highlight"><b>Outcome options:</b> <br>
+ <ol>
+ <li> Mint your own URIs. Possibly link to existing URIs.</li>
+ <li> Re-use existing URIs.</li>
+ <li> Use blank nodes. Possibly link to existing URIs.</li>
+ <li> Use literals.</li>
+ </ol>
+</div>
+
+<p class="highlight"><b>Q: Is your goal to enrich an existing LOD dataset?</b> <br>
+ That is, you want to provide users of an existing dataset with additional information about the entities described in that dataset? And use of your dataset on its own without that other dataset is not an important goal?<br>
+YQ: Can you map your entities to their URIs with very high reliability (by syntactic matching, manual verification, etc.)? <br>
+ Y: Use their URIs directly.<br>
+ N: Do a best-effort mapping from your entities to their URIs. Continue below regarding your URIs.
+</p>
+
+
+<p class="highlight"><b>Q: Do the entities have existing unique, non-URI, IDs?</b> <br>
+YQ: Are they globally unique?<br>
+ YQ: Is there an existing URI mapping for these IDs from a reliable party?<br>
+ YQ: Do you have additional information about the entities beyond what they have?<br>
+ Y: Mint your own URIs based on the existing ID, and map using a mapping property<br>
+ N: Use their URIs directly.<br>
+ N: Mint your own URI based on the existing ID by sticking it onto a unique base.<br>
+N: So you have only strings that are not guaranteed to be in a stable 1:1 correspondence with the entities. Use a blank node; make sure that there's a good skos:prefLabel, rdfs:label, dc:title, and other standard metadata properties.
+</p>
+
+
+<p class="highlight"><b>Q: Can the entity be represented as one of the standard RDF datatypes (that is, it's a date, number, etc.)?</b> <br>
+YQ: Is the entity annotated with additional information beyond what the datatype represents?<br>
+ N: Use a typed literal.
+</p>
+
+
+<p class="highlight">Q: Can you map to reliable remote URIs?</b> <br>
+Q: Do you have data about the entities beyond what's already available from the remote URIs?<br>
+A: No? Then use the remote URIs.
+</p>
+
+<p class="highlight">Q: Data is on its own web page with permalink?</b> <br>
+Q: Can you deploy RDFa in the web page, or can you deploy Turtle via content negotiation on the same URI?<br>
+A: Use <code>permalink#{fragment}</code> pattern, where <code>{fragment}</code> might be "this", "id", "product", "user", etc.
+</p>
</section>
@@ -659,16 +712,16 @@
<p>The Web makes use of the URI (Uniform Resource Identifiers) as a single global identification system. The global scope of URIs promotes large-scale "network effects". Therefore, in order to benefit from the value of LD, government and governmental agencies need to identify their resources using URIs. This section provides a set of general principles aimed at helping government stakeholders to define and manage URIs for their resources.</p>
<p class="highlight"><b>Use HTTP URIs</b><br>
-What it means: To benefit from and increase the value of the World Wide Web, governments and agencies SHOULD provide HTTP URIs as identifiers for their resources. There are many benefits to participating in the existing network of URIs, including linking, caching, and indexing by search engines. As stated in [[howto-lodp]], HTTP URIs enable people to "look-up" or "dereference" a URI in order to access a representation of the resource identified by that URI.
+<i>What it means:</i> To benefit from and increase the value of the World Wide Web, governments and agencies SHOULD provide HTTP URIs as identifiers for their resources. There are many benefits to participating in the existing network of URIs, including linking, caching, and indexing by search engines. As stated in [[howto-lodp]], HTTP URIs enable people to "look-up" or "dereference" a URI in order to access a representation of the resource identified by that URI.
To benefit from and increase the value of the World Wide Web, data publishers SHOULD provide URIs as identifiers for their resources.
</p>
<p class="highlight"><b>Provide at least one machine-readable representation of the resource identified by the URI</b><br>
-What it means: In order to enable HTTP URIs to be "dereferenced", data publishers have to set up the necessary infrastructure elements (e.g. TCP-based HTTP servers) to serve representations of the resources they want to make available (e.g. a human-readable HTML representation or a machine-readable Turtle). A publisher may supply zero or more representations of the resource identified by that URI. However, there is a clear benefit to data users in providing at least one machine-readable representation. More information about serving different representations of a resource can be found in [[!COOLURIS]]</a>.
+<i>What it means:</i> In order to enable HTTP URIs to be "dereferenced", data publishers have to set up the necessary infrastructure elements (e.g. TCP-based HTTP servers) to serve representations of the resources they want to make available (e.g. a human-readable HTML representation or a machine-readable Turtle). A publisher may supply zero or more representations of the resource identified by that URI. However, there is a clear benefit to data users in providing at least one machine-readable representation. More information about serving different representations of a resource can be found in [[!COOLURIS]]</a>.
</p>
<p class="highlight"><b>A URI structure will not contain anything that could change</b><br>
-What it means: It is good practice that URIs do not contain anything that could easily change or that is expected to change like session tokens or other state information. URIs should be stable and reliable in order to maximize the possibilities of reuse that Linked Data brings to users. There must be a balance between making URIs readable and keeping them more stable by removing descriptive information that will likely change. For more information on this see [MDinURI] and <a href="http://www.w3.org/TR/cooluris/" target="_blank">Architecture of the World Wide Web: URI Opacity</a>.
+<i>What it means:</i> It is good practice that URIs do not contain anything that could easily change or that is expected to change like session tokens or other state information. URIs should be stable and reliable in order to maximize the possibilities of reuse that Linked Data brings to users. There must be a balance between making URIs readable and keeping them more stable by removing descriptive information that will likely change. For more information on this see [MDinURI] and <a href="http://www.w3.org/TR/cooluris/" target="_blank">Architecture of the World Wide Web: URI Opacity</a>.
</p>
<p class="highlight"><b>URI Opacity</b><br>