Added description of URI/IRI issues
authorJeniT
Sun, 18 Dec 2011 20:06:13 +0000
changeset 52 034b3b769907
parent 51 3dcf0c817248
child 53 74f4e1f54ba6
Added description of URI/IRI issues
html-data-guide/index.html
--- a/html-data-guide/index.html	Sat Dec 17 20:16:32 2011 +0000
+++ b/html-data-guide/index.html	Sun Dec 18 20:06:13 2011 +0000
@@ -763,6 +763,73 @@
               The <code>@datatype</code> attribute might be required for some RDFa <a title="vocabulary">vocabularies</a>/consumers; others will coerce <a title="value">values</a> into the appropriate datatype based on the <a>property</a> itself. However, if a property takes a structured value, the property element must have <code>datatype="rdf:XMLLiteral"</code> for that structure to be preserved.
             </p>
           </section>
+          
+          <section>
+            <h4>IRIs</h4>
+            <p>
+              HTML defines some attributes, such as <code>@href</code> and <code>@src</code>, as holding URLs. The <a href="http://dev.w3.org/html5/spec/urls.html#urls">currently specified processing</a> of these URLs results in non-URI characters within IRIs being percent-encoded. This also happens with microdata attributes such as <code>@itemid</code> and <code>@itemtype</code>.
+            </p>
+            <p>
+              This normalisation does not happen in attributes defined in RDFa, such as <code>@resource</code> and <code>@property</code>: IRIs provided in these attributes will be passed into the extracted RDF as IRIs. 
+            </p>
+            <p>
+              This discrepancy means that when using RDFa, you have to be careful to use URIs only (by percent-encoding IRIs) or avoid using the HTML-defined attributes such as <code>@href</code> or <code>@src</code>. For example:
+            </p>
+            <pre>&lt;p resource="#menu"&gt;
+  &lt;a property="eg:wine" <strong>href="#ros&eacute;"</strong>&gt;Ros&eacute;&lt;/a&gt;
+  ...
+&lt;/p&gt;
+...
+&lt;p <strong>resource="#ros&eacute;"</strong>&gt;
+  &lt;span property="eg:description"&gt;This Californian wine...&lt;/span&gt;
+&lt;/p&gt;</pre>
+            <p>
+              will result in the RDF:
+            </p>
+            <pre>&lt;#menu&gt; eg:wine &lt;#ros%E9&gt; .
+&lt;#ros&eacute;&gt; eg:description "This Californian wine..." .</pre>
+            <p>
+              The URL in the <code>@href</code> attribute is percent-encoded, while the one from the <code>@resource</code> attribute is not; while the URLs appear identical in the HTML, in the RDF, they refer to distinct <a title="entity">entities</a>.
+            </p>
+            <p>
+              This can be avoided by percent-encoding the non-URI characters within the original HTML:
+            </p>
+            <pre>&lt;p resource="#menu"&gt;
+  &lt;a property="eg:wine" <strong>href="#ros%E9"</strong>&gt;Ros&eacute;&lt;/a&gt;
+  ...
+&lt;/p&gt;
+...
+&lt;p <strong>resource="#ros%E9"</strong>&gt;
+  &lt;span property="eg:description"&gt;This Californian wine...&lt;/span&gt;
+&lt;/p&gt;</pre>
+            <p>
+              which will result in:
+            </p>
+            <pre>&lt;#menu&gt; eg:wine &lt;#ros%E9&gt; .
+&lt;#ros%E9&gt; eg:description "This Californian wine..." .</pre>
+            <p>
+              or by using the <code>@resource</code> attribute to provide the IRI value for a property:
+            </p>
+            <pre>&lt;p resource="#menu"&gt;
+  &lt;a property="eg:wine" <strong>resource="#ros&eacute;"</strong> href="#ros&eacute;"&gt;Ros&eacute;&lt;/a&gt;
+  ...
+&lt;/p&gt;
+...
+&lt;p <strong>resource="#ros&eacute;"</strong>&gt;
+  &lt;span property="eg:description"&gt;This Californian wine...&lt;/span&gt;
+&lt;/p&gt;</pre>
+            <p>
+              which will result in:
+            </p>
+            <pre>&lt;#menu&gt; eg:wine &lt;#ros&eacute;&gt; .
+&lt;#ros&eacute;&gt; eg:description "This Californian wine..." .</pre>
+            <p>
+              Similar considerations apply when mixing microdata or microformats with RDFa, since the identifiers used within the microdata or microformats will be URIs rather than IRIs.
+            </p>
+            <p>
+              It is good practice for vocabulary authors to state whether any further normalisation occurs when interpreting URL values, and to either avoid using IRIs for property names or state explicitly equivalence between IRIs and the percent-encoded URI versions of <a>property</a> and <a>type</a> identifiers that will be generated from microdata markup.
+            </p>
+          </section>
         </section>
       </section>
         
@@ -1109,6 +1176,9 @@
               <li>the IRIs used for <a title="type">types</a> and <a title="property">properties</a> should resolve into documentation and/or (through content negotiation) an <a href="http://www.w3.org/TR/rdf-schema/">RDFS schema</a> or <a href="http://www.w3.org/TR/owl-overview/">OWL ontology</a> that describes the types and properties</li>
             </ul>
             <p>
+              In addition, the authors of <a title="vocabulary">vocabularies</a> designed to be used with RDFa should specify whether IRIs and percent-encoded URIs should be treated as equivalent when used for <a>property</a> and <a>type</a> identifiers or <a title="value">values</a>.
+            </p>
+            <p>
               More guidelines and patterns for modelling using RDF are available within <a href="http://patterns.dataincubator.org/book/modelling-patterns.html" rel="nofollow">Linked Data Patterns</a>.
             </p>
           </section>