Update notes and change default property URI generation to _base_.
authorGregg Kellogg <gregg@kellogg-assoc.com>
Sat, 05 Nov 2011 13:41:59 -0700
changeset 15 29b7fa945ee8
parent 14 2f11ba803758
child 16 53e66f3a31f2
Update notes and change default property URI generation to _base_.
microdata-rdf/index.html
--- a/microdata-rdf/index.html	Fri Nov 04 22:39:13 2011 -0700
+++ b/microdata-rdf/index.html	Sat Nov 05 13:41:59 2011 -0700
@@ -17,6 +17,7 @@
           apply:  function(c) {
                     // extend the bibliography entries
                     berjon.biblio["MICRODATA"] = "<cite><a href=\"http://www.w3.org/TR/2011/WD-microdata-20110525/\">HTML Microdata</a></cite> Ian Hickson Editor. World Wide Web Consortium (work in progress). 25 May 2010. This edition of the HTML Microdata specification is http://www.w3.org/TR/2011/WD-microdata-20110525/. The <a href=\"http://www.w3.org/TR/microdata/\">latest edition of HTML Microdata</a> is available at http://www.w3.org/TR/microdata/";
+                    berjon.biblio["JSON-LD"] = "Manu Sporny, Gregg Kellogg, et al. <a href=\"http://json-ld.org/spec/latest/json-ld-syntax/\"><cite>The JSON-LD Syntax</cite></a> Latest. W3C Editor's Draft. URL: <a href=\"http://json-ld.org/spec/latest/json-ld-syntax/\">http://json-ld.org/spec/latest/json-ld-syntax/</a>";
 
                     // process the document before anything else is done
                     var refs = document.querySelectorAll('adef') ;
@@ -249,18 +250,33 @@
 </section>
 
 <section id='sotd'>
-<p>This document is an experimental work in progress.</p>
+<p>This document is an experimental work in progress. The concepts described herein are intended to help
+  provide guidance for a future working group. Implementations of this specification, either producers
+  or consumers, should note that it is likely to change significantly prior to any publication as a Working
+  Draft.</p>
 </section>
 
-<section>
+<section class="informative">
   <h1>Introduction</h1>
-  <p>
-    This document describes a means of transforming HTML containing microdata into RDF. HTML Microdata [[!MICRODATA]]
+  <p>This document describes a means of transforming HTML containing microdata into RDF. HTML Microdata [[!MICRODATA]]
     is an extension to HTML used to embed machine-readable data to HTML documents. This specification describes
-    transformation directly to RDF [[RDF-CONCEPTS]].
-  </p>
-
-<section>
+    transformation directly to RDF [[RDF-CONCEPTS]].</p>
+  <div class="note">
+    <p>There are a variety of ways in which a mapping from microdata to
+      RDF might be configured to give a result that is closer to the required result for a particular vocabulary. This
+      specification defines terms that can be used as hooks for vocabulary-specific behavior, which could be defined
+      within a <tref>registry</tref> or on an implementation-defined basis. However, the HTML Data TF recommends the
+      adoption of a single method of mapping in which every vocabulary is treated as if:</p>
+    <ul>
+      <li><tref><code>propertyURI</code></tref> is set to <code>base</code> </li>
+      <li><tref><code>multipleValues</code></tref> is set to <code>unordered</code></li>
+    </ul>
+    <p>For background on the trade-offs between these options, see 
+      <a
+        href="http://www.w3.org/wiki/Mapping_Microdata_to_RDF"
+      >http://www.w3.org/wiki/Mapping_Microdata_to_RDF</a>.</p>
+  </div>
+<section class="informative">
   <h2>Background</h2>
   <p>Microdata is a way of expressing metadata in HTML documents using attributes. A previous version
     of microdata [[!MICRODATA]] included rules for generating RDF, but current Editor's Drafts have removed
@@ -280,11 +296,12 @@
     This is facilitated by a <tref>registry</tref> that associates URIs with specific rules based on matching
     <aref>itemtype</aref> values against registered URI prefixes do determine a vocabulary and
     vocabulary-specific processing rules.</p>
-  <p class="note">The Microdata JSON serialization does not retain datatype or language information that might
-    be derived from the HTML DOM. The RDF Transformation does retain language information when it is available.</p>
+  <p class="note">The Microdata JSON serialization does not retain <em>datatype</em> or
+    <em>language</em> information that might be derived from the HTML DOM. The RDF Transformation does retain language
+    information when it is available.</p>
 </section>
 
-<section>
+<section class="informative">
   <h2>Use Cases</h2>
   <p>During the period of the task force, a number of use cases were put forth for the use of microdata
     in generating RDF:</p>
@@ -318,7 +335,7 @@
   </ul>
 </section>
 
-<section>
+<section class="informative">
   <h2>Issues</h2>
   <p>Decisions or open issues in the specification are tracked on the
     <a href="http://www.w3.org/2011/htmldata/track/issues">Task Force Issue Tracker</a>. These include the
@@ -330,15 +347,18 @@
       as this would violate microdata's data model.
     </dd>
     <dt><a href="http://www.w3.org/2011/htmldata/track/issues/3">ISSUE 3</a></dt><dd>
-      Should the registry allow property datatype specification.
+      Should the <tref>registry</tref> allow property datatype specification.
+    </dd>
+    <dt><a href="http://www.w3.org/2011/htmldata/track/issues/4">ISSUE 4</a></dt><dd>
+      Should the <tref>registry</tref> allow a property name or URI to be used as an alias for <aref>itemid</aref>.
     </dd>
   </dl>
 </section>
 
-<section>
+<section class="informative">
   <h2>Goals</h2>
   <p>The purpose of this specification is to provide input to a future working group that can make decisions
-    about the need for a registry and the details of processing. Among the options investigated by
+    about the need for a <tref>registry</tref> and the details of processing. Among the options investigated by
     the Task Force are the following:</p>
   <ul>
     <li>Property URI generation using the original microdata specification with a base URI and fragment
@@ -357,6 +377,7 @@
       whether or not multiple, into some form of collection.</li>
   </ul>
 </section>
+</section>
 
 <section>
   <h1>Attributes and Syntax</h1>
@@ -375,9 +396,6 @@
     </dd>
     <dt><adef>datetime</adef></dt><dd>
       An attribute appropriate for use with the <code>date</code> element for creating typed literals.
-      <div class="issue">
-        The <code>date</code> element will likely be replaced with something more general purpose.
-      </div>
     </dd>
     <dt><adef>href</adef></dt><dd>
       An attribute appropriate for use with <code>a</code>, <code>area</code> or <code>link</code> elements for
@@ -436,7 +454,7 @@
   </dl>
 </section>
 
-<section>
+<section class="informative">
   <h1>Vocabulary Registry</h1>
   <p>In a perfect world, all processors would be able to generate the same output for a given input
     without regards to the requirements of a particular <tref>vocabulary</tref>. However, microdata doesn't
@@ -467,19 +485,20 @@
     rules is defined in the following sections. If an item has no <tref>current type</tref> or the
     <tref>registry</tref> contains no <tref>URI prefix</tref> matching <tref>current type</tref>, a conforming
     processor MUST use the default values defined for these rules.</p>
+  <p class="note">The concept of a <tref>registry</tref>, including a hypothetical format, location and updating rules
+    is presented as an abstract concept useful for describing the function of a microdata processor.
+    There are issues surrounding update
+    frequency, URL naming, and how updates are authorized. This spec
+    just considers the semantic content of such a <tref>registry</tref> and how it can be used to affect processing without
+    defining its representation or update policies.</p>
   <p class="issue">Richard Ciganiak has
     <cite><a href="http://richard.cyganiak.de/2011/10/microdata.html#whitelists">pointed out</a></cite> that
-    &quot;Registry&quot; may be the wrong term, as the proposed registry doesn't assign identifiers or manage
+    &quot;Registry&quot; may be the wrong term, as the proposed <tref>registry</tref> doesn't assign identifiers or manage
     namespace, it simply provides a mapping between URI prefixes and processor behavior and suggests the term
     &quot;Whitelist&quot;. As more than two values are required, and it describes more than binary behavior, this term
     isn't appropriate either.</p>
-  <p class="issue">Anytime we discuss maintaining such a database, there are issues surrounding update
-    frequency, URL naming, and how updates are authorized. This remains an open issue. This spec
-    just considers the semantic content of such a list and how it can be used to affect processing without
-    defining its representation or update policies.</p>
-  <p class="issue">The URL of the <tref>registry</tref> must be defined.</p>
 
-<section>
+<section class="informative">
   <h2>Property URI Generation</h2>
   <p>For <tref>property names</tref> which are not <tref>absolute URI</tref>s,
     the <tdef><code>propertyURI</code></tdef> rule defines the algorithm for generating an <tref>absolute URI</tref>
@@ -500,7 +519,7 @@
       unique based on the value of <tref>current property</tref>. This is
       required as the microdata data model requires that property names are associated with specific
       items and do not have a global scope.
-      <div class="note">
+      <div>
         <p>URI creation uses a base URI with query parameters to indicate the in-scope
           type and property name list. Consider the following example:</p>
         <pre class="example" data-transform="updateExample">
@@ -518,7 +537,7 @@
           <code>http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n</code>.
           However, the included property name <em>given-name</em> is included in untyped item.
           The inherited property URI is used to create a new property URI:
-          <code>http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n.first-name</code>.</p>
+          <code>http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n.given-name</code>.</p>
         <p>This scheme is compatible with the needs of other RDF serialization formats such as
           RDF/XML [[RDF-SYNTAX-GRAMMAR]],
           which rely on <em>QNames</em> for expressing properties. For example, the generated property URIs
@@ -548,10 +567,10 @@
       <tref>absolute URI</tref>s to the <tref>URI prefix</tref>.
     </dd>
   </dl>
-  <p>The default value of <tref><code>propertyURI</code></tref> is <code>contextual</code>.</p>
+  <p>The default value of <tref><code>propertyURI</code></tref> is <code>base</code>.</p>
 </section>
 
-<section>
+<section class="informative">
   <h2>Value Ordering</h2>
   <p>For items having multiple values for a property,
     the <tdef><code>multipleValues</code></tdef> rule defines the algorithm for serializing these values.
@@ -571,16 +590,73 @@
     <p>The default value of <tref><code>multipleValues</code></tref> is <code>list</code>.</p>
 </section>
 
-<section>
+<section class="informative">
   <h2>Value Typing</h2>
-  <p>One possible use of a registry would allow vocabularies to be marked with datatype information,
+  <p>One possible use of a <tref>registry</tref> would allow vocabularies to be marked with datatype information,
     so that a <code>dc:time</code> value, for example, would be understood to represent a literal with datatype
     <code>xsd:date</code>. This could be done by adding information for each property in the vocabulary requiring
     special treatment.</p>
   <p>Additionally, literal values which should be interpreted as URI references could be given special treatment.</p>
+  <p>This might be represented using a syntax such as the following:</p>
+  <pre class="example" data-transform="updateExample">
+<!--
+{
+ "http://schema.org/": {
+   "propertyURI": "vocabulary",
+   "multipleValues": "unordered"****,
+   "@context": {
+     "url": {"@datatype": "@uri"},
+     "dateCreated": {"@datatype": "http://www.w3.org/2001/XMLSchema#date"},
+     "price": {"@datatype": => [
+       "http://www.w3.org/2001/XMLSchema#decimal",
+       "http://www.w3.org/2001/XMLSchema#string"
+     ]}
+   }****
+ }
+}
+-->
+  </pre>
+  <p>This notation borrows some concepts from the JSON-LD [[JSON-LD]] context.
+    The @datatype identifies one or more XSD types against which to perform lexical matching, causing
+    the literal object to have the associated datatype.</p>
+  <p>The <code>@iri</code> datatype identifies the property has having a <tref>URI reference</tref> range,
+    rather than a literal. This allows the property to be used where there is a literal content model, such as
+    <aref>content</aref>, and would cause the value to be interpreted as a <tref>URI reference</tref>.</p>
   <p>These concepts are not explored further at this time, but could be developed further in
     a future revision of this document.</p>
 </section>
+
+<section class="informative">
+  <h2>Property as subject</h2>
+  <p>One possible use of a <tref>registry</tref> would allow property values to be used as the item subject, if
+    the item has no <aref>itemid</aref> attribute.</p>
+  <p>This might be represented using a syntax such as the following:</p>
+  <pre class="example" data-transform="updateExample">
+<!--
+{
+ "http://schema.org/": {
+   "propertyURI": "vocabulary",
+   "multipleValues": "unordered",
+   "@context": {
+     "url": {"@datatype": ****["@subject", "@iri"]****},
+     "dateCreated": {"@datatype": "http://www.w3.org/2001/XMLSchema#date"},
+     "price": {"@datatype": => [
+       "http://www.w3.org/2001/XMLSchema#decimal",
+       "http://www.w3.org/2001/XMLSchema#string"
+     ]}
+   }
+ }
+}
+-->
+  </pre>
+  <p>The <code>url</code> refers to <code>http://schema.org/url</code>, and is defined both as
+    having a <tref>URI reference</tref> data range, and to be used as an alias for the item subject.
+    Note, that there is a special case where the item already has an <aref>itemid</aref> attribute, or there are more
+    than one <code>url</code> property values. This could be resolved by using the first property value only if the
+    item has no <aref>itemid</aref>.</p>
+  <p>This concept is not explored further at this time, but could be developed further in
+    a future revision of this document.</p>
+</section>
 </section>
 
 <section>
@@ -759,6 +835,15 @@
         URI references are suitable to be used in <em>subject</em>, <em>predicate</em> or <em>object</em> positions
         within an RDF triple, as opposed to a <tref>literal</tref> value that may contain a string representation of a
         URI. (See [[RDF-CONCEPTS]]).
+        <div class="issue">
+          <p>The HTML5/microdata content model for <aref>href</aref>, <aref>src</aref>,
+            <aref>data</aref> and <aref>itemid</aref> is that of a URL, not a URI reference. The attributes
+            <aref>itemtype</aref> and <aref>itemprop</aref> may take any value, including that of a <tref>URI
+            reference</tref>. Within this context, <tref>URI Reference</tref> could be replaced with <em>IRI</em> as
+            well, to provide better support for international identifiers and/or locators.</p>
+          <p>A proposed mechanism for specifying the range of property values to be URI reference or IRI could
+            allow these to be specified as subject or object using a <aref>content</aref> attribute.
+        </div>
       </dd>
       <dt><tdef>vocabulary</tdef></dt><dd>
         A vocabulary is a collection of URIs, suitable for use as an <aref>itemtype</aref> or <aref>itemprop</aref>
@@ -939,7 +1024,7 @@
       <li>Otherwise, if <tref>current vocabulary</tref> from <em>context</em> is not null
         and <tref>registry</tref> has an entry for <tref>current vocabulary</tref> having a
         <tref>propertyURI</tref> entry that is not null, set that as <em>scheme</em>. Otherwise,
-        set <em>scheme</em> to <code>contextual</code>.</li>
+        set <em>scheme</em> to <code>base</code>.</li>
       <li>If <em>scheme</em> is <code>base</code> return the <tref>URI reference</tref> constructed
         by removing everything following the last SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#")
         from <tref>current type</tref> and append the fragment-escaped value of <em>name</em>.</li>
@@ -1074,7 +1159,7 @@
   </section>
 </section>
 
-<section class="appendix">
+<section class="appendix informative">
 <h2>Markup Examples</h2>
 
 <p>The microdata example below expresses book information as an FRBR Work item.</p>