--- a/microdata-rdf/index.html Fri Nov 04 22:39:13 2011 -0700
+++ b/microdata-rdf/index.html Sat Nov 05 13:41:59 2011 -0700
@@ -17,6 +17,7 @@
apply: function(c) {
// extend the bibliography entries
berjon.biblio["MICRODATA"] = "<cite><a href=\"http://www.w3.org/TR/2011/WD-microdata-20110525/\">HTML Microdata</a></cite> Ian Hickson Editor. World Wide Web Consortium (work in progress). 25 May 2010. This edition of the HTML Microdata specification is http://www.w3.org/TR/2011/WD-microdata-20110525/. The <a href=\"http://www.w3.org/TR/microdata/\">latest edition of HTML Microdata</a> is available at http://www.w3.org/TR/microdata/";
+ berjon.biblio["JSON-LD"] = "Manu Sporny, Gregg Kellogg, et al. <a href=\"http://json-ld.org/spec/latest/json-ld-syntax/\"><cite>The JSON-LD Syntax</cite></a> Latest. W3C Editor's Draft. URL: <a href=\"http://json-ld.org/spec/latest/json-ld-syntax/\">http://json-ld.org/spec/latest/json-ld-syntax/</a>";
// process the document before anything else is done
var refs = document.querySelectorAll('adef') ;
@@ -249,18 +250,33 @@
</section>
<section id='sotd'>
-<p>This document is an experimental work in progress.</p>
+<p>This document is an experimental work in progress. The concepts described herein are intended to help
+ provide guidance for a future working group. Implementations of this specification, either producers
+ or consumers, should note that it is likely to change significantly prior to any publication as a Working
+ Draft.</p>
</section>
-<section>
+<section class="informative">
<h1>Introduction</h1>
- <p>
- This document describes a means of transforming HTML containing microdata into RDF. HTML Microdata [[!MICRODATA]]
+ <p>This document describes a means of transforming HTML containing microdata into RDF. HTML Microdata [[!MICRODATA]]
is an extension to HTML used to embed machine-readable data to HTML documents. This specification describes
- transformation directly to RDF [[RDF-CONCEPTS]].
- </p>
-
-<section>
+ transformation directly to RDF [[RDF-CONCEPTS]].</p>
+ <div class="note">
+ <p>There are a variety of ways in which a mapping from microdata to
+ RDF might be configured to give a result that is closer to the required result for a particular vocabulary. This
+ specification defines terms that can be used as hooks for vocabulary-specific behavior, which could be defined
+ within a <tref>registry</tref> or on an implementation-defined basis. However, the HTML Data TF recommends the
+ adoption of a single method of mapping in which every vocabulary is treated as if:</p>
+ <ul>
+ <li><tref><code>propertyURI</code></tref> is set to <code>base</code> </li>
+ <li><tref><code>multipleValues</code></tref> is set to <code>unordered</code></li>
+ </ul>
+ <p>For background on the trade-offs between these options, see
+ <a
+ href="http://www.w3.org/wiki/Mapping_Microdata_to_RDF"
+ >http://www.w3.org/wiki/Mapping_Microdata_to_RDF</a>.</p>
+ </div>
+<section class="informative">
<h2>Background</h2>
<p>Microdata is a way of expressing metadata in HTML documents using attributes. A previous version
of microdata [[!MICRODATA]] included rules for generating RDF, but current Editor's Drafts have removed
@@ -280,11 +296,12 @@
This is facilitated by a <tref>registry</tref> that associates URIs with specific rules based on matching
<aref>itemtype</aref> values against registered URI prefixes do determine a vocabulary and
vocabulary-specific processing rules.</p>
- <p class="note">The Microdata JSON serialization does not retain datatype or language information that might
- be derived from the HTML DOM. The RDF Transformation does retain language information when it is available.</p>
+ <p class="note">The Microdata JSON serialization does not retain <em>datatype</em> or
+ <em>language</em> information that might be derived from the HTML DOM. The RDF Transformation does retain language
+ information when it is available.</p>
</section>
-<section>
+<section class="informative">
<h2>Use Cases</h2>
<p>During the period of the task force, a number of use cases were put forth for the use of microdata
in generating RDF:</p>
@@ -318,7 +335,7 @@
</ul>
</section>
-<section>
+<section class="informative">
<h2>Issues</h2>
<p>Decisions or open issues in the specification are tracked on the
<a href="http://www.w3.org/2011/htmldata/track/issues">Task Force Issue Tracker</a>. These include the
@@ -330,15 +347,18 @@
as this would violate microdata's data model.
</dd>
<dt><a href="http://www.w3.org/2011/htmldata/track/issues/3">ISSUE 3</a></dt><dd>
- Should the registry allow property datatype specification.
+ Should the <tref>registry</tref> allow property datatype specification.
+ </dd>
+ <dt><a href="http://www.w3.org/2011/htmldata/track/issues/4">ISSUE 4</a></dt><dd>
+ Should the <tref>registry</tref> allow a property name or URI to be used as an alias for <aref>itemid</aref>.
</dd>
</dl>
</section>
-<section>
+<section class="informative">
<h2>Goals</h2>
<p>The purpose of this specification is to provide input to a future working group that can make decisions
- about the need for a registry and the details of processing. Among the options investigated by
+ about the need for a <tref>registry</tref> and the details of processing. Among the options investigated by
the Task Force are the following:</p>
<ul>
<li>Property URI generation using the original microdata specification with a base URI and fragment
@@ -357,6 +377,7 @@
whether or not multiple, into some form of collection.</li>
</ul>
</section>
+</section>
<section>
<h1>Attributes and Syntax</h1>
@@ -375,9 +396,6 @@
</dd>
<dt><adef>datetime</adef></dt><dd>
An attribute appropriate for use with the <code>date</code> element for creating typed literals.
- <div class="issue">
- The <code>date</code> element will likely be replaced with something more general purpose.
- </div>
</dd>
<dt><adef>href</adef></dt><dd>
An attribute appropriate for use with <code>a</code>, <code>area</code> or <code>link</code> elements for
@@ -436,7 +454,7 @@
</dl>
</section>
-<section>
+<section class="informative">
<h1>Vocabulary Registry</h1>
<p>In a perfect world, all processors would be able to generate the same output for a given input
without regards to the requirements of a particular <tref>vocabulary</tref>. However, microdata doesn't
@@ -467,19 +485,20 @@
rules is defined in the following sections. If an item has no <tref>current type</tref> or the
<tref>registry</tref> contains no <tref>URI prefix</tref> matching <tref>current type</tref>, a conforming
processor MUST use the default values defined for these rules.</p>
+ <p class="note">The concept of a <tref>registry</tref>, including a hypothetical format, location and updating rules
+ is presented as an abstract concept useful for describing the function of a microdata processor.
+ There are issues surrounding update
+ frequency, URL naming, and how updates are authorized. This spec
+ just considers the semantic content of such a <tref>registry</tref> and how it can be used to affect processing without
+ defining its representation or update policies.</p>
<p class="issue">Richard Ciganiak has
<cite><a href="http://richard.cyganiak.de/2011/10/microdata.html#whitelists">pointed out</a></cite> that
- "Registry" may be the wrong term, as the proposed registry doesn't assign identifiers or manage
+ "Registry" may be the wrong term, as the proposed <tref>registry</tref> doesn't assign identifiers or manage
namespace, it simply provides a mapping between URI prefixes and processor behavior and suggests the term
"Whitelist". As more than two values are required, and it describes more than binary behavior, this term
isn't appropriate either.</p>
- <p class="issue">Anytime we discuss maintaining such a database, there are issues surrounding update
- frequency, URL naming, and how updates are authorized. This remains an open issue. This spec
- just considers the semantic content of such a list and how it can be used to affect processing without
- defining its representation or update policies.</p>
- <p class="issue">The URL of the <tref>registry</tref> must be defined.</p>
-<section>
+<section class="informative">
<h2>Property URI Generation</h2>
<p>For <tref>property names</tref> which are not <tref>absolute URI</tref>s,
the <tdef><code>propertyURI</code></tdef> rule defines the algorithm for generating an <tref>absolute URI</tref>
@@ -500,7 +519,7 @@
unique based on the value of <tref>current property</tref>. This is
required as the microdata data model requires that property names are associated with specific
items and do not have a global scope.
- <div class="note">
+ <div>
<p>URI creation uses a base URI with query parameters to indicate the in-scope
type and property name list. Consider the following example:</p>
<pre class="example" data-transform="updateExample">
@@ -518,7 +537,7 @@
<code>http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n</code>.
However, the included property name <em>given-name</em> is included in untyped item.
The inherited property URI is used to create a new property URI:
- <code>http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n.first-name</code>.</p>
+ <code>http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n.given-name</code>.</p>
<p>This scheme is compatible with the needs of other RDF serialization formats such as
RDF/XML [[RDF-SYNTAX-GRAMMAR]],
which rely on <em>QNames</em> for expressing properties. For example, the generated property URIs
@@ -548,10 +567,10 @@
<tref>absolute URI</tref>s to the <tref>URI prefix</tref>.
</dd>
</dl>
- <p>The default value of <tref><code>propertyURI</code></tref> is <code>contextual</code>.</p>
+ <p>The default value of <tref><code>propertyURI</code></tref> is <code>base</code>.</p>
</section>
-<section>
+<section class="informative">
<h2>Value Ordering</h2>
<p>For items having multiple values for a property,
the <tdef><code>multipleValues</code></tdef> rule defines the algorithm for serializing these values.
@@ -571,16 +590,73 @@
<p>The default value of <tref><code>multipleValues</code></tref> is <code>list</code>.</p>
</section>
-<section>
+<section class="informative">
<h2>Value Typing</h2>
- <p>One possible use of a registry would allow vocabularies to be marked with datatype information,
+ <p>One possible use of a <tref>registry</tref> would allow vocabularies to be marked with datatype information,
so that a <code>dc:time</code> value, for example, would be understood to represent a literal with datatype
<code>xsd:date</code>. This could be done by adding information for each property in the vocabulary requiring
special treatment.</p>
<p>Additionally, literal values which should be interpreted as URI references could be given special treatment.</p>
+ <p>This might be represented using a syntax such as the following:</p>
+ <pre class="example" data-transform="updateExample">
+<!--
+{
+ "http://schema.org/": {
+ "propertyURI": "vocabulary",
+ "multipleValues": "unordered"****,
+ "@context": {
+ "url": {"@datatype": "@uri"},
+ "dateCreated": {"@datatype": "http://www.w3.org/2001/XMLSchema#date"},
+ "price": {"@datatype": => [
+ "http://www.w3.org/2001/XMLSchema#decimal",
+ "http://www.w3.org/2001/XMLSchema#string"
+ ]}
+ }****
+ }
+}
+-->
+ </pre>
+ <p>This notation borrows some concepts from the JSON-LD [[JSON-LD]] context.
+ The @datatype identifies one or more XSD types against which to perform lexical matching, causing
+ the literal object to have the associated datatype.</p>
+ <p>The <code>@iri</code> datatype identifies the property has having a <tref>URI reference</tref> range,
+ rather than a literal. This allows the property to be used where there is a literal content model, such as
+ <aref>content</aref>, and would cause the value to be interpreted as a <tref>URI reference</tref>.</p>
<p>These concepts are not explored further at this time, but could be developed further in
a future revision of this document.</p>
</section>
+
+<section class="informative">
+ <h2>Property as subject</h2>
+ <p>One possible use of a <tref>registry</tref> would allow property values to be used as the item subject, if
+ the item has no <aref>itemid</aref> attribute.</p>
+ <p>This might be represented using a syntax such as the following:</p>
+ <pre class="example" data-transform="updateExample">
+<!--
+{
+ "http://schema.org/": {
+ "propertyURI": "vocabulary",
+ "multipleValues": "unordered",
+ "@context": {
+ "url": {"@datatype": ****["@subject", "@iri"]****},
+ "dateCreated": {"@datatype": "http://www.w3.org/2001/XMLSchema#date"},
+ "price": {"@datatype": => [
+ "http://www.w3.org/2001/XMLSchema#decimal",
+ "http://www.w3.org/2001/XMLSchema#string"
+ ]}
+ }
+ }
+}
+-->
+ </pre>
+ <p>The <code>url</code> refers to <code>http://schema.org/url</code>, and is defined both as
+ having a <tref>URI reference</tref> data range, and to be used as an alias for the item subject.
+ Note, that there is a special case where the item already has an <aref>itemid</aref> attribute, or there are more
+ than one <code>url</code> property values. This could be resolved by using the first property value only if the
+ item has no <aref>itemid</aref>.</p>
+ <p>This concept is not explored further at this time, but could be developed further in
+ a future revision of this document.</p>
+</section>
</section>
<section>
@@ -759,6 +835,15 @@
URI references are suitable to be used in <em>subject</em>, <em>predicate</em> or <em>object</em> positions
within an RDF triple, as opposed to a <tref>literal</tref> value that may contain a string representation of a
URI. (See [[RDF-CONCEPTS]]).
+ <div class="issue">
+ <p>The HTML5/microdata content model for <aref>href</aref>, <aref>src</aref>,
+ <aref>data</aref> and <aref>itemid</aref> is that of a URL, not a URI reference. The attributes
+ <aref>itemtype</aref> and <aref>itemprop</aref> may take any value, including that of a <tref>URI
+ reference</tref>. Within this context, <tref>URI Reference</tref> could be replaced with <em>IRI</em> as
+ well, to provide better support for international identifiers and/or locators.</p>
+ <p>A proposed mechanism for specifying the range of property values to be URI reference or IRI could
+ allow these to be specified as subject or object using a <aref>content</aref> attribute.
+ </div>
</dd>
<dt><tdef>vocabulary</tdef></dt><dd>
A vocabulary is a collection of URIs, suitable for use as an <aref>itemtype</aref> or <aref>itemprop</aref>
@@ -939,7 +1024,7 @@
<li>Otherwise, if <tref>current vocabulary</tref> from <em>context</em> is not null
and <tref>registry</tref> has an entry for <tref>current vocabulary</tref> having a
<tref>propertyURI</tref> entry that is not null, set that as <em>scheme</em>. Otherwise,
- set <em>scheme</em> to <code>contextual</code>.</li>
+ set <em>scheme</em> to <code>base</code>.</li>
<li>If <em>scheme</em> is <code>base</code> return the <tref>URI reference</tref> constructed
by removing everything following the last SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#")
from <tref>current type</tref> and append the fragment-escaped value of <em>name</em>.</li>
@@ -1074,7 +1159,7 @@
</section>
</section>
-<section class="appendix">
+<section class="appendix informative">
<h2>Markup Examples</h2>
<p>The microdata example below expresses book information as an FRBR Work item.</p>