This document is also available in this non-normative format: diff to previous version.
Copyright © 2011-2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is an experimental work in progress. The concepts described herein are intended to help provide guidance for a future working group. Implementations of this specification, either producers or consumers, should note that it is likely to change significantly prior to any publication as a Working Draft.
This document was published by the HTML Data Task Force as an Editor's Draft. If you wish to make comments regarding this document, please send them to public-html-data-tf@w3.org (subscribe, archives). All feedback is welcome.
Publication as a Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This section is non-normative.
This document describes a means of transforming HTML containing microdata into RDF. HTML Microdata [MICRODATA] is an extension to HTML used to embed machine-readable data to HTML documents. This specification describes transformation directly to RDF [RDF-CONCEPTS].
There are a variety of ways in which a mapping from microdata to RDF might be configured to give a result that is closer to the required result for a particular vocabulary. This specification defines terms that can be used as hooks for vocabulary-specific behavior, which could be defined within a registry or on an implementation-defined basis. However, the HTML Data TF recommends the adoption of a single method of mapping in which every vocabulary is treated as if:
propertyURI
is set to vocabulary
multipleValues
is set to unordered
For background on the trade-offs between these options, see http://www.w3.org/wiki/Mapping_Microdata_to_RDF.
This section is non-normative.
Microdata [MICRODATA] is a way of embedding data in HTML documents using attributes. The HTML DOM is extended to provide an API for accessing microdata information, and the microdata specification defines how to generate a JSON representation from microdata markup.
Mapping microdata to RDF enables consumers to merge data expressed in other RDF-based formats with microdata. It facilitates the use of RDF vocabularies within microdata, and enables microdata to be used with the full RDF toolchain. Some use cases for this mapping are described in Section 1.2 below.
Microdata's data model does not align neatly with RDF.
http://example.org/Cat
can have
both the property color
and the property http://example.org/color
,
and these properties are semantically distinct under microdata. In
RDF, all properties have IRIs.@lang
attributes could
be used to provide datatype and language information for RDF data, this
would be contrary to the microdata specification.Thus, in some places the needs of RDF consumers violate requirements of the microdata specification. This specification highlights where such violations occur and the reasons for them.
This specification allows for vocabulary-specific rules that affect the generation of property URIs and value serializations. This is facilitated by a registry that associates URIs with specific rules based on matching @itemtype values against registered URI prefixes do determine a vocabulary and potentially vocabulary-specific processing rules.
This specification also assumes that consumers of RDF generated from microdata may have to process the results in order to, for example, assign appropriate datatypes to property values.
This section is non-normative.
During the period of the task force, a number of use cases were put forth for the use of microdata in generating RDF:
rdfs:range
of a GoodRelations
property indicates the datatype of the expected value, and GoodRelations
processors will expect values to be cast to that type. Language
information from the HTML needs to be captured as it is common that
multiple values will be used to specify the same information in different
languages.http://schema.org/musicGroupMember
, and an author might express more detail through an ad-hoc
sub-property musicGroupMember/leadVocalist, having the URI
http://schema.org/musicGroupMember/leadVocalist
.This section is non-normative.
Decisions or open issues in the specification are tracked on the Task Force Issue Tracker. These include the following:
The purpose of this specification is to provide input to a future working group that can make decisions about the need for a registry and the details of processing. Among the options investigated by the Task Force are the following:
http://www.w3.org/ns/md#item
mapping at all.rdf:Seq
, or place all values,
whether or not multiple, into some form of collection.The microdata specification [MICRODATA] defines a number of attributes and the way in which those attributes are to be interpreted. The microdata DOM API provides methods and attributes for retrieving microdata from the HTML DOM.
For reference, attributes used for specifying and retrieving HTML microdata are referenced here:
element.itemType
on the element.
The item type is also used to resolve non-URL names to absolute URLs.
Available through the
Microdata DOM API as
element.itemType
.
(See Items
in [MICRODATA]).
In RDF, it is common for people to shorten vocabulary terms via abbreviated URIs that use a 'prefix' and a 'reference'. throughout this document assume that the following vocabulary prefixes have been defined:
dc: | http://purl.org/dc/terms/ |
md: | http://www.w3.org/ns/md# |
rdf: | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs: | http://www.w3.org/2000/01/rdf-schema# |
xsd: | http://www.w3.org/2001/XMLSchema# |
This section is non-normative.
In a perfect world, all processors would be able to generate the same output for a given input without regards to the requirements of a particular vocabulary. However, microdata doesn't provide sufficient syntactic help in making these decisions. Different vocabularies have different needs.
The registry is located at the namespace defined for microdata: http://www.w3.org/ns/md
in
a variety of formats.
The registry associates a URI prefix with one or more key-value pairs denoting processor behavior. A hypothetical JSON representation of such a registry might be the following:
{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "tracks": {"multipleValues": "list"} } }, "http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "list", "properties" { "url": {"multipleValues": "unordered"} } } }
This structure associates mappings for two URIs, http://schema.org/
and
http://microformats.org/profile/hcard
. Items having an item type with a URI
prefix from this registry use the the rules described for that prefix within the scope of that
item type. This mapping currently defines two rules: propertyURI
and
multipleValues
with values to indicate specific behavior. It also allows overrides
on a per-property basis; the properties
key associates an individual name
with overrides for default behavior.
The interpretation of these
rules is defined in the following sections. If an item has no current type or the
registry contains no URI prefix matching current type, a conforming
processor must use the default values defined for these rules.
The concept of a registry, including a hypothetical format, location and updating rules is presented as an abstract concept useful for describing the function of a microdata processor. There are issues surrounding update frequency, URL naming, and how updates are authorized. This spec just considers the semantic content of such a registry and how it can be used to affect processing without defining its representation or update policies.
Richard Ciganiak has pointed out that "Registry" may be the wrong term, as the proposed registry doesn't assign identifiers or manage namespace, it simply provides a mapping between URI prefixss and processor behavior and suggests the term "Whitelist". As more than two values are required, and it describes more than binary behavior, this term isn't appropriate either.
This section is non-normative.
For names which are not absolute URLs,
the propertyURI
rule defines the algorithm for generating an absolute URL
given an evaluation context including a current type, current name and
current vocabulary.
The procedure for generating property URIs is defined in Generate Predicate URI.
Possible values for propertyURI
are the following:
contextual
contextual
URI generation scheme guarantees that generated property URIs are
unique based on the value of current name. This is
required as the microdata data model requires that names are associated with specific
items and do not have a global scope. (See Step 5 in
Generate Predicate URI).
URI creation uses a base URI with query parameters to indicate the in-scope type and name list. Consider the following example:
<span itemscope itemtype="http://microformats.org/profile/hcard"> <span itemprop="n" itemscope> <span itemprop="given-name"> Princeton </span> </span> </span>
The first name n generates the URI
http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n
.
However, the included name given-name is included in untyped item.
The inherited property URI is used to create a new property URI:
http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n.given-name
.
This scheme is compatible with the needs of other RDF serialization formats such as RDF/XML [RDF-SYNTAX-GRAMMAR], which rely on QNames for expressing properties. For example, the generated property URIs can be split as follows:
<rdf:Description xmlns:hcard="http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=" rdf:type="http://microformats.org/profile/hcard"> <hcard:n> <rdf:Description> <hcard:n.given-name> Princeton </hcard:n.given-name> </rdf:Description> </hcard:n> </rdf:Description>
Looking at another example:
<div itemscope itemtype="http://schema.org/Person"> <h2 itemprop="name">Jeni</h2> </div>
This would generate http://www.w3.org/ns/md?type=http://schema.org/Person&prop=name
.
vocabulary
vocabulary
URI generation scheme appends names that are not
absolute URLs to the URI prefix. When generating property URIs, if the URI prefix
does not end with a '/' or '#', a '#' is appended to the URI prefix. (See Step 4
in
Generate Predicate URI.)
URI creation uses a base URL with query parameters to indicate the in-scope type and name list. Consider the following example:
<span itemscope itemtype="http://microformats.org/profile/hcard"> <span itemprop="n" itemscope> <span itemprop="given-name"> Princeton </span> </span> </span>
Given the URI prefix http://microformats.org/profile/hcard
, this
would generate http://microformats.org/profile/hcard#n
and
http://microformats.org/profile/hcard#given-name
. Note that the '#' is automatically
added as a separator.
Looking at another example:
<div itemscope itemtype="http://schema.org/Person"> <h2 itemprop="name">Jeni</h2> </div>
Given the URI prefix http://schema.org/
,
this would generate http://schema.org/name
. Note that if the @itemtype
were http://schema.org/Person/Teacher
, this would generate the same property URI.
If the registry contains no match for current type implementations act as if there is a URI prefix made from the first @itemtype value by stripping either the fragment content or last path segment, if the value has no fragment (See [RFC3986]).
Deconstructing the @itemtype URL to create or identify a vocabulary URI is a violation of the microdata specification which is necessary to support the use of existing vocabularies designed for use with RDF, and shared or inherited properties within all vocabularies.
The default value of propertyURI
is vocabulary
.
<div itemscope itemtype="http://schema.org/Book"> <h2 itemprop="title">Just a Geek</h2> </div>
In this example, assuming no matching entry in the registry,
the URI prefix is constructed by removing the
last path segment, leaving the URI
http://schema.org/
. As there is no explicit propertyURI
,
the default vocabulary
is used, and the resulting property URI would be
http://schema.org/title
.
This section is non-normative.
For items having multiple values for a given property,
the multipleValues
rule defines the algorithm for serializing these values.
Microdata uses document order when generating property values, as defined in
Microdata DOM API
as element.itemValue
. However, many RDF vocabularies expect multiple values to be generated
as triples sharing a common subject and predicate. In some cases, it may be useful to retain value ordering.
The procedure for generating property values is defined in Generate Property Values.
Possible values for multipleValues
are the following:
unordered
list
An example of how this might be specified in a registry is the following:
{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered" }, "http://microformats.org/profile/hcard": { "propertyURI": "type", "multipleValues": {"multipleValues": "list"} } }
Additionally, some vocabularies may wish to specify this on a per-property basis. For example,
within http://schema.org/MusicPlaylist
the tracks
property might depend on the order
of values to to reproduce associated MusicRecording
values.
{
"http://schema.org/": {
"propertyURI": "vocabulary",
"multipleValues": "unordered",
"properties": {
"tracks": {"multipleValues": "list"}
}
}
}
The properties
key takes a JSON Object as a value, which in turn has keys for each
property that is to be given alternate semantics. Each name is implicitly expanded to it's URI
representation as defined in Generate Predicate URI, so that
the behavior is the same whether or not the name is listed as an absolute URL.
The default value of multipleValues
is unordered
.
This section is non-normative.
In microdata, all values are strings. In RDF, values may be resources or may be typed with an appropriate datatype.
In some cases, the type of a microdata value can be determined from the element on which it is specified. In particular:
time
element provides dates and timesUsing information about the content of the document where the microdata is marked up might be a violation of the spirit of the microdata specification, though it does not explicitly say in normative text that consumers cannot use other information from the HTML DOM to interpret microdata.
Additionally, one possible use of a registry would allow vocabularies to be marked with datatype information,
so that a dc:time
value, for example, would be understood to represent a literal with datatype
xsd:date
. This could be done by adding information for each property in the vocabulary requiring
special treatment.
This might be represented using a syntax such as the following:
{
"http://schema.org/": {
"propertyURI": "vocabulary",
"multipleValues": "unordered",
"properties": {
"dateCreated": {"datatype": "http://www.w3.org/2001/XMLSchema#date"}
}
}
}
The datatype
identifies a URI to be used in constructing a typed literal.
In most cases, the relevant datatype for a value can be derived from knowledge of what property the value is for and the syntax of the value itself. Thus, values can be given datatypes in a post-processing step after the mapping of microdata to RDF described by this specification. However, where there is information in the HTML markup, such as knowledge of what element was used to mark up the value, which can help with determining its datatype, that information is used by this specification.
This concept is not explored further at this time, but could be developed further in a future revision of this document.
Transformation of Microdata to RDF makes use of general processing rules described in [MICRODATA] for the treatment of items.
contextual
property URI generation
scheme. Without this scheme, this evaluation context component would not be required.
document.getItems
method.
element.properties
attribute.
a
, area
, audio
,
embed
, iframe
, img
, link
, object
,
source
, track
or video
)
element.itemValue
.
(See relevant attribute descriptions in [HTML5]).
time
element.
element.itemValue
.
http://www.w3.org/2001/XMLSchema#date
.
http://www.w3.org/2001/XMLSchema#time
.
http://www.w3.org/2001/XMLSchema#dateTime
.
See
The time
element
in [HTML5].
time
element is subject to change, and may include
more content types, such as xsd:duration
, xsd:gYear
, xsd:gYearMonth
and xsd:monthDay
in the future.
element.itemValue
with language information set from the
lang
IDL attribute of the property element.
document.getItems
.
(See Associating names with items in [MICRODATA]).
The HTML5/microdata content model for @href
, @src
,
@data
, @itemtype and @itemprop and @itemid is that of
a URL, not a URI or IRI.
A proposed mechanism for specifying the range of property values to be URI reference or IRI could
allow these to be specified as subject or object using a @content
attribute.
A HTML document containing microdata may be converted to any other RDF-compatible document format using the algorithm specified in this section.
A conforming microdata processor implementing RDF conversion must implement a processing algorithm that results in the equivalent triples to those that the following algorithm generates:
Set item list to an empty list.
http://www.w3.org/ns/md#item
When the user agent is to Generate triples for an item item, given an Evaluation Context, it must run the following steps:
This algorithm has undergone substantial change from the original microdata specification [MICRODATA].
element.itemType
of the element defining the item.
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
element.itemType
of the element defining the item.
Predicate URI generation makes use of current type, current name, and current vocabulary from an evaluation context context along with name.
http://example.org/doc
and an @itemprop of 'title', a URI will be constructed
to be http://example.org/doc#title
.
vocabulary
.vocabulary
return the URI reference constructed
by appending the fragment-escaped value of name to current vocabulary,
separated by a U+0023 NUMBER SIGN character (#) unless the current vocabulary ends
with either a U+0023 NUMBER SIGN character (#) or SOLIDUS U+002F (/).contextual
, return the URI reference
constructed as follows:
http://www.w3.org/ns/md?type=
is a prefix of s,
return the concatenation of s, a U+002E FULL STOP character (.) and
the fragment-escaped value of name.
http://www.w3.org/ns/md?type=
,
the fragment-escaped value of s, the string &prop=
,
and the fragment-escaped value of name.
Property value serialization makes use of subject, predicate and values.
properties
which has a JSON Object value, let properties be that value. Otherwise, set properties
to null.multipleValues
, set that as method.multipleValues
, set that as method.unordered
.unordered
,
for each value in values, generate the following triple:
list
:
An RDF Collection is a mechanism for defining ordered sequences of objects in RDF (See RDF Collections in
[RDF-SCHEMA]). As the RDF data-model is that of an unordered graph, a linking method using properties
rdf:first
and rdf:next
is required to be able to specify a particular order.
In the microdata to RDF mapping, RDF Collections are used when an item has more than one value associated with a given property to ensure that the original document order is maintained. The following procedure should be used to generate triples when an item property has more than one value (contained in list):
http://www.w3.org/1999/02/22-rdf-syntax-ns#first
http://www.w3.org/1999/02/22-rdf-syntax-ns#rest
http://www.w3.org/1999/02/22-rdf-syntax-ns#nil
This section is non-normative.
The microdata example below expresses book information as an FRBR Work item.
<dl itemscope itemtype="http://purl.org/vocab/frbr/core#Work" itemid="http://books.example.com/works/45U8QJGZSQKDH8N" lang="en"> <dt>Title</dt> <dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd> <dt>By</dt> <dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd> <dt>Format</dt> <dd itemprop="http://purl.org/vocab/frbr/core#realization" itemscope itemtype="http://purl.org/vocab/frbr/core#Expression" itemid="http://books.example.com/products/9780596007683.BOOK"> <link itemprop="http://purl.org/dc/terms/type" href="http://books.example.com/product-types/BOOK"> Print </dd> <dd itemprop="http://purl.org/vocab/frbr/core#realization" itemscope itemtype="http://purl.org/vocab/frbr/core#Expression" itemid="http://books.example.com/products/9780596802189.EBOOK"> <link itemprop="http://purl.org/dc/terms/type" href="http://books.example.com/product-types/EBOOK"> Ebook </dd> </dl>
Assuming that registry contains a an entry for http://purl.org/vocab/frbr/core#
with propertyURI
set to vocabulary
,
this is equivalent to the following Turtle:
@prefix dc: <http://purl.org/dc/terms/> . @prefix md: <http://www.w3.org/ns/md#> . @prefix frbr: <http://purl.org/vocab/frbr/core#> . <> md:item (<http://books.example.com/works/45U8QJGZSQKDH8N>) . <http://books.example.com/works/45U8QJGZSQKDH8N> a frbr:Work ; dc:creator "Wil Wheaton"@en ; dc:title "Just a Geek"@en ; frbr:realization <http://books.example.com/products/9780596007683.BOOK>, <http://books.example.com/products/9780596802189.EBOOK> . <http://books.example.com/products/9780596007683.BOOK> a frbr:Expression ; dc:type <http://books.example.com/product-types/BOOK> . <http://books.example.com/products/9780596802189.EBOOK> a frbr:Expression ; dc:type <http://books.example.com/product-types/EBOOK> .
The following snippet of HTML has microdata for two people with the same address. This illustrates two items referencing a third item, and how only a single RDF resource definition is created for that third item.
<p> Both <span itemscope="" itemtype="http://microformats.org/profile/hcard" itemref="home"> <span itemprop="fn" ><span itemprop="n" itemscope="" ><span itemprop="given-name">Princeton</span></span></span> </span> and <span itemscope="" itemtype="http://microformats.org/profile/hcard" itemref="home"> <span itemprop="fn" ><span itemprop="n" itemscope="" ><span itemprop="given-name">Trekkie</span></span></span> </span> live at <span id="home" itemprop="adr" itemscope=""> <span itemprop="street-address">Avenue Q</span>. </span> </p>
Assuming that registry contains a an entry for http://microformats.org/profile/hcard
with propertyURI
set to vocabulary
,
it generates these triples expressed in Turtle:
@prefix md: <http://www.w3.org/ns/md#> . @prefix hcard: <http://microformats.org/profile/hcard#> . <> md:item ( [ a <http://microformats.org/profile/hcard>; hcard:fn "Princeton"; hcard:n [ hcard:given-name "Princeton" ]; hcard:adr _:a ] [ a <http://microformats.org/profile/hcard>; hcard:fn "Trekkie"; hcard:n [ hcard:given-name "Trekkie" ]; hcard:adr _:a ]) . _:a hcard:street-address "Avenue Q" .
The following snippet of HTML has microdata for a playlist, and illustrates overriding a property to place elements in an RDF Collection:
<div itemscope="" itemtype="http://schema.org/MusicPlaylist"> <span itemprop="name">Classic Rock Playlist</span> <meta itemprop="numTracks" content="2"/> <p>Including works by <span itemprop="byArtist">Lynard Skynard</span> and <span itemprop="byArtist">AC/DC</span></p>. <div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording"> 1.<span itemprop="name">Sweet Home Alabama</span> - <span itemprop="byArtist">Lynard Skynard</span> <link href="sweet-home-alabama" itemprop="url" /> </div> <div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording"> 2.<span itemprop="name">Shook you all Night Long</span> - <span itemprop="byArtist">AC/DC</span> <link href="shook-you-all-night-long" itemprop="url" /> </div> </div>
Assuming that registry contains a an entry for http://schema.org/
with propertyURI
set to vocabulary
,
multipleValues
set to unordered
with the properties
track
and byArtist
having multipleValues
set to list
,
it generates these triples expressed in Turtle:
@prefix md: <http://www.w3.org/ns/md#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix schema: <http://schema.org/> . <> md:item ([ a schema:MusicPlaylist; schema:name "Classic Rock Playlist"; schema:byArtist ("Lynard Skynard" "AC/DC"); schema:numTracks "2"; schema:tracks ( [ a schema:MusicRecording; schema:byArtist ("Lynard Skynard");; schema:name "Sweet Home Alabama"; schema:url <sweet-home-alabama>] [ a schema:MusicRecording; schema:byArtist ("AC/DC");; schema:name "Shook you all Night Long"; schema:url <shook-you-all-night-long>] )]); .
The following is an example registry in JSON format.
{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "blogPosts": {"multipleValues": "list"}, "breadcrumb": {"multipleValues": "list"}, "byArtist": {"multipleValues": "list"}, "creator": {"multipleValues": "list"}, "episodes": {"multipleValues": "list"}, "events": {"multipleValues": "list"}, "founders": {"multipleValues": "list"}, "itemListElement": {"multipleValues": "list"}, "musicGroupMember": {"multipleValues": "list"}, "performerIn": {"multipleValues": "list"}, "performers": {"multipleValues": "list"}, "producer": {"multipleValues": "list"}, "recipeInstructions": {"multipleValues": "list"}, "seasons": {"multipleValues": "list"}, "subEvents": {"multipleValues": "list"}, "tracks": {"multipleValues": "list"} } }, "http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "unordered" }, "http://microformats.org/profile/hcalendar#": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "categories": {"multipleValues": "list"} } }, "http://n.whatwg.org/work": { "propertyURI": "contextual", "multipleValues": "list" } }