W3C

Microdata to RDF

Transformation from HTML+Microdata to RDF

W3C Editor's Draft 19 November 2011

This version:
https://dvcs.w3.org/hg/htmldata/microdata-rdf/
Latest published version:
http://www.w3.org/TR/microdata-rdf/
Latest editor's draft:
https://dvcs.w3.org/hg/htmldata/microdata-rdf/
Previous version:
https://dvcs.w3.org/hg/htmldata/raw-file/be6c462f97a0/ED/microdata-rdf/20111028/index.html
Editor:
Gregg Kellogg , Kellogg Associates
Authors:
Ian Hickson , Google, Inc.
Gregg Kellogg , Kellogg Associates
Jeni Tennison , Independent

This document is also available in this non-normative format: diff to previous version .


Abstract

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is an experimental work in progress. The concepts described herein are intended to help provide guidance for a future working group. Implementations of this specification, either producers or consumers, should note that it is likely to change significantly prior to any publication as a Working Draft.

This document was published by the HTML Data Task Force as an Editor's Draft. If you wish to make comments regarding this document, please send them to public-html-data-tf@w3.org ( subscribe , archives ). All feedback is welcome.

Publication as a Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .

Table of Contents

1. Introduction

This section is non-normative.

This document describes a means of transforming HTML containing Microdata microdata into RDF. HTML Microdata [[!MICRODATA]] [ MICRODATA ] is an extension to HTML used to embed machine-readable data to HTML documents. This specification describes transformation directly to RDF [[RDF-CONCEPTS]]. [ RDF-CONCEPTS ].

There are a variety of ways in which a mapping from microdata to RDF might be configured to give a result that is closer to the required result for a particular vocabulary. This specification defines terms that can be used as hooks for vocabulary-specific behavior, which could be defined within a registry or on an implementation-defined basis. However, the HTML Data TF recommends the adoption of a single method of mapping in which every vocabulary is treated as if:

For background on the trade-offs between these options, see http://www.w3.org/wiki/Mapping_Microdata_to_RDF .

1.1 Background

This section is non-normative.

Microdata [ MICRODATA ] is a way of expressing metadata embedding data in HTML documents using attributes. A previous version of Microdata [[!MICRODATA]] included rules for generating RDF, but current Editor's Drafts have removed the explicit transformation procedure. Microdata The HTML DOM is now used as an API extended to access data from within provide an HTML DOM API for accessing microdata information, and as the microdata specification defines how to generate a JSON serialization. representation from microdata markup.

The original Mapping microdata to RDF transformation process created URIs enables consumers to merge data expressed in other RDF-based formats with microdata. It facilitates the use of RDF vocabularies within microdata, and enables microdata to be used with the full RDF toolchain. Some use cases for this mapping are described in Section 1.2 below.

Microdata's data model does not align neatly with RDF.

Thus, in some places the needs of RDF Transformation does retain both datatype consumers violate requirements of the microdata specification. This specification highlights where such violations occur and language information when it is available. the reasons for them.

This specification is an update to the original RDF transformation process in addition to allows for vocabulary -specific rules that affect the generation of property URIs and value serializations. This is facilitated by a registry that associates URIs with specific rules based on matching itemtype @itemtype values against registered URI prefixes do determine a vocabulary and potentially vocabulary-specific processing rules.

This specification also assumes that consumers of RDF generated from microdata may have to process the results in order to, for example, assign appropriate datatypes to property value s.

1.2 Use Cases

This section is non-normative.

During the period of the task force, a number of use cases were put forth for the use of Microdata microdata in generating RDF:

1.3 Issues

This section is non-normative.

Decisions or open issues in the specification are tracked on the Task Force Issue Tracker . These include the following:

ISSUE 1
Vocabulary specific parsing for Microdata
ISSUE 2
Should Microdata-RDF generate XMLLiteral values values. This issue has been closed with no change as this would violate microdata's data model.
ISSUE 3
Should the registry allow property datatype specification.
ISSUE 4
Should the registry allow a name or URL to be used as an alias for @itemid .
Goals

The purpose of this specification is to provide input to a future working group that can make decisions about the need for a registry and the details of processing. Among the options investigated by the Task Force are the following:

2. Attributes and Syntax

The Microdata microdata specification [[!MICRODATA]] [ MICRODATA ] defines a number of attributes and the way in which those attributes are to be interpreted. This section describes those attributes, with reference to their original definition. content An attribute appropriate for use with the meta element for creating invisible properties. data An attribute appropriate for use with the object element for creating URI URI reference s. datetime The microdata DOM API An attribute appropriate provides methods and attributes for use with retrieving microdata from the date element for creating typed literals. The date element will likely be replaced with something more general purpose. href HTML DOM.

An attribute appropriate for use with a , area or link elements

For reference, attributes used for creating URI reference s. specifying and retrieving HTML microdata are referenced here:

itemid
An attribute containing a URI URL used to identify the subject of triples associated with this item . Available through the . (See Microdata DOM API Items as element.itemId . (See in [ Section 3.2 Items MICRODATA in [[!MICRODATA]]). ]).
itemprop
An attribute used to identify one or more properties to one ore more name s of an item s. An itemprop @itemprop contains a space separated list of name s which may either by absolute URI URL s or terms associated with the type of the item as defined by the referencing item 's itemtype . Available through the item type . (See Microdata DOM API Items as element.itemProp . (See in [ Section 3.3 Names: the itemprop attribute MICRODATA of [[!MICRODATA]]). ]).
itemref
An additional attribute on an item element that references additional elements containing property definitions to be applied to the referencing item . The attribute value is an unordered list of ID references to elements within the same document. Available through the . (See Microdata DOM API Items as element.itemRef . (See in [ Section 3.2 Items MICRODATA of [[!MICRODATA]]). ]).
itemscope
An boolean attribute identifying an element as an item. item . (See Section 3.2 Items of [[!MICRODATA]]). in [ MICRODATA ]).
itemtype
An additional attribute on an item element used to specify the one or more types of an item . The item type of an item . is the first value returned from element.itemType on the element. The specified item type is also used to resolve non-URI non-URL name s to absolute URI URL s. Available through the Microdata DOM API as element.itemType . (See Section 3.2 Items of [[!MICRODATA]]). src in [ MICRODATA An attribute appropriate for use with audio , embed , iframe , img , source , track , or video elements for creating invisible properties. ]).

In RDF, it is common for people to shorten vocabulary terms via abbreviated URIs that use a 'prefix' and a 'reference'. throughout this document assume that the following vocabulary prefixes have been defined:

dc: http://purl.org/dc/terms/
md: http://www.w3.org/ns/md#
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#
xsd: http://www.w3.org/2001/XMLSchema#

3. Vocabulary Registry

This section is non-normative.

In a perfect world, all processors would be able to generate the same output for a given input without regards to the requirements of a particular vocabulary . . However, Microdata microdata doesn't provide sufficient syntactic help in making these decisions. Different vocabularies have different needs.

The registry is located at the namespace defined for microdata: http://www.w3.org/ns/md in a variety of formats.

The registry associates a URI prefix with one or more key-value pairs denoting processor behavior. A hypothetical JSON representation of such a registry might be the following:

{
  "http://schema.org/": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered",
    "properties": {
      "tracks": {"multipleValues": "list"}
    }
  },
  "http://microformats.org/profile/hcard": {
    "propertyURI":    "vocabulary",
    "multipleValues": "list",
    "properties" {
      "url": {"multipleValues": "unordered"}
    }
  }
}

This structure associates mappings for two URIs, http://schema.org/ and http://microformats.org/profile/hcard . Items having an itemtype item type with a URI prefix from this registry use the the rules described for that prefix within the scope of that itemtype . item type . This mapping currently defines two rules: propertyURI and multipleValues with values to indicate specific behavior. It also allows overrides on a per-property basis; the properties key associates an individual name with overrides for default behavior. The interpretation of these rules is defined in the following sections. If an item has no current type or the registry contains no URI prefix matching current type , , a conforming processor MUST must use the default values defined for these rules.

The concept of a registry , including a hypothetical format, location and updating rules is presented as an abstract concept useful for describing the function of a microdata processor. There are issues surrounding update frequency, URL naming, and how updates are authorized. This spec just considers the semantic content of such a registry and how it can be used to affect processing without defining its representation or update policies.

Richard Ciganiak has pointed out that "Registry" "Registry" may be the wrong term, as the proposed registry doesn't assign identifiers or manage namespace, it simply provides a mapping between URI prefixes prefix ss and processor behavior and suggests the term "Whitelist". "Whitelist". As more than two values are required, and it describes more than binary behavior, this term isn't appropriate either.

Anytime we discuss maintaining such a database, there are issues surrounding update frequency, URL naming, and how updates are authorized. This remains an open issue. This spec just considers the semantic content of such a list and how it can be used to affect processing without defining its representation or update policies. The URL of the registry must be defined.

3.1 Property URI Generation

This section is non-normative.

For property names name s which are not absolute URI URL s, the propertyURI rule defines the algorithm for generating an absolute URI URL given an evaluation context including an a current type , current name and current property . vocabulary .

The procedure for generating property URIs is defined in Generate Predicate URI .

Possible values for propertyURI are the following:

context contextual
The context contextual URI generation scheme guarantees that generated property URIs are unique for each current type and based on the value of current property combination. name . This is required as the Microdata microdata data model requires that property names name s are associated with specific items and do not have a global scope. (See Step 5 in Generate Predicate URI ).

URI creation uses a base URI with query parameters to indicate the in-scope type and name list. Consider the following example:

<span itemscope itemtype="http://microformats.org/profile/hcard">

  <span itemprop="n" itemscope>
    <span itemprop="given-name">
      Princeton
    </span>
  </span>
</span>

The type first name n generates the URI generation scheme appends http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n . However, the included name given-name is included in untyped item. The inherited property names that are not absolute URI s is used to current type using create a "#" separator. new property URI: http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n.given-name .

This scheme is compatible with the needs of other RDF serialization formats such as RDF/XML [ RDF-SYNTAX-GRAMMAR ], which rely on QNames for expressing properties. For example, the generated property URIs can be split as follows:

<rdf:Description xmlns:hcard="http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop="

                 rdf:type="http://microformats.org/profile/hcard">
  <hcard:n>
    <rdf:Description>
      <hcard:n.given-name>
        Princeton
      </hcard:n.given-name>
    </rdf:Description>
  </hcard:n>
</rdf:Description>

Looking at another example:

<div itemscope itemtype="http://schema.org/Person">

  <h2 itemprop="name">Jeni</h2>
</div>

This would generate http://www.w3.org/ns/md?type=http://schema.org/Person&prop=name .

vocabulary
The vocabulary URI generation scheme appends property names name s that are not absolute URI URL s to the URI prefix . . When generating property URIs, if the URI prefix does not end with a '/' or '#', a '#' is appended to the URI prefix . (See Step 4 in Generate Predicate URI .)

URI creation uses a base URL with query parameters to indicate the in-scope type and name list. Consider the following example:

<span itemscope itemtype="http://microformats.org/profile/hcard">

  <span itemprop="n" itemscope>
    <span itemprop="given-name">
      Princeton
    </span>
  </span>
</span>

Given the URI prefix http://microformats.org/profile/hcard , this would generate http://microformats.org/profile/hcard#n and http://microformats.org/profile/hcard#given-name . Note that the '#' is automatically added as a separator.

Looking at another example:

<div itemscope itemtype="http://schema.org/Person">

  <h2 itemprop="name">Jeni</h2>
</div>

Given the URI prefix http://schema.org/ , this would generate http://schema.org/name . Note that if the @itemtype were http://schema.org/Person/Teacher , this would generate the same property URI.

If the registry contains no match for current type implementations act as if there is a URI prefix made from the first @itemtype value by stripping either the fragment content or last path segment , if the value has no fragment (See [ RFC3986 ]).

Deconstructing the @itemtype URL to create or identify a vocabulary URI is a violation of the microdata specification which is necessary to support the use of existing vocabularies designed for use with RDF, and shared or inherited properties within all vocabularies.

The default value of propertyURI is context vocabulary .

<div itemscope itemtype="http://schema.org/Book">
  <h2 itemprop="title">Just a Geek</h2>
</div>

In this example, assuming no matching entry in the registry , the URI prefix is constructed by removing the last path segment , leaving the URI http://schema.org/ . As there is no explicit propertyURI , the default vocabulary is used, and the resulting property URI would be http://schema.org/title .

3.2 Value Ordering

This section is non-normative.

For items having multiple values for a property, given property , the multipleValues rule defines the algorithm for serializing these values. This is required as the Microdata data model requires that values be strictly ordered uses document order when generating property value s, as defined in Microdata DOM API as element.itemValue . However, many RDF vocabularies expect multiple values to be generated as triples sharing a common subject and predicate. In some cases, it may be useful to retain value ordering.

The procedure for generating property value s is defined in Generate Property Values .

Possible values for multipleValues are the following:

unordered
Values are serialized without ordering using a common subject and predicate. (See Step 7 in Generate Property Values ).
list
Multi-valued itemprop @itemprop s are serialized using an RDF Collection . . (See Step 8 in Generate Property Values ).

An example of how this might be specified in a registry is the following:

{

  "http://schema.org/": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered"

  },
  "http://microformats.org/profile/hcard": {
    "propertyURI":    "type",
    "multipleValues": {"multipleValues": "list"}

  }
}

Additionally, some vocabularies may wish to specify this on a per-property basis. For example, within http://schema.org/MusicPlaylist the tracks property might depend on the order of values to to reproduce associated MusicRecording values.

{

 "http://schema.org/": {
   "propertyURI": "vocabulary",
   "multipleValues": "unordered",
   "properties": {
     "tracks": {"multipleValues": "list"}
   }
 }
}

The properties key takes a JSON Object as a value, which in turn has keys for each property that is to be given alternate semantics. Each name is implicitly expanded to it's URI representation as defined in Generate Predicate URI , so that the behavior is the same whether or not the name is listed as an absolute URL .

The default value of multipleValues is list unordered .

3.3 Value Typing

This section is non-normative.

In microdata, all values are strings. In RDF, values may be resources or may be typed with an appropriate datatype.

In some cases, the type of a microdata value can be determined from the element on which it is specified. In particular:

Using information about the content of the document where the microdata is marked up might be a violation of the spirit of the microdata specification, though it does not explicitly say in normative text that consumers cannot use other information from the HTML DOM to interpret microdata.

Additionally, one possible use of a registry would allow vocabularies to be marked with datatype information, so that a dc:time value, for example, would be understood to represent a literal with datatype xsd:date . This could be done by adding information for each property in the vocabulary requiring special treatment.

This might be represented using a syntax such as the following:

{

 "http://schema.org/": {
   "propertyURI": "vocabulary",
   "multipleValues": "unordered",
   "properties": {
     "dateCreated": {"datatype": "http://www.w3.org/2001/XMLSchema#date"}
   }
 }
}

The datatype identifies a URI to be used in constructing a typed literal .

In most cases, the relevant datatype for a value can be derived from knowledge of what property the value is for and the syntax of the value itself. Thus, values can be given datatypes in a post-processing step after the mapping of microdata to RDF described by this specification. However, where there is information in the HTML markup, such as knowledge of what element was used to mark up the value, which can help with determining its datatype, that information is used by this specification.

This concept is not explored further at this time, but could be developed further in a future revision of this document.

4. Algorithm

Transformation of Microdata to RDF makes use of general processing rules described in [[!MICRODATA]] [ MICRODATA ] for the treatment of item s.

4.1 Algorithm Terms

absolute URI URL
As The term absolute URL is defined in [[!RFC3986]], an absolute URI contains both scheme and scheme-specific-part s. [ HTML5 ].
blank node
A blank node is a node in a graph that is neither a URI reference nor a literal . . Item s without a global identifier have a blank node allocated to them. (See [[RDF-CONCEPTS]]). [ RDF-CONCEPTS ]).
document base
The base address of the document being processed, as defined in Section 2.6.3 Resolving URLs of [[!HTML5]] . in [ HTML5 ].
evaluation context
A data structure including the following elements:
memory
a mapping of items to subjects, initially empty empty;
current property name
an absolute URI URL for the current property , in-scope name , used for generating URIs for properties of items without an explicit itemtype . item type ;
current name is required for the contextual property URI generation scheme. Without this scheme, this evaluation context component would not be required.
current type
an absolute URI URL for the current type , used when an item does not contain an explicit itemtype item type ;
current vocabulary
an absolute URI URL for the current vocabulary , from the registry .
item
An item is defined as described by an element containing an itemscope @itemscope attribute. (See The list of top-level microdata items may be retrieved using the Microdata DOM API Section 3.2 Items document.getItems of [[!MICRODATA]]). method.
item properties
The mechanism for finding the properties of an item are described in Section 3.5 Associating names with items The list of [[!MICRODATA]]. Available through item properties items may be retrieved using the Microdata DOM API as element.properties . attribute.
fragment-escape
The term fragment-escape is defined in [ HTML5 ]. This involves transforming elements added to URLs to ensure that the result remains a valid URL. The following characters are subject to percent escaping:
  • U+0022 QUOTATION MARK character (")
  • U+0023 NUMBER SIGN character (#)
  • U+0025 PERCENT SIGN character (%)
  • U+003C LESS-THAN SIGN character (<)
  • U+003E GREATER-THAN SIGN character (>)
  • U+005B LEFT SQUARE BRACKET character ([)
  • U+005C REVERSE SOLIDUS character (\)
  • U+005D RIGHT SQUARE BRACKET character (])
  • U+005E CIRCUMFLEX ACCENT character (^)
  • U+007B LEFT CURLY BRACKET character ({)
  • U+007C VERTICAL LINE character (|)
  • U+007D RIGHT CURLY BRACKET character (})
global identifier
The value of an item 's itemid @itemid attribute, if it has one. (See Section 3.2 Items of [[!MICRODATA]]). in [ MICRODATA ]).
literal
Literals a are values such as strings and dates, including typed literal s and plain literal s. (See [[RDF-CONCEPTS]]). [ RDF-CONCEPTS ]).
property
Each name identifies a property of an item . An item may have multiple elements sharing the same name , creating a multi-valued property .
property names
The tokens of an element's itemprop @itemprop attribute. Each token is a name . (See Section 3.3 Names: the itemprop attribute of [[!MICRODATA]]). in [ MICRODATA ]).
property value
The property value of a name-value pair added by an element with an itemprop @itemprop attribute depends on the element. Available through the Microdata DOM API as element.itemValue . (Updated from Section 3.4 Values of [[!MICRODATA]]).
If the element also has an itemscope no @itemprop attribute
The value is the item created by the element as a URI reference or blank node null and no triple should be generated.
If the element is a meta element creates an item (by having an @itemscope attribute)
The value is the plain literal created from the value of the element's content attribute, if any, URI reference or blank node returned from generate the empty string if there is no such attribute. If the language of the element is known it MUST be used when creating the plain literal . triples for that item .
If the element is an a URL property element ( a , area , audio , embed , iframe , img , link , object , source , track , or video element with a src attribute )
The value is a URI reference that results created from resolving the value of the element's src attribute relative to the element at the time the element.itemValue . (See relevant attribute is set. descriptions in [ HTML5 ]).
If the element is an a , area , or link time element with an href attribute element.
The value is a URI reference that results literal made from resolving the value of the element's href attribute relative to the element at the time the attribute is set. If the element is an object element with a data attribute The value is URI reference that results from resolving the value of the element's data attribute relative to the element at the time the attribute is set. element.itemValue .
If the element is a time element with a datetime attribute The time element will likely be replaced with something more general purpose.
If the value has the lexical form of xsd:date [[!RDF-SCHEMA]]. [ RDF-SCHEMA ].
The value is a typed literal composed of the value and http://www.w3.org/2001/XMLSchema#date .
If the value has the lexical form of xsd:time [[!RDF-SCHEMA]]. [ RDF-SCHEMA ].
The value is a typed literal composed of the value and http://www.w3.org/2001/XMLSchema#time .
If the value has the lexical form of xsd:dateTime [[!RDF-SCHEMA]]. [ RDF-SCHEMA ].
The value is a typed literal composed of the value and http://www.w3.org/2001/XMLSchema#dateTime .
Otherwise
The value is a plain literal created from the value. value with language information set from the lang IDL attribute of the property element.

See The time element in [ HTML5 ].

The content model of the time element is subject to change, and may include more content types, such as xsd:duration , xsd:gYear , xsd:gYearMonth and xsd:monthDay in the future.
Otherwise
The value is a plain literal , created from element.itemValue with the language information set from the language lang IDL attribute of the element, if it is not unknown. property element.
top-level item
An item which does not contain an itemprop @itemprop attribute. Available through the Microdata DOM API as document.getItems . (See Section 3.5 Associating names with items of [[!MICRODATA]]). in [ MICRODATA ]).
URI reference
URI references are suitable to be used in subject , predicate or object positions within an RDF triple, as opposed to a literal value that may contain a string representation of a URI. (See [[RDF-CONCEPTS]]). [ RDF-CONCEPTS ]).

The HTML5/microdata content model for @href , @src , @data , @itemtype and @itemprop and @itemid is that of a URL, not a URI or IRI.

A proposed mechanism for specifying the range of property value s to be URI reference or IRI could allow these to be specified as subject or object using a @content attribute.

vocabulary
A vocabulary is a collection of URIs, suitable for use as an itemtype @itemtype or itemprop @itemprop value, that share a common URI prefix . . That prefix is the vocabulary URI. A vocabulary URL URI is not allowed to be a prefix of another vocabulary URI.
This definition differs from the language in the HTML spec and is just for the purpose of this document. In HTML, a vocabulary is a specification, and doesn't have a URI. In our view, if one specification defines ten itemtype @itemtype s, then these could be treated as one vocabulary or as ten distinct vocabularies; it is entirely up to the vocabulary creator.

4.2 RDF Conversion Algorithm

A HTML document containing Microdata MAY microdata may be converted to any other RDF-compatible document format using the algorithm specified in this section.

The algorithm below is designed for DOM-based implementations with CSS selector access to elements. A conforming Microdata microdata processor implementing RDF conversion MUST must implement a processing algorithm that results in the equivalent triples to those that the following algorithm generates:

Set item list to an empty list.

  1. For each element that is also a top-level item run the following algorithm:
    1. Generate the triples for an item item , using the evaluation context . . Let result be the ( URI reference or blank node ) subject returned.
    2. Append result to item list .
  2. If item list contains multiple values, generate Generate an RDF Collection list from the ordered list of values. Set value to the value returned from generate an RDF Collection .
  3. Otherwise, if item list contains a single value set value to that value. Generate the following triple:
    subject
    Document base
    predicate
    http://www.w3.org/1999/xhtml/microdata#item http://www.w3.org/ns/md#item
    object
    value

4.3 Generate the triples

When the user agent is to Generate triples for an item item , given an Evaluation Context , , it must run the following steps:

This algorithm has undergone substantial change from the original Microdata microdata specification [[!MICRODATA]]. [ MICRODATA ].

  1. If there is an entry for item in memory , , then let subject be the subject of that entry. Otherwise, if item has a global identifier and that global identifier is an absolute URI , URL , let subject be that global identifier . . Otherwise, let subject be a new blank node . .
  2. Add a mapping from item to subject in memory
  3. If For each item type has an itemtype attribute, extract returned from element.itemType of the value as type . element defining the item .
    1. If type is an absolute URI , URL , generate the following triple:
      subject
      subject
      predicate
      http://www.w3.org/1999/02/22-rdf-syntax-ns#type
      object
      type (as a URI reference )
  4. Set type to the first value returned from element.itemType of the element defining the item .
  5. If type is not an absolute URI , URL , set it to current type from the Evaluation Context if not empty.
  6. If the registry contains a URI prefix that is a character for character match of type up to the length of the URI prefix , , set vocab as that URI prefix . .
  7. Otherwise, if type is not empty, construct vocab by removing everything following the last SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#") from type .
  8. Update evaluation context setting current vocabulary to vocab .
  9. Set property list to an empty array mapping between properties and to one or more ordered value s as established below.
  10. For each element element that has one or more property names and is one of the properties of the item item , in the order those elements are given by the algorithm that returns the properties of the item , , run the following substep:
    1. For each name in the element's property names , , run the following substeps:
      1. Let context to be a copy of evaluation context with current type set to type and current vocabulary set to vocab .
      2. Let predicate be the result of generate predicate URI using context and name . Update context by setting current property name to predicate .
      3. Let value be the property value of element .
      4. If value is an item , , then generate the triples for value using context . Replace value by the subject returned from those steps.
      5. Add value to property list for predicate .
  11. For each predicate in property list :
    1. Generate property values using a copy of evaluation context with current property set to predicate and current vocabulary set to vocab along with subject , predicate and the list of values associated with predicate from property list as values .
  12. Return subject

4.4 Generate Predicate URI

Predicate URI generation makes use of current type , , current property name , and current vocabulary from an evaluation context context along with name .

  1. If name is an absolute URI , URL , return name as a URI reference . .
  2. If current type from context is null, there can be no current vocabulary . Return the URI reference that is the document base with its fragment set to the fragment-escape d value of name

    This rule is intended to allow for a the case where no type is set, and therefore there is no vocabulary from which to extract rules. For example, if there is a document base of http://example.org/doc and an @itemprop of 'title', a URI will be constructed to be http://example.org/doc#title .
  3. Otherwise, if current vocabulary from context is not null and registry has an entry for current vocabulary having a propertyURI entry that is not null, set that as method scheme . Otherwise, set method scheme to contextual vocabulary .
  4. If method scheme is vocabulary return the URI reference constructed by appending the fragment escaped fragment-escape d value of name to current vocabulary . If method is type , return the URI reference constructed as follows: Let s be current type from context . If s does not contain a U+0023 NUMBER SIGN character (#), then append , separated by a U+0023 NUMBER SIGN character (#) to s . Return the concatenation of s and unless the fragment-escaped value of name as a URI reference . Otherwise, if current type from context return the URI reference constructed as follows: Let s be document base . If s does not contain a U+0023 NUMBER SIGN character (#), then append vocabulary ends with either a U+0023 NUMBER SIGN character (#) to s . or SOLIDUS U+002F (/).
  5. Return the concatenation of s and the fragment-escaped value of
  6. If name scheme as a URI reference . Otherwise, is contextual , return the URI reference constructed as follows:
    1. Let s be current type from context .
    2. If http://www.w3.org/ns/md?type= is a prefix of s , return the last character concatenation of s is not a U+003A COLON character (:), append a U+0025 PERCENT SIGN character (%), , a U+0032 DIGIT TWO U+002E FULL STOP character (2), (.) and a U+0030 DIGIT ZERO character (0) to s. Append the fragment-escaped fragment-escape d value of name to s .
    3. Return Otherwise, return the concatenation of http://www.w3.org/1999/xhtml/microdata# and http://www.w3.org/ns/md?type= , the fragment-escaped fragment-escape d value of s as a URI reference . , the string &prop= , and the fragment-escape d value of name .

4.5 Generate Property Values

Property value serialization makes use of current vocabulary from an evaluation context context along with subject , predicate and values .

  1. Let If the registry contains a URI prefix that is a character for character match of predicate be current property from up to the length of the URI prefix, set context . vocab as that URI prefix. Otherwise set vocab to null.
  2. If current vocabulary from context vocab is not null and registry has an entry for current vocabulary having vocab that is a JSON Object, let registry object be that value. Otherwise set registry object to null.
  3. If registry object is not null and registry object contains key multipleValues properties entry which has a JSON Object value, let properties be that value. Otherwise, set properties to null.
  4. If properties is not null, and properties contains a key, which after Generate Predicate URI expansion has a value which is a JSON Object, let property override be that value. Otherwise, set property override to null.
  5. If property override contains the key multipleValues , set that as method .
  6. Otherwise, if registry object con contains the key multipleValues , set that as method .
  7. Otherwise, set method to list unordered .
  8. If method is unordered , foreach for each value in values , generate the following triple:
    subject
    subject
    predicate
    predicate
    object
    value
  9. Otherwise, if method is list :
    1. If values contains multiple values, generate an RDF Collection list from the ordered list of values. Set value to the value returned from generate an RDF Collection .
    2. Otherwise, if values contains a single value set value to that value. Generate the following triple:
      subject
      subject
      predicate
      predicate
      object
      value

4.6 Generate RDF Collection

An RDF Collection is a mechanism for defining ordered sequences of objects in RDF (See Section 5.2 RDF Collections in [[!RDF-SCHEMA]]). [ RDF-SCHEMA ]). As the RDF data-model is that of an unordered graph, a linking method using properties rdf:first and rdf:next is required to be able to specify a particular order.

In the Microdata microdata to RDF mapping, RDF Collection s are used when an item has more than one value associated with a given property to ensure that the original document order is maintained. The following procedure should be used to generate triples when an item property has more than one value (contained in list ):

  1. Create a new array array containing a blank node for every value in list .
  2. For each pair of bnode and value from array and value from list the following triple is generated:
    subject
    bnode
    predicate
    http://www.w3.org/1999/02/22-rdf-syntax-ns#first
    object
    value
  3. For each bnode in array the following triple is generated:
    subject
    bnode
    predicate
    http://www.w3.org/1999/02/22-rdf-syntax-ns#rest
    object
    next element bnode in array or, if that does not exist, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil
  4. Return the first blank node from array .

A. Markup Examples

This section is non-normative.

The Microdata microdata example below expresses book information as an FRBR Work item.

<dl itemscope
    itemtype="http://purl.org/vocab/frbr/core#Work"
    itemid="http://books.example.com/works/45U8QJGZSQKDH8N"
    lang="en">
 <dt>Title</dt>
 <dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd>
 <dt>By</dt>
 <dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd>
 <dt>Format</dt>
 <dd itemprop="http://purl.org/vocab/frbr/core#realization"
     itemscope
     itemtype="http://purl.org/vocab/frbr/core#Expression"
     itemid="http://books.example.com/products/9780596007683.BOOK">
  <link itemprop="http://purl.org/dc/terms/type" href="http://books.example.com/product-types/BOOK">
  Print
 </dd>
 <dd itemprop="http://purl.org/vocab/frbr/core#realization"
     itemscope
     itemtype="http://purl.org/vocab/frbr/core#Expression"
     itemid="http://books.example.com/products/9780596802189.EBOOK">
  <link itemprop="http://purl.org/dc/terms/type" href="http://books.example.com/product-types/EBOOK">
  Ebook
 </dd>
</dl>

Assuming that registry contains a an entry for http://purl.org/vocab/frbr/core# with propertyURI set to vocabulary , this is equivalent to the following Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix md: <http://www.w3.org/ns/md#> .
@prefix frbr: <http://purl.org/vocab/frbr/core#> .

<> md:item (<http://books.example.com/works/45U8QJGZSQKDH8N>) .

<http://books.example.com/works/45U8QJGZSQKDH8N> a frbr:Work ;
  dc:creator "Wil Wheaton"@en ;
  dc:title "Just a Geek"@en ;
  frbr:realization <http://books.example.com/products/9780596007683.BOOK>,
    <http://books.example.com/products/9780596802189.EBOOK> .

<http://books.example.com/products/9780596007683.BOOK> a frbr:Expression ;
  dc:type <http://books.example.com/product-types/BOOK> .

<http://books.example.com/products/9780596802189.EBOOK> a frbr:Expression ;
dc:type
<http://books.example.com/product-types/EBOOK>
.

The following snippet of HTML has microdata for two people with the same address: address. This illustrates two item s referencing a third item, and how only a single RDF resource definition is created for that third item.

<p>
 Both
 <span itemscope="" itemtype="http://microformats.org/profile/hcard" itemref="home">
   <span itemprop="fn"
       ><span itemprop="n" itemscope=""
       ><span itemprop="given-name">Princeton</span></span></span>
  </span>
 and
 <span itemscope="" itemtype="http://microformats.org/profile/hcard" itemref="home">
   <span itemprop="fn"
     ><span itemprop="n" itemscope=""
       ><span itemprop="given-name">Trekkie</span></span></span>
  </span>
 live at
 <span id="home" itemprop="adr" itemscope="">
   <span itemprop="street-address">Avenue Q</span>.
 </span>
</p>

Assuming that registry contains a an entry for http://microformats.org/profile/hcard with propertyURI set to type vocabulary , it generates these triples expressed in Turtle:

@prefix md: <http://www.w3.org/ns/md#> .
@prefix hcard: <http://microformats.org/profile/hcard#> .

<> md:item (
  [ a <http://microformats.org/profile/hcard>;
    hcard:fn "Princeton";
    hcard:n [ hcard:given-name "Princeton" ];
    hcard:adr _:a
  ]
  [ a <http://microformats.org/profile/hcard>;
    hcard:fn "Trekkie";
    hcard:n [ hcard:given-name "Trekkie" ];
    hcard:adr _:a
  ]) .

_:a
hcard:street-address
"Avenue
Q"
.

Acknowledgements

Thanks to Richard Cyganiak The following snippet of HTML has microdata for property URI a playlist, and illustrates overriding a property to place elements in an RDF Collection:

<div itemscope="" itemtype="http://schema.org/MusicPlaylist">

  <span itemprop="name">Classic Rock Playlist</span>
  <meta itemprop="numTracks" content="2"/>
  <p>Including works by
    <span itemprop="byArtist">Lynard Skynard</span> and
    <span itemprop="byArtist">AC/DC</span></p>.

  <div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording">
    1.<span itemprop="name">Sweet Home Alabama</span> -
    <span itemprop="byArtist">Lynard Skynard</span>
    <link href="sweet-home-alabama" itemprop="url" />
   </div>

  <div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording">
    2.<span itemprop="name">Shook you all Night Long</span> -
    <span itemprop="byArtist">AC/DC</span>
    <link href="shook-you-all-night-long" itemprop="url" />
  </div>
</div>

Assuming that registry contains a an entry for http://schema.org/ with propertyURI set to vocabulary terminology , multipleValues set to unordered with the properties track and byArtist having multipleValues set to list , it generates these triples expressed in Turtle:

@prefix md: <http://www.w3.org/ns/md#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .

<> md:item ([ a schema:MusicPlaylist;
  schema:name "Classic Rock Playlist";
  schema:byArtist ("Lynard Skynard" "AC/DC");
  schema:numTracks "2";
  schema:tracks (
    [ a schema:MusicRecording;
      schema:byArtist ("Lynard Skynard");;
      schema:name "Sweet Home Alabama";
      schema:url <sweet-home-alabama>]
    [ a schema:MusicRecording;
      schema:byArtist ("AC/DC");;
      schema:name "Shook you all Night Long";
      schema:url <shook-you-all-night-long>]
)]);
.

B. Example registry

The following is an example registry in JSON format.

{

  "http://schema.org/": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered",
    "properties": {
      "blogPosts": {"multipleValues": "list"},
      "breadcrumb": {"multipleValues": "list"},
      "byArtist": {"multipleValues": "list"},
      "creator": {"multipleValues": "list"},
      "episodes": {"multipleValues": "list"},
      "events": {"multipleValues": "list"},
      "founders": {"multipleValues": "list"},
      "itemListElement": {"multipleValues": "list"},
      "musicGroupMember": {"multipleValues": "list"},
      "performerIn": {"multipleValues": "list"},
      "performers": {"multipleValues": "list"},
      "producer": {"multipleValues": "list"},
      "recipeInstructions": {"multipleValues": "list"},
      "seasons": {"multipleValues": "list"},
      "subEvents": {"multipleValues": "list"},
      "tracks": {"multipleValues": "list"}
    }
  },
  "http://microformats.org/profile/hcard": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered"
  },
  "http://microformats.org/profile/hcalendar#": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered",
    "properties": {
      "categories": {"multipleValues": "list"}
    }
  },
  "http://n.whatwg.org/work": {
    "propertyURI":    "contextual",
    "multipleValues": "list"
  }
}

C. References

C.1 Normative references

[HTML5]
Ian Hickson; David Hyatt. HTML5. 25 May 2011. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/html5
[MICRODATA]
HTML Microdata Ian Hickson Editor. World Wide Web Consortium (work in progress). 25 May 2010. This edition of the general excellent consideration HTML Microdata specification is http://www.w3.org/TR/2011/WD-microdata-20110525/. The latest edition of practical problems in generating HTML Microdata is available at http://www.w3.org/TR/microdata/
[RDF-SCHEMA]
Dan Brickley; Ramanathan V. Guha. RDF from Microdata. Vocabulary Description Language 1.0: RDF Schema. 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-schema-20040210
[RFC3986]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet RFC 3986. URL: http://www.ietf.org/rfc/rfc3986.txt

C.2 Informative references

[RDF-CONCEPTS]
Graham Klyne; Jeremy J. Carroll. Resource Description Framework (RDF): Concepts and Abstract Syntax. 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210
[RDF-SYNTAX-GRAMMAR]
Dave Beckett. RDF/XML Syntax Specification (Revised). 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210