Algorithm
Transformation of Microdata to RDF makes use of general processing rules described in [[!MICRODATA]]
for the treatment of items.
Algorithm Terms
- item
-
An item is defined as an element containing an itemscope attribute. (See Section 3.2 Items of
[[!MICRODATA]]).
- top-level item
-
An item which does not contain an itemprop attribute.
Available through the Microdata DOM API as
document.getItems
.
(See Section 3.5
Associating names with items of [[!MICRODATA]]).
- absolute URI
-
As defined in [[!RFC3986]], an absolute URI contains both scheme and scheme-specific-parts.
- document base
-
The base address of the document being processed, as defined in Section 2.6.3 Resolving URLs of
[[!HTML5]].
- global identifier
-
The value of an item's itemid attribute, if it has one. (See Section 3.2 Items of
[[!MICRODATA]]).
- URI reference
-
URI references are suitable to be used as subject predicate or object positions within an RDF triple,
as opposed to a Literal value that may contain a string representation of a URI. (See
[[RDF-CONCEPTS]]).
- Blank Node
-
A blank node is a node in a graph that is neither a URI reference nor a Literal.
Items without a global identifier have a blank node allocated to them.
(See [[RDF-CONCEPTS]]).
- Literal
-
Literals a values such as strings and dates, including typed literals and
plain literals.
(See [[RDF-CONCEPTS]]).
- evaluation context
-
A data structure including the following elements:
- memory
-
a mapping of items to subjects, initially empty
- current type
-
an absolute URI for the current type, used when an item does not contain
an explicit itemtype
- item properties
-
The mechanism for finding the properties of an item are described in
Section 3.5
Associating names with items of [[!MICRODATA]].
Available through the Microdata DOM API as
element.properties
.
- property names
-
The tokens of an element's itemprop attribute.
(See Section 3.3 Names: the
itemprop attribute of [[!MICRODATA]]).
- property value
-
The property value of a name-value pair added by an element with an itemprop
attribute depends on the element.
Available through the Microdata DOM API as
element.itemValue
.
(Updated from Section
3.4 Values of [[!MICRODATA]]).
If we reference element.itemValue
we should file issues against the Microdata spec
to ensure that values returned are consisted with this spec.
- If the element also has an itemscope attribute
-
The value is the item created by the element as a URI reference or
blank node
- If the element is a
meta
element
-
The value is the plain literal created from the value of the element's content
attribute, if any, or the empty string if there is no such attribute.
-
If the element is an
audio
, embed
, iframe
, img
,
source
, track
, or video
element with a src attribute
-
The value is a URI reference that results from resolving the value of the element's
src attribute relative to the element at the time the attribute is set.
-
If the element is an
a
, area
, or link
element with an
href attribute
-
The value is a URI reference that results from resolving the value of the element's
href attribute relative to the element at the time the attribute is set.
- If the element is an
object
element with a data attribute
-
The value is URI reference that results from resolving the value of the element's
data attribute relative to the element at the time the attribute is set.
- If the element is a
time
element with a datetime attribute
-
-
If the value has the lexical form of xsd:date [[!RDF-SCHEMA]]
-
The value is a typed literal composed of the value and
http://www.w3.org/2001/XMLSchema#date
-
If the value has the lexical form of xsd:time [[!RDF-SCHEMA]]
-
The value is a typed literal composed of the value and
http://www.w3.org/2001/XMLSchema#time
-
If the value has the lexical form of xsd:dateTime [[!RDF-SCHEMA]]
-
The value is a typed literal composed of the value and
http://www.w3.org/2001/XMLSchema#dateTime
- Otherwise
- The value is a plain literal created from the value.
-
If the element is an
blockquote
or q
element with n
cite attribute
-
The value is URI reference that results from resolving the value of the element's
cite attribute relative to the element at the time the attribute is set
Was formerly document-level, now part of item value processing.
- Otherwise
-
The value is a plain literal, with the language information set from the language of the
element, if it is not unknown.
RDF Conversion Algorithm
A HTML document containing Microdata MAY be converted to any other RDF-compatible document
format using the algorithm specified in this section.
The algorithm below is designed for DOM-based implementations with CSS selector access to elements.
A conforming Microdata processor implementing RDF conversion MUST implement a
processing algorithm that results in the equivalent triples that the following
algorithm generates:
Set item list to an empty list.
For each element that is also a top-level item run the following algorithm:
-
Generate the triples for an item item, using the
evaluation context.
Let result be the (URI reference or blank node) subject returned.
-
Append result to item list.
-
If item list contains multiple values, generate an RDF Collection list from the ordered list of values.
Set value to the value returned from generate an RDF
Collection.
-
Otherwise, if item list contains a single value set value to that value.
-
Generate the following triple:
- subject
- Document base
- predicate
http://www.w3.org/1999/xhtml/microdata#item
- object
- value
Generate the triples
When the user agent is to Generate triples for an item item, given an
Evaluation Context, it must run the following steps:
This algorithm has undergone substantial change from the original Microdata specification [[!MICRODATA]].
-
If there is an entry for item in memory, then let subject be the subject of
that entry. Otherwise, if item has a global identifier and that
global identifier is an absolute URI, let subject be that
global identifier. Otherwise, let subject be a new blank node.
- Add a mapping from item to subject in memory
-
If the item has an itemtype attribute, extract the value as type.
- If type is an absolute URI, generate the following triple:
- subject
- subject
- predicate
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
- object
- type (as a URI reference)
-
If type is not an absolute URI, set it to current type from the
Evaluation Context if not empty.
-
Set property list to an empty mapping between properties and one or more ordered
values as established below.
-
For each element element that has one or more property names and is one of the
properties of the item item, in the order those elements
are given by the algorithm that returns the properties of the item,
run the following substep:
-
For each name in the element's property names, run the following substeps:
-
If name is an absolute URI, set predicate to name
as a URI reference.
-
Otherwise, if type is not defined, set
predicate to a URI reference that results from resolving name
relative to the element at the time the attribute is set.
-
Otherwise, construct predicate from type by removing everything
following the last SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#") in type and append
name.
-
Let value be the property value of element.
-
If value is an item, then generate the
triples for value using a copy of evaluation context with
current type set to type. Replace value by the subject returned from those steps.
-
Add value to property list for predicate.
-
For each predicate in property list:
-
If entry for predicate in property list contains multiple values, generate an RDF Collection list from the ordered list of values.
Set value to the value returned from generate an RDF
Collection.
-
Otherwise, if predicate in property list contains a single value set
value to that value.
-
Generate the following triple:
- subject
- subject
- predicate
- predicate
- object
- value
- Return subject
Generate an RDF Collection
An RDF Collection is a mechanism for defining ordered sequences of objects in RDF (See Section 5.2 RDF Collections in
[[!RDF-SCHEMA]]). As the RDF data-model is that of an unordered graph, a linking method using properties
rdf:first
and rdf:next
is required to be able to specify a particular order.
In the Microdata to RDF mapping, RDF Collections are used when an item has more than one value
associated with a given property to ensure that the original document order is maintained. The following
procedure should be used to generate triples when an item property has more than one value
(contained in list):
-
Create a new array array containing a blank node for every value in list.
-
For each pair of bnode and value from array and list the following
triple is generated:
- subject
- bnode
- predicate
http://www.w3.org/1999/02/22-rdf-syntax-ns#first
- object
- value
-
For each bnode in array the following triple is generated:
- subject
- bnode
- predicate
http://www.w3.org/1999/02/22-rdf-syntax-ns#rest
- object
-
next element in array or, if that does not exist,
http://www.w3.org/1999/02/22-rdf-syntax-ns#nil
-
Return the first blank node from array.
Markup Examples
The Microdata example below expresses book information as an FRBR Work item.
This is equivalent to the following Turtle:
The following snippet of HTML has microdata for two people with the same address:
It generates these triples expressed in Turtle: