Copyright © 2011-2011 W3C ® ( MIT , ERCIM , Keio ), All Rights Reserved. W3C liability , trademark and document use rules apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is an experimental work in progress.
This document was published by the HTML Data Task Force as an Editor's Draft. If you wish to make comments regarding this document, please send them to public-html-data-tf@w3.org ( subscribe , archives ). All feedback is welcome.
Publication as a Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This
document
describes
a
means
of
transforming
HTML
containing
Microdata
into
RDF.
HTML
Microdata
[[!MICRODATA]]
[
MICRODATA
]
is
an
extension
to
HTML
used
to
embed
machine-readable
data
to
HTML
documents.
This
specification
describes
transformation
directly
to
RDF
[[RDF-CONCEPTS]].
[
RDF-CONCEPTS
].
Microdata is a way of expressing metadata in HTML documents using attributes. A previous version of Microdata [ MICRODATA ] included rules for generating RDF, but current Editor's Drafts have removed the explicit transformation procedure. Microdata is now used as an API to access data from within an HTML DOM and as a JSON serialization.
The original RDF transformation process created URIs for properties that are expressed as non-absolute URIs. The algorithm was designed to create URIs which were distinct based on the relationship between @itemtype and @itemprop contexts. This is required, as the Microdata data model requires that properties maintain distinct semantic meanings in different contexts. However, this form of URI generation is typically different than that used within RDF vocabularies, where properties typically have a common meaning within a given vocabulary .
Microdata
also
specifies
that
items
are
values
are
ordered,
which
is
not
typically
the
case
for
RDF
vocabularies.
In
fact,
unless
a
property
has
an
rdfs:range
of
rdf:List
,
or
is
unspecified,
it
may
not
be
appropriate
to
generate
an
RDF
Collection
.
The Microdata JSON serialization does not retain datatype or language information that might be derived from the HTML DOM. The RDF Transformation does retain both datatype and language information when it is available.
This specification is an update to the original RDF transformation process in addition to vocabulary -specific rules that affect the generation of property URIs and value serializations. This is facilitated by a registry that associates URIs with specific rules based on matching @itemtype values against registered URI prefixes do determine a vocabulary and vocabulary-specific processing rules.
During the period of the task force, a number of use cases were put forth for the use of Microdata in generating RDF:
rdfs:range
declarations
at
parse
time
so
properly
typed
literals
could
be
constructed.
It
also
requires
that
plain
literals
retain
language
information
in
scope
on
the
HTML
element,
as
it
is
common
that
multiple
values
will
be
used
to
specify
the
same
information
in
different
languages.
Collection.
http://schema.org/musicGroupMember
,
and
an
author
might
express
more
detail
through
an
ad-hoc
sub-property
musicGroupMember/leadVocalist
,
having
the
URI
http://schema.org/musicGroupMember/leadVocalist
.Decisions or open issues in the specification are tracked on the Task Force Issue Tracker . These include the following:
The purpose of this specification is to provide input to a future working group that can make decisions about the need for a registry and the details of processing. Among the options investigated by the Task Force are the following:
http://www.w3.org/1999/xhtml/microdata#item
mapping
at
all.
rdf:Seq
,
or
place
all
values,
whether
or
not
multiple,
into
some
form
of
collection.
The
Microdata
specification
[[!MICRODATA]]
[
MICRODATA
]
defines
a
number
of
attributes
and
the
way
in
which
those
attributes
are
to
be
interpreted.
This
section
describes
those
attributes,
with
reference
to
their
original
definition.
meta
element
for
creating
invisible
properties.
object
element
for
creating
URI
URI
reference
s.
date
element
for
creating
typed
literals.
date
element
will
likely
be
replaced
with
something
more
general
purpose.
a
,
area
or
link
elements
for
creating
URI
reference
s.
element.itemId
.
(See
Section
3.2
Items
in
element.itemProp
.
(See
Section
3.3
Names:
the
itemprop
attribute
of
element.itemRef
.
(See
Section
3.2
Items
of
element.itemType
.
(See
Section
3.2
Items
of
audio
,
embed
,
iframe
,
img
,
source
,
track
,
or
video
elements
for
creating
invisible
properties.
Transformation
In
a
perfect
world,
all
processors
would
be
able
to
generate
the
same
output
for
a
given
input
without
regards
to
the
requirements
of
a
particular
vocabulary
.
However,
Microdata
to
RDF
makes
use
doesn't
provide
sufficient
syntactic
help
in
making
these
decisions.
Different
vocabularies
have
different
needs.
The
registry
associates
a
URI
prefix
with
one
or
more
key-value
pairs
denoting
processor
behavior.
A
hypothetical
JSON
representation
of
general
processing
such
a
registry
might
be
the
following:
{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered" }, "http://microformats.org/profile/hcard": { "propertyURI": "type", "multipleValues": "list" } }
This
structure
associates
mappings
for
two
URIs,
http://schema.org/
and
http://microformats.org/profile/hcard
.
Items
having
an
@itemtype
with
a
URI
prefix
from
this
registry
use
the
the
rules
described
in
[[!MICRODATA]]
for
that
prefix
within
the
treatment
scope
of
item
s.
Algorithm
Terms
that
@itemtype
.
This
mapping
currently
defines
two
rules:
item
propertyURI
and
An
item
multipleValues
with
values
to
indicate
specific
behavior.
The
interpretation
of
these
rules
is
defined
as
an
element
containing
in
the
following
sections.
If
an
itemscope
attribute.
(See
item
has
no
current
type
or
the
registry
contains
no
URI
prefix
matching
current
type
,
a
conforming
processor
must
use
the
default
values
defined
for
these
rules.
Richard
Ciganiak
has
Section
3.2
Items
pointed
out
that
"Registry"
may
be
the
wrong
term,
as
the
proposed
registry
doesn't
assign
identifiers
or
manage
namespace,
it
simply
provides
a
mapping
between
URI
prefixes
and
processor
behavior
and
suggests
the
term
"Whitelist".
As
more
than
two
values
are
required,
and
it
describes
more
than
binary
behavior,
this
term
isn't
appropriate
either.
Anytime
we
discuss
maintaining
such
a
database,
there
are
issues
surrounding
update
frequency,
URL
naming,
and
how
updates
are
authorized.
This
remains
an
open
issue.
This
spec
just
considers
the
semantic
content
of
[[!MICRODATA]]).
top-level
item
such
a
list
and
how
it
can
be
used
to
affect
processing
without
defining
its
representation
or
update
policies.
The URL of the registry must be defined.
For
property
names
which
does
are
not
contain
an
itemprop
attribute.
Available
through
absolute
URI
s,
the
Microdata
DOM
API
propertyURI
rule
defines
the
algorithm
for
generating
an
absolute
URI
given
an
evaluation
context
including
an
current
type
and
current
property
.
The procedure for generating property URIs is defined in Generate Predicate URI .
Possible
values
for
as
document.getItems
.
(See
propertyURI
Section
3.5
Associating
are
the
following:
context
context
URI
generation
scheme
guarantees
that
generated
property
URIs
are
unique
for
each
current
type
and
current
property
combination.
This
is
required
as
the
Microdata
model
requires
that
property
names
are
associated
with
specific
items
type
type
URI
vocabulary
vocabulary
URI
generation
scheme
appends
property
names
that
are
not
absolute
URI
s
to
the
URI
prefix
.
The
default
value
of
propertyURI
is
context
.
For
items
having
multiple
values
for
a
property,
the
document
being
processed,
multipleValues
rule
defines
the
algorithm
for
serializing
these
values.
This
is
required
as
the
Microdata
data
model
requires
that
values
be
strictly
ordered
as
defined
in
Section
2.6.3
Resolving
URLs
Microdata
DOM
API
of
[[!HTML5]]
.
as
element.itemValue
.
However,
many
RDF
vocabularies
expect
multiple
values
to
be
generated
as
triples
sharing
a
common
subject
and
predicate.
Possible
values
for
multipleValues
are
the
following:
unordered
list
The
default
value
of
an
item
's
itemid
attribute,
if
it
has
one.
(See
multipleValues
is
list
.
Transformation
of
Microdata
to
RDF
makes
use
of
general
processing
rules
described
in
[
Section
3.2
Items
MICRODATA
]
for
the
treatment
of
[[!MICRODATA]]).
item
s.
element.properties
.
element.itemValue
.
(Updated
from
Section
3.4
Values
of
meta
element
audio
,
embed
,
iframe
,
img
,
source
,
track
,
or
video
element
with
a
a
,
area
,
or
link
element
with
an
object
element
with
a
time
element
with
a
time
element
will
likely
be
replaced
with
something
more
general
purpose.
http://www.w3.org/2001/XMLSchema#date
.
http://www.w3.org/2001/XMLSchema#time
.
http://www.w3.org/2001/XMLSchema#dateTime
.
document.getItems
.
(See
Section
3.5
Associating
names
with
items
of
[
MICRODATA
]).
A
HTML
document
containing
Microdata
MAY
may
be
converted
to
any
other
RDF-compatible
document
format
using
the
algorithm
specified
in
this
section.
The algorithm below is designed for DOM-based implementations with CSS selector access to elements.
A
conforming
Microdata
processor
implementing
RDF
conversion
MUST
must
implement
a
processing
algorithm
that
results
in
the
equivalent
triples
that
the
following
algorithm
generates:
Set item list to an empty list.
http://www.w3.org/1999/xhtml/microdata#item
When
the
user
agent
is
to
Generate
triples
for
an
item
item
,
given
an
Evaluation
Context
,
,
it
must
run
the
following
steps:
This
algorithm
has
undergone
substantial
change
from
the
original
Microdata
specification
[[!MICRODATA]].
[
MICRODATA
].
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Predicate URI generation makes use of current type , current property and current vocabulary from an evaluation context context along with name .
contextual
.
vocabulary
return
the
URI
reference
constructed
by
appending
the
fragment
escaped
value
of
name
to
current
vocabulary
.
type
,
return
the
URI
reference
constructed
as
follows:
http://www.w3.org/1999/xhtml/microdata#
and
the
fragment-escaped
value
of
s
as
a
URI
reference
.Property value serialization makes use of current vocabulary from an evaluation context context along with subject and values .
multipleValues
entry
that
is
not
null,
set
that
as
method
.
Otherwise,
set
method
to
list
.
unordered
,
foreach
value
in
list
:
An
RDF
Collection
is
a
mechanism
for
defining
ordered
sequences
of
objects
in
RDF
(See
Section
5.2
RDF
Collections
in
[[!RDF-SCHEMA]]).
[
RDF-SCHEMA
]).
As
the
RDF
data-model
is
that
of
an
unordered
graph,
a
linking
method
using
properties
rdf:first
and
rdf:next
is
required
to
be
able
to
specify
a
particular
order.
In the Microdata to RDF mapping, RDF Collection s are used when an item has more than one value associated with a given property to ensure that the original document order is maintained. The following procedure should be used to generate triples when an item property has more than one value (contained in list ):
http://www.w3.org/1999/02/22-rdf-syntax-ns#first
http://www.w3.org/1999/02/22-rdf-syntax-ns#rest
http://www.w3.org/1999/02/22-rdf-syntax-ns#nil
The Microdata example below expresses book information as an FRBR Work item.
<dl itemscope itemtype="http://purl.org/vocab/frbr/core#Work" itemid="http://purl.oreilly.com/works/45U8QJGZSQKDH8N" lang="en"> <dt>Title</dt> <dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd> <dt>By</dt> <dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd> <dt>Format</dt> <dd itemprop="realization" itemscope itemtype="http://purl.org/vocab/frbr/core#Expression" itemid="http://purl.oreilly.com/products/9780596007683.BOOK"> <link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/BOOK"> Print </dd> <dd itemprop="realization" itemscope itemtype="http://purl.org/vocab/frbr/core#Expression" itemid="http://purl.oreilly.com/products/9780596802189.EBOOK"> <link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/EBOOK"> Ebook </dd> </dl>
This
Assuming
that
registry
contains
a
an
entry
for
http://purl.org/vocab/frbr/core#
with
propertyURI
set
to
vocabulary
,
this
is
equivalent
to
the
following
Turtle:
@base <http://books.example.com/> . @prefix dc: <http://purl.org/dc/terms/> . @prefix md: <http://www.w3.org/1999/xhtml/microdata#> . @prefix frbr: <http://purl.org/vocab/frbr/core#> . <> md:item <works/45U8QJGZSQKDH8N> . <works/45U8QJGZSQKDH8N> a frbr:Work ; dc:creator "Wil Wheaton"@en ; dc:title "Just a Geek"@en ; frbr:realization ( <products/9780596007683.BOOK> <products/9780596802189.EBOOK> ) . <products/9780596007683.BOOK> a frbr:Expression ; dc:type <product-types/BOOK> . <products/9780596802189.EBOOK> a frbr:Expression ; dc:type <product-types/EBOOK> .
The following snippet of HTML has microdata for two people with the same address:
<p> Both <span itemscope itemtype="http://microformats.org/profile/hcard#" itemref="home"> <span itemprop="fn" ><span itemprop="n" itemscope ><span itemprop="given-name">Princeton</span></span></span> </span> and <span itemscope itemtype="http://microformats.org/profile/hcard#" itemref="home"> <span itemprop="fn" ><span itemprop="n" itemscope ><span itemprop="given-name">Trekkie</span></span></span> </span> live at <span id="home" itemprop="adr" itemscope> <span itemprop="street-address">Avenue Q</span>. </span> </p>
It
Assuming
that
registry
contains
a
an
entry
for
http://microformats.org/profile/hcard
with
propertyURI
set
to
type
,
it
generates
these
triples
expressed
in
Turtle:
@prefix md: <http://www.w3.org/1999/xhtml/microdata#> . @prefix hcard: <http://microformats.org/profile/hcard#> . <> md:item [ a hcard:; hcard:fn "Princeton"; hcard:n [ hcard:given-name "Princeton" ]; hcard:adr _:a ], [ a hcard:; hcard:fn "Trekkie"; hcard:n [ hcard:given-name "Trekkie" ]; hcard:adr _:a ] . _:a hcard:street-address "Avenue Q" .
This section is non-normative.
Thanks to Richard Cyganiak for property URI and vocabulary terminology and the general excellent consideration of practical problems in generating RDF from Microdata.