This
document
is
also
available
in
this
non-normative
format:
diff
to
previous
version
.
Copyright © 2011-2012 W3C ® ( MIT , ERCIM , Keio ), All Rights Reserved. W3C liability , trademark and document use rules apply.
HTML microdata [ MICRODATA ] is an extension to HTML used to embed machine-readable data into HTML documents. Whereas the microdata specification describes a means of markup, the output format is JSON. This specification describes processing rules that may be used to extract RDF [ RDF-CONCEPTS ] from an HTML document containing microdata.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is an experimental work in progress. The concepts described herein are intended to provide guidance for a possible future Working Group chartered to provide a Recommendation for this transformation. As a consequence, implementers of this specification, either producers or consumers, should note that it may change prior to any possible publication as a Recommendation.
This Working Draft is an update of the W3C Interest Group Note , published in March 2012. This update adds the Vocabulary Expansion feature to the conversion algorithm, in response to the evolution of vocabularies discussed on the Web Schemas Task Force of the Semantic Web Interest Group at W3C . The intention is to publish this draft as a new version of the Interest Group Note after gathering and incorporating community input.
This
document
was
published
by
the
HTML
Data
Task
Force,
Semantic
Web
Interest
Group
as
an
Interest
Group
Note.
a
Working
Draft.
If
you
wish
to
make
comments
regarding
this
document,
please
send
them
to
public-html-data-tf@w3.org
semantic-web@w3.org
(
subscribe
,
archives
).
All
feedback
is
welcome.
Publication
as
an
Interest
Group
Note
a
Working
Draft
does
not
imply
endorsement
by
the
W3C
Membership.
This
is
a
draft
document
and
may
be
updated,
replaced
or
obsoleted
by
other
documents
at
any
time.
It
is
inappropriate
to
cite
this
document
as
other
than
work
in
progress.
This
document
was
produced
by
a
group
operating
under
the
5
February
2004
W3C
Patent
Policy
.
The
disclosure
obligations
group
does
not
expect
this
document
to
become
a
W3C
Recommendation.
W3C
maintains
a
public
list
of
any
patent
disclosures
made
in
connection
with
the
Participants
deliverables
of
this
group
are
described
the
group;
that
page
also
includes
instructions
for
disclosing
a
patent.
An
individual
who
has
actual
knowledge
of
a
patent
which
the
individual
believes
contains
Essential
Claim(s)
must
disclose
the
information
in
accordance
with
section
6
of
the
charter
W3C
Patent
Policy
.
This section is non-normative.
This document describes a means of transforming HTML containing microdata into RDF. HTML Microdata [ MICRODATA ] is an extension to HTML used to embed machine-readable data to HTML documents. This specification describes transformation directly to RDF [ RDF-CONCEPTS ].
There are a variety of ways in which a mapping from microdata to RDF might be configured to give a result that is closer to the required result for a particular vocabulary. This specification defines terms that can be used as hooks for vocabulary-specific behavior, which could be defined within a registry or on an implementation-defined basis. However, the HTML Data TF recommends the adoption of a single method of mapping in which every vocabulary is treated as if:
propertyURI
is
set
to
vocabulary
multipleValues
is
set
to
unordered
For background on the trade-offs between these options, see http://www.w3.org/wiki/Mapping_Microdata_to_RDF .
This section is non-normative.
Microdata [ MICRODATA ] is a way of embedding data in HTML documents using attributes. The HTML DOM is extended to provide an API for accessing microdata information, and the microdata specification defines how to generate a JSON representation from microdata markup.
Mapping microdata to RDF enables consumers to merge data expressed in other RDF-based formats with microdata. It facilitates the use of RDF vocabularies within microdata, and enables microdata to be used with the full RDF toolchain. Some use cases for this mapping are described in Section 1.2 below.
Microdata's data model does not align neatly with RDF.
http://example.org/Cat
can
have
both
the
property
color
and
the
property
http://example.org/color
,
and
these
properties
are
semantically
distinct
under
microdata.
In
RDF,
all
properties
have
IRIs.
@lang
attributes
could
be
used
to
provide
datatype
and
language
information
for
RDF
data,
this
would
be
contrary
to
the
microdata
specification.
Thus, in some places the needs of RDF consumers violate requirements of the microdata specification. This specification highlights where such violations occur and the reasons for them.
This
specification
allows
for
vocabulary
-specific
rules
that
affect
the
generation
of
property
URIs
and
value
serializations.
This
is
facilitated
by
a
registry
that
associates
URIs
with
specific
rules
based
on
matching
@itemtype
itemtype
values
against
registered
URI
prefixes
do
determine
a
vocabulary
and
potentially
vocabulary-specific
processing
rules.
This specification also assumes that consumers of RDF generated from microdata may have to process the results in order to, for example, assign appropriate datatypes to property value s.
This section is non-normative.
During the period of the task force, a number of use cases were put forth for the use of microdata in generating RDF:
rdfs:range
of
a
GoodRelations
property
indicates
the
datatype
of
the
expected
value,
and
GoodRelations
processors
will
expect
values
to
be
cast
to
that
type.
Language
information
from
the
HTML
needs
to
be
captured
as
it
is
common
that
multiple
values
will
be
used
to
specify
the
same
information
in
different
languages.
http://schema.org/musicGroupMember
,
and
an
author
might
express
more
detail
through
an
ad-hoc
sub-property
musicGroupMember/leadVocalist
,
having
the
URI
http://schema.org/musicGroupMember/leadVocalist
.
This section is non-normative.
Decisions or open issues in the specification are tracked on the Task Force Issue Tracker . These include the following:
Vocabulary specific parsing for Microdata. This specification attempts to create generic rules for processing microdata with typical RDF vocabularies. A registry allows for exceptions to the default processing rules for certain well-known vocabularies.
Should Microdata-RDF generate XMLLiteral values. This issue has been closed with no change as this would violate microdata's data model.
Should the registry allow property datatype specification. The consensus is that datatypes are only derived from HTML semantics, so that only <time> values have a datatype other than plain.
The purpose of this specification is to provide input to a future working group that can make decisions about the need for a registry and the details of processing. Among the options investigated by the Task Force are the following:
http://www.w3.org/ns/md#item
mapping
at
all.
rdf:Seq
,
or
place
all
values,
whether
or
not
multiple,
into
some
form
of
collection.
More examples and explanatory information are available in [ MICRODATA-RDF-SUPPLEMENT ], which may be updated from time to time.
The microdata specification [ MICRODATA ] defines a number of attributes and the way in which those attributes are to be interpreted. The microdata DOM API provides methods and attributes for retrieving microdata from the HTML DOM.
For reference, attributes used for specifying and retrieving HTML microdata are referenced here:
element.itemType
on
the
element.
The
item
type
is
also
used
to
resolve
non-URL
name
s
to
absolute
URL
s.
Available
through
the
Microdata
DOM
API
as
element.itemType
.
(See
Items
in
[
MICRODATA
]).
In RDF, it is common for people to shorten vocabulary terms via abbreviated URIs that use a 'prefix' and a 'reference'. throughout this document assume that the following vocabulary prefixes have been defined:
dc: | http://purl.org/dc/terms/ |
md: | http://www.w3.org/ns/md# |
rdf: | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
|
|
rdfa: | http://www.w3.org/ns/rdfa# |
xsd: | http://www.w3.org/2001/XMLSchema# |
This section is non-normative.
In a perfect world, all processors would be able to generate the same output for a given input without regards to the requirements of a particular vocabulary . However, microdata doesn't provide sufficient syntactic help in making these decisions. Different vocabularies have different needs.
The
registry
is
located
at
the
namespace
defined
for
microdata:
http://www.w3.org/ns/md
in
a
variety
of
formats.
The registry associates a URI prefix with one or more key-value pairs denoting processor behavior. A hypothetical JSON representation of such a registry might be the following:
{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "tracks": {"multipleValues": "list"} } },"http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "list", "properties" { "url": {"multipleValues": "unordered"}"http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "list", "properties" { "url": {"multipleValues": "unordered"} } } }
This
structure
associates
mappings
for
two
URIs,
http://schema.org/
and
http://microformats.org/profile/hcard
.
Items
having
an
item
type
with
a
URI
prefix
from
this
registry
use
the
the
rules
described
for
that
prefix
within
the
scope
of
that
item
type
.
This
mapping
currently
defines
two
rules:
propertyURI
and
multipleValues
with
values
to
indicate
specific
behavior.
It
also
allows
overrides
on
a
per-property
basis;
the
properties
key
associates
an
individual
name
with
overrides
for
default
behavior.
The
interpretation
of
these
rules
is
defined
in
the
following
sections.
If
an
item
has
no
current
type
or
the
registry
contains
no
URI
prefix
matching
current
type
,
a
conforming
processor
must
use
the
default
values
defined
for
these
rules.
The concept of a registry , including a hypothetical format, location and updating rules is presented as an abstract concept useful for describing the function of a microdata processor. There are issues surrounding update frequency, URL naming, and how updates are authorized. This spec just considers the semantic content of such a registry and how it can be used to affect processing without defining its representation or update policies.
This section is non-normative.
For
name
s
which
are
not
absolute
URL
s,
the
propertyURI
rule
defines
the
algorithm
for
generating
an
absolute
URL
given
an
evaluation
context
including
a
current
type
,
current
name
and
current
vocabulary
.
The procedure for generating property URIs is defined in Generate Predicate URI .
Possible
values
for
propertyURI
are
the
following:
contextual
contextual
URI
generation
scheme
guarantees
that
generated
property
URIs
are
unique
based
on
the
value
of
current
name
.
This
is
required
as
the
microdata
data
model
requires
that
name
s
are
associated
with
specific
items
and
do
not
have
a
global
scope.
(See
Step
5
in
Generate
Predicate
URI
).
URI creation uses a base URI with query parameters to indicate the in-scope type and name list. Consider the following example:
<span itemscope itemtype="http://microformats.org/profile/hcard"> <span itemprop="n" itemscope> <span itemprop="given-name"> Princeton </span> </span> </span>
The
first
name
n
generates
the
URI
http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n
.
However,
the
included
name
given-name
is
included
in
untyped
item.
The
inherited
property
URI
is
used
to
create
a
new
property
URI:
http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=n.given-name
.
This scheme is compatible with the needs of other RDF serialization formats such as RDF/XML [ RDF-SYNTAX-GRAMMAR ], which rely on QNames for expressing properties. For example, the generated property URIs can be split as follows:
<rdf:Description xmlns:hcard="http://www.w3.org/ns/md?type=http://microformats.org/profile/hcard?prop=" rdf:type="http://microformats.org/profile/hcard"> <hcard:n> <rdf:Description> <hcard:n.given-name> Princeton </hcard:n.given-name> </rdf:Description> </hcard:n> </rdf:Description>
Looking at another example:
<div itemscope itemtype="http://schema.org/Person"> <h2 itemprop="name">Jeni</h2> </div>
This
would
generate
http://www.w3.org/ns/md?type=http://schema.org/Person&prop=name
.
vocabulary
vocabulary
URI
generation
scheme
appends
name
s
that
are
not
absolute
URL
s
to
the
URI
prefix
.
When
generating
property
URIs,
if
the
URI
prefix
does
not
end
with
a
'/'
or
'#',
a
'#'
is
appended
to
the
URI
prefix
.
(See
Step
4
in
Generate
Predicate
URI
.)
URI creation uses a base URL with query parameters to indicate the in-scope type and name list. Consider the following example:
<span itemscope itemtype="http://microformats.org/profile/hcard"> <span itemprop="n" itemscope> <span itemprop="given-name"> Princeton </span> </span> </span>
Given
the
URI
prefix
http://microformats.org/profile/hcard
,
this
would
generate
http://microformats.org/profile/hcard#n
and
http://microformats.org/profile/hcard#given-name
.
Note
that
the
'#'
is
automatically
added
as
a
separator.
Looking at another example:
<div itemscope itemtype="http://schema.org/Person"> <h2 itemprop="name">Jeni</h2> </div>
Given
the
URI
prefix
http://schema.org/
,
this
would
generate
http://schema.org/name
.
Note
that
if
the
@itemtype
itemtype
were
http://schema.org/Person/Teacher
,
this
would
generate
the
same
property
URI.
If
the
registry
contains
no
match
for
current
type
implementations
act
as
if
there
is
a
URI
prefix
made
from
the
first
@itemtype
itemtype
value
by
stripping
either
the
fragment
content
or
last
path
segment
,
if
the
value
has
no
fragment
(See
[
RFC3986
]).
Deconstructing
the
@itemtype
itemtype
URL
to
create
or
identify
a
vocabulary
URI
is
a
violation
of
the
microdata
specification
which
is
necessary
to
support
the
use
of
existing
vocabularies
designed
for
use
with
RDF,
and
shared
or
inherited
properties
within
all
vocabularies.
The
default
value
of
propertyURI
is
vocabulary
.
<div itemscope itemtype="http://schema.org/Book"> <h2 itemprop="title">Just a Geek</h2> </div>
In
this
example,
assuming
no
matching
entry
in
the
registry
,
the
URI
prefix
is
constructed
by
removing
the
last
path
segment
,
leaving
the
URI
http://schema.org/
.
As
there
is
no
explicit
propertyURI
,
the
default
vocabulary
is
used,
and
the
resulting
property
URI
would
be
http://schema.org/title
.
This section is non-normative.
For
items
having
multiple
values
for
a
given
property
,
the
multipleValues
rule
defines
the
algorithm
for
serializing
these
values.
Microdata
uses
document
order
when
generating
property
value
s,
as
defined
in
Microdata
DOM
API
as
element.itemValue
.
However,
many
RDF
vocabularies
expect
multiple
values
to
be
generated
as
triples
sharing
a
common
subject
and
predicate.
In
some
cases,
it
may
be
useful
to
retain
value
ordering.
The procedure for generating property value s is defined in Generate Property Values .
Possible
values
for
multipleValues
are
the
following:
unordered
list
An example of how this might be specified in a registry is the following:
{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered" },"http://microformats.org/profile/hcard": { "propertyURI": "vocabulary","http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": {"multipleValues": "list"} } }
Additionally,
some
vocabularies
may
wish
to
specify
this
on
a
per-property
basis.
For
example,
within
http://schema.org/MusicPlaylist
the
tracks
property
might
depend
on
the
order
of
values
to
to
reproduce
associated
MusicRecording
values.
{
"http://schema.org/": {
"propertyURI": "vocabulary",
"multipleValues": "unordered",
"properties": {
"tracks": {"multipleValues": "list"}
}
}
}
The
properties
key
takes
a
JSON
Object
as
a
value,
which
in
turn
has
keys
for
each
property
that
is
to
be
given
alternate
semantics.
Each
name
is
implicitly
expanded
to
it's
URI
representation
as
defined
in
Generate
Predicate
URI
,
so
that
the
behavior
is
the
same
whether
or
not
the
name
is
listed
as
an
absolute
URL
.
The
default
value
of
multipleValues
is
unordered
.
An alternative mechanism would output both unordered and ordered values, to allow an application to choose the most useful representation. For example, consider the following:
<div itemscope itemtype="http://schema.org/MusicPlaylist"> <span itemprop="name">Classic Rock Playlist</span> <meta itemprop="numTracks" content="2"/> <p>Including works by<span itemprop="byArtist">Lynard Skynard</span> and <span itemprop="byArtist">AC/DC</span></p>.<span itemprop="byArtist">Lynard Skynard</span> and <span itemprop="byArtist">AC/DC</span></p>.<div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording"> 1.<span itemprop="name">Sweet Home Alabama</span> - <span itemprop="byArtist">Lynard Skynard</span> <link href="sweet-home-alabama" itemprop="url" /><div itemprop="tracks" itemscope itemtype="http://schema.org/MusicRecording"> 1.<span itemprop="name">Sweet Home Alabama</span> - <span itemprop="byArtist">Lynard Skynard</span> <link href="sweet-home-alabama" itemprop="url" /> </div><div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording"> 2.<span itemprop="name">Shook you all Night Long</span> - <span itemprop="byArtist">AC/DC</span> <link href="shook-you-all-night-long" itemprop="url" /><div itemprop="tracks" itemscope itemtype="http://schema.org/MusicRecording"> 2.<span itemprop="name">Shook you all Night Long</span> - <span itemprop="byArtist">AC/DC</span> <link href="shook-you-all-night-long" itemprop="url" /> </div> </div>
This might generate the following Turtle:
@prefix md: <http://www.w3.org/ns/md#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfa: <http://www.w3.org/ns/rdfa#> . @prefix schema: <http://schema.org/> .<> md:item [ a schema:MusicPlaylist; schema:name "Classic Rock Playlist"; schema:byArtist ("Lynard Skynard" "AC/DC"); schema:numTracks "2"; schema:tracks _:track1, _:track2, (_:track1 _:track2) ] .<> md:item [ a schema:MusicPlaylist; schema:name "Classic Rock Playlist"; schema:byArtist ("Lynard Skynard" "AC/DC"); schema:numTracks "2"; schema:tracks _:track1, _:track2, (_:track1 _:track2) ]; rdfa:usesVocabulary schema: . _:track1 a schema:MusicRecording;schema:byArtist ("Lynard Skynard"); schema:name "Sweet Home Alabama";schema:byArtist ("Lynard Skynard"); schema:name "Sweet Home Alabama"; schema:url <sweet-home-alabama> . _:track2 a schema:MusicRecording;schema:byArtist ("AC/DC"); schema:name "Shook you all Night Long";schema:byArtist ("AC/DC"); schema:name "Shook you all Night Long"; schema:url <shook-you-all-night-long> .
By
providing
both
_:track1
and
_:track2
as
object
values
of
the
playlist
along
with
an
RDF
Collection
containing
the
ordered
values,
the
data
may
be
queried
via
a
simple
query
using
the
playlist
subject,
or
as
an
ordered
collection.
This section is non-normative.
In microdata, all values are strings. In RDF, values may be resources or may be typed with an appropriate datatype.
In some cases, the type of a microdata value can be determined from the element on which it is specified. In particular:
time
element
provides
dates
and
times
Using information about the content of the document where the microdata is marked up might be a violation of the spirit of the microdata specification, though it does not explicitly say in normative text that consumers cannot use other information from the HTML DOM to interpret microdata.
Additionally,
one
possible
use
of
a
registry
would
allow
vocabularies
to
be
marked
with
datatype
information,
so
that
a
dc:time
value,
for
example,
would
be
understood
to
represent
a
literal
with
datatype
xsd:date
.
This
could
be
done
by
adding
information
for
each
property
in
the
vocabulary
requiring
special
treatment.
This might be represented using a syntax such as the following:
{
"http://schema.org/": {
"propertyURI": "vocabulary",
"multipleValues": "unordered",
"properties": {
"dateCreated": {"datatype": "http://www.w3.org/2001/XMLSchema#date"}
}
}
}
The
datatype
identifies
a
URI
to
be
used
in
constructing
a
typed
literal
.
In most cases, the relevant datatype for a value can be derived from knowledge of what property the value is for and the syntax of the value itself. Thus, values can be given datatypes in a post-processing step after the mapping of microdata to RDF described by this specification. However, where there is information in the HTML markup, such as knowledge of what element was used to mark up the value, which can help with determining its datatype, that information is used by this specification.
This concept is not explored further at this time, but could be developed further in a future revision of this document.
If
property
URI
generation
was
fixed
to
vocabulary
,
multiple
values
always
generated
both
unordered
and
ordered
representations,
and
there
were
datatype
support,
the
registry
could
be
reduced
to
a
simple
list
of
URLs
without
any
further
structure
necessary.
Microdata requires that all values of itemtype come from the same vocabulary. This is required as itemprop values are resolved relative to that vocabulary. However, it is often useful to define an item to have types from multiple different vocabularies.
Vocabulary
expansion
uses
simple
rules
to
generate
additional
triples
based
on
rules
and
property
relationships
described
in
the
registry
.
Within
the
registry
,
a
property
definition
may
have
either
equivalentProperty
or
subPropertyOf
keys
having
a
IRI
value
(or
array
of
IRI
values)
of
the
associated
property.
Such
an
entry
causes
the
processor
to
generate
triples
associating
the
source
property
IRI
with
the
target
property
IRI
using
either
http://www.w3.org/2000/01/rdf-schema#subPropertyOf
or
http://www.w3.org/2002/07/owl#equivalentProperty
predicates.
For
example,
the
registry
definition
for
the
additionalType
property
within
schema.org,
defines
additionalType
to
have
an
rdfs:subPropertyOf
relationship
with
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
.
{
"http://schema.org/": {
"properties": {
"additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}
}
}
<div itemscope itemtype="http://schema.org/Product">
<link itemprop="additionalType" href="http://www.productontology.org/id/Laser_printer" />
<p itemprop="name">Laser Printer</a>
</div>
The
previous
example,
indicates
a
registry
rule,
which
causes
the
processor
to
emit
an
extra
triple
when
first
seeing
the
additionalProperty
itemprop
:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfa: <http://www.w3.org/ns/rdfa#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix schema: <http://schema.org/> . <> md:item ( [ a schema:Product; schema:additionalType <http://www.productontology.org/id/Laser_printer> ; schema:name "Laser Printer"] ); rdfa:usesVocabulary schema: . schema:additionalProperty rdfs:subPropertyOf rdf:type .
After
performing
vocabulary
expansion,
an
additional
rdf:type
triple
is
generated:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix schema: <http://schema.org/> . <> md:item ( [ a schema:Product, <http://www.productontology.org/id/Laser_printer>; schema:additionalType <http://www.productontology.org/id/Laser_printer> ; schema:name "Laser Printer"] ) rdfa:usesVocabulary schema: . schema:additionalProperty rdfs:subPropertyOf rdf:type .
Formally, and for the purpose of vocabulary processing, microdata uses a very restricted subset of the OWL2 vocabulary and is based on the RDF-Based Semantics of OWL2 [ OWL2-RDF-BASED-SEMANTICS ]. Vocabulary Entailment uses the following terms:
rdfs:subPropertyOf
owl:equivalentProperty
Vocabulary Entailment considers only the entailment on individuals (i.e., not on the relationships that can be deduced on the properties or the classes themselves.)
While
the
formal
definition
of
the
Entailment
refers
to
the
general
OWL
2
Semantics,
practical
implementations
may
rely
on
a
subset
of
the
OWL
2
RL
Profile’s
entailment
expressed
in
rules
(
section
4.3
of
[
OWL2-PROFILES
]).
The
relevant
rules
are,
using
the
rule
identifications
in
section
4.3
of
[
OWL2-PROFILES
]):
prp-spo1
,
prp-eqp1
,
and
prp-eqp2
.
[
RDFA-CORE
]
implements
a
more
complete
form
of
vocabulary
entailement,
including
retrieving
the
vocabulary
URI
to
find
additional
class
and
property
expansion
definitions,
as
described
in
RDFa
Vocabulary
Entailment
.
Microdata
implementations
may
use
RDFa
Vocabulary
Entailment
as
an
alternative
to
implementing
a
separate
entailment
algorithm.
To
allow
[
RDFA-CORE
]
processors
to
be
used
for
microdata
vocabulary
expansion,
microdata
acts
as
if
there
is
an
implicit
@vocab
RDFa
attribute
set
to
a
detected
vocabulary
by
emitting
a
triple
using
the
rdfa:usesVocabulary
predicate.
The
entailment
described
in
this
section
is
the
minimum
useful
level
for
microdata.
Processors
may,
of
course,
choose
to
follow
more
powerful
entailment
regimes,
e.g.,
include
full
RDFS
[
RDF-MT
]
or
OWL2
[
OWL2-OVERVIEW
]
entailments.
Using
those
entailments
applications
may
perform
datatype
validation
by
checking
rdfs:range
of
a
property,
or
use
the
advanced
facilities
offered
by,
e.g.,
OWL2’s
property
chains
to
interlink
vocabularies
further.
Conforming processors must perform the basic vocabulary expansion.
If
vocabulary
expansion
is
performed
by
the
microdata
processor
using
[
RDFA-CORE
]
vocabulary
expansion,
and
the
vocab_expansion
option
is
passed
to
the
microdata
processor,
the
full
[
RDFA-CORE
]
expansion
must
also
be
performed.
Transformation of Microdata to RDF makes use of general processing rules described in [ MICRODATA ] for the treatment of item s.
contextual
property
URI
generation
scheme.
Without
this
scheme,
this
evaluation
context
component
would
not
be
required.
document.getItems
method.
element.properties
attribute.
a
,
area
,
audio
,
embed
,
iframe
,
img
,
link
,
object
,
source
,
track
or
video
)
element.itemValue
.
(See
relevant
attribute
descriptions
in
[
HTML5
]).
time
element.
element.itemValue
.
http://www.w3.org/2001/XMLSchema#date
.
http://www.w3.org/2001/XMLSchema#time
.
http://www.w3.org/2001/XMLSchema#dateTime
.
http://www.w3.org/2001/XMLSchema#gYearMonth
.
http://www.w3.org/2001/XMLSchema#gYearMonth
http://www.w3.org/2001/XMLSchema#gYear
.
http://www.w3.org/2001/XMLSchema#duration
.
The referenced version of [ HTML5 ] does not include a duration data type, but it is in the Editor's Draft and is expected to be included in a forthcoming update to the Working Draft
The HTML valid yearless date string is similar to xsd:gMonthDay , but the lexical forms differ, so it is not included in this conversion.
See
The
time
element
in
[
HTML5
].
See
The
lang
and
xml:lang
attributes
in
[
HTML5
]
for
determining
the
language
of
a
node.
document.getItems
.
(See
Associating
names
with
items
in
[
MICRODATA
]).
The
HTML5/microdata
content
model
for
@href
,
@src
,
@data
,
@itemtype
itemtype
and
@itemprop
itemprop
and
@itemid
itemid
is
that
of
a
URL,
not
a
URI
or
IRI.
A
proposed
mechanism
for
specifying
the
range
of
property
value
s
to
be
URI
reference
or
IRI
could
allow
these
to
be
specified
as
subject
or
object
using
a
@content
attribute.
A HTML document containing microdata may be converted to any other RDF-compatible document format using the algorithm specified in this section.
A conforming microdata processor implementing RDF conversion must implement a processing algorithm that results in the equivalent triples to those that the following algorithm generates:
Set item list to an empty list.
http://www.w3.org/ns/md#item
When the user agent is to Generate triples for an item item , given evaluation context , it must run the following steps:
This algorithm has undergone substantial change from the original microdata specification [ MICRODATA ].
element.itemType
of
the
element
defining
the
item
.
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
element.itemType
of
the
element
defining
the
item
.
http://www.w3.org/ns/rdfa#usesVocabulary
Predicate URI generation makes use of current type , current name , and current vocabulary from an evaluation context context along with name .
http://example.org/doc
and
an
http://example.org/doc#title
.
vocabulary
.
vocabulary
contextual
,
http://www.w3.org/ns/md?type=
is
a
prefix
of
s
,
return
the
concatenation
of
s
,
a
U+002E
FULL
STOP
character
(.)
and
the
fragment-escape
d
value
of
name
.
http://www.w3.org/ns/md?type=
,
the
fragment-escape
d
value
of
current
type
,
the
string
&prop=
,
and
the
fragment-escape
d
value
of
name
.
equivalentProperty
key,
generate
the
following
triple
using
the
value
of
that
key:
http://www.w3.org/2002/07/owl#equivalentProperty
If the value is an array, generate a triple for each value of that array.
subPropertyOf
key,
generate
the
following
triple
using
the
value
of
that
key:
http://www.w3.org/2000/01/rdf-schema#subPropertyOf
If the value is an array, generate a triple for each value of that array.
Property value serialization makes use of subject , predicate and values .
properties
which
has
a
JSON
Object
value,
let
properties
be
that
value.
Otherwise,
set
properties
to
null.
multipleValues
,
set
that
as
method
.
multipleValues
,
set
that
as
method
.
unordered
.
unordered
,
for
each
value
in
values
,
generate
the
following
triple:
list
:
An
RDF
Collection
is
a
mechanism
for
defining
ordered
sequences
of
objects
in
RDF
(See
RDF
Collections
in
[
RDF-SCHEMA
]).
As
the
RDF
data-model
is
that
of
an
unordered
graph,
a
linking
method
using
properties
rdf:first
and
rdf:next
is
required
to
be
able
to
specify
a
particular
order.
In the microdata to RDF mapping, RDF Collection s are used when an item has more than one value associated with a given property to ensure that the original document order is maintained. The following procedure should be used to generate triples when an item property has more than one value (contained in list ):
http://www.w3.org/1999/02/22-rdf-syntax-ns#first
http://www.w3.org/1999/02/22-rdf-syntax-ns#rest
http://www.w3.org/1999/02/22-rdf-syntax-ns#nil
This section is non-normative.
A test suite [ MICRODATA-RDF-TESTS ] under development to help processor developers verify conformance to this specification.
This section is non-normative.
The microdata example below expresses book information as an FRBR Work item.
<dl itemscope itemtype="http://purl.org/vocab/frbr/core#Work" itemid="http://books.example.com/works/45U8QJGZSQKDH8N" lang="en"> <dt>Title</dt><dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd><dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd> <dt>By</dt><dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd><dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd> <dt>Format</dt><dd itemprop="http://purl.org/vocab/frbr/core#realization"<dd itemprop="http://purl.org/vocab/frbr/core#realization" itemscopeitemtype="http://purl.org/vocab/frbr/core#Expression" itemid="http://books.example.com/products/9780596007683.BOOK"> <link itemprop="http://purl.org/dc/terms/type" href="http://books.example.com/product-types/BOOK">itemtype="http://purl.org/vocab/frbr/core#Expression" itemid="http://books.example.com/products/9780596007683.BOOK"> <link itemprop="http://purl.org/dc/terms/type" href="http://books.example.com/product-types/BOOK"> Print </dd><dd itemprop="http://purl.org/vocab/frbr/core#realization"<dd itemprop="http://purl.org/vocab/frbr/core#realization" itemscopeitemtype="http://purl.org/vocab/frbr/core#Expression" itemid="http://books.example.com/products/9780596802189.EBOOK"> <link itemprop="http://purl.org/dc/terms/type" href="http://books.example.com/product-types/EBOOK">itemtype="http://purl.org/vocab/frbr/core#Expression" itemid="http://books.example.com/products/9780596802189.EBOOK"> <link itemprop="http://purl.org/dc/terms/type" href="http://books.example.com/product-types/EBOOK"> Ebook </dd> </dl>
Assuming
that
registry
contains
a
an
entry
for
http://purl.org/vocab/frbr/core#
with
propertyURI
set
to
vocabulary
,
this
is
equivalent
to
the
following
Turtle:
@prefix dc: <http://purl.org/dc/terms/> . @prefix md: <http://www.w3.org/ns/md#> . @prefix frbr: <http://purl.org/vocab/frbr/core#> . @prefix rdfa: <http://www.w3.org/ns/rdfa#> .<> md:item (<http://books.example.com/works/45U8QJGZSQKDH8N>) .<> md:item (<http://books.example.com/works/45U8QJGZSQKDH8N>) ; rdfa:usesVocabulary frbr: . <http://books.example.com/works/45U8QJGZSQKDH8N> a frbr:Work ;dc:creator "Wil Wheaton"@en ; dc:title "Just a Geek"@en ;dc:creator "Wil Wheaton"@en ; dc:title "Just a Geek"@en ; frbr:realization <http://books.example.com/products/9780596007683.BOOK>, <http://books.example.com/products/9780596802189.EBOOK> . <http://books.example.com/products/9780596007683.BOOK> a frbr:Expression ; dc:type <http://books.example.com/product-types/BOOK> . <http://books.example.com/products/9780596802189.EBOOK> a frbr:Expression ; dc:type <http://books.example.com/product-types/EBOOK> .
The following snippet of HTML has microdata for two people with the same address. This illustrates two item s referencing a third item, and how only a single RDF resource definition is created for that third item.
<p> Both<span itemscope="" itemtype="http://microformats.org/profile/hcard" itemref="home"> <span itemprop="fn" ><span itemprop="n" itemscope="" ><span itemprop="given-name">Princeton</span></span></span><span itemscope itemtype="http://microformats.org/profile/hcard" itemref="home"> <span itemprop="fn" ><span itemprop="n" itemscope ><span itemprop="given-name">Princeton</span></span></span> </span> and<span itemscope="" itemtype="http://microformats.org/profile/hcard" itemref="home"> <span itemprop="fn" ><span itemprop="n" itemscope="" ><span itemprop="given-name">Trekkie</span></span></span><span itemscope itemtype="http://microformats.org/profile/hcard" itemref="home"> <span itemprop="fn" ><span itemprop="n" itemscope ><span itemprop="given-name">Trekkie</span></span></span> </span> live at<span id="home" itemprop="adr" itemscope=""> <span itemprop="street-address">Avenue Q</span>.<span id="home" itemprop="adr" itemscope> <span itemprop="street-address">Avenue Q</span>. </span> </p>
Assuming
that
registry
contains
a
an
entry
for
http://microformats.org/profile/hcard
with
propertyURI
set
to
vocabulary
,
it
generates
these
triples
expressed
in
Turtle:
@prefix md: <http://www.w3.org/ns/md#> . @prefix hcard: <http://microformats.org/profile/hcard#> . @prefix rdfa: <http://www.w3.org/ns/rdfa#> . <> md:item ( [ a <http://microformats.org/profile/hcard>;hcard:fn "Princeton"; hcard:n [ hcard:given-name "Princeton" ];hcard:fn "Princeton"; hcard:n [ hcard:given-name "Princeton" ]; hcard:adr _:a ] [ a <http://microformats.org/profile/hcard>;hcard:fn "Trekkie"; hcard:n [ hcard:given-name "Trekkie" ];hcard:fn "Trekkie"; hcard:n [ hcard:given-name "Trekkie" ]; hcard:adr _:a]) .]) ; rdfa:usesVocabulary <http://microformats.org/profile/hcard> . _:a hcard:street-address"Avenue Q""Avenue Q" .
The
following
snippet
of
HTML
has
microdata
for
a
playlist,
and
illustrates
overriding
a
property
to
place
elements
in
an
RDF
Collection:
Collection.
This
also
illustrates
the
use
of
the
schema:additionalType
property
to
relate
recordings
to
the
Music
Ontology
:
<div itemscope itemtype="http://schema.org/MusicPlaylist"> <span itemprop="name">Classic Rock Playlist</span> <meta itemprop="numTracks" content="2"/> <p>Including works by<span itemprop="byArtist">Lynard Skynard</span> and <span itemprop="byArtist">AC/DC</span></p>.<span itemprop="byArtist">Lynard Skynard</span> and <span itemprop="byArtist">AC/DC</span></p>.<div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording"> 1.<span itemprop="name">Sweet Home Alabama</span> - <span itemprop="byArtist">Lynard Skynard</span> <link href="sweet-home-alabama" itemprop="url" /><div itemprop="tracks" itemscope itemtype="http://schema.org/MusicRecording"> <link itemprop="additionalType" href="http://purl.org/ontology/mo/MusicalManifestation"/> 1.<span itemprop="name">Sweet Home Alabama</span> - <span itemprop="byArtist">Lynard Skynard</span> <link href="sweet-home-alabama" itemprop="url" /> </div><div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording"> 2.<span itemprop="name">Shook you all Night Long</span> - <span itemprop="byArtist">AC/DC</span> <link href="shook-you-all-night-long" itemprop="url" /><div itemprop="tracks" itemscope itemtype="http://schema.org/MusicRecording"> <link itemprop="additionalType" href="http://purl.org/ontology/mo/MusicalManifestation"/> 2.<span itemprop="name">Shook you all Night Long</span> - <span itemprop="byArtist">AC/DC</span> <link href="shook-you-all-night-long" itemprop="url" /> </div> </div>
Assuming
that
registry
contains
a
an
entry
for
http://schema.org/
with
propertyURI
set
to
vocabulary
,
multipleValues
set
to
unordered
with
the
properties
track
and
byArtist
having
multipleValues
set
to
list
,
it
generates
these
triples
expressed
in
Turtle:
@prefix md: <http://www.w3.org/ns/md#> . @prefix mo: <http://purl.org/ontology/mo/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfa: <http://www.w3.org/ns/rdfa#> . @prefix schema: <http://schema.org/> . <> md:item ([ a schema:MusicPlaylist;schema:name "Classic Rock Playlist"; schema:byArtist ("Lynard Skynard" "AC/DC"); schema:numTracks "2";schema:name "Classic Rock Playlist"; schema:byArtist ("Lynard Skynard" "AC/DC"); schema:numTracks "2"; schema:tracks ([ a schema:MusicRecording; schema:byArtist ("Lynard Skynard");; schema:name "Sweet Home Alabama";[ a schema:MusicRecording, mo:MusicalManifestation; schema:additionalType mo:MusicalManifestation; schema:byArtist ("Lynard Skynard"); schema:name "Sweet Home Alabama"; schema:url <sweet-home-alabama>][ a schema:MusicRecording; schema:byArtist ("AC/DC");; schema:name "Shook you all Night Long";[ a schema:MusicRecording, mo:MusicalManifestation; schema:additionalType mo:MusicalManifestation; schema:byArtist ("AC/DC");; schema:name "Shook you all Night Long"; schema:url <shook-you-all-night-long>])]);)]); rdfa:usesVocabulary schema: . schema:additionalType rdfs:subPropertyOf rdf:type .
This section is non-normative.
The following is an example registry in JSON format.
{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, "blogPosts": {"multipleValues": "list"}, "blogPosts": {"multipleValues": "list"}, "breadcrumb": {"multipleValues": "list"}, "byArtist": {"multipleValues": "list"}, "creator": {"multipleValues": "list"}, "episode": {"multipleValues": "list"}, "episodes": {"multipleValues": "list"}, "event": {"multipleValues": "list"}, "events": {"multipleValues": "list"}, "founder": {"multipleValues": "list"}, "founders": {"multipleValues": "list"}, "itemListElement": {"multipleValues": "list"}, "musicGroupMember": {"multipleValues": "list"}, "performerIn": {"multipleValues": "list"}, "actor": {"multipleValues": "list"}, "actors": {"multipleValues": "list"}, "performer": {"multipleValues": "list"}, "performers": {"multipleValues": "list"}, "producer": {"multipleValues": "list"}, "recipeInstructions": {"multipleValues": "list"}, "season": {"multipleValues": "list"}, "seasons": {"multipleValues": "list"}, "subEvent": {"multipleValues": "list"}, "subEvents": {"multipleValues": "list"}, "track": {"multipleValues": "list"}, "tracks": {"multipleValues": "list"} } },"http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "unordered""http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "unordered" },"http://microformats.org/profile/hcalendar#": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "categories": {"multipleValues": "list"}"http://microformats.org/profile/hcalendar#": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "categories": {"multipleValues": "list"} } } }
This section is non-normative.
Thanks to Richard Cyganiak for property URI and vocabulary terminology and the general excellent consideration of practical problems in generating RDF from microdata.