The Resource Description Framework (RDF) is a framework for representing information in the Web.
RDF Concepts and Abstract Syntax defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of key concepts, datatyping, character normalization and handling of IRIs.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax (a data model) which serves to link all RDF-based languages and specifications, including:
The core structure of the abstract syntax is a collection of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph. This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link; hence the term “graph”.
There may be three kinds of nodes in an RDF graph: IRIs, literals, and blank nodes.
Any IRI and literal denotes some thing in the universe of discourse. These things are called resources. Anything can be a resource, including physical things, documents, and abstract concepts; the term is synonymous with “entity”. The resource denoted by an IRI is called its referent, and the resource denoted by a literal is called its value. Literals have datatypes that define the range of possible values, such as strings, numbers, and dates. A special kind of literals, language-tagged strings, denote plain-text strings in a natural language.
The assertion of an RDF triple says that some relationship, indicated by the predicate, holds between the resources denoted by the subject and object. This statement corresponding to an RDF triple is known as an RDF statement. The predicate itself is an IRI and denotes a binary relation, also known as a property. (Relations that involve more than two entities can only be indirectly expressed in RDF [[SWBP-N-ARYRELATIONS]].)
The assertion of an RDF graph amounts to asserting all the triples in it, so the meaning of an RDF graph is the conjunction (logical AND) of the statements corresponding to all the triples it contains.
Statements involving blank nodes say that something with the given relationships exists, without explicitly naming it.
The resource denoted by an IRI is also called its referent. What exactly is denoted by any given IRI is not defined by this specification. The question is treated in other documents like Architecture of the World Wide Web, Volume One [[WEBARCH]] and Cool URIs for the Semantic Web [[COOLURIS]]. A very brief, informal and partial account follows:
An RDF vocabulary is a collection of IRIs with clearly established referents intended for use in RDF graphs. For example, the IRIs documented in [[RDF-SCHEMA]] are the RDF Schema vocabulary. RDF Schema can itself be used to define and document additional RDF vocabularies. Some such vocabularies are mentioned in the Primer [[RDF-PRIMER]].
It has been suggested that this specification should also define terms such as “namespace”, “namespace IRI”, and “namespace prefix”.
The idea of meaning in RDF is underpinned by the formal concept of entailment. In brief, an RDF graph A is said to entail another RDF graph B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if the truth of A is presumed or demonstrated then the truth of B can be inferred. An account of meaning and entailment in RDF, using the formalism of model theory, is given in [[RDF-MT]].
This section should explain terminology around working with multiple graphs, and explain the fact that graphs merge easily. This will be added once the Working Group has finalised a design.
An RDF document is a document that encodes an RDF graph in a concrete RDF syntax, such as Turtle [[TURTLE-TR]], RDFa [[RDFA-PRIMER]], RDF/XML [[RDF-SYNTAX-GRAMMAR]], or N-Triples [[N-TRIPLES]].
Implementations are free to represent RDF graphs in any other equivalent form.
This section needs to explain what kind of artefact can conform to this specification, and what is required in order to conform.
An RDF graph is a set of RDF triples.
Graph isomorphism: Two RDF graphs G and G' are isomorphic if there is a bijection M between the sets of nodes of the two graphs, such that:
With this definition, M shows how each blank node in G can be replaced with a new blank node to give G'. Graph isomorphism is needed to support the RDF Test Cases [[RDF-TESTCASES]] specification.
An RDF triple contains three components:
An RDF triple is conventionally written in the order subject, predicate, object.
The set of nodes of an RDF graph is the set of subjects and objects of triples in the graph. Predicate IRIs MAY also appear as nodes in the graph.
IRIs, blank nodes and literals are collectively known as RDF terms.
An IRI (Internationalized Resource Identifier) within an RDF graph is a Unicode string [[!UNICODE]] that conforms to the syntax defined in RFC 3987 [[!IRI]]. IRIs are a generalization of URIs [[URI]]. Every absolute URI and URL is an IRI.
IRIs in the RDF abstract syntax MUST be absolute, and MAY contain a fragment identifier.
IRI equality: Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 of [[!IRI]]. Further normalization MUST NOT be performed when comparing IRIs for equality.
When IRIs are used in operations that are only defined for URIs, they must first be converted according to the mapping defined in section 3.1 of [[!IRI]]. A notable example is retrieval over the HTTP protocol. The mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets not allowed in URIs, and Punycode-encoding of domain names.
Some concrete syntaxes permit relative IRIs as a shorthand for absolute IRIs, and define how to resolve the relative IRIs against a base IRI.
Previous versions of RDF used the term
“RDF URI Reference” instead of “IRI” and allowed
“’ (double quote), and “
Interoperability problems can be avoided by minting only IRIs that are normalized according to Section 5 of [[!IRI]]. Non-normalized forms that should be avoided include:
/./” or “
/../” in the path component of an IRI
%3F” is preferable over “
Literals are used to denote values such as strings, numbers and dates by means of a lexical representation.
A literal in an RDF graph consists of:
A language-tagged string is any literal
whose datatype IRI is equal to
In addition to lexical form and datatype IRI,
a language-tagged string also has:
Concrete syntaxes MAY support simple
literals, consisting of only a lexical form
without any datatype IRI or language tag. Simple literals only
exist in concrete syntaxes, and are treated as
syntactic sugar for abstract syntax
literals with the datatype IRI
Literal equality: Two literals are equal if and only if the two lexical forms, the two datatype IRIs, and the two language tags (if any) compare equal, character by character.
In earlier versions of RDF, literals with a language tag did not have a datatype IRI, and simple literals could appear directly in the abstract syntax. Simple literals and literals with a language tag were collectively known as plain literals.
Literals in which the lexical form begins with a composing character (as defined by [[CHARMOD]]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [[XML11]].
Earlier versions of RDF permitted tags that adhered to the generic tag/subtag syntax of language tags, but were not well-formed according to [[!BCP47]]. Such language tags do not conform to RDF 1.1.
xsd:string datatype does not
#x0 character, and implementations may not permit
control codes in the
#x1-#x1F range. Earlier versions of
RDF allowed these characters in
simple literals, although they
could never be serialized in a W3C-recommended concrete syntax.
When using the language tag, care must be taken not to confuse language with locale. The language tag relates only to human language text. Presentational issues should be addressed in end-user applications.
The case normalization of language tags is part of the description of the abstract syntax, and consequently the abstract behaviour of RDF applications. It does not constrain an RDF implementation to actually normalize the case. Crucially, the result of comparing two language tags should not be sensitive to the case of the original input.
RDF Literals are distinct and distinguishable
from IRIs; e.g.
as a string literal is not equal to
as an IRI.
The blank nodes in an RDF graph are drawn from an infinite set. This set is disjoint from the set of all IRIs and the set of all literals. Otherwise, this set of blank nodes is arbitrary.
RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.
Some concrete syntaxes for RDF use blank node identifiers to allow several statements to use the same blank node. A blank node identifier is a local identifier that can be distinguished from IRIs and literals. Such blank node identifiers are not part of the RDF abstract syntax, but are entirely dependent on the particular concrete syntax used.
Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization.
In situations where stronger identification is needed, systems MAY systematically transform some or all of the blank nodes in an RDF graph into IRIs [[!IRI]]. Systems wishing to do this SHOULD mint a new, globally unique IRI (a Skolem IRI) for each blank node so transformed.
This transformation does not change the meaning of an RDF graph, provided that the Skolem IRIs do not occur anywhere else.
Systems may wish to mint Skolem IRIs in such a way that they can recognize the IRIs as having been introduced solely to replace a blank node, and map back to the source blank node where possible.
Systems that want Skolem IRIs to be recognizable outside of the system
boundaries SHOULD use a well-known IRI [[WELL-KNOWN]] with the registered
genid. This is an IRI that uses the HTTP or HTTPS scheme,
or another scheme that has been specified to use well-known IRIs; and whose
path component starts with
For example, the authority responsible for the domain
example.com could mint the following recognizable Skolem IRI:
IETF registration of the
genid name is
currently in progress. This is
RFC 5785 [[WELL-KNOWN]] only specifies well-known URIs, not IRIs. For the purpose of this document, a well-known IRI is any IRI that results in a well-known URI after IRI-to-URI mapping [[!IRI]].
Datatypes are used with RDF literals
to represent values such as string, numbers and dates.
The datatype abstraction used in RDF is compatible with XML Schema
[[!XMLSCHEMA11-2]]. Any datatype definition that conforms
to this abstraction MAY be used in RDF, even if not defined
in terms of XML Schema. RDF re-uses the XML Schema built-in datatypes,
and provides one additional built-in datatype,
The Working Group is planning to add an HTML datatype to better address the use case of including text with markup. This is ISSUE-63.
A datatype consists of a lexical space, a value space and a lexical-to-value mapping, and is denoted by one or more IRIs.
The lexical space of a datatype is a set of Unicode [[!UNICODE]] strings.
The lexical-to-value mapping of a datatype is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:
When the datatype is defined using XML Schema:
strings have the datatype IRI
No datatype is formally defined for this IRI because the definition
of datatypes does not accommodate
language tags in the lexical space.
The value space associated with the datatype IRI is the set
of all pairs of strings and language tags.
For example, the XML Schema datatype
where each member of the value space has two lexical
representations, is defined as follows:
true”, true>, <“
false”, false>, <“
1”, true>, <“
0”, false>, }
The literals that can be defined using this datatype are:
IRIs of the form
is the name of a datatype, denote the built-in datatypes defined in
XML Schema 1.1 Part 2:
Datatypes [[!XMLSCHEMA11-2]]. The XML Schema built-in types
listed in the following table are the
RDF-compatible XSD types. Their use is RECOMMENDED.
|Datatype||Value space (informative)|
|Core types||Character strings|
|Arbitrary-precision decimal numbers|
|Arbitrary-size integer numbers|
|64-bit floating point numbers incl. ±Inf, ±0, NaN|
|32-bit floating point numbers incl. ±Inf, ±0, NaN|
|Time and date||Dates (yyyy-mm-dd) with or without timezone|
|Times (hh:mm:ss.sss…) with or without timezone|
|Date and time with or without timezone|
|Date and time with required timezone|
|Gregorian calendar year|
|Gregorian calendar month|
|Gregorian calendar day of the month|
|Gregorian calendar year and month|
|Gregorian calendar month and day|
|Duration of time (months and years)|
|Duration of time (days, hours, minutes, seconds)|
|-128…+127 (8 bit)|
|-32768…+32767 (16 bit)|
|-2147483648…+2147483647 (32 bit)|
|-9223372036854775808…+9223372036854775807 (64 bit)|
|0…255 (8 bit)|
|0…65535 (16 bit)|
|0…4294967295 (32 bit)|
|0…18446744073709551615 (64 bit)|
|Integer numbers >0|
|Integer numbers ≥0|
|Integer numbers <0|
|Integer numbers ≤0|
|Encoded binary data||Hex-encoded binary data|
|Base64-encoded binary data|
|Absolute or relative URIs and IRIs|
|Language tags per [[BCP47]]|
The other built-in XML Schema datatypes are unsuitable for various reasons, and SHOULD NOT be used.
xsd:durationdoes not have a well-defined value space.
xsd:ENTITYrequire an enclosing XML document context.
xsd:IDREFare for cross references within an XML document.
xsd:NOTATIONis not intended for direct use.
xsd:NMTOKENSare sequence-valued datatypes which do not fit the RDF datatype model.
is noted above as being unsuitable for use in RDF. The design of the type
has changed in XSD 1.1, so this should be reviewed. This is
RDF provides for XML content as a possible literal value.
Such content is indicated in an RDF graph using a literal
whose datatype is a special built-in datatype
rdf:XMLLiteral is defined as follows.
DocumentFragmentsA and B are considered equal if and only if the DOM method
xmldocbe the literal's lexical form, wrapped between an arbitrary XML start-tag and matching end-tag
domdocbe a DOM
Documentobject [[!DOM-LEVEL-3-CORE]] corresponding to
domfragbe a DOM
childNodesattribute is equal to the
rdf:XMLLiteralcanonical mapping is the exclusive XML canonicalization method (with comments, with empty InclusiveNamespaces PrefixList) [[!XML-EXC-C14N]].
Any XML namespace declarations (
and language annotation (
xml:lang) desired in the
XML content must be included explicitly in the XML literal.
Note that some concrete RDF syntaxes may define mechanisms
for inheriting them from the context
in RDF/XML [[RDF-SYNTAX-GRAMMAR]]).
Not all values of this datatype are compliant with XML 1.1 [[XML11]]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.
A datatype map is an implementation-defined set of <IRI, datatype> pairs such that no IRI appears twice in the set and the IRI denotes the datatype. It can be seen as a function from IRIs to datatypes.
If a datatype map contains the IRI
then it MUST be paired with the datatype
If a datatype map contains an IRI of the form
then it MUST be paired with the
RDF-compatible XSD type
The literal value associated with a literal is:
In application contexts, comparing the values of literals is usually more helpful than comparing their syntactic forms (literal equality). Similarly, for comparing RDF graphs, semantic notions of entailment are usually more helpful than syntactic graph isomorphism.
The RDF data model expresses information as RDF graphs consisting of triples with subject, predicate and object. Often, one wants to hold multiple RDF graphs and record information about each graph, allowing an application to work with datasets that involve information from more than one graph.
An RDF Dataset is a collection of RDF graphs and comprises:
The Working Group will standardize a model and semantics for multiple graphs and graphs stores. The charter notes:
The RDF Community has used the term “named graphs” for a number of years in various settings, but this term is ambiguous, and often refers to what could rather be referred as quoted graphs, graph literals, IRIs for graphs, knowledge bases, graph stores, etc. The term “Support for Multiple Graphs and Graph Stores” is used as a neutral term in this charter; this term is not and should not be considered as definitive. The Working Group will have to define the right term(s).
Progress on the design for this feature is tracked under multiple issues:
The design presented here should be considered a straw man proposal at this point. It is based on RDF Datasets as defined in SPARQL 1.1.
When RDF graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers.
Should “Graph merge” be defined in this spec? If not, then the previous note could just as well go. This will be decided once a multigraph design has been decided upon.
RDF uses IRIs, which may include fragment identifiers, as resource identifiers. The semantics of fragment identifiers are defined in RFC 3986 [[URI]]: They identify a secondary resource that is usually a part of, view of, defined in, or described in the primary resource, and the precise semantics depend on the set of representations that might result from a retrieval action on the primary resource.
This section discusses the handling of fragment identifiers in representations that encode RDF graphs.
In RDF-bearing representations of a resource
the secondary resource identified by a fragment
is the entity denoted by the full IRI
in the RDF graph.
Since IRIs in RDF graphs can denote anything, this can be
something external to the representation, or even external
to the Web.
In this way, the RDF representation acts as an intermediary between some web-retrievable document, and some set of possibly non-web or abstract entities that the RDF may describe.
Primary resources may have multiple representations
(a.k.a. content negotiation). Fragments in RDF-bearing representations
should be used consistently with the semantics imposed by any
non-RDF representations. For example, if the fragment
#chapter1 identifies a document section in an
HTML representation of a primary resource, then
should be taken to denote that same section in all RDF-bearing
representations of the same primary resource.
Likewise, RDF graphs embedded in non-RDF representations with mechanism such as RDFa [[RDFA-PRIMER]] should use fragment identifiers consistently with the semantics imposed by the host language.
This section does not yet acknowledge contributions to the RDF 1.1 version.
The RDF 2004 editors acknowledge valuable contributions from Frank Manola, Pat Hayes, Dan Brickley, Jos de Roo, Dave Beckett, Patrick Stickler, Peter F. Patel-Schneider, Jerome Euzenat, Massimo Marchiori, Tim Berners-Lee, Dave Reynolds and Dan Connolly.
This specification contains a significant contribution from the designers of the RDF typed literal mechanism, Pat Hayes, Sergey Melnik and Patrick Stickler. The document draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha.
This specification is a product of extended deliberations by the members of the RDFcore Working Group and the Schema Working Group.
rdf:XMLLiteral. Added some new issue boxes.
rdf:XMLLiteralno longer requires lexical forms to be canonicalized, and the value space is now defined in terms of [[DOM-LEVEL-3-CORE]] (ISSUE-13)
rdf:langString. Formally introduced the term “language-tagged string”.