The Resource Description Framework (RDF) is a framework for representing information in the Web.

RDF Concepts and Abstract Syntax defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of key concepts, datatyping, character normalization and handling of IRIs.

Introduction

The Resource Description Framework (RDF) is a framework for representing information in the Web.

This document defines an abstract syntax (a data model) which serves to link all RDF-based languages and specifications, including:

Graph-based Data Model

The core structure of the abstract syntax is a collection of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph. This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link; hence the term “graph”.

image of the RDF triple comprising (subject, predicate, object)

There may be three kinds of nodes in an RDF graph: IRIs, literals, and blank nodes.

Resources and Statements

Any IRI and literal denotes some thing in the universe of discourse. These things are called resources. Anything can be a resource, including physical things, documents, and abstract concepts; the term is synonymous with “entity”. The resource denoted by an IRI is called its referent, and the resource denoted by a literal is called its value. Literals have datatypes that define the range of possible values, such as strings, numbers, and dates. A special kind of literals, language-tagged strings, denote plain-text strings in a natural language.

The assertion of an RDF triple says that some relationship, indicated by the predicate, holds between the resources denoted by the subject and object. This statement corresponding to an RDF triple is known as an RDF statement. The predicate itself is an IRI and denotes a binary relation, also known as a property. (Relations that involve more than two entities can only be indirectly expressed in RDF [[SWBP-N-ARYRELATIONS]].)

The assertion of an RDF graph amounts to asserting all the triples in it, so the meaning of an RDF graph is the conjunction (logical AND) of the statements corresponding to all the triples it contains.

Statements involving blank nodes say that something with the given relationships exists, without explicitly naming it.

The Referent of an IRI

The resource denoted by an IRI is also called its referent. What exactly is denoted by any given IRI is not defined by this specification. The question is treated in other documents like Architecture of the World Wide Web, Volume One [[WEBARCH]] and Cool URIs for the Semantic Web [[COOLURIS]]. A very brief, informal and partial account follows:

An RDF vocabulary is a collection of IRIs with clearly established referents intended for use in RDF graphs. For example, the IRIs documented in [[RDF-SCHEMA]] are the RDF Schema vocabulary. RDF Schema can itself be used to define and document additional RDF vocabularies. Some such vocabularies are mentioned in the Primer [[RDF-PRIMER]].

It has been suggested that this specification should also define terms such as “namespace”, “namespace IRI”, and “namespace prefix”.

Formal Meaning and Entailment

The idea of meaning in RDF is underpinned by the formal concept of entailment. In brief, an RDF graph A is said to entail another RDF graph B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if the truth of A is presumed or demonstrated then the truth of B can be inferred. An account of meaning and entailment in RDF, using the formalism of model theory, is given in [[RDF-MT]].

Merging and Managing RDF Graphs

This section should explain terminology around working with multiple graphs, and explain the fact that graphs merge easily. This will be added once the Working Group has finalised a design.

An RDF document is a document that encodes an RDF graph in a concrete RDF syntax, such as Turtle [[TURTLE-TR]], RDFa [[RDFA-PRIMER]], RDF/XML [[RDF-SYNTAX-GRAMMAR]], or N-Triples [[N-TRIPLES]].

Implementations are free to represent RDF graphs in any other equivalent form.

This section needs to explain what kind of artefact can conform to this specification, and what is required in order to conform.

RDF Graphs

An RDF graph is a set of RDF triples.

Graph isomorphism: Two RDF graphs G and G' are isomorphic if there is a bijection M between the sets of nodes of the two graphs, such that:

  1. M maps blank nodes to blank nodes.
  2. M(lit)=lit for all RDF literals lit which are nodes of G.
  3. M(uri)=uri for all IRIs uri which are nodes of G.
  4. The triple ( s, p, o ) is in G if and only if the triple ( M(s), p, M(o) ) is in G'

With this definition, M shows how each blank node in G can be replaced with a new blank node to give G'. Graph isomorphism is needed to support the RDF Test Cases [[RDF-TESTCASES]] specification.

Triples

An RDF triple contains three components:

An RDF triple is conventionally written in the order subject, predicate, object.

The set of nodes of an RDF graph is the set of subjects and objects of triples in the graph. Predicate IRIs MAY also appear as nodes in the graph.

IRIs, blank nodes and literals are collectively known as RDF terms.

IRIs

An IRI (Internationalized Resource Identifier) within an RDF graph is a Unicode string [[!UNICODE]] that conforms to the syntax defined in RFC 3987 [[!IRI]]. IRIs are a generalization of URIs [[URI]]. Every absolute URI and URL is an IRI.

IRIs in the RDF abstract syntax MUST be absolute, and MAY contain a fragment identifier.

IRI equality: Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 of [[!IRI]]. Further normalization MUST NOT be performed when comparing IRIs for equality.

When IRIs are used in operations that are only defined for URIs, they must first be converted according to the mapping defined in section 3.1 of [[!IRI]]. A notable example is retrieval over the HTTP protocol. The mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets not allowed in URIs, and Punycode-encoding of domain names.

Some concrete syntaxes permit relative IRIs as a shorthand for absolute IRIs, and define how to resolve the relative IRIs against a base IRI.

Previous versions of RDF used the term “RDF URI Reference” instead of “IRI” and allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘’ (double quote), and “ ” (space). In IRIs, these characters must be percent-encoded as described in section 2.1 of [[URI]].

Interoperability problems can be avoided by minting only IRIs that are normalized according to Section 5 of [[!IRI]]. Non-normalized forms that should be avoided include:

  • Uppercase characters in scheme names and domain names
  • Percent-encoding of characters where it is not required by IRI syntax
  • Explicitly stated HTTP default port (http://example.com:80/); http://example.com/ is preferrable
  • Completely empty path in HTTP IRIs (http://example.com); http://example.com/ is preferrable
  • /./” or “/../” in the path component of an IRI
  • Lowercase hexadecimal letters within percent-encoding triplets (“%3F” is preferable over “%3f”)
  • Punycode-encoding of Internationalized Domain Names in IRIs
  • IRIs that are not in Unicode Normalization Form C [[!NFC]]

Literals

Literals are used to denote values such as strings, numbers and dates by means of a lexical representation.

A literal in an RDF graph consists of:

A language-tagged string is any literal whose datatype IRI is equal to http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. In addition to lexical form and datatype IRI, a language-tagged string also has:

Concrete syntaxes MAY support simple literals, consisting of only a lexical form without any datatype IRI or language tag. Simple literals only exist in concrete syntaxes, and are treated as syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string.

Literal equality: Two literals are equal if and only if the two lexical forms, the two datatype IRIs, and the two language tags (if any) compare equal, character by character.

In earlier versions of RDF, literals with a language tag did not have a datatype IRI, and simple literals could appear directly in the abstract syntax. Simple literals and literals with a language tag were collectively known as plain literals.

Literals in which the lexical form begins with a composing character (as defined by [[CHARMOD]]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [[XML11]].

Earlier versions of RDF permitted tags that adhered to the generic tag/subtag syntax of language tags, but were not well-formed according to [[!BCP47]]. Such language tags do not conform to RDF 1.1.

The xsd:string datatype does not permit the #x0 character, and implementations may not permit control codes in the #x1-#x1F range. Earlier versions of RDF allowed these characters in simple literals, although they could never be serialized in a W3C-recommended concrete syntax.

When using the language tag, care must be taken not to confuse language with locale. The language tag relates only to human language text. Presentational issues should be addressed in end-user applications.

The case normalization of language tags is part of the description of the abstract syntax, and consequently the abstract behaviour of RDF applications. It does not constrain an RDF implementation to actually normalize the case. Crucially, the result of comparing two language tags should not be sensitive to the case of the original input.

RDF Literals are distinct and distinguishable from IRIs; e.g. http://example.org/ as a string literal is not equal to http://example.org/ as an IRI.

Blank Nodes

The blank nodes in an RDF graph are drawn from an infinite set. This set is disjoint from the set of all IRIs and the set of all literals. Otherwise, this set of blank nodes is arbitrary.

RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.

Some concrete syntaxes for RDF use blank node identifiers to allow several statements to use the same blank node. A blank node identifier is a local identifier that can be distinguished from IRIs and literals. Such blank node identifiers are not part of the RDF abstract syntax, but are entirely dependent on the particular concrete syntax used.

Replacing Blank Nodes with IRIs

Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization.

In situations where stronger identification is needed, systems MAY systematically transform some or all of the blank nodes in an RDF graph into IRIs [[!IRI]]. Systems wishing to do this SHOULD mint a new, globally unique IRI (a Skolem IRI) for each blank node so transformed.

This transformation does not change the meaning of an RDF graph, provided that the Skolem IRIs do not occur anywhere else.

Systems may wish to mint Skolem IRIs in such a way that they can recognize the IRIs as having been introduced solely to replace a blank node, and map back to the source blank node where possible.

Systems that want Skolem IRIs to be recognizable outside of the system boundaries SHOULD use a well-known IRI [[WELL-KNOWN]] with the registered name genid. This is an IRI that uses the HTTP or HTTPS scheme, or another scheme that has been specified to use well-known IRIs; and whose path component starts with /.well-known/genid/.

For example, the authority responsible for the domain example.com could mint the following recognizable Skolem IRI:

http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6

IETF registration of the genid name is currently in progress. This is ACTION-82.

RFC 5785 [[WELL-KNOWN]] only specifies well-known URIs, not IRIs. For the purpose of this document, a well-known IRI is any IRI that results in a well-known URI after IRI-to-URI mapping [[!IRI]].

Datatypes

Datatypes are used with RDF literals to represent values such as string, numbers and dates. The datatype abstraction used in RDF is compatible with XML Schema [[!XMLSCHEMA11-2]]. Any datatype definition that conforms to this abstraction MAY be used in RDF, even if not defined in terms of XML Schema. RDF re-uses the XML Schema built-in datatypes, and provides one additional built-in datatype, rdf:XMLLiteral.

The Working Group is planning to add an HTML datatype to better address the use case of including text with markup. This is ISSUE-63.

A datatype consists of a lexical space, a value space and a lexical-to-value mapping, and is denoted by one or more IRIs.

The lexical space of a datatype is a set of Unicode [[!UNICODE]] strings.

The lexical-to-value mapping of a datatype is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:

When the datatype is defined using XML Schema:

Language-tagged strings have the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is formally defined for this IRI because the definition of datatypes does not accommodate language tags in the lexical space. The value space associated with the datatype IRI is the set of all pairs of strings and language tags.

For example, the XML Schema datatype xsd:boolean, where each member of the value space has two lexical representations, is defined as follows:

Lexical space:
{“true”, “false”, “1”, “0”}
Value space:
{true, false}
Lexical-to-value mapping
{ <“true”, true>, <“false”, false>, <“1”, true>, <“0”, false>, }

The literals that can be defined using this datatype are:

Literal Value
<“true”, xsd:boolean> true
<“false”, xsd:boolean> false
<“1”, xsd:boolean> true
<“0”, xsd:boolean> false

The XML Schema Built-in Datatypes

IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name of a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part 2: Datatypes [[!XMLSCHEMA11-2]]. The XML Schema built-in types listed in the following table are the RDF-compatible XSD types. Their use is RECOMMENDED.

DatatypeValue space (informative)
Core typesxsd:stringCharacter strings
xsd:booleantrue, false
xsd:decimalArbitrary-precision decimal numbers
xsd:integerArbitrary-size integer numbers
IEEE floating-point
numbers
xsd:double64-bit floating point numbers incl. ±Inf, ±0, NaN
xsd:float32-bit floating point numbers incl. ±Inf, ±0, NaN
Time and date xsd:dateDates (yyyy-mm-dd) with or without timezone
xsd:timeTimes (hh:mm:ss.sss…) with or without timezone
xsd:dateTimeDate and time with or without timezone
xsd:dateTimeStampDate and time with required timezone
Recurring and
partial dates
xsd:gYearGregorian calendar year
xsd:gMonthGregorian calendar month
xsd:gDayGregorian calendar day of the month
xsd:gYearMonthGregorian calendar year and month
xsd:gMonthDayGregorian calendar month and day
xsd:yearMonthDurationDuration of time (months and years)
xsd:dayTimeDurationDuration of time (days, hours, minutes, seconds)
Limited-range
integer numbers
xsd:byte-128…+127 (8 bit)
xsd:short-32768…+32767 (16 bit)
xsd:int-2147483648…+2147483647 (32 bit)
xsd:long-9223372036854775808…+9223372036854775807 (64 bit)
xsd:unsignedByte0…255 (8 bit)
xsd:unsignedShort0…65535 (16 bit)
xsd:unsignedInt0…4294967295 (32 bit)
xsd:unsignedLong0…18446744073709551615 (64 bit)
xsd:positiveIntegerInteger numbers >0
xsd:nonNegativeIntegerInteger numbers ≥0
xsd:negativeIntegerInteger numbers <0
xsd:nonPositiveIntegerInteger numbers ≤0
Encoded binary data xsd:hexBinaryHex-encoded binary data
xsd:base64BinaryBase64-encoded binary data
Miscellaneous
XSD types
xsd:anyURIAbsolute or relative URIs and IRIs
xsd:languageLanguage tags per [[BCP47]]
xsd:normalizedStringWhitespace-normalized strings
xsd:tokenTokenized strings
xsd:NMTOKENXML NMTOKENs
xsd:NameXML Names
xsd:NCNameXML NCNames

The other built-in XML Schema datatypes are unsuitable for various reasons, and SHOULD NOT be used.

xsd:duration is noted above as being unsuitable for use in RDF. The design of the type has changed in XSD 1.1, so this should be reviewed. This is ISSUE-88.

The rdf:XMLLiteral Datatype

RDF provides for XML content as a possible literal value. Such content is indicated in an RDF graph using a literal whose datatype is a special built-in datatype rdf:XMLLiteral.

rdf:XMLLiteral is defined as follows.

An IRI denoting this datatype
is http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral.
The lexical space
is the set of all strings which are well-balanced, self-contained XML content [[!XML10]]; and for which embedding between an arbitrary XML start tag and an end tag yields a document conforming to XML Namespaces [[!XML-NAMES]].
The value space
is a set of DOM DocumentFragments [[!DOM-LEVEL-3-CORE]]. Two DocumentFragments A and B are considered equal if and only if the DOM method A.isEqualNode(B) returns true.
The lexical-to-value mapping
is defined as follows:
  • Let xmldoc be the literal's lexical form, wrapped between an arbitrary XML start-tag and matching end-tag
  • Let domdoc be a DOM Document object [[!DOM-LEVEL-3-CORE]] corresponding to xmldoc
  • Let domfrag be a DOM DocumentFragment whose childNodes attribute is equal to the childNodes attribute of domdoc's documentElement attribute
  • Return domfrag.normalize()
The canonical mapping
defines a canonical lexical form [[!XMLSCHEMA11-2]] for each member of the value space. The rdf:XMLLiteral canonical mapping is the exclusive XML canonicalization method (with comments, with empty InclusiveNamespaces PrefixList) [[!XML-EXC-C14N]].

Any XML namespace declarations (xmlns) and language annotation (xml:lang) desired in the XML content must be included explicitly in the XML literal. Note that some concrete RDF syntaxes may define mechanisms for inheriting them from the context (e.g., @parseType="literal" in RDF/XML [[RDF-SYNTAX-GRAMMAR]]).

Not all values of this datatype are compliant with XML 1.1 [[XML11]]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.

Datatype Maps

A datatype map is an implementation-defined set of <IRI, datatype> pairs such that no IRI appears twice in the set and the IRI denotes the datatype. It can be seen as a function from IRIs to datatypes.

If a datatype map contains the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral, then it MUST be paired with the datatype rdf:XMLLiteral.

If a datatype map contains an IRI of the form http://www.w3.org/2001/XMLSchema#xxx, then it MUST be paired with the RDF-compatible XSD type named xsd:xxx.

The Value Corresponding to a Literal

The literal value associated with a literal is:

  1. If the literal is a language-tagged string, then the literal value is a pair consisting of its lexical form and its language tag, in that order.
  2. If the literal's datatype IRI is not in the implementation-defined datatype map, then the literal value is not defined by this specification.
  3. Let d be the datatype associated with the datatype IRI in the implementation-defined datatype map.
  4. If the literal's lexical form is in the lexical space of d, then the literal value is the result of applying the lexical-to-value mapping of d to the lexical form.
  5. Otherwise, the literal is ill-typed, and no literal value can be associated with the literal. Such a case, while in error, is not syntactically ill-formed.

In application contexts, comparing the values of literals is usually more helpful than comparing their syntactic forms (literal equality). Similarly, for comparing RDF graphs, semantic notions of entailment are usually more helpful than syntactic graph isomorphism.

Abstract Syntax for Working with Multiple Graphs

The RDF data model expresses information as RDF graphs consisting of triples with subject, predicate and object. Often, one wants to hold multiple RDF graphs and record information about each graph, allowing an application to work with datasets that involve information from more than one graph.

An RDF Dataset is a collection of RDF graphs and comprises:

The Working Group will standardize a model and semantics for multiple graphs and graphs stores. The charter notes:

The RDF Community has used the term “named graphs” for a number of years in various settings, but this term is ambiguous, and often refers to what could rather be referred as quoted graphs, graph literals, IRIs for graphs, knowledge bases, graph stores, etc. The term “Support for Multiple Graphs and Graph Stores” is used as a neutral term in this charter; this term is not and should not be considered as definitive. The Working Group will have to define the right term(s).

Progress on the design for this feature is tracked under multiple issues:

The design presented here should be considered a straw man proposal at this point. It is based on RDF Datasets as defined in SPARQL 1.1.

When RDF graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers.

Should “Graph merge” be defined in this spec? If not, then the previous note could just as well go. This will be decided once a multigraph design has been decided upon.

Fragment Identifiers

RDF uses IRIs, which may include fragment identifiers, as resource identifiers. The semantics of fragment identifiers are defined in RFC 3986 [[URI]]: They identify a secondary resource that is usually a part of, view of, defined in, or described in the primary resource, and the precise semantics depend on the set of representations that might result from a retrieval action on the primary resource.

This section discusses the handling of fragment identifiers in representations that encode RDF graphs.

In RDF-bearing representations of a resource <foo>, the secondary resource identified by a fragment #bar is the entity denoted by the full IRI <foo#bar> in the RDF graph. Since IRIs in RDF graphs can denote anything, this can be something external to the representation, or even external to the Web.

In this way, the RDF representation acts as an intermediary between some web-retrievable document, and some set of possibly non-web or abstract entities that the RDF may describe.

Primary resources may have multiple representations (a.k.a. content negotiation). Fragments in RDF-bearing representations should be used consistently with the semantics imposed by any non-RDF representations. For example, if the fragment #chapter1 identifies a document section in an HTML representation of a primary resource, then #chapter1 should be taken to denote that same section in all RDF-bearing representations of the same primary resource.

Likewise, RDF graphs embedded in non-RDF representations with mechanism such as RDFa [[RDFA-PRIMER]] should use fragment identifiers consistently with the semantics imposed by the host language.

Acknowledgments

This section does not yet acknowledge contributions to the RDF 1.1 version.

The RDF 2004 editors acknowledge valuable contributions from Frank Manola, Pat Hayes, Dan Brickley, Jos de Roo, Dave Beckett, Patrick Stickler, Peter F. Patel-Schneider, Jerome Euzenat, Massimo Marchiori, Tim Berners-Lee, Dave Reynolds and Dan Connolly.

This specification contains a significant contribution from the designers of the RDF typed literal mechanism, Pat Hayes, Sergey Melnik and Patrick Stickler. The document draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha.

This specification is a product of extended deliberations by the members of the RDFcore Working Group and the Schema Working Group.

Changes from RDF 2004