The Resource Description Framework (RDF) is a framework for representing information in the Web.

RDF Concepts and Abstract Syntax defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of key concepts, datatyping, character normalization and handling of IRIs.

Introduction

This document reflects current progress of the RDF Working Group towards updating the 2004 version of RDF Concepts and Abstract Syntax. The editors expect to work on a number of issues, some of which are listed in boxes like this throughout the document.

The Resource Description Framework (RDF) is a framework for representing information in the Web.

This document defines an abstract syntax (a data model) on which RDF is based, and which serves to link concrete syntaxes to its formal semantics. It also includes discussion of key concepts, datatyping, character normalization and handling of IRIs.

Normative documentation of RDF falls into the following areas:

The framework is designed so that vocabularies can be layered. The terms defined in [[RDF-SCHEMA]] are the first such vocabulary. Several other vocabularies for RDF are mentioned in the Primer [[RDF-PRIMER]].

RDF Concepts

This section is quite redundant with later normative sections and the RDF Primer. Its removal has been proposed. This is ISSUE-68.

RDF uses the following key concepts:

Graph Data Model

The underlying structure of any expression in RDF is a collection of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph (defined more formally in section 6). This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link (hence the term “graph”).

image of the RDF triple comprising (subject, predicate, object)

Each triple represents a statement of a relationship between the things denoted by the nodes that it links. Each triple has three parts:

  1. a subject,
  2. an object, and
  3. a predicate (also called a property) that denotes a relationship.

The direction of the arc is significant: it always points toward the object.

The nodes of an RDF graph are its subjects and objects.

The assertion of an RDF triple says that some relationship, indicated by the predicate, holds between the things denoted by subject and object of the triple. The assertion of an RDF graph amounts to asserting all the triples in it, so the meaning of an RDF graph is the conjunction (logical AND) of the statements corresponding to all the triples it contains. A formal account of the meaning of RDF graphs is given in [[!RDF-MT]].

IRI-based Vocabulary and Node Identification

A node may be an IRI, a literal, or blank (having no separate form of identification). Properties are IRIs.

An IRI or literal used as a node identifies what that node represents. An IRI used as a predicate identifies a relationship between the things represented by the nodes it connects. A predicate IRI may also be a node in the graph.

A blank node is a node that is not an IRI or a literal. In the RDF abstract syntax, a blank node is just a unique node that can be used in one or more RDF statements.

A convention used by some linear representations of an RDF graph to allow several statements to use the same blank node is to use a blank node identifier, which is a local identifier that can be distinguished from all IRIs and literals. When graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers. Note that such blank node identifiers are not part of the RDF abstract syntax, and the representation of triples containing blank nodes is entirely dependent on the particular concrete syntax used.

Datatypes

Datatypes are used by RDF in the representation of values such as integers, floating point numbers and dates.

A datatype consists of a lexical space, a value space and a lexical-to-value mapping, see section 5.

For example, the lexical-to-value mapping for the XML Schema datatype xsd:boolean, where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:

Value Space {T, F}
Lexical Space {"0", "1", "true", "false"}
Lexical-to-Value Mapping {<"true", T>, <"1", T>, <"0", F>, <"false", F>}

RDF predefines just one datatype rdf:XMLLiteral, used for embedding XML in RDF (see section 5.1). RDF also defines rdf:langString, used for plain text in a natural language, but this is not formally considered a datatype.

There is no built-in concept of numbers or dates or other common values. Rather, RDF defers to datatypes that are defined separately, and identified with IRIs. The predefined XML Schema datatypes [[!XMLSCHEMA-2]] are expected to be widely used for this purpose.

RDF provides no mechanism for defining new datatypes. XML Schema Datatypes [[!XMLSCHEMA-2]] provides an extensibility framework suitable for defining new datatypes for use in RDF.

Literals

Literals are used to identify values such as numbers and dates by means of a lexical representation. Anything represented by a literal could also be represented by an IRI, but it is often more convenient or intuitive to use literals. All literals have a datatype IRI. A literal denotes a member of the datatype's value space, as indicated by its lexical-to-value mapping.

A literal may be the object of an RDF statement, but not the subject or the predicate.

Literals may be typed or language-tagged: A typed literal is a string combined with a datatype IRI. It denotes the member of the identified datatype's value space obtained by applying the lexical-to-value mapping to the literal string. A language-tagged literal is a string combined with a language tag. This may be used for plain text in a natural language. Language-tagged literals are self-denoting. Continuing the example from section 3.3, the typed literals that can be defined using the XML Schema datatype xsd:boolean are:

Typed Literal Lexical-to-Value Mapping Value
<xsd:boolean, "true"> <"true", T> T
<xsd:boolean, "1"> <"1", T> T
<xsd:boolean, "false"> <"false", F> F
<xsd:boolean, "0"> <"0", F> F

For text that may contain markup, use typed literals with type rdf:XMLLiteral. If language annotation is required, it must be explicitly included as markup, usually by means of an xml:lang attribute. XHTML [[XHTML10]] may be included within RDF in this way. Sometimes, in this latter case, an additional span or div element is needed to carry an xml:lang or lang attribute.

Update the XHTML 1.0 reference to something more recent? The string in both plain and typed literals is recommended to be in Unicode Normal Form C [[!NFC]]. This is motivated by [[CHARMOD]] particularly section 4 Early Uniform Normalization.

Entailment

The ideas on meaning and inference in RDF are underpinned by the formal concept of entailment, as discussed in the RDF semantics document [[!RDF-MT]]. In brief, an RDF expression A is said to entail another RDF expression B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if the truth of A is presumed or demonstrated then the truth of B can be inferred .

RDF Vocabulary IRI and Namespace

RDF uses IRIs to identify resources and properties. Certain IRIs with the following leading substring are defined by the RDF specifications to denote specific concepts:

Vocabulary terms in the rdf: namespace are listed and described in detail in the RDF Schema specification [[!RDF-SCHEMA]].

The RDF namespace is also used as an XML namespace [[XML-NAMES]] to define a number of additional element and attribute names for purely syntactic purposes within the RDF/XML syntax ([[RDF-SYNTAX-GRAMMAR]], section 5.1). These terms (e.g., rdf:about and rdf:ID) do not denote concepts.

Datatypes

This section perhaps should discuss the XSD datatype map and rdf:PlainLiteral. This is ISSUE-70.

The datatype abstraction used in RDF is compatible with the abstraction used in XML Schema Part 2: Datatypes [[!XMLSCHEMA-2]].

A datatype consists of a lexical space, a value space and a lexical-to-value mapping.

The lexical space of a datatype is a set of Unicode [[!UNICODE]] strings.

The lexical-to-value mapping of a datatype is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:

A datatype is identified by one or more IRIs.

RDF may be used with any datatype definition that conforms to this abstraction, even if not defined in terms of XML Schema.

Certain XML Schema built-in datatypes are not suitable for use within RDF. For example, the QName datatype requires a namespace declaration to be in scope during the mapping, and is not recommended for use in RDF. [[!RDF-MT]] contains a more detailed discussion of specific XML Schema built-in datatypes.

When the datatype is defined using XML Schema:

Language-tagged strings have the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is formally defined for this IRI because the definition of datatypes does not accommodate language tags. The value space associated with the datatype IRI is the set of all pairs of strings and language tags.

XML Content within an RDF Graph

The canonicalization rules required for XML literals are quite complicated. Increasingly, RDF is produced and consumed in environments where no XML parser and canonicalization engine is available. A possible change to relax the requirements for the lexical space, while retaining the value space, is under discussion. This is ISSUE-13.

RDF provides for XML content as a possible literal value. Such content is indicated in an RDF graph using a typed literal whose datatype is a specialthe built-in datatype rdf:XMLLiteral, defined as follows.

An IRI for identifying this datatype
is http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral.
The lexical space
is the set of all strings:
The value space
is a set of entities, called XML values, which is:
  • disjoint from the lexical space;
  • disjoint from the value space of any other datatype that is not explicitly defined as a sub- or supertype of this datatype;
  • disjoint from the set of Unicode character strings [[!UNICODE]];
  • and in 1:1 correspondence with the lexical space.
The lexical-to-value mapping
is a one-one mapping from the lexical space onto the value space, i.e. it is both injective and surjective.

Not all values of this datatype are compliant with XML 1.1 [[XML11]]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.

XML values can be thought of as the [[XML-INFOSET]] or the [[XPATH]] nodeset corresponding to the lexical form, with an appropriate equality function.

RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:XMLLiteral corresponding to a single text node of the same string.

Abstract Syntax

This section defines the RDF abstract syntax. The RDF abstract syntax is a set of triples, called the RDF graph.

This section also defines equivalence between RDF graphs. A definition of equivalence is needed to support the RDF Test Cases [[RDF-TESTCASES]] specification.

This abstract syntax is the syntax over which the formal semantics are defined. Implementations are free to represent RDF graphs in any other equivalent form. As an example: in an RDF graph, literals with datatype rdf:XMLLiteral can be represented in a non-canonical format, and canonicalization performed during the comparison between two such literals. In this example the comparisons may be being performed either between syntactic structures or between their denotations in the domain of discourse. Implementations that do not require any such comparisons can hence be optimized.

RDF Triples

An RDF triple contains three components:

An RDF triple is conventionally written in the order subject, predicate, object.

The predicate is also known as the property of the triple.

IRIs, blank nodes and literals are collectively known as RDF terms.

RDF Graph

An RDF graph is a set of RDF triples.

The set of nodes of an RDF graph is the set of subjects and objects of triples in the graph.

Graph Equivalence

Two RDF graphs G and G' are equivalent if there is a bijection M between the sets of nodes of the two graphs, such that:

  1. M maps blank nodes to blank nodes.
  2. M(lit)=lit for all RDF literals lit which are nodes of G.
  3. M(uri)=uri for all IRIs uri which are nodes of G.
  4. The triple ( s, p, o ) is in G if and only if the triple ( M(s), p, M(o) ) is in G'

With this definition, M shows how each blank node in G can be replaced with a new blank node to give G'.

IRIs

An IRI (Internationalized Resource Identifier) within an RDF graph is a Unicode string [[!UNICODE]] that conforms to the syntax defined in RFC 3987 [[!IRI]]. IRIs are a generalization of URIs [[URI]]. Every absolute URI and URL is an IRI.

IRIs in the RDF abstract syntax MUST be absolute, and MAY contain a fragment identifier.

Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 of [[!IRI]]. Further normalization MUST NOT be performed when comparing IRIs for equality.

When IRIs are used in operations that are only defined for URIs, they must first be converted according to the mapping defined in section 3.1 of [[!IRI]]. A notable example is retrieval over the HTTP protocol. The mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets not allowed in URIs, and Punycode-encoding of domain names.

Some concrete syntaxes permit relative IRIs as a shorthand for absolute IRIs, and define how to resolve the relative IRIs against a base IRI.

Previous versions of RDF used the term “RDF URI Reference” instead of “IRI” and allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘’ (double quote), and “ ” (space). In IRIs, these characters must be percent-encoded as described in section 2.1 of [[URI]].

Interoperability problems can be avoided by minting only IRIs that are normalized according to Section 5 of [[!IRI]]. Non-normalized forms that should be avoided include:

  • Uppercase characters in scheme names and domain names
  • Percent-encoding of characters where it is not required by IRI syntax
  • Explicitly stated HTTP default port (http://example.com:80/); http://example.com/ is preferrable
  • Completely empty path in HTTP IRIs (http://example.com); http://example.com/ is preferrable
  • /./” or “/../” in the path component of an IRI
  • Lowercase hexadecimal letters within percent-encoding triplets (“%3F” is preferable over “%3f”)
  • Punycode-encoding of Internationalized Domain Names in IRIs
  • IRIs that are not in Unicode Normalization Form C [[!NFC]]

RDF Literals

This section is a major departure from RDF 2004 as simple literals are now treated as syntactic sugar for xsd:string typed literals

A literal in an RDF graph consists of:

A language-tagged string is any literal in an whose RDF graph is either a datatype IRI is equal to http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. In addition to typed literal or a lexical form and datatype IRI, a language-tagged literal.string also has:

All literals have a lexical form being a Unicode [[!UNICODE]] string, which SHOULD be in Normal Form C [[!NFC]]. Language-tagged literals have a lexical form and Typed literals have a lexical form and a datatype IRI being an IRI.

Concrete syntaxes MAY support simple literals, consisting of only a lexical form without any datatype IRI or language tag or tag. Simple literals only exist in concrete syntaxes, and are treated as syntactic sugar for abstract syntax literals with the datatype IRI. Simple literals only exist in concrete syntaxes, and are treated as syntactic sugar for abstract syntax typed literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string.

In earlier versions of RDF, literals with a language tag did not have a datatype IRI, and simple literals could appear directly in the abstract syntax. Simple literals and language-tagged literals are with a language tag were collectively known as plain literals. Earlier versions of RDF allowed simple literals in the abstract syntax.

Literals in which the lexical form begins with a composing character (as defined by [[CHARMOD]]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [[XML11]].

Earlier versions of RDF permitted tags that adhered to the generic tag/subtag syntax of language tags, but were not well-formed according to [[!BCP47]]. Such language tags do not conform to RDF 1.1.

When using the language tag, care must be taken not to confuse language with locale. The language tag relates only to human language text. Presentational issues should be addressed in end-user applications.

The case normalization of language tags is part of the description of the abstract syntax, and consequently the abstract behaviour of RDF applications. It does not constrain an RDF implementation to actually normalize the case. Crucially, the result of comparing two language tags should not be sensitive to the case of the original input.

Literal Equality

Two literals are equal if and only if all of the following hold:

  • The strings of the two lexical forms compare equal, character by character.
  • Either both or neither have language tags.
  • The language tags, if any, compare equal.
  • Either both or neither have datatype IRIs.
  • The two datatype IRIs, if any, compare equal, character by character.

RDF Literals are distinct and distinguishable from IRIs; e.g. http://example.org/ as an RDF Literal (untyped, without a language tag)a string literal is not equal to http://example.org/ as an IRI.

The Value Corresponding to a Typed Literal

The datatype IRI refers to a datatype. For XML Schema built-in datatypes, IRIs such as http://www.w3.org/2001/XMLSchema#int are used. The IRI of the datatype rdf:XMLLiteral may be used. There may be other, implementation dependent, mechanisms by which IRIs refer to datatypes.

The literal value associated with a typed literal is found by is:

If the lexical form is not in the lexical space of the datatype associated with the datatype IRI, then no literal value can be associated with the typed literal. Such a case, while in error, is not syntactically ill-formed.

In application contexts, comparing the values of typed literals (see section 6.5.2) is usually more helpful than comparing their syntactic forms (see section 6.5.1). Similarly, for comparing RDF Graphs, semantic notions of entailment (see [[!RDF-MT]]) are usually more helpful than syntactic equality (see section 6.3).

Blank Nodes

The blank nodes in an RDF graph are drawn from an infinite set. This set of blank nodes, the set of all IRIs and the set of all literals are pairwise disjoint.

Otherwise, this set of blank nodes is arbitrary.

RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.

Replacing Blank Nodes with IRIs

Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization.

In situations where stronger identification is needed, systems MAY systematically transform some or all of the blank nodes in an RDF graph into IRIs [[!IRI]]. Systems wishing to do this SHOULD mint a new, globally unique IRI (a Skolem IRI) for each blank node so transformed.

This transformation does not change the meaning of an RDF graph, provided that the Skolem IRIs do not occur anywhere else.

Systems may wish to mint Skolem IRIs in such a way that they can recognize the IRIs as having been introduced solely to replace a blank node, and map back to the source blank node where possible.

Systems that want Skolem IRIs to be recognizable outside of the system boundaries SHOULD use a well-known IRI [[WELL-KNOWN]] with the registered name genid. This is an IRI that uses the HTTP or HTTPS scheme, or another scheme that has been specified to use well-known IRIs; and whose path component starts with /.well-known/genid/.

For example, the authority responsible for the domain example.com could mint the following recognizable Skolem IRI:

http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6

IETF registration of the genid name is currently in progress.

RFC 5785 [[WELL-KNOWN]] only specifies well-known URIs, not IRIs. For the purpose of this document, a well-known IRI is any IRI that results in a well-known URI after IRI-to-URI mapping [[!IRI]].

Abstract Syntax for Working with Multiple Graphs

The Working Group will standardize a model and semantics for multiple graphs and graphs stores. The charter notes:

The RDF Community has used the term “named graphs” for a number of years in various settings, but this term is ambiguous, and often refers to what could rather be referred as quoted graphs, graph literals, IRIs for graphs, knowledge bases, graph stores, etc. The term “Support for Multiple Graphs and Graph Stores” is used as a neutral term in this charter; this term is not and should not be considered as definitive. The Working Group will have to define the right term(s).

Progress on the design for this feature is tracked under multiple issues:

The design presented here should be considered a straw man proposal at this point. It is based on RDF Datasets as defined in SPARQL 1.1.

The RDF data model expresses information as RDF graphs consisting of triples with subject, predicate and object. Often, one wants to hold multiple RDF graphs and record information about each graph, allowing an application to work with datasets that involve information from more than one graph.

An RDF Dataset is a collection of RDF graphs and comprises:

Fragment Identifiers

This section does not address the case where RDF is embedded in other document formats, such as in RDFa or when an RDF/XML fragment is embedded in SVG. It has been suggested that this may be a general issue for the TAG about the treatment of fragment identifiers when one language is embedded in another. This is ISSUE-37.

This section treats the RDF/XML media type as canonical for establishing the referent of IRIs that include fragment identifier. Today we have many different media types that can carry RDF graphs, and HTTP content negotiation is more common. Also, the problem addressed in the section (context-dependence of fragment identifiers) has to some extent gone away when RFC 2396 was replaced by RFC 3986. The latter states that the same fragment should be used for the same thing in resources that have multiple representations (Section 3.5 [[URI]]). This is ISSUE-69.

RDF uses IRIs, which may include fragment identifiers, as context free identifiers for resources. RFC 2396 states that the meaning of a fragment identifier depends on the MIME content-type of a document, i.e. is context dependent.

These apparently conflicting views are reconciled by considering that an IRI in an RDF graph is treated with respect to the MIME type application/rdf+xml [[RDF-MIME-TYPE]]. Given an IRI that includes a fragment identifier, the fragment identifer identifies the same thing that it does in an application/rdf+xml representation of the resource identified by the IRI excluding the fragment identifier. Thus:

This provides a handling of IRIs and their denotation that is consistent with the RDF model theory and usage, and also with conventional Web behavior. Note that nothing here requires that an RDF application be able to retrieve any representation of resources identified by the IRIs in an RDF graph.

Acknowledgments

This section does not yet list those who made contributions to the RDF 1.1 version, nor does it list the current RDF WG members.

The RDF 2004 editors acknowledge valuable contributions from Frank Manola, Pat Hayes, Dan Brickley, Jos de Roo, Dave Beckett, Patrick Stickler, Peter F. Patel-Schneider, Jerome Euzenat, Massimo Marchiori, Tim Berners-Lee, Dave Reynolds and Dan Connolly.

This specification contains a significant contribution from the designers of the RDF typed literal mechanism, Pat Hayes, Sergey Melnik and Patrick Stickler. The document draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha.

This specification is a product of extended deliberations by the members of the RDFcore Working Group and the RDF and RDF Schema Working Group.

Changes from RDF 2004