RDF 1.1 TriG

This document defines a textual syntax for RDF called TriG that allows an RDF dataset to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. TriG is an extension of the Turtle [[!TURTLE]] format.

TriG Grammar

A TriG document is a Unicode [[!UNICODE]] character string encoded in UTF-8. Unicode characters only in the range U+0000 to U+10FFFF inclusive are allowed.

White Space

White space (production WS) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a TriG parser.

White space is significant in the production String.

Comments

Comments in TriG take the form of '#', outside an IRI or a string, and continue to the end of line (marked by characters U+000D or U+000A) or end of file if there is no end of line after the comment marker. Comments are treated as white space.

IRI References

Relative IRIs are resolved with base IRIs as per Uniform Resource Identifier (URI): Generic Syntax [[!RFC3986]] using only the basic algorithm in section 5.2. Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of Internationalized Resource Identifiers (IRIs) [[!RFC3987]].

The @base directive defines the Base IRI used to resolve relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded in Content". Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the In-Scope Base IRI may come from an encapsulating document, such as a SOAP envelope with an xml:base directive or a mime multipart document with a Content-Location header. The "Retrieval URI" identified in 5.1.3, Base "URI from the Retrieval URI", is the URL from which a particular TriG document was retrieved. If none of the above specifies the Base URI, the default Base URI (section 5.1.4, "Default Base URI") is used. Each @base directive sets a new In-Scope Base URI, relative to the previous one.

Escape Sequences

There are three forms of escapes used in TriG documents:

numeric escape sequences represent Unicode code points:

Escape sequence	Unicode code point
'\u' hex hex hex hex	A Unicode character in the range U+0000 to U+FFFF inclusive corresponding to the value encoded by the four hexadecimal digits interpreted from most significant to least significant digit.
'\U' hex hex hex hex hex hex hex hex	A Unicode character in the range U+0000 to U+10FFFF inclusive corresponding to the value encoded by the eight hexadecimal digits interpreted from most significant to least significant digit.

where HEX is a hexadecimal character

HEX ::= [0-9] | [A-F] | [a-f]

string escape sequences represent the characters traditionally escaped in string literals:

Escape sequence	Unicode code point
'\t'	U+0009
'\b'	U+0008
'\n'	U+000A
'\r'	U+000D
'\f'	U+000C
'\"'	U+0022
'\''	U+0027
'\\'	U+005C

reserved character escape sequences consist of a '\' followed by one of ~.-!$&'()*+,;=/?#@%_ and represent the character to the right of the '\'.

Context where each kind of escape sequence can be used
	numeric escapes	string escapes	reserved character escapes
IRIs, used as RDF terms or as in @prefix or @base declarations	yes	no	no
local names	no	no	yes
Strings	yes	yes	no

%-encoded sequences are in the character range for IRIs and are explicitly allowed in local names. These appear as a '%' followed by two hex characters and represent that same sequence of three characters. These sequences are not decoded during processing. A term written as <http://a.example/%66oo-bar> in TriG designates the IRI http://a.example/%66oo-bar and not IRI http://a.example/foo-bar. A term written as ex:%66oo-bar with a prefix @prefix ex: <http://a.example/> also designates the IRI http://a.example/%66oo-bar.

Grammar

The EBNF used here is defined in XML 1.0 [[!EBNF-NOTATION]]. Production labels consisting of a number and a final 'g' are unique to TriG. All Production labels consisting of only a number reference the production with that number in the Turtle grammar [[!TURTLE]]. Production labels consisting of a number and a final 's', e.g. [60s], reference the production with that number in the document SPARQL 1.1 Query Language grammar [[SPARQL11-QUERY]].

Notes:

A blank node label represents the same blank node throughout the TriG document.
Keywords in single quotes ( '@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ( "BASE", "PREFIX" "GRAPH" ) are case-insensitive.
Escape sequences markers \u, \U and those in ECHAR are case sensitive.
When tokenizing the input and choosing grammar rules, the longest match is chosen.
The TriG grammar is LL(1) and LALR(1) when the rules with uppercased names are used as terminals.
The entry point into the grammar is trigDoc.
In signed numbers, no white space is allowed between the sign and the number.
The [162s] ANON ::= '[' WS* ']' token allows any amount of white space and comments between []s. The single space version is used in the grammar for clarity.
The strings '@prefix' and '@base' match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g. "Z"@base) is in the TriG language.

Parsing

The RDF Concepts and Abstract Syntax [[!RDF11-CONCEPTS]] specification defines three types of RDF Term: IRIs, literals and blank nodes. Literals are composed of a lexical form and an optional language tag [[!BCP47]] or datatype IRI. An extra type, prefix, is used during parsing to map string identifiers to namespace IRIs. This section maps a string conforming to the grammar in to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g. language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.

Parser State

Parsing TriG requires a state of six items:

IRI baseURI — When the base production is reached, the second rule argument, IRIREF, is the base URI used for relative IRI resolution.
Map[prefix -> IRI] namespaces — The second and third rule arguments (PNAME_NS and IRIREF) in the prefixID production assign a namespace name (IRIREF) for the prefix (PNAME_NS). Outside of a prefixID production, any PNAME_NS is substituted with the namespace. Note that the prefix may be an empty string, per the PNAME_NS, production: (PN_PREFIX)? ":".
Map[string -> blank node] bnodeLabels — A mapping from string to blank node.
RDF_Term curSubject — The curSubject is bound to the subject production.
RDF_Term curPredicate — The curPredicate is bound to the verb production. If token matched was "a", curPredicate is bound to the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type.
RDF_Term curGraph — The curGraph is bound to the label of the graph that is the destination of triples produced in parsing. When undefined, triples are destined for the default graph.

RDF Term Constructors

This table maps productions and lexical tokens to RDF terms or components of RDF terms listed in :

production	type	procedure
IRIREF	IRI	The characters between "<" and ">" are taken, with the numeric escape sequences unescaped, to form the unicode string of the IRI. Relative IRI resolution is performed per .
PNAME_NS	prefix	When used in a prefixID or sparqlPrefix production, the `prefix` is the potentially empty unicode string matching the first argument of the rule is a key into the namespaces map.
PNAME_NS	IRI	When used in a PrefixedName production, the `iri` is the value in the namespaces map corresponding to the first argument of the rule.
PNAME_LN	IRI	A potentially empty prefix is identified by the first sequence, `PNAME_NS`. The namespaces map MUST have a corresponding `namespace`. The unicode string of the IRI is formed by unescaping the reserved characters in the second argument, `PN_LOCAL`, and concatenating this onto the `namespace`.
STRING_LITERAL_SINGLE_QUOTE	lexical form	The characters between the outermost "'"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_QUOTE	lexical form	The characters between the outermost '"'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG_SINGLE_QUOTE	lexical form	The characters between the outermost "'''"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG_QUOTE	lexical form	The characters between the outermost '"""'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
LANGTAG	language tag	The characters following the `@` form the unicode string of the language tag.
RDFLiteral	literal	The literal has a lexical form of the first rule argument, `String`, and either a language tag of `LANGTAG` or a datatype IRI of `iri`, depending on which rule matched the input. If the `LANGTAG` rule matched, the datatype is `rdf:langString` and the language tag is `LANGTAG`. If neither a language tag nor a datatype IRI is provided, the literal has a datatype of `xsd:string`.
INTEGER	literal	The literal has a lexical form of the input string, and a datatype of `xsd:integer`.
DECIMAL	literal	The literal has a lexical form of the input string, and a datatype of `xsd:decimal`.
DOUBLE	literal	The literal has a lexical form of the input string, and a datatype of `xsd:double`.
BooleanLiteral	literal	The literal has a lexical form of the `true` or `false`, depending on which matched the input, and a datatype of `xsd:boolean`.
BLANK_NODE_LABEL	blank node	The string matching the second argument, `PN_LOCAL`, is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated.
ANON	blank node	A blank node is generated.
blankNodePropertyList	blank node	A blank node is generated. Note the rules for `blankNodePropertyList` in the next section.
collection	blank node	For non-empty lists, a blank node is generated. Note the rules for `collection` in the next section.
collection	IRI	For empty lists, the resulting IRI is `rdf:nil`. Note the rules for `collection` in the next section.

RDF Triples Construction

A TriG document defines an RDF Dataset composed of one default graph and zero or more named graphs. Each graph is composed of a set of RDF triples.

Output Graph

The state curGraph is initially unset. It records the label of the graph for triples produced during parsing. If undefined, the default graph is used.

The rule labelOrSubject sets both curGraph and curSubject (only one of these will be used).

The following grammar production clauses set curGraph to be undefined, indicating the default graph:

The grammar production clause wrappedGraph in rule block.
The grammar production in rule triples2.

The grammar production labelOrSubject predicateObjectList '.' unsets curGraph before handling predicateObjectLists in rule triplesOrGraph.

Triple Output

Each RDF triple produced is added to curGraph, or the default graph if curGraph is not set at that point in the parsing process.

The subject production sets the curSubject. The verb production sets the curPredicate.

Triples are produced at the following points in the parsing process and each RDF triple produced is added to the graph identified by curGraph.

Triple Production

Each object N in the document produces an RDF triple: curSubject curPredicate N.

Property Lists

Beginning the blankNodePropertyList production records the curSubject and curPredicate, and sets curSubject to a novel blank node B. Finishing the blankNodePropertyList production restores curSubject and curPredicate. The node produced by matching blankNodePropertyList is the blank node B.

Collections

Beginning the collection production records the curSubject and curPredicate. Each object in the collection production has a curSubject set to a novel blank node B and a curPredicate set to rdf:first. For each object object_n after the first produces a triple:object_n-1 rdf:rest object_n . Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores curSubject and curPredicate The node produced by matching collection is the first blank node B for non-empty lists and rdf:nil for empty lists.

Media Type Registration

Contact:: Eric Prud'hommeaux
See also:: How to Register a Media Type for a W3C Specification; Internet Media Type registration, consistency of use
TAG Finding 3 June 2002 (Revised 4 September 2002)

The Internet Media Type / MIME Type for TriG is "application/trig".

It is recommended that TriG files have the extension ".trig" (all lowercase) on all platforms.

It is recommended that TriG files stored on Macintosh HFS file systems be given a file type of "TEXT".

This information that follows will be submitted to the IESG for review, approval, and registration with IANA.

Type name:: application
Subtype name:: trig
Required parameters:: None
Optional parameters:: None
Encoding considerations:: The syntax of TriG is expressed over code points in Unicode [[!UNICODE]]. The encoding is always UTF-8 [[!UTF-8]].; Unicode code points may also be expressed using an \uXXXX (U+0000 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-Fa-f]
Security considerations:: TriG is a general-purpose assertion language; applications may evaluate given data to infer more assertions or to dereference IRIs, invoking the security considerations of the scheme for that IRI. Note in particular, the privacy issues in [[!RFC3023]] section 10 for HTTP IRIs. Data obtained from an inaccurate or malicious data source may lead to inaccurate or misleading conclusions, as well as the dereferencing of unintended IRIs. Care must be taken to align the trust in consulted resources with the sensitivity of the intended use of the data; inferences of potential medical treatments would likely require different trust than inferences for trip planning.; TriG is used to express arbitrary application data; security considerations will vary by domain of use. Security tools and protocols applicable to text (e.g. PGP encryption, MD5 sum validation, password-protected compression) may also be used on TriG documents. Security/privacy protocols must be imposed which reflect the sensitivity of the embedded information.; TriG can express data which is presented to the user, for example, RDF Schema labels. Application rendering strings retrieved from untrusted TriG documents must ensure that malignant strings may not be used to mislead the reader. The security considerations in the media type registration for XML ([[!RFC3023]] section 10) provide additional guidance around the expression of arbitrary data and markup.; TriG uses IRIs as term identifiers. Applications interpreting data expressed in TriG should address the security issues of Internationalized Resource Identifiers (IRIs) [[!RFC3987]] Section 8, as well as Uniform Resource Identifier (URI): Generic Syntax [[!RFC3986]] Section 7.; Multiple IRIs may have the same appearance. Characters in different scripts may look similar (a Cyrillic "о" may appear similar to a Latin "o"). A character followed by combining characters may have the same visual representation as another character (LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT has the same visual representation as LATIN SMALL LETTER E WITH ACUTE). Any person or application that is writing or interpreting data in TriG must take care to use the IRI that matches the intended semantics, and avoid IRIs that make look similar. Further information about matching of similar characters can be found in Unicode Security Considerations [[UNICODE-SECURITY]] and Internationalized Resource Identifiers (IRIs) [[RFC3987]], Section 8.
Interoperability considerations:: There are no known interoperability issues.
Published specification:: This specification.
Applications which use this media type:: No widely deployed applications are known to use this media type. It may be used by some web services and clients consuming their data.
Additional information:
Magic number(s):: TriG documents may have the strings 'prefix' or 'base' (case independent) near the beginning of the document.
File extension(s):: ".trig"
Base URI:: The TriG base directive can change the current base URI for relative IRIrefs in the language that are used sequentially later in the document.
Macintosh file type code(s):: "TEXT"
Person & email address to contact for further information:: Eric Prud'hommeaux <eric@w3.org>
Intended usage:: COMMON
Restrictions on usage:: None
Author/Change controller:: The TriG specification is the product of the RDF WG. The W3C reserves change control over this specifications.

Introduction

TriG Language

Triple Statements

Graph Statements

Other Terms

Special Considerations for Blank Nodes

Media Type and Content Encoding