Copyright © 2012-2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
N-Quads is a line-based, plain text format for encoding an RDF dataset.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
The N-Quads format has a similar flavour as N-Triples [n-triples]. The main distinction is that N-Quads allows encoding multiple graphs. In a change from previous publication, this document is intended to become a W3C Recommendation.
This document was published by the RDF Working Group as a Last Call Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-rdf-comments@w3.org (subscribe, archives). The Last Call period ends 14 October 2013. All comments are welcome.
Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This is a Last Call Working Draft and thus the Working Group has determined that this document has satisfied the relevant technical requirements and is sufficiently stable to advance through the Technical Recommendation process.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document defines an easy to parse line-based language named N-Quads.
N-quads statments are a sequence of RDF terms representing the subject, predicate, object and graph label of an RDF Triple and the graph it is part of in a dataset. These may be separated by white space (spaces #x20
or tabs #x9
). This sequence is terminated by a '.
' and a new line (optional at the end of a document).
The simplest statement is a sequence of (subject, predicate, object) terms forming an RDF triple and an optional IRI labeling what graph in a dataset the triple belongs to, all are separated by whitespace and terminated by '.
' after each statement.
The graph label IRI can be omitted, in which case the triples are considered part of the default graph of the RDF dataset.
IRIs may be written only as absolute IRIs.
IRIs are enclosed in '<' and '>' and may contain numeric escape sequences (described below). For example <http://example.org/#green-goblin>
.
Literals are used to identify values such as strings, numbers, dates.
Literals (Grammar production Literal) have a lexical form followed by a language tag, a datatype IRI, or neither.
The representation of the lexical form consists of an initial delimiter "
(U+0022), a sequence of permitted characters or numeric escape sequence or string escape sequence, and a final delimiter. Literals may not contain the characters "
, LF
, or CR
. In addition '\
' (U+005C) may not appear in any quoted literal except as part of an escape sequence.
The corresponding RDF lexical form is the characters between the delimiters, after processing any escape sequences.
If present, the language tag is preceded by a '@
' (U+0040).
If there is no language tag, there may be a datatype IRI, preceeded by '^^
' (U+005E U+005E). If there is no datatype IRI and no language tag, the datatype is xsd:string
.
RDF blank nodes in N-Quads are expressed as _:
followed by a blank node label which is a series of name characters.
The characters in the label are built upon PN_CHARS_BASE, liberalized as follows:
_
and digits may appear anywhere in a blank node label..
may appear anywhere except the first or last character.-
, U+00B7
, U+0300
to U+036F
and U+203F
to U+2040
are permitted anywhere except the first character.A fresh RDF blank node is allocated for each unique blank node label in a document. Repeated use of the same blank node label identifies the same RDF blank node.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [RFC2119].
This specification defines conformance criteria for:
A conforming N-Quad document is a Unicode string that conforms to the grammar and additional constraints defined in section 4. Grammar, starting with the nquadsDoc
production. A N-Quad document serializes an RDF dataset.
A conforming N-Quad parser is a system capable of reading N-Quad documents on behalf of an application. It makes the serialized RDF graph, as defined in section 5. Parsing, available to the application, usually through some form of API.
The IRI that identifies the N-Quad language is: http://www.w3.org/ns/formats/N-Quad
The media type of N-Quads is application/n-quads
.
The content encoding of N-Quads is always UTF-8.
See N-Quads Media Type for the media type
registration form.
A N-Quads document is a Unicode[UNICODE] character string encoded in UTF-8. Unicode codepoints only in the range U+0 to U+10FFFF inclusive are allowed.
White space (tab U+0009
or space U+0020
) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. White space is significant in the production STRING_LITERAL_QUOTE.
Comments in N-Quads take the form of '#
', outside an IRIREF
or STRING_LITERAL_QUOTE
, and continue to the end of line (EOL
) or end of file if there is no end of line after the comment marker. Comments are treated as white space.
The EBNF used here is defined in XML 1.0 [EBNF-NOTATION].
Escape sequence rules are the same as Turtle
[turtle]. However, as only the STRING_LITERAL_QUOTE
production is allowed new lines in literals MUST be escaped.
[1] | nquadsDoc |
::= | statement? (EOL statement)* EOL? |
[2] | statement |
::= | subject predicate object graphLabel? '. ' |
[3] | subject |
::= | IRIREF | BLANK_NODE_LABEL |
[4] | predicate |
::= | IRIREF |
[5] | object |
::= | IRIREF | BLANK_NODE_LABEL | literal |
[6] | graphLabel |
::= | IRIREF |
[7] | literal |
::= | STRING_LITERAL_QUOTE ('^^ ' IRIREF | '@ ' LANG)? |
Productions for terminals | |||
[144s] | LANGTAG |
::= | '@ ' [a-zA-Z ]+ ('- ' [a-zA-Z0-9 ]+ )* |
[8] | EOL |
::= | [#xD#xA ]+ |
[10] | IRIREF |
::= | '< ' ([^#x00-#x20<>"{}|^`\ ] | UCHAR)* '> ' |
[11] | STRING_LITERAL_QUOTE |
::= | '" ' ([^#x22#x5C#xA#xD ] | ECHAR | UCHAR)* '" ' |
[141s] | BLANK_NODE_LABEL |
::= | '_: ' (PN_CHARS_U | [0-9 ]) ((PN_CHARS | '. ')* PN_CHARS)? |
[12] | UCHAR |
::= | '\u ' HEX HEX HEX HEX | '\U ' HEX HEX HEX HEX HEX HEX HEX HEX |
[153s] | ECHAR |
::= | '\ ' [tbnrf"' ] |
[157s] | PN_CHARS_BASE |
::= | [A-Z ] | [a-z ] | [#x00C0-#x00D6 ] | [#x00D8-#x00F6 ] | [#x00F8-#x02FF ] | [#x0370-#x037D ] | [#x037F-#x1FFF ] | [#x200C-#x200D ] | [#x2070-#x218F ] | [#x2C00-#x2FEF ] | [#x3001-#xD7FF ] | [#xF900-#xFDCF ] | [#xFDF0-#xFFFD ] | [#x10000-#xEFFFF ] |
[158s] | PN_CHARS_U |
::= | PN_CHARS_BASE | '_ ' | ': ' |
[160s] | PN_CHARS |
::= | PN_CHARS_U | '- ' | [0-9 ] | #x00B7 | [#x0300-#x036F ] | [#x203F-#x2040 ] |
[162s] | HEX |
::= | [0-9 ] | [A-F ] | [a-f ] |
Parsing N-Quads requires a state of one item:
bnodeLabels
— A mapping from string to blank node.This table maps productions and lexical tokens to RDF terms
or components of RDF terms
listed in section 5. Parsing:
production | type | procedure |
---|---|---|
IRIREF | IRI | The characters between "<" and ">" are taken, with the numeric escape sequences unescaped, to form the unicode string of the IRI. |
STRING_LITERAL_QUOTE | lexical form | The characters between the outermost '"'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
LANGTAG | language tag | The characters following the @ form the unicode string of the language tag. |
literal | literal | The literal has a lexical form of the first rule argument, STRING_LITERAL_QUOTE , and either a language tag of LANGTAG or a datatype IRI of iri , depending on which rule matched the input. if neither a language tag nor a datatype IRI is provided, the literal has a datatype of xsd:string . |
BLANK_NODE_LABEL | blank node | The string matching the second argument, PN_LOCAL , is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated. |
A N-Quad document defines an RDF dataset composed of RDF graphs composed of a set of RDF triples. The statement
production produces a triple defined by the terms constructed for subject
, predicate
and object
. This RDF triple is added to the graph labeled by the production graphLabel
, if no graphLabel
is present the triple is added to the RDF datasets default graph.
The Internet Media Type / MIME Type for N-Quads is "application/n-quads".
It is recommended that N-Quads files have the extension ".nq" (all lowercase) on all platforms.
It is recommended that N-Quads files stored on Macintosh HFS file systems be given a file type of "TEXT".
This information that follows will be submitted to the IESG for review, approval, and registration with IANA.