Merge in N-Triples
authorGavin Carothers <gavin@carothers.name>
Sat, 10 Dec 2011 20:43:42 -0800 (2011-12-11)
changeset 193 1621235cc65b
parent 192 199e7b239562 (current diff)
parent 138 7c5b7a87c25a (diff)
child 194 01805fdf2d0e
Merge in N-Triples
rdf-turtle/index.html
rdf-turtle/n-prime.bnf
--- a/rdf-turtle/index.html	Sat Dec 10 20:32:09 2011 -0800
+++ b/rdf-turtle/index.html	Sat Dec 10 20:43:42 2011 -0800
@@ -934,6 +934,47 @@
           no effect on the parsing of the data blocks.</p>
         </section>
       </section>
+        <!-- BEGIN N-TRIPLES SPEC -->
+
+      <section id="sec-triples">
+        <h2>N′ (N-Triples)</h2>
+        <p class="issue">The RDF WG has not settled on a name for N′, N′ is used to distinguish it from the RDF Test Cases N-Triples.</p>
+        <p class="issue"><a href="http://www.w3.org/2011/rdf-wg/track/issues/4">ISSUE-4</a>: The RDF WG has specified N-Triples Prime to allow UTF-8 characters in IRIs, literals and blank node identifiers. Readers with an opinion about whether or not N-Triples should be ASCII-only may wish to comment.</p>
+        <p>This section defines an easy to parse line-based subset of
+Turtle named N-Triples Prime (<abbr title="N-Triples Prime">N′</abbr>).</p>
+        <p>The syntax is an improved version of N-Triples. N-Triples is a format
+originally defined in the RDF Test Cases [[!RDF-TESTCASES]] document. Its original intent was for
+writing test cases, but it has proven to be popular as a dump format for RDF
+data.</p>
+        <section id="n-triple-changes" class="informative">
+          <h3>Changes from N-Triples</h3>
+          <ul>
+            <li>Default encoding is UTF-8 rather then US-ASCII only
+            <li>Uses IRIs rather then RDF URI References
+            <li>Defines a unique media type <code>text/ntriples+turtle</code>
+            <li>Subset of Turtle rather then Notation 3
+            <li>Comments may occur after a triple production
+          </ul>
+        </section>
+        <section id="n-triples-compatibility"  class="informative">
+          <h3>Compatibility with previous N-Triples</h3>
+          <p>
+          An N-Triples document written using only absolute IRIs is a valid N′ document generating the same triples. N-Triples uses <code>\u</code> escape sequences for characters outside US-ASCII and processing will have turned these into the original character. A N′ document that is serialized into ASCII and uses <code>\u</code> escape sequences for any character outside US-ASCII should be equivalent to a N-Triples document. Any comments may also be removed to avoid changes in the locations that comments are allowed.
+          </p>
+        </section>
+        <section id="n-triple-grammar">
+          <h3>Grammar</h3>
+          <p>A N′ document is a Unicode[[!UNICODE]] character string encoded in UTF-8.
+          Unicode codepoints only in the range U+0 to U+10FFFF inclusive are allowed.</p>
+          <p><a href="#sec-strings">Escape sequence rules</a> are the same as Turtle. However, as only the <code>STRING_LITERAL2</code> production is allowed new lines in literals MUST be escaped.</p>
+          <p class="issue">Current grammar doesn't deal with comments correctly.</p>
+          <p class="issue">Current grammar doesn't deal with triple lines that end in EOF rather then EOL.</p>
+          <pre data-include="n-prime.bnf" data-oninclude="esc">
+          </pre>
+        </section>
+      </section>
+
+<!-- END N-TRIPLES SPEC -->
 
       <section id="sec-compared">
         <h2>Turtle compared</h2>
@@ -941,7 +982,6 @@
       <section id="sec-diff-ntriples" class="informative">
         <h3>Turtle compared to N-Triples (Informative)</h3>
        
-        <p class="issue"><a href="http://www.w3.org/2011/rdf-wg/track/issues/4">ISSUE-4</a>: A future version of this document is expected to define N-Triples. By default, the RDF WG will specify N-Triples to allow UTF-8 characters in IRIs, literals and blank node identifiers. Readers with an opinion about whether or not N-Triples should be ASCII-only may wish to comment.</p>
         <p>All N-Triples files are vaild Turtle documents. Turtle adds the following syntax to N-Triples:</p>
 
         <ol>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/rdf-turtle/n-prime.bnf	Sat Dec 10 20:43:42 2011 -0800
@@ -0,0 +1,45 @@
+primeTriplesDoc         ::= triple? (EOL triple)* EOL?
+triple                  ::= subj pred obj '.'
+subj                    ::= IRI_REF | BLANK_NODE_LABEL
+pred                    ::= IRI_REF 
+obj                     ::= IRI_REF | BLANK_NODE_LABEL | lit
+lit                     ::= STRING_LITERAL2 ('^^' IRI_REF | ('@' LANG) )?
+
+@terminals
+LANG                    ::= [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )*
+
+EOL                     ::= [\r\n]+
+
+/* From Turtle */
+[70s] IRI_REF           ::= '<' ([^<>"{}|^`\]-[#x00-#x20] | UCHAR )* '>'
+[73s] BLANK_NODE_LABEL  ::= "_:" PN_LOCAL 
+[88s] STRING_LITERAL2   ::= '"' ( ( [^\"\\\n\r] ) | ECHAR | UCHAR )* '"' 
+[19]  UCHAR             ::= ( "\\u" HEX HEX HEX HEX ) 
+                          | ( "\\U" HEX HEX HEX HEX HEX HEX HEX HEX ) 
+[91s] ECHAR             ::= "\\" [tbnrf\\\"'] 
+[95s] PN_CHARS_BASE     ::= [A-Z] 
+                          | [a-z] 
+                          | [#00C0-#00D6] 
+                          | [#00D8-#00F6] 
+                          | [#00F8-#02FF] 
+                          | [#0370-#037D] 
+                          | [#037F-#1FFF] 
+                          | [#200C-#200D] 
+                          | [#2070-#218F] 
+                          | [#2C00-#2FEF] 
+                          | [#3001-#D7FF] 
+                          | [#F900-#FDCF] 
+                          | [#FDF0-#FFFD] 
+                          | [#10000-#EFFFF] 
+[96s] PN_CHARS_U        ::= PN_CHARS_BASE 
+                          | "_" 
+[98s] PN_CHARS          ::= PN_CHARS_U 
+                          | "-" 
+                          | [0-9] 
+                          | #00B7 
+                          | [#0300-#036F] 
+                          | [#203F-#2040] 
+[100s] PN_LOCAL         ::= ( PN_CHARS_U | [0-9] ) ( ( PN_CHARS | "." )* PN_CHARS )?
+[162s] HEX              ::= [0-9] | [A-F] | [a-f]
+
+@pass                   ::= [ \t]+ | "#" [^\r\n]* [\r\n]
\ No newline at end of file