Add parsing to TriG
authorGavin Carothers <gavin@carothers.name>
Tue, 02 Jul 2013 14:46:00 -0700
changeset 884 575c6d893230
parent 883 b8994acd28ac
child 885 4dc2ae051935
child 890 bf271955ba47
Add parsing to TriG
trig/index.html
--- a/trig/index.html	Tue Jul 02 14:29:53 2013 -0700
+++ b/trig/index.html	Tue Jul 02 14:46:00 2013 -0700
@@ -443,12 +443,102 @@
 
 			</section>
 		</section>
-		<section id="sec-parsing">
-			<h2>Parsing</h2>
-			<p class="issue">
-				Define a method of parsing that treats each graph statement as a Turtle document. Union any graph statements that have the same label, or if they don't have a label merge to form the default graph.
-			</p>
-		</section>
+        <section id="sec-parsing"> 
+          <h2>Parsing</h2>
+
+          <p>The <a href="../rdf-concepts/index.html">RDF Concepts and Abstract Syntax</a> ([[!RDF-CONCEPTS]]) specification defines three types of <em>RDF Term</em>:
+
+          <a href="../rdf-concepts/index.html#dfn-iri">IRIs</a>,
+          <a href="../rdf-concepts/index.html#dfn-literal">literals</a> and
+          <a href="../rdf-concepts/index.html#dfn-blank-node">blank nodes</a>.
+          Literals are composed of a <a href="../rdf-concepts/index.html#dfn-lexical-form">lexical form</a> and an optional <a href="../rdf-concepts/index.html#dfn-language-tag">language tag</a> [[!BCP47]] or datatype IRI.
+          An extra type, <code id="prefix" class="dfn">prefix</code>, is used during parsing to map string identifiers to namespace IRIs.
+
+          This section maps a string conforming to the grammar in <a href="#sec-grammar-grammar" class="sectionRef"></a> to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g. language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.</p>
+          <section id="sec-parsing-state">
+          <h3>Parser State</h3>
+
+              <p>Parsing TriG requires a state of six items:</p>
+
+              <ul>
+                <li id="baseURI">IRI <code class="dfn">baseURI</code> — When the <a href="#grammar-production-base">base production</a> is reached, the second rule argument, <code>IRIREF</code>, is the base URI used for relative IRI resolution <span class="testrefs">(test: <a href="tests/#base1">base1</a> <a href="tests/#base2">base2</a>)</span>.</li>
+
+                <li id="namespaces">Map[<a class="type prefix" href="#prefix">prefix</a> -&gt; IRI] <code class="dfn">namespaces</code> — The second and third rule arguments (<code>PNAME_NS</code> and <code>IRIREF</code>) in the <a href="#grammar-production-prefixID">prefixID production</a> assign a namespace name (<code>IRIREF</code>) for the prefix (<code>PNAME_NS</code>). Outside of a <code>prefixID</code> production, any <code>PNAME_NS</code> is substituted with the namespace <span class="testrefs">(test: <a href="tests/#prefix1">prefix1</a> <!-- a href="tests/#escapedPrefix1">escapedPrefix1</a --> <a href="tests/#escapedNamespace1">escapedNamespace1</a>)</span>. Note that the prefix may be an empty string, per the <code>PNAME_NS,</code> production: <code>(PN_PREFIX)? ":"</code> <span class="testrefs">(test: <a href="tests/#default1">default1</a>)</span>.</li>
+
+                <li id="bnodeLabels">Map[<a class="type string">string</a> -&gt; <a href="../rdf-concepts/index.html#dfn-blank-node">blank node</a>] <code class="dfn">bnodeLabels</code> — A mapping from string to blank node.</li>
+                <li id="curSubject">RDF_Term <code class="dfn">curSubject</code> — The <code class="curSubject">curSubject</code> is bound to the <code><a href="#grammar-production-subject">subject</a></code> production.</li>
+
+                <li id="curPredicate">RDF_Term <code class="dfn">curPredicate</code> — The <code class="curPredicate">curPredicate</code> is bound to the <code><a href="#grammar-production-verb">verb</a></code> production. If token matched was "<code>a</code>", <code class="curPredicate">curPredicate</code> is bound to the IRI <code>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</code> <span class="testrefs">(test: <a href="tests/#type">type</a>)</span>.</li>
+
+                <li id="curGraph">RDF_Term <code class="dfn">curGraph</code> — The <code class="curGraph">curGraph</code> is bound to the <code><a href="#grammar-production-graphName">graphName</a></code> production.</li>
+
+              </ul>
+          </section>
+          <section  id="sec-parsing-terms">
+          <h3>RDF Term Constructors</h3>
+
+              <p>This table maps productions and lexical tokens to <code>RDF terms</code> or components of <code>RDF terms</code> listed in <a href="#sec-parsing" class="sectionRef"></a>:</p>
+
+              <table class="separated">
+                <thead>
+              <tr>                                  <th>                                                                       production               </th><th>                                                                                       type            </th><th>procedure</th></tr>
+                </thead>
+                <tbody>
+              <tr id="handle-IRIREF"    ><td style="text-align:left;"><a class="type IRI"         href="#grammar-production-IRIREF"               >IRIREF               </a></td><td><a href="../rdf-concepts/index.html#dfn-iri">      IRI         </a></td><td>The characters between "&lt;" and "&gt;" are taken, with the <a href="#numeric">numeric escape sequences</a> unescaped, to form the unicode string of the IRI. Relative IRI resolution is performed per <a href="#sec-iri-references" class="sectionRef">Section 6.3</a>.</td></tr>
+              <tr id="handle-PNAME_NS"    ><td style="text-align:left;" rowspan="2"><a class="type string" href="#grammar-production-PNAME_NS"      >PNAME_NS             </a></td><td><a href="#prefix">                                 prefix      </a></td><td>When used in a <a href="#grammar-production-prefixID">prefixID</a> or <a href="#grammar-production-sparqlPrefix">sparqlPrefix</a> production, the <code>prefix</code> is the potentially empty unicode string matching the first argument of the rule is a key into the <a href="#namespaces">namespaces map</a>.</td></tr>
+              <tr id="handle-PNAME_NS"    >                                                                                                                                        <td><a href="../rdf-concepts/index.html#dfn-iri">      IRI         </a></td><td>When used in a <a href="#grammar-production-PrefixedName">PrefixedName</a> production, the <code>iri</code> is the value in the <a href="#namespaces">namespaces map</a> corresponding to the first argument of the rule.</td></tr>
+              <tr id="handle-PNAME_LN"    ><td style="text-align:left;"><a class="type IRI"         href="#grammar-production-PNAME_LN"             >PNAME_LN             </a></td><td><a href="../rdf-concepts/index.html#dfn-iri">      IRI         </a></td><td>A potentially empty <a href="#prefix">prefix</a> is identified by the first sequence, <code>PNAME_NS</code>. The <a href="#namespaces">namespaces map</a> <em class="rfc2119">MUST</em> have a corresponding <code>namespace</code>. The unicode string of the IRI is formed by unescaping the <a href="#reserved">reserved characters</a> in the second argument, <code>PN_LOCAL</code>, and concatenating this onto the <code>namespace</code>.</td></tr>
+              <!-- tr id="handle-PrefixedName"><td style="text-align:left;"><a class="type IRI"         href="#grammar-production-PrefixedName"         >PrefixedName         </a></td><td><a href="../rdf-concepts/index.html#dfn-iri">      IRI         </a></td><td>.</td></tr -->
+              <tr id="handle-STRING_LITERAL_SINGLE_QUOTE"         ><td style="text-align:left;"><a class="type lexicalForm" href="#grammar-production-STRING_LITERAL_SINGLE_QUOTE"      >STRING_LITERAL_SINGLE_QUOTE      </a></td><td><a href="../rdf-concepts/index.html#dfn-lexical-form">                         lexical form</a></td><td>The characters between the outermost "'"s   are taken, with <a href="#numeric">numeric</a> and <a href="#string">string</a> escape sequences unescaped, to form the unicode string of a lexical form.</td></tr>
+              <tr id="handle-STRING_LITERAL_QUOTE"         ><td style="text-align:left;"><a class="type lexicalForm" href="#grammar-production-STRING_LITERAL_QUOTE"      >STRING_LITERAL_QUOTE      </a></td><td><a href="../rdf-concepts/index.html#dfn-lexical-form">                         lexical form</a></td><td>The characters between the outermost '"'s   are taken, with <a href="#numeric">numeric</a> and <a href="#string">string</a> escape sequences unescaped, to form the unicode string of a lexical form.</td></tr>
+              <tr id="handle-STRING_LITERAL_LONG_SINGLE_QUOTE"    ><td style="text-align:left;"><a class="type lexicalForm" href="#grammar-production-STRING_LITERAL_LONG_SINGLE_QUOTE" >STRING_LITERAL_LONG_SINGLE_QUOTE </a></td><td><a href="../rdf-concepts/index.html#dfn-lexical-form">                         lexical form</a></td><td>The characters between the outermost "'''"s are taken, with <a href="#numeric">numeric</a> and <a href="#string">string</a> escape sequences unescaped, to form the unicode string of a lexical form.</td></tr>
+              <tr id="handle-STRING_LITERAL_LONG_QUOTE"    ><td style="text-align:left;"><a class="type lexicalForm" href="#grammar-production-STRING_LITERAL_LONG_QUOTE" >STRING_LITERAL_LONG_QUOTE </a></td><td><a href="../rdf-concepts/index.html#dfn-lexical-form">                         lexical form</a></td><td>The characters between the outermost '"""'s are taken, with <a href="#numeric">numeric</a> and <a href="#string">string</a> escape sequences unescaped, to form the unicode string of a lexical form.</td></tr>
+              <tr id="handle-LANGTAG"                 ><td style="text-align:left;"><a class="type langTag"     href="#grammar-production-LANGTAG"              >LANGTAG              </a></td><td><a href="../rdf-concepts/index.html#dfn-language-tag">language tag</a></td><td>The characters following the <code>@</code> form the unicode string of the language tag.</td></tr>
+              <tr id="handle-RDFLiteral"              ><td style="text-align:left;"><a class="type literal"     href="#grammar-production-RDFLiteral"           >RDFLiteral           </a></td><td><a href="../rdf-concepts/index.html#dfn-literal">            literal     </a></td><td>The literal has a lexical form of the first rule argument, <code>String</code>, and either a language tag of <code>LANGTAG</code> or a datatype IRI of <code>iri</code>, depending on which rule matched the input. if neither a language tag nor a datatype IRI is provided, the literal has a datatype of <code>xsd:string</code>.</td></tr>
+              <tr id="handle-INTEGER"                 ><td style="text-align:left;"><a class="type integer"     href="#grammar-production-INTEGER"              >INTEGER              </a></td><td><a href="../rdf-concepts/index.html#dfn-literal">            literal     </a></td><td>The literal has a lexical form of the input string, and a datatype of <code>xsd:integer</code>.</td></tr>
+              <tr id="handle-DECIMAL"                 ><td style="text-align:left;"><a class="type decimal"     href="#grammar-production-DECIMAL"              >DECIMAL              </a></td><td><a href="../rdf-concepts/index.html#dfn-literal">            literal     </a></td><td>The literal has a lexical form of the input string, and a datatype of <code>xsd:decimal</code>.</td></tr>
+              <tr id="handle-DOUBLE"                  ><td style="text-align:left;"><a class="type double"      href="#grammar-production-DOUBLE"               >DOUBLE               </a></td><td><a href="../rdf-concepts/index.html#dfn-literal">            literal     </a></td><td>The literal has a lexical form of the input string, and a datatype of <code>xsd:double</code>.</td></tr>
+              <tr id="handle-BooleanLiteral"          ><td style="text-align:left;"><a class="type boolean"     href="#grammar-production-BooleanLiteral"       >BooleanLiteral       </a></td><td><a href="../rdf-concepts/index.html#dfn-literal">            literal     </a></td><td>The literal has a lexical form of the <code>true</code> or <code>false</code>, depending on which matched the input, and a datatype of <code>xsd:boolean</code>.</td></tr>
+              <tr id="handle-BLANK_NODE_LABEL"        ><td style="text-align:left;"><a class="type bNode"       href="#grammar-production-BLANK_NODE_LABEL"     >BLANK_NODE_LABEL     </a></td><td><a href="../rdf-concepts/index.html#dfn-blank-node">         blank node  </a></td><td>The string matching the second argument, <code>PN_LOCAL</code>, is a key in <a href="#bnodeLabels">bnodeLabels</a>. If there is no corresponding blank node in the map, one is allocated.</td></tr>
+              <tr id="handle-ANON"                    ><td style="text-align:left;"><a class="type bNode"       href="#grammar-production-ANON"                 >ANON                 </a></td><td><a href="../rdf-concepts/index.html#dfn-blank-node">         blank node  </a></td><td>A blank node is generated.</td></tr>
+              <tr id="handle-blankNodePropertyList"   ><td style="text-align:left;"><a class="type bNode"       href="#grammar-production-blankNodePropertyList">blankNodePropertyList</a></td><td><a href="../rdf-concepts/index.html#dfn-blank-node">         blank node  </a></td><td>A blank node is generated. Note the rules for <code>blankNodePropertyList</code> in the next section.</td></tr>
+              <tr id="handle-collection"              ><td style="text-align:left;" rowspan="2"><a class="type bNode"       href="#grammar-production-collection"           >collection           </a></td><td><a href="../rdf-concepts/index.html#dfn-blank-node">         blank node  </a></td><td>For non-empty lists, a blank node is generated. Note the rules for <code>collection</code> in the next section.</td></tr>
+              <tr id="handle-collection"              >                                                                                                                                                    <td><a href="../rdf-concepts/index.html#dfn-iri"       >         IRI         </a></td><td>For empty lists, the resulting IRI is <code>rdf:nil</code>. Note the rules for <code>collection</code> in the next section.</td></tr>
+                </tbody>
+              </table>
+
+          </section>
+          <section id="sec-parsing-triples">
+          <h3>RDF Triples Constructors</h3>
+              <p>
+		A TriG document defines an RDF Dataset composed of a set of <a href="../rdf-concepts/index.html#dfn-rdf-graph">RDF graph</a>s composed of set of <a href="../rdf-concepts/index.html#dfn-rdf-triple">RDF triple</a>s.
+
+		The <code><a href="#grammar-production-subject">graph</a></code> production sets the <code class="curGraph">curGraph</code> to the term produced by <code><a href="#grammar-production-graphName">graphName</a></code>, if there is no <code>graphName</code>, the <code class="curGraph">curGraph</code> is set to the default graph. 
+
+
+		The <code><a href="#grammar-production-subject">subject</a></code> production sets the <code class="curSubject">curSubject</code>.
+		The <code><a href="#grammar-production-verb">verb</a></code> production sets the <code class="curPredicate">curPredicate</code>.
+		Each <a tabindex="30" class="grammarRef" href="#grammar-production-object">object</a> <code>N</code> in the document produces an RDF triple: <span class="ntriple"><code class="curSubject">curSubject</code> <code class="curPredicate">curPredicate</code> <code>N</code> .</span>
+          </p>
+
+	  <h4 id="propertyList" style="padding-bottom:0; margin-bottom:0;"><span>Property Lists:</span></h4>
+          <p style="padding-top:0; margin-top:0;">
+          Beginning the <code><a href="#grammar-production-blankNodePropertyList">blankNodePropertyList</a></code> production records the <code class="curSubject">curSubject</code> and <code class="curPredicate">curPredicate</code>, and sets <code class="curSubject">curSubject</code> to a novel <code>blank node</code> <code>B</code>.
+          Finishing the <code><a href="#grammar-production-blankNodePropertyList">blankNodePropertyList</a></code> production restores <code class="curSubject">curSubject</code> and <code class="curPredicate">curPredicate</code>.
+          The node produced by matching <code><a href="#grammar-production-blankNodePropertyList">blankNodePropertyList</a></code> is the blank node <code>B</code>.
+
+          </p>
+
+	  <h4 id="collection" style="padding-bottom:0; margin-bottom:0;"><span>Collections:</span></h4>
+          <p style="padding-top:0; margin-top:0;">
+          Beginning the <code><a href="#grammar-production-collection">collection</a></code> production records the <code class="curSubject">curSubject</code> and <code class="curPredicate">curPredicate</code>.
+	  Each <code>object</code> in the <code><a href="#grammar-production-collection">collection</a></code> production has a <code class="curSubject">curSubject</code> set to a novel <code>blank node</code> <code>B</code> and a <code class="curPredicate">curPredicate</code> set to <code>rdf:first</code>.
+          For each object <code>object<sub>n</sub></code> after the first produces a triple:<span class="ntriple"><code>object<sub>n-1</sub></code> <code>rdf:rest</code> <code>object<sub>n</sub></code> .</span>
+          Finishing the <code><a href="#grammar-production-collection">collection</a></code> production creates an additional triple <span class="ntriple"><code>curSubject rdf:rest rdf:nil</code> .</span> and restores <code class="curSubject">curSubject</code> and <code class="curPredicate">curPredicate</code>
+          The node produced by matching <code><a href="#grammar-production-collection">collection</a></code> is the first blank node <code>B</code> for non-empty lists and <code>rdf:nil</code> for empty lists.
+          </p>
+          </section>
+      </section>
       <section id="sec-differences" class="appendix informative">
       	<h2>Differences from previous TriG</h2>
       	<ul>