Re-worked sections 5.11.3 and 5.11.4 (Normalization and Node Relabeling).
authorManu Sporny <msporny@digitalbazaar.com>
Sun, 14 Aug 2011 19:27:20 -0400
changeset 158 df64cda434cf
parent 157 1f7865f8e089
child 159 a69dc8a1f0a3
Re-worked sections 5.11.3 and 5.11.4 (Normalization and Node Relabeling).
spec/latest/index.html
--- a/spec/latest/index.html	Sun Aug 14 12:44:49 2011 -0400
+++ b/spec/latest/index.html	Sun Aug 14 19:27:20 2011 -0400
@@ -2226,6 +2226,14 @@
 <section>
 <h3>Normalization Algorithm Terms</h3>
  <dl>
+   <dt><tdef>label</tdef></dt>
+   <dd>
+     The subject IRI associated with a graph node. The subject IRI is expressed
+     using a key-value pair in a <tref>JSON object</tref> where the key is
+     <code>@subject</code> and the value is a string that is an IRI or 
+     a <tref>JSON object</tref> containing the key <code>@iri</code> and
+     a value that is a string that is an IRI.
+   </dd>
    <dt><tdef>list of expanded nodes</tdef></dt>
    <dd>
      A list of all nodes in the <tref>JSON-LD input</tref> graph containing no
@@ -2238,25 +2246,6 @@
      second nodes or values being examined in an algorithm. The names are
      merely used to refer to each input value to a comparison algorithm.
    </dd>
-   <dt><tdef>labeling prefix</tdef></dt>
-   <dd>
-     A random base string that starts with the characters <code>_:</code>, is 
-     not used by any other node's subject IRI in the <tref>JSON-LD input</tref>, 
-     and does not start with the characters
-     <code>_:c14n</code>. The prefix is used to temporarily name nodes during
-     the normalization algorithm in a way that doesn't collide with the
-     names that already exist as well as the names that will be generated by the
-     normalization algorithm. To prevent further collisions, an incrementing
-     <tdef>labeling counter</tdef> is appended to the base string. For
-     example, <code>_:j8r3k</code> is a proper <tref>labeling prefix</tref>.
-   </dd>
-   <dt><tdef>deterministic labeling prefix</tdef></dt>
-   <dd>
-     A base string with the value <code>_:c14n</code>. The prefix is used to 
-     deterministically name nodes during the normalization algorithm. The
-     <tdef>deterministic labeling counter</tdef> is appended to the base string
-     to ensure that there are no collisions.
-   </dd>
    <dt><tdef>renaming counter</tdef></dt>
    <dd>
      A counter that is used during the 
@@ -2278,77 +2267,139 @@
 </section>
 
 <section>
+<h3>Normalization State</h3>
+
+<p>When performing the steps required by the normalization algorithm, 
+it is helpful to track the many pieces of information in a
+data structure called the <tdef>normalization state</tdef>. The information
+contained in the <tref>normalization state</tref> is described below.</p>
+
+<dl>
+   <dt><tdef>labeling prefix</tdef></dt>
+   <dd>
+     A random base string that starts with the characters <code>_:</code>, is 
+     not used by any other node's <tref>label</tref> in the <tref>JSON-LD input</tref>, 
+     and does not start with the characters
+     <code>_:c14n</code>. The prefix is used to temporarily name nodes during
+     the normalization algorithm in a way that doesn't collide with the
+     names that already exist as well as the names that will be generated by the
+     normalization algorithm. To prevent further collisions, an incrementing
+     <tref>labeling counter</tref> is appended to the base string. For
+     example, <code>_:j8r3k</code> is a proper <tref>labeling prefix</tref>.
+   </dd>
+   <dt><tdef>deterministic labeling prefix</tdef></dt>
+   <dd>
+     A base string with the value <code>_:c14n</code>. The prefix is used to 
+     deterministically name nodes during the normalization algorithm. The
+     <tref>deterministic labeling counter</tref> is appended to the base string
+     to ensure that there are no collisions.
+   </dd>
+   <dt><tdef>labeling counter</tdef></dt>
+   <dd>
+     A counter that is used to re-label nodes that are not labeled, or that
+     conflict with the canonicalization naming scheme.
+   </dd>
+   <dt><tdef>deterministic labeling counter</tdef></dt>
+   <dd>
+     A counter this is used to re-label nodes in a deterministic manner for
+     the normalization output.
+   </dd>
+   <dt><tdef>map of flattened nodes</tdef></dt>
+   <dd>
+     A map containing a representation of all nodes in the graph where the
+     key is a node <tref>label</tref> and the value is a single
+     <tref>JSON object</tref> that has no nested sub-objects 
+     and has had all properties for the same node merged into a single
+     <tref>JSON object</tref>.
+   </dd>
+   <dt><tdef>incoming map</tdef></dt>
+   <dd>
+     Relates graph node IRIs to every other node that refers to them in the 
+     graph. For example, if a node <tref>alpha</tref> refers to a
+     node <tref>beta</tref> via a property, the key in the 
+     <tref>incoming map</tref> is the <tref>label</tref> for <tref>beta</tref> and 
+     the value is an array containing at least node <tref>alpha</tref>.
+   </dd>
+   <dt><tdef>outgoing map</tdef></dt>
+   <dd>
+     Relates graph node IRIs to the target nodes that they reference. For 
+     example, if a node <tref>alpha</tref> refers to a node <tref>beta</tref> 
+     via a property, the key in the <tref>outgoing map</tref> is the subject 
+     IRI of <tref>alpha</tref> and the value is an array containing at least
+     <tref>beta</tref>.
+   </dd>
+</dl>
+
+</section>
+
+<section>
 <h3>Normalization Algorithm</h3>
 
 <p>The normalization algorithm expands the <tref>JSON-LD input</tref>,
 flattens the data structure, and creates an initial set of names for all
 nodes in the graph. The flattened data structure is then processed by a
 node labeling algorithm in order to get a fully expanded and named list of
-nodes which is then sorted.
+nodes which is then sorted. The result is a deterministically named and
+ordered list of graph nodes.
 </p>
 
 <ol class="algorithm">
+<li>Create a <tref>normalization state</tref> and initialize all array
+  values to an empty array, all map values to an empty map, all
+  string values to an empty string and all number values to <code>0</code>.</li>
 <li>Expand the <tref>JSON-LD input</tref> according to the steps in
 the <a href="#expansion-algorithm">Expansion Algorithm</a> and store the
 result as the <strong>expanded input</strong>.</li>
 <li>Initialize the <tref>labeling prefix</tref> to a value that doesn't
-collide with any other subject IRI prefix in the graph. Initialize the
+collide with any other <tref>label</tref> prefix in the graph. Initialize the
 <tref>labeling counter</tref> to 1.
-<li>Create an empty <tdef>map of expanded nodes</tdef> and recursively 
-process every <tdef>expanded node</tdef> in the <strong>expanded input</strong>
-in depth-first order:
+<li>Initialize the <tref>map of flattened nodes</tref> by recursively 
+processing every <tdef>expanded node</tdef> in the 
+<strong>expanded input</strong> in depth-first order:
   <ol class="algorithm">
     <li>If the <tref>expanded node</tref> is an unlabeled node, add a 
-      new key-value pair 
+      new key-value pair to the <tref>expanded node</tref>
       where the key is <code>@subject</code> and the value is the 
       concatenation of the <tref>labeling prefix</tref> 
-      and the string value of the <tref>labeling counter</tref>.</li>
+      and the string value of the <tref>labeling counter</tref>.
+      Increment the <tref>labeling counter</tref>.</li>
     <li>Add the <tref>expanded node</tref> to the 
-      <tref>map of expanded nodes</tref>:
+      <tref>map of flattened nodes</tref>:
       <ol class="algorithm">
-        <li>If the <tref>expanded node</tref>'s subject IRI is already in the 
-          <tref>map of expanded nodes</tref> merge all properties from the 
-          entry in the <tref>map of expanded nodes</tref> into the 
+        <li>If the <tref>expanded node</tref>'s <tref>label</tref> is already 
+          in the 
+          <tref>map of flattened nodes</tref> merge all properties from the 
+          entry in the <tref>map of flattened nodes</tref> into the 
           <tref>expanded node</tref>.</li>
         <li>Go through every property associated with an array in the 
           <tref>expanded node</tref> and remove any duplicate IRI entries from
-          the array. If resulting array only has one IRI entry, change it 
+          the array. If the resulting array only has one IRI entry, change it 
           from an array to an object.</li>
-        <li>Set the entry for the <tref>expanded node</tref>'s subject IRI
-          in the <tref>map of expanded nodes</tref> to the
+        <li>Set the entry for the <tref>expanded node</tref>'s <tref>label</tref>
+          in the <tref>map of flattened nodes</tref> to the
           <tref>expanded node</tref>.
         </li></ol></li>
-    <li>Replace the reference to the node in the 
-      <strong>expanded input</strong> with an object containing a single
+    <li>After exiting the recursive step, replace the reference to the
+      <tref>expanded node</tref> with an object containing a single
        key-value pair where the key is <code>@iri</code> and the value is
        the value of the <code>@subject</code> key in the node.</li>
   </ol></li>
-<li>Create a <tdef>outgoing map</tdef> that relates graph
-  node IRIs to the targets nodes that they reference. For example, if a 
-  node <tref>alpha</tref> refers to a node <tref>beta</tref> via a property, 
-  the key in the <tref>outgoing map</tref> is the subject IRI of 
-  <tref>alpha</tref> and the value is an array containing at least
-  <tref>beta</tref>.
-  <ol class="algorithm">
-    <li>Add all outgoing references for every node in the graph.</li>
-  </ol></li>
-<li>Create a <tdef>incoming map</tdef> that relates graph node IRIs
-  to every other node that refers to them in the graph.
-  For example, if a node <tref>alpha</tref> refers to a
-  node <tref>beta</tref> via a property, the key in the 
-  <tref>incoming map</tref> is the subject IRI for <tref>beta</tref> and 
-  the value is an array containing at least <tref>alpha</tref>.
-  <ol class="algorithm">
-    <li>Add all incoming references for every node in the graph.</li>
-  </ol></li>
-
-<li>For every entry in the <tref>map of expanded nodes</tref> that has a 
-subject IRI that begins with <code>_:c14n</code>, relabel the subject IRI
+<li>Initialize the <tref>outgoing map</tref> by iterating over every object
+  associated with a property in the node, and if it is another node,
+  adding the association to the list of nodes associated with the current
+  node.</li>
+<li>Initialize the <tref>incoming map</tref> by iterating over every 
+  node in the graph and mapping its <tref>label</tref> to the list of all
+  other nodes that refer to it.</li>
+<li>For every entry in the <tref>map of flattened nodes</tref> that has a 
+<tref>label</tref> that begins with <code>_:c14n</code>, relabel the <tref>label</tref>
 using the steps in the 
 <a href="#node-relabeling-algorithm">Node Relabeling Algorithm</a> with the
-following parameters; <tref>naming base string</tref> should be the value
-of <tref>labeling prefix</tref> and <tref>renaming counter</tref> should be the
-value of <tref>labeling counter</tref>.
+following parameters; the <tref>normalization state</tref>,
+the <tref>old label</tref> should be the value of <tref>label</tref>,
+the <tref>naming base string</tref> should be the value
+of <tref>labeling prefix</tref> and the <tref>renaming counter</tref> should be 
+the value of <tref>labeling counter</tref>.
 <li>Label all of the nodes that contain a <code>@subject</code> key associated
 with a value starting with <code>_:</code> according to the steps in the 
 <a href="#deterministic-labeling-algorithm">Deterministic Labeling Algorithm</a>.
@@ -2359,24 +2410,40 @@
 <section>
 <h4>Node Relabeling Algorithm</h4>
 
-<p>The node relabeling algorithm takes a node containing an 
-<tdef>old label</tdef> that is the subject IRI, a 
-<tdef>naming base string</tdef>, a <tref>renaming counter</tref>, and the 
-<tref>incoming map</tref> and <tref>outgoing map</tref>.
-</p>
+<p>This algorithm renames a node by generating a unique 
+<tdef>new label</tdef> and updating all references to that <tref>label</tref>
+in the <tref>incoming map</tref> and the <tref>outgoing map</tref>. The 
+following parameters must be given as input to the algorithm.
+
+<dl class="algorithm">
+  <dt><tref>normalization state</tdef></dt>
+    <dd>A reference to the current normalization state.</dd>
+  <dt><tdef>old label</tdef></dt>
+  <dd>The string value to use when performing look-ups on the
+    <tref>incoming map</tref> and the <tref>outgoing map</tref>.</dd>
+  <dt><tdef>naming base string</tdef></dt>
+  <dd>The base string to utilize when generating the <tref>new label</tref>.
+    </dd>
+  <dt><tdef>renaming counter</tdef></dt>
+  <dd>The counter that ensures uniqueness for all <tref>new label</tref>s that 
+    are generated. This value must be passed in by reference as changes
+    to it must be reflected elsewhere in the normalization algorithm.</dd>
+</dl>
+
+<p>The node relabeling algorithm is as follows:</p>
 
 <ol class="algorithm">
   <li>Generate the <tdef>new label</tdef> by concatenating the
     <tref>naming base string</tref> with the string value of the 
-    <tref>renaming counter</tref>. Add <code>1</code>
-    to the <tref>renaming counter</tref>, ensuring to preserve the latest 
-    value of the counter across multiple calls to this algorithm.
+    <tref>renaming counter</tref>. Increment the <tref>renaming counter</tref>
+    by one.
+    </li>
   <li>For each node that is associated with the <tref>old label</tref> in the
     <tref>incoming map</tref>, update all properties that reference the
     <tref>old label</tref> with the <tref>new label</tref>.</li>
   <li>Change the <tref>old label</tref> keys in the <tref>incoming map</tref> 
     and the <tref>outgoing map</tref> to the <tref>new label</tref>.</li>
-  <li>Set the node's subject IRI to the <tref>new label</tref>.</li>
+  <li>Return the <tref>new label</tref> as the result.</li>
 </ol>
 </section>
 
@@ -2384,7 +2451,7 @@
 <h4>Deterministic Labeling Algorithm</h4>
 
 <p>The deterministic labeling algorithm takes the 
-<tref>map of expanded nodes</tref>
+<tref>map of flattened nodes</tref>
 and produces a sorted list of deterministically named and expanded nodes from
 the graph.
 
@@ -2393,7 +2460,7 @@
     <code>1</code> and
     the <tdef>sorted list of expanded nodes</tdef> to an empty array.</li>
   <li>Place a reference to each of the nodes in the 
-   <tref>map of expanded nodes</tref> into a 
+   <tref>map of flattened nodes</tref> into a 
    <tref>list of expanded nodes</tref>.
   <li>Move every node with an IRI that does not start with <code>_:</code>
     from the <tref>list of expanded nodes</tref> into the 
@@ -2445,7 +2512,7 @@
            <tref>outgoing serialization map</tref>.</li>
          <li>For each <tref>old label</tref> in the
            <tref>list of old labels</tref> get the node from the 
-           <tref>map of expanded nodes</tref> and relabel it according to the
+           <tref>map of flattened nodes</tref> and relabel it according to the
            <a href="#node-relabeling-algorithm">Node Relabeling Algorithm</a>
            with the following parameters; 
            <tref>naming base string</tref> is the value of the
@@ -2466,7 +2533,7 @@
            <tref>incoming serialization map</tref>.</li>
          <li>For each <tref>old label</tref> in the
            <tref>list of old labels</tref> get the node from the 
-           <tref>map of expanded nodes</tref> and relabel it according to the
+           <tref>map of flattened nodes</tref> and relabel it according to the
            <a href="#node-relabeling-algorithm">Node Relabeling Algorithm</a>
            with the following parameters; 
            <tref>naming base string</tref> is the value of the
@@ -2491,7 +2558,7 @@
        entry in the <tref>map of incoming serialization maps</tref>.
      </li>
      <li>
-       Remove each node with a subject IRI that starts with
+       Remove each node with a <tref>label</tref> that starts with
        <code>_:c14n</code> from the <tref>list of expanded nodes</tref> and
        add it to the <tref>sorted list of expanded nodes</tref>.
        </ol></li>
@@ -2625,7 +2692,7 @@
   <li>Compare incoming and outgoing edges for each node, updating the
     serialization maps as each node is processed:
     <ol class="algorithm">
-      <li>If the subject IRI for <tref>alpha</tref> does not exist in the
+      <li>If the <tref>label</tref> for <tref>alpha</tref> does not exist in the
         <tref>outgoing serialization map</tref>, generate the serialization by
         following the steps in the 
         <a href="#node-serialization-algorithm">Node Serialization Algorithm</a>.
@@ -2634,7 +2701,7 @@
         the <tref>outgoing serialization map</tref>,
         the <tref>serialized labels map</tref> and the 
         <tref>list of unserialized labels</tref> to the algorithm as inputs.       
-      <li>If the subject IRI for <tref>beta</tref> does not exist in the
+      <li>If the <tref>label</tref> for <tref>beta</tref> does not exist in the
         <tref>outgoing serialization map</tref>, generate the serialization by
         following the steps in the 
         <a href="#node-serialization-algorithm">Node Serialization Algorithm</a>.
@@ -2648,7 +2715,7 @@
         for <tref>beta</tref> in the <tref>outgoing serialization map</tref>, 
         then <tref>alpha</tref> is first. If the lexicographical value is less, 
         then <tref>beta</tref> is first.</li>
-      <li>If the subject IRI for <tref>alpha</tref> does not exist in the
+      <li>If the <tref>label</tref> for <tref>alpha</tref> does not exist in the
         <tref>incoming serialization map</tref>, generate the serialization by
         following the steps in the 
         <a href="#node-serialization-algorithm">Node Serialization Algorithm</a>.
@@ -2657,7 +2724,7 @@
         the <tref>incoming serialization map</tref>,
         the <tref>serialized labels map</tref> and the 
         <tref>list of unserialized labels</tref> to the algorithm as inputs.       
-      <li>If the subject IRI for <tref>beta</tref> does not exist in the
+      <li>If the <tref>label</tref> for <tref>beta</tref> does not exist in the
         <tref>incoming serialization map</tref>, generate the serialization by
         following the steps in the 
         <a href="#node-serialization-algorithm">Node Serialization Algorithm</a>.
@@ -2683,8 +2750,7 @@
 a <tref>processed labels map</tref>, a
 <tdef>serialization map</tdef>, a <tref>serialized labels map</tref>,
 and a <tref>list of unserialized labels</tref> as inputs and generates
-deterministic serializations for all unlabeled nodes in the graph. The
-node's subject IRI will be called the <tdef>label</tdef> in this algorithm.
+deterministic serializations for all unlabeled nodes in the graph.
 </p>
 
 <ol class="algorithm">
@@ -2699,7 +2765,7 @@
   passing the <tref>label</tref> and the <tref>mapping count</tref> as
   parameters to the algorithm.</li>
 <li>For every property in the node that references a target node with a 
-subject IRI that starts with <code>_:</code> 
+<tref>label</tref> that starts with <code>_:</code> 
 (the <tdef>target node label</tdef>):
   <ol class="algorithm">
     <li>Look up the <tref>target node label</tref> in the