Fleshed out top-level normalization algorithm.
authorManu Sporny <msporny@digitalbazaar.com>
Sun, 07 Aug 2011 16:44:00 -0400
changeset 136 b53e36625acf
parent 135 b8ef51a2ab60
child 137 e69616f137d0
Fleshed out top-level normalization algorithm.
spec/latest/index.html
--- a/spec/latest/index.html	Sat Aug 06 23:19:42 2011 -0400
+++ b/spec/latest/index.html	Sun Aug 07 16:44:00 2011 -0400
@@ -2124,14 +2124,40 @@
 
 <p class="issue">This algorithm is a work in progress, do not implement it.</p>
 
-<p></p>
+<p>Normalization is the process of taking <tref>JSON-LD input</tref> and 
+performing a deterministic transformation on that input that results in all
+aspects of the graph being fully expanded and named in the 
+<tref>JSON-LD output</tref>. The normalized output is generated in such a way 
+that any conforming JSON-LD processor will generate identical output 
+given the same input. The problem is a fairly difficult technical 
+problem to solve because it requires a directed graph to be ordered into a 
+set of nodes and edges in a deterministic way. This is easy to do when all of 
+the nodes have unique names, but very difficult to do when some of the nodes 
+are not labeled.
+</p>
+
+<p>In time, there may be more than one normalization algorithm that will need
+to be identified. For identification purposes, this algorithm is named 
+<abbr title="Universal Graph Normalization Algorithm 2011">UGNA2011</abbr>.
+</p>
 
 <section>
 <h3>Normalization Algorithm Terms</h3>
  <dl>
-   <dt><tdef>term1</tdef></dt>
+   <dt><tdef>list of expanded nodes</tdef></dt>
    <dd>
-     definition1
+     A list of all nodes in the <tref>JSON-LD input</tref> graph containing no
+     embedded objects and having all keys and values expanded according to the
+     steps in the <a href="#expansion-algorithm">Expansion Algorithm</a>.
+   </dd>
+   <dt><tdef>naming base string</tdef></dt>
+   <dd>
+     An unlabeled node naming prefix that is not used by any other node 
+     in the <tref>JSON-LD input</tref> and does not start with the characters
+     <code>c14n</code>. The prefix is used to temporarily name nodes during
+     the normalization algorithm in a way that doesn't collide with the
+     names that already exist as well as the names that will be generated by the
+     normalization algorithm. 
    </dd>
 </dl>
 </section>
@@ -2140,9 +2166,32 @@
 <h3>Normalization Algorithm</h3>
 
 <ol class="algorithm">
-<li>Expand input (resolve keywords and CURIEs, etc).</li>
-<li>Flatten input (give unique names to unnamed blank nodes, remove embeds, duplicate objects for predicates, etc.).</li>
-<li>Label the unlabeled nodes.</li>
+<li>Expand the <tref>JSON-LD input</tref> according to the steps in
+the <a href="#expansion-algorithm">Expansion Algorithm</a> and store the
+result as the <strong>expanded input</strong>.</li>
+<li>Process every object in the <strong>expanded input</strong>
+searching for <code>@subject</code> values that start with the
+text string <code>_:c14n</code>. If a match is found, rename the subject and 
+all references to the subject by concatenating <code>_:</code> with the 
+<tref>naming base string</tref> and a unique identifier, such as an
+incremented counter value.
+</li>
+<li>Create an empty <tref>list of expanded nodes</tref> and recursively 
+process every object in the <strong>expanded input</strong> that is not an 
+expanded IRI, typed literal or language literal, in depth-first order:
+  <ol class="algorithm">
+    <li>If an object does not contain a <code>@subject</code>, 
+      name it by concatenating 
+      <code>_:</code> with the <tref>naming base string</tref> and a 
+      unique identifier, such as an  incremented counter value.</li>
+    <li>Add the object to the <tref>list of expanded nodes<tref>.</li>
+    <li>Replace the reference to the object with the value of the
+       <code>@subject</code> in the object.</li>
+    <li>???duplicate objects for predicates???</li>
+  </ol></li>
+<li>Label all of the nodes that contain a <code>@subject</code> key associated
+with a value starting with <code>_:</code> according to the steps in the 
+<a href="#node-labeling-algorithm">Node Labeling Algorithm</a>.</li>
 </ol>
 </section>
 
@@ -2150,13 +2199,30 @@
 <h4>Node Labeling Algorithm</h4>
 
 <ol class="algorithm">
-<li>Rename any blank nodes that start with 'c14n' using a temporary prefix.</li>
 <li>Initialize property and reference relation serializations for all blank nodes to null.</li>
 <li>Until all blank nodes are canonically named, keep sorting them according to DeepCompare and naming them according to NameNode.</li>
 </ol>
 </section>
 
 <section>
+<h4>Shallow Comparison Algorithm</h4>
+
+<p>
+ShallowCompare(bnodeA, bnodeB):
+</p>
+
+<ol class="algorithm">
+  <li>The bnode with fewer properties is first.</li>
+  <li>The bnode with the alphabetically first property is first.</li>
+  <li>Do CompareObjects for each equivalent property.</li>
+  <li>The bnode with fewer references is first.</li>
+  <li>The bnode with the reference IRI (vs. bnode) is first.</li>
+  <li>The bnode with the alphabetically-first reference IRI is first.</li>
+  <li>The bnode with the alphabetically-first reference property is first.</li>
+</ol>
+</section>
+
+<section>
 <h4>Deep Comparison Algorithm</h4>
 
 <p>