added section on Untrusting Merge
authorSandro Hawke <sandro@hawke.org>
Tue, 15 May 2012 07:47:32 -0400
changeset 373 0ce5c7f9152c
parent 372 3c2c909d3312
child 374 36d5efae5e0a
added section on Untrusting Merge
rdf-spaces/index.html
--- a/rdf-spaces/index.html	Tue May 15 00:05:01 2012 -0400
+++ b/rdf-spaces/index.html	Tue May 15 07:47:32 2012 -0400
@@ -667,7 +667,7 @@
 
 
   <section>
-    <h2>Union and Merge</h2>
+    <h2>Merge and Union</h2>
 
     <p>RDF graphs are usually combined in one of two ways:</p>
 
@@ -700,6 +700,82 @@
   </section>
 
 
+  <section>
+    <h2>Untrusting Merge</h2>
+
+    <p>The act of <dfn>renaming the graphs</dfn> in a dataset is to
+    create another dataset which differs from the first only in that
+    all the IRIs used as graph names are replace by fresh "Skolem"
+    IRIs.  This replacement occurs in the name slot of the
+    (name,graph) pairs, and in the triples in the default graph, but
+    <em>not</em> in the triples in the named graphs.</p>
+
+    <p>Logically, this operation is equivalent to partially
+    un-labeling an RDF Graph (turning some IRIs into blank nodes),
+    then Skolemizing those blank nodes.  As an operation, it discards
+    some of the information and adds more true information; it is a
+    sound but not complete reasoning step.  It can be made complete by
+    <dfn>recording</dfn> the relationship between the old graph names
+    and the new ones, using some vocabulary such as owl:sameAs.</p>
+
+    <p>For example, a recording graph_rename operation might take as input:</p>
+    <pre>@prefix : &lt;http://example.com/>
+:g1 { :a :b :c }
+:d :e :f</pre>
+    <p>and produce:</p>
+    <pre>@prefix : &lt;http://example.com/>
+:fe2b9765-ba1d-4644-a335-80a8c3786c8d { :a :b :c }
+:d :e :f
+:fe2b9765-ba1d-4644-a335-80a8c3786c8d owl:sameAs :g1
+</pre>
+
+    <p>Given the semantics of datasets, informally described above and
+    formally stated in <a href="#semantics" class="sectionRef"></a>,
+    and the semantics of OWL, where { ?a owl:sameAs ?b } means that
+    the terms ?a and ?b both denote the same thing, the second dataset
+    above entails the first, and includes only additional information
+    that is known to be true.  (Slight caveat: the new information is
+    only true if the assumptions of the name-generation function are
+    correct, that the name is previously unused and this naming agent
+    has the right to claim it.)</p>
+
+    <p>A relatated operation, <dfn>sequestering</dfn> the default
+    graph, is to create a new dataset which differs from the first
+    only in that the the triples in the default graph of the input
+    appear instead in a new, freshly-named, <a>named graph</a> of the
+    output.  Sequestering returns both the new dataset and the name
+    generated for the new graph: <code>sequester(D1) -> (D2,
+    generatedIRI)</code>.</p>
+
+    <p>Used together, the operations of <a>renaming the graphs</a>,
+    <a>sequestering</a> the default graphs, and then <a>merging
+    datasets</a>, constitutes an <dfn>untrusting merge</dfn> of
+    datasets.  This operation provides the functionality required for
+    addressing the use case described in <a href="#uc-untrusted"
+    class="sectionRef"></a> and is illustrated in <a
+    href="#example-untrusted" class="sectionRef"></a>.  It uses quads
+    to addresses some&mdash;perhaps all&mdash;of the need for quints
+    or nested graphs.</p>
+
+    <p>More precisely:</p>
+    
+    <div style="margin-left: 2em;">
+      <pre>function untrusted_merge(D1, ... Dn):
+   for i in 1..n:
+      RDi = rename_graphs(Di)
+      (SRDi, DGNi) = sequester(RDi)
+   return (merge(SRD1, ... SRDn), (DGN1, ... DGNn))</pre>
+    </div>
+
+   <p>Here, <tt>untrusted_merge</tt> returns a single dataset and a list of
+   the names of the graphs (in that dataset) which contain the triples
+   that were in the default graphs, possibly augmented with
+   <a>recording</a> triples.  Whether recording is done or not is
+   hidden inside the rename_graphs function, and is
+   application-dependent.</p>
+
+  </section>
+
   
 </section>
 
@@ -1131,11 +1207,7 @@
     <h2>Showing Untrusted Quads(v5)</h2>
 
     <p>@@@ Show how to address <a href="#uc-untrusted" class="sectionRef"></a></p>
-    <p>@@@ what if one of the divisions gives you bad quads?  It
-    better not mess up provenance.  Maybe suggest GSP-style name
-    mangling...?  Put "renaming datasets" in Concepts somewhere as a
-    standard thing?</p>
-
+    <p>@@@ uses <a>renaming the graphs</a>.</p>
 
 
   </section>
@@ -1444,6 +1516,7 @@
 <section class="appendix informative" id="changes">
   <h2>Changes</h2>
   <ul>
+    <li>2012-05-15: Added section on "Untrusting Merge".</li>
     <li>2012-05-14: Fill in the use cases, removing some of the text that was there and which can go into the example.  Redid the trig grammar, adding spaceName, changing formatting.  Added valid-time example.  Added some of transaction-time example.</li>
     <li>2012-05-13: Fill in the example's skeleton, add a few issues/ideas on trig</li>
     <li>2012-05-11: Rewriting and reorganizing Concepts; some more work on Usecases and Example; removed the Detailed Example since it needs to be so re-written; renamed 'reflection' to 'folding'; reworked the Semanics</li>