edited collections section on separate file
authorPaolo Missier <pmissier@acm.org>
Wed, 14 Mar 2012 12:22:31 +0000
changeset 1904 e39072d33cee
parent 1903 313202bb4688
child 1905 5e6cc45426fc
edited collections section on separate file
model/working-copy/wd5-prov-dm-collections.html
--- a/model/working-copy/wd5-prov-dm-collections.html	Wed Mar 14 08:21:36 2012 +0000
+++ b/model/working-copy/wd5-prov-dm-collections.html	Wed Mar 14 12:22:31 2012 +0000
@@ -203,50 +203,40 @@
 
 
 
-
-<section id="term-original-source">
-
 <section id="term-Collection">
 <h3>Collections</h3>
 
-<p><strong>Collection relations</strong> address the need to describe the evolution of entities that have a collection structure, that is, which may contain other entities. Specifically, this section exploits the built-in type for entities, called <a title="concept-collection">collection</a>, and two relations to describe the effect of adding elements to, and removing elements from, a collection entity.
-The intent of these relations and entity types is to capture the <em>history of changes that occurred to a collection</em>.
-Thus, a collection entity is an immutable representation of the state of a collection data structure following a sequence of insertion and deletion operations.
-</p>
+<p><strong>Collection relations and entity types</strong> address the need to describe the evolution of entities that have a collection structure, that is, which may contain other entities. The intent of these relations and entity types is to capture the <em>history of changes that occurred to a collection</em>. Indirectly, such history provides a way to reconstruct, with some limitations discussed <a href="#term-collection-weak-derivation">below</a>, the contents of a collection entity. Thus, for the purpose of provenance a collection entity is viewed an immutable representation of the state of a collection data structure, following a sequence of insertion and deletion operations.
 
-<p>A collection is an entity that has a logical internal structure consisting of key-value pairs, often referred to as a map.
-More precisely, the following entity types are introduced:
+<br/>A collection entity is an entity that has a logical internal structure consisting of key-value pairs, often referred to as a <strong>map</strong>. This collection type provides a generic indexing structure that can be used to model commonly used data structures, including associative lists (also known as "dictionaries" in some programming languages), relational tables, ordered lists, and more (the specification of such specialized structures in terms of key-value pairs is out of the scope of this document).
+
+<section id="term-collection-types">
+<h3>Collection types</h3>
+
+The following new entity types are introduced:
 
 <ul>
-  <li> <span class="name">Collection</span>  denotes an entity of type collection, i.e. an entity that  can participate in insertion and removal relations;
-
-  <li><span class="name">EmptyCollection</span> denotes an empty collection.
-</ul>
+  <li> <span class="name">prov:Collection</span>  denotes an entity of type collection, i.e. an entity that  can participate in  relations amongst collections;
 
-The following relations relate a collection <span class="name">c1</span> with a collection <span class="name">c2</span> obtained after adding or removing a new pair to (resp. from) <span class="name">c1</span>:
-
-<ul>
-  <li>Derivation-by-Insertion relation <span class="name">derivedByInsertionFrom(c2, c1, k, v)</span> states that  <span class="name">c2</span> is the state of the collection
-following the insertion of pair <span class="name">(k,v)</span> into collection  <span class="name">c1</span>;</li>
-
-<li>  Derivation-by-Removal relation <span class="name">derivedByRemovalFrom(c2,c1, k)</span> states that  <span class="name">c2</span> is  the  state of the collection following the removal of the pair corresponding to key  <span class="name">k</span> from  <span class="name">c1</span>.</li>
-
+  <li><span class="name">prov:EmptyCollection</span> denotes an empty collection.
 </ul>
 
 <div class="anexample">
 <pre class="codeexample">
    entity(c, [prov:type="EmptyCollection"])    // e is an empty collection
-   entity(v1)
-   entity(v2)
-   entity(c1, [prov:type="Collection"])
-   entity(c2, [prov:type="Collection"])
-  
-  derivedByInsertionFrom(c1, c, "k1", v1)       // c1 = { ("k1",v1) }
-  derivedByInsertionFrom(c2, c1, "k2", v2)      // c2 = { ("k1",v1), ("k2", v2) }
-  derivedByRemovalFrom(c3, c2, k1)              // c3 = { ("k2",v2) }
+   entity(c1, [prov:type="Collection"])   // c1 is a collection, with unknown content
 </pre>
 </div>
 
+</section>  <!-- end of collection-types -->
+
+
+<section id="term-collection-insertion">
+<h3>Insertion</h3>
+
+
+<strong>Derivation-by-Insertion</strong> relation <span class="name">derivedByInsertionFrom(c2, c1, k, v)</span> states that  <span class="name">c2</span> is the state of the collection
+following the insertion of pair <span class="name">(k,v)</span> into collection  <span class="name">c1</span>.
 
 <p> A Derivation-by-Insertion relation<span class="withPn">, written <span class="pnExpression"> derivedByInsertionFrom(id, collAfter, collBefore, key, value, attrs)</span>,</span> contains:</p>
 <ul>
@@ -258,6 +248,34 @@
 <li><span class='attribute'>attributes</span>: an OPTIONAL set of attribute-value pairs to further describe the properties of the relation.</li>
 </ul>
 
+
+<div class="anexample">
+<pre class="codeexample">
+   entity(c, [prov:type="EmptyCollection"])    // c is an empty collection
+   entity(v1)
+   entity(v2)
+   entity(c1, [prov:type="Collection"])
+   entity(c2, [prov:type="Collection"])
+  
+  derivedByInsertionFrom(c1, c, "k1", v1)       
+  derivedByInsertionFrom(c2, c1, "k2", v2)    
+</pre>
+  From this set of assertions, we conclude:
+  <pre class="codeexample">
+   c =  {  }
+   c1 = { ("k1",v1) }
+   c2 = { ("k1",v1), ("k2", v2) }
+  </pre>
+</div>
+
+</section>
+
+
+<section id="term-collection-removal">
+<h3>Removal</h3>
+
+<strong>Derivation-by-Removal</strong> relation <span class="name">derivedByRemovalFrom(c2,c1, k)</span> states that  <span class="name">c2</span> is  the  state of the collection following the removal of the pair corresponding to key  <span class="name">k</span> from  <span class="name">c1</span>.
+
 <p> A Derivation-by-Removal relation, written <span class="pnExpression"> derivedByRemovalFrom(id, collAfter, collBefore, key, attrs)</span>, contains:</p>
 <ul>
 <li><span class='attribute'>id</span>:  an OPTIONAL identifier identifying the relation;</li>
@@ -267,6 +285,97 @@
 <li><span class='attribute'>attributes</span>: an OPTIONAL set of attribute-value pairs to further describe the properties of the relation.</li>
 </ul>
 
+
+<div class="anexample">
+<pre class="codeexample">
+   entity(c, [prov:type="EmptyCollection"])    // e is an empty collection
+   entity(v1)
+   entity(v2)
+   entity(c1, [prov:type="Collection"])
+   entity(c2, [prov:type="Collection"])
+  
+  derivedByInsertionFrom(c1, c, "k1", v1)       
+  derivedByInsertionFrom(c2, c1, "k2", v2)      
+  derivedByRemovalFrom(c3, c2, k1)              
+</pre>
+  From this set of assertions, we conclude:
+  <pre class="codeexample">
+   c =  {  }
+   c1 = { ("k1",v1) }
+   c2 = { ("k1",v1), ("k2", v2) }
+   c3 = { ("k2",v2) }
+  </pre>
+
+  
+</div>
+
+</section>
+
+<section id="term-collection-containment">
+<h3>Containment</h3>
+
+<strong>Containment</strong> relation <span class="name">contained(c, k, v)</span> states that collection <span class="name">c</span> contained pair <span class="name">(k,v)</span>. 
+
+
+<p> A Containment relation, written <span class="pnExpression"> contained(id, coll, key, values, attrs)</span>, contains:</p>
+<ul>
+<li><span class='attribute'>id</span>:  an OPTIONAL identifier identifying the relation;</li>
+<li><span class='attribute'>after</span>: an identifier  for the collection whose members are asserted; </li>
+<li><span class='attribute'>key</span>: the key of the pair contained by the collection;</li>
+<li><span class='attribute'>value</span>: an identifier for the value corresponding to the key;</li>
+<li><span class='attribute'>attributes</span>: an OPTIONAL set of attribute-value pairs to further describe the properties of the relation.</li>
+</ul>
+
+The insertion and removal relations make insertions and removals explicit as part of the history of a collection. The containment relation is typically used when the content of a collection is at least partially known, but no insertions have been witnessed. This is the case for  example of a workflow block that produces a new collection <span class="name">c</span>  with known content. In this case,
+<span class="name">contained(c,k, v)</span> asserts that  <span class="name">c</span> is known to contain <span class="name">(k,v)</span>, regardless of explicit insertion operations.
+
+
+<div class="anexample">
+<pre class="codeexample">
+   entity(c, [prov:type="Collection"])    // e is a collection, with unknown content
+   activity(a)
+   wasGeneratedBy(c,a)   // a produced c
+  
+   entity(v1)
+   entity(v2)
+   contained(c, "k1", v1)
+   contained(c, "k1", v2)    // c is contains at least ("k1", v1)  and ("k2", v2)
+  
+   entity(v2)
+   entity(c1, [prov:type="Collection"])
+  
+   derivedByInsertionFrom(c1, c, "k3", v3)     
+</pre>
+  From this set of assertions, we conclude:
+  <pre class="codeexample">
+   c  contains   ("k1", v1), ("k2", v2) 
+   c1 contains   ("k1", v1), ("k2", v2), ("k3", v3) 
+  </pre>
+ Note that the state of <span class="name">c1</span> after these assertions is only partially known, because the initial state of <span class="name">c</span> is unknown. Alternatively, had the first assertion been
+  <span class="name"> entity(c, [prov:type="EmptyCollection"])</span>,
+  one would conclude that, based on these assertions,  <span class="name">c1 = {("k1", v1) ("k2", v2), ("k3",v3)}</span>.
+</div>
+
+</section>
+
+Further considerations: <p/>
+
+<ul>
+<li>In Key-Value pairs, Keys are <a href="#term-value">values</a>, and Values are entities. This allows expressing nested collections, that is, collections whose values include entities of type collection.</li>
+
+<li>As the relation names suggest, insertion and removal relations are a particular case of <a href="#Derivation-Relation">derivation</a>.</li>
+
+
+
+<li>The state of a collection (i.e., the set of key-value pairs it contains) at a given point in a sequence of operations is never stated explicitly. Rather, it can be obtained by querying the chain of derivations involving insertions and removals. Entity type <span class="name">emptyCollection</span> can be used in this context as it marks the start of a sequence of collection operations.</li>
+
+
+<li>The representation of a collection through these relations, makes no assumption regarding the underlying data structure used to store and manage collections. In particular, no assumptions are needed regarding the mutability of a data structure that is subject to updates. Entities, however, are immutable and this applies  to those entities that represent collections. This is reflected in the constraints listed in Part II.  </li>
+</ul>
+
+<section id="collection-convenience-relations">
+<h3>Convenience relations</h3>
+
 As a convenience, corresponding relations for <em>bulk operations</em> involving a set of key-value pairs are introduced, as follows.
 
 <p> A  Derivation-by-Bulk-Insertion relation <span class="withPn">, written <span class="pnExpression"> derivedByBulkInsertionFrom(id, collAfter, collBefore, key-value-set, attrs)</span>,</span> contains:</p>
@@ -278,7 +387,7 @@
 <li><span class='attribute'>attributes</span>: an OPTIONAL set of attribute-value pairs to further describe the properties of the relation.</li>
 </ul>
 
-<p> A Derivation-by-Bulk-Removal relation, written <span class="pnExpression"> derivedByBulkRemovalFrom(id, collAfter, collBefore, key, attrs)</span>, contains:</p>
+<p> A Derivation-by-Bulk-Removal relation, written <span class="pnExpression"> derivedByBulkRemovalFrom(id, collAfter, collBefore, key-set, attrs)</span>, contains:</p>
 <ul>
 <li><span class='attribute'>id</span>:  an OPTIONAL identifier identifying the relation;</li>
 <li><span class='attribute'>after</span>: an identifier  for the collection  <em>after</em> the deletion; </li>
@@ -288,33 +397,89 @@
 </ul>
 
 
-<p>Further considerations:</p>
-
+<p> A Bulk-Containment relation, written <span class="pnExpression"> containedBulk(id, coll, key-value-set, attrs)</span>, contains:</p>
 <ul>
-  <li>The <strong>map</strong> collection type provides a generic indexing structure that can be used to model commonly used data structures, including associative lists (also known as "dictionaries" in some programming languages), relational tables, ordered lists, and more (the specification of such specialized structures in terms of key-value pairs is out of the scope of this document).</li>
-
-<li>Values are entities. This allows expressing nested collections, that is, collections whose values include entities of type collection.</li>
-
-<li>As the relation names suggest, insertion and removal relations are a particular case of <a href="#Derivation-Relation">derivation</a>.</li>
-
-<li>This representation of a collection's evolution makes no assumption regarding the underlying data structure used to store and manage collections. In particular, no assumptions are needed regarding the mutability of a data structure that is subject to updates. Entities, however, are immutable and this applies  to those entities that represent collections. This is reflected in the constraints listed in Part II.  </li>
-
-<li>The state of a collection (i.e., the set of key-value pairs it contains) at a given point in a sequence of operations is never stated explicitly. Rather, it can be obtained by querying the chain of derivations involving insertions and removals. Entity type <span class="name">emptyCollection</span> can be used in this context as it marks the start of a sequence of collection operations.</li>
-
-<li>An activity may generate a new collection, complete with its content, without visible insertion operations. Nevertheless, one can abstract the collection construction process as a sequence of insertions (or one single bulk insertion). Thus, if the content of the collection at the end of the generation is known, one can state it using the Derivation-by-Insertion relation, starting from the empty collection (or, more generally, from a collection with unknown prior state).
+<li><span class='attribute'>id</span>:  an OPTIONAL identifier identifying the relation;</li>
+<li><span class='attribute'>after</span>: an identifier  for the collection whose members are asserted; </li>
+<li><span class='attribute'>key-value-set</span>: a set of key-value pairs contained in the collection, of the form {(key_1, value_1), ..., (key_n, value_n)}</li>
+<li><span class='attribute'>attributes</span>: an OPTIONAL set of attribute-value pairs to further describe the properties of the relation.</li>
+</ul>
 
 <div class="anexample">
 <pre class="codeexample">
-   entity(c0, [prov:type="EmptyCollection"])    // e is an empty collection
-   activity(a)
-   entity(c1, [prov:type="Collection"]) 
-   wasGeneratedBy(c1,a)  
-   derivedByBulkInsertionFrom(c1, c0, {("k1", v1), {("k2", v2) )       // c1 = { ("k1",v1), ("k2",v2) }
+   entity(c, [prov:type="EmptyCollection"])    // e is a collection, with unknown content
+  
+   entity(v1)
+   entity(v2)
+   derivedByInsertionFrom(c1, c, {("k1", v1), ("k2", v2)})  
+   derivedByInsertionFrom(c2, c1, "k3", v3)                 
+   derivedByRemovalFrom(c3, c1, {"k1", "k3"})               
+
+   contained(c3, {("k4", v4), ("k5", v5)})  
 </pre>
+  From this set of assertions, we conclude:
+  <pre class="codeexample">
+   c =  {  }
+   c1  = { ("k1", v1) ("k2", v2)}
+   c2  = { ("k1", v1) ("k2", v2), ("k3", v3)}
+   c3  = { ("k2", v2), ("k4", v4), ("k5", v5)}
+  </pre>
 </div>
 
-</ul>
-<div class='note'>Deleted further items. Some of them are constraints which belong to part 2.</div>
+
+</section> <!-- convenience relations -->
+
+<section id="term-collection-state">
+
+<h3>State of collections and use of weaker <a href="#Derivation-Relation">derivation</a> relation</h3>
+
+<p>The state of a collection is only known to the extent that a chain of derivations starting from an empty collection can be found. Since a set of assertions regarding a collection's evolution may be incomplete, so is the reconstructed state obtained by querying those assertions. In general, all assertions reflect partial knowledge reagrding a sequence of data transformation events. In the particular case of collection evolution, in which some of the state changes may have been missed, the more generic  <a href="#Derivation-Relation">derivation</a> relation should be used to signal that some updates may have occurred, which cannot be precisely asserted as insertions or removals. The following two examples illustrate this.</p>
+
+<div class="anexample">
+<pre class="codeexample">
+  entity(c, [prov:type="collection"])    // c is a collection, possibly not empty
+  entity(v1)
+  entity(v2, [prov:type="collection"])    // v2 is a collection
+
+  entity(c1, [prov:type="collection"])    
+  entity(c2, [prov:type="collection"])    
+  
+  derivedByInsertionFrom(c1, c, k1, v1)       
+  derivedByInsertionFrom(c2, c1, k2, v2)    
+</pre>
+    From this set of assertions, we conclude:
+  <pre class="codeexample">
+   c1 includes (k1,v1) but may contain additional unknown pairs
+   c2 includes (k1,v1), (k2 v2) (and possibly more pairs), where v2 is a collection with unknown state
+  </pre>
+
+</div>
+  In the example, the state of <span class="name">c2</span> is only partially known because the collection is constructed from partially known other collections.
+
+<div class="anexample">
+<pre class="codeexample">
+  entity(c, [prov:type="emptyCollection"])    // c is an empty collection
+  entity(v1)
+  entity(v2)
+  entity(c1, [prov:type="collection"])    
+  entity(c2, [prov:type="collection"])    
+  entity(c3, [prov:type="collection"])    
+
+  derivedByInsertionFrom(c1, c, k1, v1)       
+  wasDerivedFrom(c2, c1)                       
+  derivedByInsertionFrom(c3, c2, k2, v2)       
+</pre>
+    From this set of assertions, we conclude:
+  <pre class="codeexample">
+   c1 = { (k1,v1) }
+   c2 is somehow derived from c1, but the precise sequence of updates is unknown
+   c3  includes  (k2 v2) but the earlier "gap" leaves uncertainty regarding  (k1,v1) <br/>  (it may have been removed) or any other pair that may have been added as part of the derivation activities.
+  </pre>
+</div>
+
+</section>  <!-- end of collections-weaker-derivation -->
+
+
 
 
 </section>   <!-- end collections-->
@@ -325,10 +490,10 @@
 <div class='constraint' id='collection-unique-insertion'>
 <p>One cannot have multiple assertions regarding the state of a collection. Thus:</p>
 <pre class="codeexample">
-derivedByInsertionFrom(c2, c1, k1, v1)
-derivedByInsertionFrom(c2, c1, k2, v2)
+derivedByInsertionFrom(id1, c2, c1, k1, v1)
+derivedByInsertionFrom(id2, c2, c1, k2, v2)
 </pre>
-implies <span class="name">k1 = v1, k2 = v2</span>. <p/>
+implies <span class="name">k1 = v1, k2 = v2, id1 = id2</span>. <p/>
 
   Similarly for removal:
 <pre class="codeexample">
@@ -352,10 +517,17 @@
   entity(c2, [prov:type="Collection"])
   entity(c3, [prov:type="Collection"])
   
-  derivedByInsertionFrom(c1, c, k1, v1)       // c1 = { (k1,v1) }
-  derivedByInsertionFrom(c2, c, k2, v2)       // c2 = { (k2 v2) }
-  derivedByInsertionFrom(c3, c1, k3,v3)       // c3 = { (k1,v1),  (k3,v3) }
+  derivedByInsertionFrom(c1, c, k1, v1)      
+  derivedByInsertionFrom(c2, c, k2, v2)       
+  derivedByInsertionFrom(c3, c1, k3,v3)       
 </pre>
+    From this set of assertions, we conclude:
+  <pre class="codeexample">
+  c1 = { (k1,v1) }
+  c2 = { (k2 v2) }
+  c3 = { (k1,v1),  (k3,v3) }
+  </pre>
+
 </div>
 </div>
 
@@ -363,61 +535,45 @@
 <div class='constraint' id='collection-unique-ancestor'>
 A collection can only be derived from a single prior collection. Thus:
 <pre class="codeexample">
+entity(c1, [prov:type="Collection"])
+entity(c2, [prov:type="Collection"])
+entity(c, [prov:type="Collection"])
+
 derivedByInsertionFrom(c, c1, k1, v1)
 derivedByInsertionFrom(c, c2, k2, v2)
 </pre>
 implies  <span class="name">c1==c2</span>. <p/>
 
-And:
 <pre class="codeexample">
 derivedByRemovalFrom(c, c1, k1)
-derivedByRemovalFrom(c, c1, k2)
+derivedByRemovalFrom(c, c2, k2)
 </pre>
-implies <span class="name">k1 = k2</span>. <p/>
+implies <span class="name">c1 = c2, k1 = k2</span>. <p/>
 
- This also applies to any combination of insertions and removals.
+  This also applies to any combination of insertions and removals, for example:
+
+  <pre class="codeexample">
+derivedByInsertionFrom(c, c1, k1, v1)
+derivedByRemovalFrom(c, c2, k2)
+</pre>
+  implies  <span class="name">c1 = c2, k1 = k2</span>.
+  
 </div>
 
 <div class='constraint' id='collection-unique-value-for-key'>
   Keys are unique within a collection. Thus:
 <pre class="codeexample">
+entity(c, [prov:type="Collection"])
+entity(c1, [prov:type="Collection"])
+
 derivedByInsertionFrom(c1, c, k, v1)
 derivedByInsertionFrom(c1, c, k, v2)
 </pre>
 implies  <span class="name">v1==v2</span>.
 </div>
 
-<h3>Use of weaker <a href="#Derivation-Relation">derivation</a> relation</h3>
-
-<p>The state of a collection is only known to the extent that a chain of derivations starting from an empty collection can be found. Since a set of assertions regarding a collection's evolution may be incomplete, so is the reconstructed state obtained by querying those assertions. In general, all assertions reflect the asserter's partial knowledge of a sequence of data transformation events. In the particular case of collection evolution, in which the asserter  <em>knows</em> that some of the state changes may have been missed, then the more generic  <a href="#Derivation-Relation">derivation</a> relation should be used to signal that some updates may have occurred, which cannot be precisely asserted as insertions or removals. The following two examples illustrate this.</p>
-
-<div class="anexample">
-<pre class="codeexample">
-  entity(c, [prov:type="collection"])    // e is a collection, possibly not empty
-  entity(v1)
-  entity(v2, [prov:type="collection"])    // v2 is a collection
 
-  derivedByInsertionFrom(c1, c, k1, v1)       // c1 <em>includes</em> { (k1,v1) } but may contain additional unknown pairs
-  derivedByInsertionFrom(c2, c1, k2, v2)      // c2 includes { (k1,v1), (k2 v2) } where v2 is a collection with unknown state
-</pre>
-</div>
-  In the example, the state of <span class="name">c2</span> is only partially known because the collection is constructed from partially known other collections.
-
-<div class="anexample">
-<pre class="codeexample">
-  entity(c, [prov:type="emptyCollection"])    // e is an empty collection
-  entity(v1)
-  entity(v2)
-
-  derivedByInsertionFrom(c1, c, k1, v1)       // c1 = { (k1,v1) }
-  wasDerivedFrom(c2, c1)                        // the asserted knows that c2 is somehow derived from c1, but cannot assert the precise sequence of updates
-    derivedByInsertionFrom(c3, c2, k2, v2)       
-</pre>
-
-<p>Here  <span class="name">c3</span> includes <span class="name">{ (k2 v2) }</span> but the earlier "gap" leaves uncertainty regarding  <span class="name">(k1,v1)</span>  (it may have been removed) or any other pair that may have been added as part of the derivation activities.</p>
-</div>
-
-</section>
+</section>  <!-- end of collections -->