extensive edit of collections 6.8
authorPaolo Missier <pmissier@acm.org>
Fri, 20 Jan 2012 12:07:27 +0000
changeset 1424 5c78159bc626
parent 1423 34b110edc35f
child 1425 19b676b49051
extensive edit of collections 6.8
model/ProvenanceModel.html
--- a/model/ProvenanceModel.html	Fri Jan 20 09:16:26 2012 +0000
+++ b/model/ProvenanceModel.html	Fri Jan 20 12:07:27 2012 +0000
@@ -3468,7 +3468,11 @@
 <span class="name">)</span> 
 </div>
 
-</section> 
+</section>
+
+<!--------------------
+COLLECTIONS
+---------------------->
 
 <section id="record-Collection">
 <h3>Collections</h3>
@@ -3483,51 +3487,56 @@
 <a href="http://www.w3.org/2011/prov/track/issues/135">ISSUE-139</a>
 </div>
 
-Activity-entity assertions account for the provenance of entities in terms of how they are produced and consumed. The <em>collection</em> assertions introduced in this section complement
-those by offering a family of entity-entity assertions that can be used to express structural relationships amongst entities.  Specifically, their  purpose is to enable modelling of part-of
-relationships amongst entities. In particular, a form of <strong>collection</strong> entity type is introduced, consisting of a <strong>set of key-value pairs</strong>. Key-value pairs
-provide a generic indexing structure that can be used to model commonly used data structures, including associative lists (also known as "dictionaries" in some programming languages),
-relational tables, ordered lists, and more.<p/>
-
-Collections are entities that contain <strong>sets</strong> of key-value pairs. Both keys and values are also entities.
-
-The following record types are introduced to model (a) insertion of a new key-value pair into a collection and (c) removal of a key-value pair from a collection. Optionally, the time at
-which the insertion/deletion takes places can be specified.
-
-<p>    Record:  <span class="name">CollectionAfterInsertion(c2,c1, k, v, [t])</span>  denotes that collection  <span class="name">c2</span> is an updated version of collection <span
-class="name">c1</span>, following the insertion of pair <span class="name">(k,v)</span> into  <span class="name">c1</span> (at time t).</p>
-
-<p>    Record:  <span class="name">CollectionAfterRemoval(c2,c1, k, [t])</span>  denotes that collection  <span class="name">c2</span> is an updated version of collection <span
-class="name">c1</span>, following the removal of pair <span class="name">(k,v)</span> from  <span class="name">c1</span> (at time t), where  <span class="name">v</span> is the value
-corresponding to key  <span class="name">k</span> in  <span class="name">c1</span>.</p>
-
-The intent of these records is to capture the history of changes to a collection, using <em>copy semantics</em>. This means that each insertion/removal operation generates a new collection,
-rather than updating the collection in place. Because these assertions state the derivation of a collection from another, formally they are specializations of the <a
-href="#record-Derivation"><strong>wasDerivedFrom</strong></a> relation, specifically precise-1 or imprecise-1.
+The intent of the <em>collection</em> record types relations introduced in this section is to account for <strong>part-of</strong> relationships that may exist amongst entities. Specifically, this section:
+
+<ul>
+  <li>introduces a new built-in type <span class="name">collection</span>, and
+  <li>introduces relations that let asserters describe how the content of entities of type collection changes over time, as a consequence of insertion and deletion operations.
+  </ul>
+
+We adopt a very generic form of collection for the purpose, namely an abstract data types consisting of <strong>set of key-value pairs</strong>, often referred to as a <strong>map</strong>. This provides a generic indexing structure that can be used to model commonly used data structures, including associative lists (also known as "dictionaries" or maps in some programming languages), relational tables, ordered lists, and more (the specification of such specialized structures in terms of key-value pairs is out of the scope of this document).<p/>
+
+Keys and values used in collections are either literals, or entities themselves. This allows expressing nested collections, that is, collections whose values include entities of type collection.<p/>
+
+<p>The following relations and corresponding record types are introduced to model (a) insertion of a new key-value pair into a collection and (b) removal of a key-value pair from a collection.
+
+<ul>
+  <li>Record:  <span class="name">CollectionAfterInsertion(c2,c1, k, v)</span> denotes that collection  <span class="name">c2</span> represents the new state of collection <span class="name">c1</span>,
+following the insertion of pair <span class="name">(k,v)</span> into  <span class="name">c1</span>
+
+<li>  Record: <span class="name">CollectionAfterRemoval(c2,c1, k)</span> denotes that collection  <span class="name">c2</span> represents the removal of pair <span class="name">(k,v)</span> from  <span class="name">c1</span>, where  <span class="name">v</span> is the value
+corresponding to key  <span class="name">k</span> in  <span class="name">c1</span>. 
+
+</ul>
+
+<p>
+Because these relations state the derivation of a collection from another, formally they are specializations of the precise-1  <a
+href="#Derivation-Relation"><strong>wasDerivedFrom</strong></a> relation.<p/>
+
+The following entity types are introduced:
+
+<ul>
+  <li> <span class="name">prov:type="Collection"%%xsd:QName</span>  denotes that the entity is a collection, that is, it can participate in  <span class="name">CollectionAfterInsertion</span> and  <span class="name">CollectionAfterDeletion</span> relations;
+
+  <li><span class="name">prov:type="emptyCollection"%%xsd:QName</span> denotes that the entity is a collection, and furthermore that it does not contain any key-value pairs.
+</ul>
 
 <div class="note">Given the specific nature of the derivation, the intervening activity that accounts for imprecise-1 derivation should have an equally specific type, such as"
 collection-insertion" and "collection-removal". This is left for a future version.</div>
 
-Because of the <strong>copy semantics</strong> interpretation of collection manipulation, one cannot have two assertions of the form:<br/>
-<span class="name">CollectionAfterInsertion(c2,c1, k1, v1, t1)</span><br/>
-<span class="name">CollectionAfterInsertion(c2,c1, k2, v2, t2)</span><br/>
-<em>within the same account</em>, regardless of the values of <span class="name">k1, k2, v1, v2, t1, t2</span>. This is because <span class="name">c2</span> is the result of exactly one insertion into <span class="name">c1</span>.  
-<p/>
-
-Collections follow <strong>set semantics</strong> have been recorded using
-collection assertions (note that the problem of state completeness is common to other provenance assertions, and not specific to the case of collections or part-of relations).
-
-</ul>
- 
+The intent of these relations and entity types is to capture the <em>history of changes that occurred to a collection</em>. <p/>
+
 The following examples illustrate how these assertions are expected to be used in practice.
 
 <div class="anexample">
 <pre class="codeexample">
-   <span class="name">entity(c, [prov:type="collection"])</span>    // <span class="name">e</span> is now understood to be an empty collection
+   <span class="name">entity(c, [prov:type="EmptyCollection"])</span>    // <span class="name">e</span> is an empty collection
    <span class="name">entity(k1).
    <span class="name">entity(v1).
    <span class="name">entity(k2).
    <span class="name">entity(v2).
+   <span class="name">entity(c1, [prov:type="Collection"]).</span>
+   <span class="name">entity(c2, [prov:type="Collection"]).</span>
   
   <span class="name">CollectionAfterInsertion(c1, c, k1, v1) </span>      // c1 = { (k1,v1) }
   <span class="name">CollectionAfterInsertion(c2, c1, k2, v2) </span>     // c2 = { (k1,v1), (k2 v2) }
@@ -3535,43 +3544,78 @@
 </pre>
 </div>
 
-Note that <span class="name">c</span> is a collection by the initial entity record assertion, while <span class="name">c1, c2, c3</span> are known to be collections as a consequence of their
-involvement in collection derivation assertions. Note also that the state of <span class="name">c3</span> can be obtained by querying the chain of derivations assertions involving insertions
-and removals, up to the starting point, <span class="name">c</span>.<p/>
-
-The following example illustrates how copy semantics may lead to multiple derivations from a single root collection, possibly by different asserters:
+This representation of a collection's evolution makes no assumption regarding the underlying data structure used to store and manage collections. In particular, no assumptions are needed regarding the mutability of a data structure that is subject to updates.   In fact, the state of a collection (i.e., the set of key-value pairs it contains) at a given point in a sequence of operations is never stated explicitly. Rather, it can be obtained by querying the chain of derivation assertions involving insertions and removals. Entity type <span class="name">prov:type="emptyCollection"</span> can be used in this context as it marks the start of a sequence of collection operations.
+
+<p>A number of observations follow from this.
+
+<ul>
+
+  <li> One can have multiple assertions regarding the state of a collection following a <em>set</em> of insertions, for example:<br/>
+<span class="name">CollectionAfterInsertion(c2,c1, k1, v1)</span><br/>
+<span class="name">CollectionAfterInsertion(c2,c1, k2, v2)</span><br/>
+  <span class="name">...</span><br/>
+This is interpreted as <em>" <span class="name">c2</span> is the state that results from inserting  <span class="name">(k1, v1)</span>,  <span class="name">(k2, v2)</span> etc. into  <span class="name">c1</span></em><p/>
+  
+<li> It is possible to have multiple derivations from a single root collection, possibly by different asserters, as shown in the following example.
 
 <div class="anexample">
 <pre class="codeexample">
-  <span class="name">entity(c, [prov:type="collection"])</span>    // <span class="name">e</span> is now understood to be an empty collection
-   <span class="name">entity(k1).
-   <span class="name">entity(v1).
-   <span class="name">entity(k2).
-   <span class="name">entity(v2).
-   <span class="name">entity(k3).
-   <span class="name">entity(v3).
+  <span class="name">entity(c, [prov:type="EmptyCollection"])</span>    // <span class="name">e</span> is an empty collection
+  <span class="name">entity(k1).
+  <span class="name">entity(v1).
+  <span class="name">entity(k2).
+  <span class="name">entity(v2).
+  <span class="name">entity(k3).
+  <span class="name">entity(v3).
+  <span class="name">entity(c1, [prov:type="Collection"]).</span>
+  <span class="name">entity(c2, [prov:type="Collection"]).</span>
+  <span class="name">entity(c3, [prov:type="Collection"]).</span>
   
   <span class="name">CollectionAfterInsertion(c1, c, k1, v1) </span>      // c1 = { (k1,v1) }
   <span class="name">CollectionAfterInsertion(c2, c, k2, v2) </span>      // c2 = { (k2 v2) }
   <span class="name">CollectionAfterInsertion(c3, c1, k3,v3) </span>      // c3 = { (k1,v1),  (k3,v3) }
 </pre>
+</div><p/>
+
+<li>Both relations are functional, in the sense that they represent a state change following a unique insertion/removal operation (or a unique set of them). Thus, from the pair of assertions:
+
+<span class="name">CollectionAfterInsertion(c, c1, k1, v1)</span><br/>
+<span class="name">CollectionAfterInsertion(c, c2, k2, v2)</span><br/>
+
+one can infer: <span class="name">c1==c2, k1==k2, v1==v</span>.<p/>
+
+
+<li> The state of a collection is only known to the extent that a chain of derivations starting from an empty collection can be found. It is understood that a set of assertions regarding a collection's evolution may be incomplete, and therefore so is the reconstructed state obtained by querying those assertions. For example:
+
+<div class="anexample">
+<pre class="codeexample">
+  <span class="name">entity(c, [prov:type="collection"])</span>    // <span class="name">e</span> is a collection, possibly not empty
+  <span class="name">entity(k1).
+  <span class="name">entity(v1).
+  <span class="name">entity(k2).
+  <span class="name">entity(v2, [prov:type="collection"])</span>    // <span class="name">v2</span> is a collection
+
+  <span class="name">CollectionAfterInsertion(c1, c, k1, v1) </span>      // c1 <em>includes</em> { (k1,v1) } but may contain additional unknown pairs
+  <span class="name">CollectionAfterInsertion(c2, c1, k2, v2) </span>      // c2 includes { (k1,v1), (k2 v2) } where v2 is a collection with unknown state
+</pre>
 </div>
-
-<p> An assertion CollectionAfterInsertion, written <span class="name"> CollectionAfterInsertion(collAfter, collBefore, key, value, [time])</span>, contains:</p>
+  In the example, the state of <span class="name">c2</span> is only partially known.
+
+</ul>
+
+<p> An assertion CollectionAfterInsertion, written <span class="name"> CollectionAfterInsertion(collAfter, collBefore, key, value)</span>, contains:</p>
 <ul>
 <li><em>entity</em>: an identifier <span class="name">collAfter</span> for the collection record representing the collection <em>after</em> the insertion; </li>
 <li><em>source</em>: an identifier <span class="name">collBefore</span>  for the collection record representing the collection <em>before</em> the insertion;</li>
 <li><em>source</em>: an identifier <span class="name">key</span>  for the key that has been inserted into <span class="name">collBefore</span></li>
 <li><em>source</em>: an identifier <span class="name">value</span>  for the value corresponding to the key that has been inserted into <span class="name">collBefore</span></li>
-<li><em>attributes</em>: an OPTIONAL <span class="name">time</span> at which the insertion was performed.</li>
 </ul>
 
-<p> An assertion CollectionAfterDeletion, written <span class="name"> CollectionAfterDeletion(collAfter, collBefore, key, [time])</span>, contains:</p>
+<p> An assertion CollectionAfterDeletion, written <span class="name"> CollectionAfterDeletion(collAfter, collBefore, key)</span>, contains:</p>
 <ul>
 <li><em>entity</em>: an identifier <span class="name">collAfter</span> for the collection record representing the collection <em>after</em> the deletion; </li>
 <li><em>source</em>: an identifier <span class="name">collBefore</span>  for the collection record representing the collection <em>before</em> the deletion;</li>
 <li><em>source</em>: an identifier <span class="name">key</span>  for the key corresponding to the (key, value) pair that has been deleted from <span class="name">collBefore</span></li>
-<li><em>attributes</em>: an OPTIONAL <span class="name">time</span> at which the insertion was performed.</li>
 </ul>
 
 
@@ -3592,12 +3636,8 @@
 <span class="nonterminal">kidentifier</span>
  <span class="name">,</span>
 <span class="nonterminal">videntifier</span>
-  [
- <span class="name">,</span>
-<span class="nonterminal">time</span>
-  ]
   <span class="name">)</span>
-<br/>
+  <br/>
 
  <span class="nonterminal">collectionRemovalRecord</span>&nbsp;::=
   <span class="name">CollectionAfterRemoval</span> 
@@ -3607,10 +3647,6 @@
 <span class="nonterminal">cidentifier</span>
  <span class="name">,</span>
 <span class="nonterminal">kidentifier</span>
-  [
- <span class="name">,</span>
-<span class="nonterminal">time</span>
-  ]
   <span class="name">)</span>
 
   </div>