PROV-DM new version of Collections

Collections

PROV-DM Collection Constraints and further considerations [to go in part II]

One cannot have multiple assertions that define the state of a collection by means of insertions and removal relations. Thus:

entity(c1, [prov:type="Collection"])
entity(c2, [prov:type="Collection"])
entity(c, [prov:type="Collection"])

derivedByInsertionFrom(c, c1, {(k1, v1), (k2, v2)})
derivedByInsertionFrom(c, c2, {(k3, v3)})

is not allowed (unless the two sets were identical, in which case one of the two statements would be redundant)

In particular, one cannot derive the state of a collection from another using multiple statements. Thus:

derivedByInsertionFrom(id1, c, c1, {(k1, v1), (k2, v2)})
derivedByInsertionFrom(id2, c, c1, {(k3, v3), (k4, v4)})

is not allowed.

The same applies to removal and combinations of insertions and removals, for example:

derivedByInsertionFrom(c, c1, {(k1, v1)})
derivedByRemovalFrom(c, c2, {k2})

is not allowed.

Keys are unique within a collection. Thus:

entity(c, [prov:type="Collection"])
entity(c1, [prov:type="Collection"])

derivedByInsertionFrom(c1, c, {(k, v1), ...})
derivedByInsertionFrom(c1, c, {(k, v2)}, ...)

implies v1==v2.

Further considerations.

Collection branching.

It is possible to have multiple derivations from a single root collection, as long as the resulting entities are distinct, as shown in the following example.

  entity(c, [prov:type="EmptyCollection"])    // e is an empty collection
  entity(v1)
  entity(v2)
  entity(v3)
  entity(c1, [prov:type="Collection"])
  entity(c2, [prov:type="Collection"])
  entity(c3, [prov:type="Collection"])
  
  derivedByInsertionFrom(c1, c, {(k1, v1)})      
  derivedByInsertionFrom(c2, c, {(k2, v2)})       
  derivedByInsertionFrom(c3, c1, {(k3,v3)})

From this set of assertions, we conclude:

  c1 = { (k1,v1) }
  c2 = { (k2 v2) }
  c3 = { (k1,v1),  (k3,v3) }

State of collections and use of weaker derivation relation

The state of a collection is only known to the extent that a chain of derivations starting from an empty collection can be found. Since a set of assertions regarding a collection's evolution may be incomplete, so is the reconstructed state obtained by querying those assertions. In general, all assertions reflect partial knowledge reagrding a sequence of data transformation events. In the particular case of collection evolution, in which some of the state changes may have been missed, the more generic derivation relation should be used to signal that some updates may have occurred, which cannot be precisely asserted as insertions or removals. The following two examples illustrate this.

  entity(c, [prov:type="Collection"])    // c is a collection, possibly not empty
  entity(v1)
  entity(v2, [prov:type="Collection"])    // v2 is a collection

  derivedByInsertionFrom(c1, c, {(k1, v1)})       
  derivedByInsertionFrom(c2, c1, {(k2, v2)})

From this set of assertions, we conclude:

    c1 includes (k1,v1) but may contain additional unknown pairs
    c2 includes (k1,v1), (k2 v2) (and possibly more pairs), where v2 is a collection with unknown state

In the example, the state of c2 is only partially known because the collection is constructed from partially known other collections.

   entity(c, [prov:type="EmptyCollection"])    // c is an empty collection
   entity(v1)
   entity(v2)
   entity(c1, [prov:type="Collection"])    
   entity(c2, [prov:type="Collection"])    
   entity(c3, [prov:type="Collection"])    
 
   derivedByInsertionFrom(c1, c, {(k1, v1)})       
   wasDerivedFrom(c2, c1)                       
   derivedByInsertionFrom(c3, c2, {(k2, v2)})

From this set of assertions, we conclude:

    c1 = { (k1,v1) }
    c2 is somehow derived from c1, but the precise sequence of updates is unknown
    c3  includes  (k2 v2) but the earlier "gap" leaves uncertainty regarding  (k1,v1) 
  (it may have been removed) or any other pair that may have been added as part of the derivation activities.