described complementarity
authorStian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Mon, 10 Oct 2011 12:23:09 +0100
changeset 630 d36878bc83f5
parent 539 eb9e64ddfdf3
child 631 e0a00fcee786
described complementarity
primer/Primer.html
--- a/primer/Primer.html	Wed Oct 05 21:56:53 2011 -0400
+++ b/primer/Primer.html	Mon Oct 10 12:23:09 2011 +0100
@@ -181,8 +181,61 @@
 
    <section>
     <h3>Complementarity</h3>
-
-    <p>An intuitive overview of how to think about complementarity in Prov-DM.</p>
+    <p>
+    Several asserted entities can be characterizing the same thing, in
+    particular when entities are asserted by different <em>accounts</em> or over
+    different time periods. If two such entities have <em>overlapping
+    lifespans</em>, and the first entity have some <em>attributes</em> that
+    have not been asserted (and not neccessarily always true) for the second entity,
+    then the first entity is said to be <em>complementing</em> the second
+    entity, that is the first entity helps form a more detailed
+    description of the second entity, at least for the duration of the
+    overlapping lifespan.  
+    </p> 
+    <p> 
+    In addition, if <code>:A prov:wasComplementOf :B</code>, then of all the
+    attributes of the entity <code>:A</code> which can be <em>mapped</em> to
+    <em>compatible</em> attributes of <code>:B</code> MUST be <em>matching</em>
+    for the contiuous duration of the overlap of <code>:A</code> and
+    <code>:B</code>'s lifespans.
+    It is out of scope for PROV to specify or assert the nature of
+    the <em>compatibility mapping</em> and <em>matching</em>, the exact
+    interpretation of these is left to the asserter of
+    <code>wasComplementOf</code>
+    </p>
+    <p>
+    If <code>:B</code> also have some attributes which 
+    are not asserted (or not always true) about <code>:A</code>,
+    then this MAY be asserted using the 
+    inverse relation <code>:B prov:wasComplementOf :A</code>. If two entities
+    both complement each other in this manner, both MUST have some
+    attributes the other does not have, although those attributes MAY
+    not have been asserted in the provenance. Note that the
+    <em>lack</em> of such an inverse assertion does not neccessarily
+    mean that <code>:B</code> did not have any additional attributes
+    for <code>:A</code> in the timespan, only that this has not
+    been asserted.
+    </p>
+    <p>
+    In the simplest case, both entites are described using the same
+    attributes, in which case <em>matching</em> means the values SHOULD
+    literally be the same (matching by identity). On the other hand an
+    attribute like <code>ex1:speed_in_mph</code> can be <em>mapped</em> to
+    a compatible <code>ex2:speed_in_kmh</code> attribute. Not all
+    attributes might be mappable in both directions, for instance
+    <code>ex1:city</code> to <code>ex2:country</code>, but not vice
+    versa.
+    </p>
+    <p>
+    Note that it is out of scope for PROV to assert or explain any
+    mapping of compatible attributes. This is merely a conclusion 
+    that can be drawn from the assertion that the two entities both
+    described the same thing in the overlapping time spans.  Also note
+    that asserting a complementary relationship does not detail how the
+    two entity timespans overlap, this could be anything from
+    complete one-to-one match (where all attributes are always true for
+    both entities) to merely touching overlaps. 
+    </p>
    </section>
 
    <section>
@@ -201,7 +254,7 @@
    <p>We include examples of how the formal ontology 
     can be used to represent the Prov-DM assertions as RDF triples.
     These are shown using the Turtle notation. In 
-    the latter depictions, the namespace prefix <b>po</b> denotes 
+    the latter depictions, the namespace prefix <b>prov</b> denotes 
     terms from the Prov ontology, while <b>ex1</b>, <b>ex2</b>, etc. 
     denote terms specific to the example.</p>
 
@@ -290,17 +343,182 @@
 
     <p><i>Suggested example:</i> After looking at the detail of the compilation process, there appears
      to be nothing wrong, so Betty concludes the error is in GovData. She contacts
-     the government, and a new version of GovData is created. How does the provenance
-     document that the new version is a revision of the old version?</p>
+     the government, and a new revision of GovData is created. How does the provenance
+     document that the new revision is a revision of the old revision?</p>
    </section>
 
    <section>
     <h3>Complementarity</h3>
 
-    <p><i>Suggested example:</i> Betty lets Derek know that a new version of the data set exists,
+    <p>Betty lets Derek know that a new revision of the data set exists,
      and he looks at the provenance of the new data to understand what he needs to
-     reanalyse. When understanding how the new data differs from the old, how does he
-     interpret the relation of the two versions and GovData independent of version?</p>
+     reanalyse. </p>
+    <p>In addition to specifying that 
+        <code>ex1:dataSet2</code> is a new revision of
+        <code>ex1:dataSet1</code>, the provenance from DataGov also 
+        asserts that both of these entities were a <em>complement of</em>
+        another entity <code>ex1:dataSet</code>.
+    </p>
+     <pre class="turtle example">
+     ex1:dataSet1 prov:wasComplementOf ex1:dataSet .
+     ex1:dataSet2 prov:wasComplementOf ex1:dataSet .
+     </pre>
+     <!--
+     <pre class="asn example">
+     wasComplementOf(ex1:dataSet1, ex1:dataSet)
+     wasComplementOf(ex1:dataSet2, ex1:dataSet)
+     </pre>
+     -->
+     <p>
+        This assertion means that <code>ex1:dataSet1</code> at some point shared
+        its characterising attributes with <code>ex1:dataSet</code>, and the same for
+        <code>ex2:dataSet2</code>. Thus the <em>entity</em>
+        <code>ex1:dataSet1</code> did at some point represent the same
+        thing as characterized by the entity <code>ex1:dataSet</code>. The same is
+        true for <code>ex1:dataSet2</code> - but not neccessarily at the
+        same point in time. 
+     </p>
+     <p>
+     The term <em>was complement of</em> here means that the
+     <code>ex1:dataSet1</code>
+     provide additional details that adds to the details of
+     <code>ex1:dataSet</code> (complementing it), and that both of these
+     entities represented the same thing.
+     Characterizing attributes of <code>ex1:dataSet</code> are from this
+     asserted to have been <em>compatible</em> with the properties of
+     <code>ex1:dataSet1</code> and <code>ex1:dataSet2</code>.
+     <em>Compatible</em> here means that some kind of mapping can be
+     established between the attributes, they don't neccessarily have to
+     match directly.
+     </p>
+     <p>   
+        Derek then looks at the characterization of 
+        <code>ex1:dataSet</code> to find these compatible attributes:
+     </p>
+     <pre class="example turtle">
+     ex1:dataSet a ex1:DataSet ;
+         ex1:regions ( ex1:North, ex1:NorthWest, ex1:East ) ;
+         dc:creator ex1:DataGov ;
+         dc:title "Regional incidence dataset 2011" .
+     </pre>
+     <!--
+     <pre class="example asn">
+     entity(ex1:dataSet, [
+        type="ex1:DataSet",
+        ex1:regions="North,NorthWest,East",
+        dc:creator="ex1:DataGov",
+        dc:title="Regional incidence dataset 2011"])
+     </pre>
+     -->
+     <p>Derek can from this deduce that both datasets had at some point
+        the same creator and title.  Derek then compares this to the
+        attributes for each of the complementing entities:
+     </p>
+     <pre class="example turtle">
+     ex1:dataSet1 a ex1:DataSet ;
+         ex1:postCodes ( "N1", "N2", "NW1", "E1", "E2" ) ;
+         ex1:totalIncidents 141 ;
+         dc:creator ex1:DataGov ;
+         dc:title "Regional incidence dataset 2011" .
+     </pre>
+     <!--
+     <pre class="example asn">
+     entity(ex1:dataSet1, [
+        type="ex1:DataSet",
+        ex1:postCodes="N1,N2,NW1,E1,E2",
+        ex1:totalIncidents="141",
+        dc:creator="ex1:DataGov",
+        dc:title="Regional incidence dataset 2011"])
+     </pre>
+     -->
+     <p>
+        Derek sees that the creator and title are directly mappable and 
+        equal between these entities. He also knows (from his region
+        aggregation method) that the <code>ex1:postCodes</code> <code>N1</code> and
+        <code>N2</code> are in the
+        region <code>ex1:North</code>, and so on, and can confirm that although
+        this regional characterisation of the data is not expressed
+        using the same attributes in the two entities, they are <em>compatible</em>. 
+      </p>
+      <p>Derek notes that <code>ex1:totalIncidents</code> is not stated
+        for <code>ex1:dataSet</code>, and not mappable to any of the
+        other existing attributes. Thus this could be one of the
+        complementing attributes that makes <code>ex1:dataSet1</code>
+        more specific than <code>ex1:dataSet</code>.
+            
+        Derek can from the assertion <code>ex1:dataSet1
+        prov:wasComplementOf ex1:dataSet</code>
+        see that <code>ex1:dataSet</code>
+        did have 141 incidents when its characterization interval
+        overlapped that of <code>ex1:dataSet1</code>, but not neccessarily
+        throughout its lifetime. Note that in this example the provenance
+        assertions are not providing any direct description of the
+        characterization interval of the entities.
+      </p>
+      <p> 
+        Due to the open world assumption (more
+        information might be added later) he can not conclude
+        from this alone that <code>ex1:dataSet</code> at any point did
+        <strong>not</strong> have 141 incidents. He therefore does not know
+        for sure that <code>ex1:totalIncidents</code> is a complementing
+        attribute which <code>ex1:dataSet</code> does not have in its
+        characterisation.
+      </p>
+      <p>
+      Derek finally compares the newer revision 
+      <code>ex1:dataSet2</code> with
+      <code>ex1:dataSet</code>:
+      </p>
+     <pre class="example turtle">
+     ex1:dataSet2 a ex1:DataSet ;
+         ex1:postCodes ( "N1", "N2", "NW1", "NW2", "E1", "E2" ) ;
+         ex1:totalIncidents 158 ;
+         dc:creator ex1:DataGov ;
+         dc:title "Regional incidence dataset 2011" .
+     </pre>
+     <!--
+     <pre class="example asn">
+     entity(ex1:dataSet1, [
+        type="ex1:DataSe2",
+        ex1:postCodes="N1,N2,NW1,NW2,E1,E2",
+        ex1:totalIncidents="158",
+        dc:creator="ex1:DataGov",
+        dc:title="Regional incidence dataset 2011"])
+     </pre>
+     -->
+     <p>
+      In this revision, the new postcode <kbd>NW2</kbd> appears, this is still
+      <em>compatible</em> with the region <code>ex1:NorthWest</code>
+      of <code>ex1:dataSet</code>
+      On the other hand, the attribute <code>prov:totalIncidents</code> have gone up to 158. 
+     </p>
+     <p>
+      From the <code>prov:wasComplementOf</code> assertion Derek knows that
+      <code>ex1:dataSet2</code> also provides additional attributes for
+      <code>ex1:dataSet</code>, but because the total incidents can't
+      both be 141 and 158, the attribute <code>ex1:totalIncidents</code>
+      is a complementing attribute, and changes over the
+      characterisation interval (lifespan) of <code>ex1:dataSet</code>,
+      and is thus not one of its characterising attributes.  He also now
+      knows that <code>ex1:dataSet</code> is a common characterisation
+      of the dataset that spans (parts of) both revisions. It has
+      however not been asserted explicitly that the
+      <code>ex1:dataSet</code> is a somewhat more general
+      characterisation, just that it allows mutability on the
+      <code>prov:totalIncidents</code> attribute and overlapped (parts
+      of) the timespans of the two revisions.
+      </p>
+     <p>
+      From this Derek concludes that he can still use the regions Nort,
+      North West and East in the diagram layout, but as the
+      <code>ex1:totalIncidents</code> differ, something in the
+      raw data has changed. He can't from this provenance assertion
+      alone tell if that is merely from the addition of the post code
+      NW2, or if data for the other post codes have changed as well.
+      Derek desides to redo the aggregation by region using
+      <code>ex1:dataSet2</code> and regenerate the
+      graphics using the same layout.
+     </p>
    </section>
 
    <section>