--- a/primer/Primer.html Wed Oct 05 21:56:53 2011 -0400
+++ b/primer/Primer.html Mon Oct 10 12:23:09 2011 +0100
@@ -181,8 +181,61 @@
<section>
<h3>Complementarity</h3>
-
- <p>An intuitive overview of how to think about complementarity in Prov-DM.</p>
+ <p>
+ Several asserted entities can be characterizing the same thing, in
+ particular when entities are asserted by different <em>accounts</em> or over
+ different time periods. If two such entities have <em>overlapping
+ lifespans</em>, and the first entity have some <em>attributes</em> that
+ have not been asserted (and not neccessarily always true) for the second entity,
+ then the first entity is said to be <em>complementing</em> the second
+ entity, that is the first entity helps form a more detailed
+ description of the second entity, at least for the duration of the
+ overlapping lifespan.
+ </p>
+ <p>
+ In addition, if <code>:A prov:wasComplementOf :B</code>, then of all the
+ attributes of the entity <code>:A</code> which can be <em>mapped</em> to
+ <em>compatible</em> attributes of <code>:B</code> MUST be <em>matching</em>
+ for the contiuous duration of the overlap of <code>:A</code> and
+ <code>:B</code>'s lifespans.
+ It is out of scope for PROV to specify or assert the nature of
+ the <em>compatibility mapping</em> and <em>matching</em>, the exact
+ interpretation of these is left to the asserter of
+ <code>wasComplementOf</code>
+ </p>
+ <p>
+ If <code>:B</code> also have some attributes which
+ are not asserted (or not always true) about <code>:A</code>,
+ then this MAY be asserted using the
+ inverse relation <code>:B prov:wasComplementOf :A</code>. If two entities
+ both complement each other in this manner, both MUST have some
+ attributes the other does not have, although those attributes MAY
+ not have been asserted in the provenance. Note that the
+ <em>lack</em> of such an inverse assertion does not neccessarily
+ mean that <code>:B</code> did not have any additional attributes
+ for <code>:A</code> in the timespan, only that this has not
+ been asserted.
+ </p>
+ <p>
+ In the simplest case, both entites are described using the same
+ attributes, in which case <em>matching</em> means the values SHOULD
+ literally be the same (matching by identity). On the other hand an
+ attribute like <code>ex1:speed_in_mph</code> can be <em>mapped</em> to
+ a compatible <code>ex2:speed_in_kmh</code> attribute. Not all
+ attributes might be mappable in both directions, for instance
+ <code>ex1:city</code> to <code>ex2:country</code>, but not vice
+ versa.
+ </p>
+ <p>
+ Note that it is out of scope for PROV to assert or explain any
+ mapping of compatible attributes. This is merely a conclusion
+ that can be drawn from the assertion that the two entities both
+ described the same thing in the overlapping time spans. Also note
+ that asserting a complementary relationship does not detail how the
+ two entity timespans overlap, this could be anything from
+ complete one-to-one match (where all attributes are always true for
+ both entities) to merely touching overlaps.
+ </p>
</section>
<section>
@@ -201,7 +254,7 @@
<p>We include examples of how the formal ontology
can be used to represent the Prov-DM assertions as RDF triples.
These are shown using the Turtle notation. In
- the latter depictions, the namespace prefix <b>po</b> denotes
+ the latter depictions, the namespace prefix <b>prov</b> denotes
terms from the Prov ontology, while <b>ex1</b>, <b>ex2</b>, etc.
denote terms specific to the example.</p>
@@ -290,17 +343,182 @@
<p><i>Suggested example:</i> After looking at the detail of the compilation process, there appears
to be nothing wrong, so Betty concludes the error is in GovData. She contacts
- the government, and a new version of GovData is created. How does the provenance
- document that the new version is a revision of the old version?</p>
+ the government, and a new revision of GovData is created. How does the provenance
+ document that the new revision is a revision of the old revision?</p>
</section>
<section>
<h3>Complementarity</h3>
- <p><i>Suggested example:</i> Betty lets Derek know that a new version of the data set exists,
+ <p>Betty lets Derek know that a new revision of the data set exists,
and he looks at the provenance of the new data to understand what he needs to
- reanalyse. When understanding how the new data differs from the old, how does he
- interpret the relation of the two versions and GovData independent of version?</p>
+ reanalyse. </p>
+ <p>In addition to specifying that
+ <code>ex1:dataSet2</code> is a new revision of
+ <code>ex1:dataSet1</code>, the provenance from DataGov also
+ asserts that both of these entities were a <em>complement of</em>
+ another entity <code>ex1:dataSet</code>.
+ </p>
+ <pre class="turtle example">
+ ex1:dataSet1 prov:wasComplementOf ex1:dataSet .
+ ex1:dataSet2 prov:wasComplementOf ex1:dataSet .
+ </pre>
+ <!--
+ <pre class="asn example">
+ wasComplementOf(ex1:dataSet1, ex1:dataSet)
+ wasComplementOf(ex1:dataSet2, ex1:dataSet)
+ </pre>
+ -->
+ <p>
+ This assertion means that <code>ex1:dataSet1</code> at some point shared
+ its characterising attributes with <code>ex1:dataSet</code>, and the same for
+ <code>ex2:dataSet2</code>. Thus the <em>entity</em>
+ <code>ex1:dataSet1</code> did at some point represent the same
+ thing as characterized by the entity <code>ex1:dataSet</code>. The same is
+ true for <code>ex1:dataSet2</code> - but not neccessarily at the
+ same point in time.
+ </p>
+ <p>
+ The term <em>was complement of</em> here means that the
+ <code>ex1:dataSet1</code>
+ provide additional details that adds to the details of
+ <code>ex1:dataSet</code> (complementing it), and that both of these
+ entities represented the same thing.
+ Characterizing attributes of <code>ex1:dataSet</code> are from this
+ asserted to have been <em>compatible</em> with the properties of
+ <code>ex1:dataSet1</code> and <code>ex1:dataSet2</code>.
+ <em>Compatible</em> here means that some kind of mapping can be
+ established between the attributes, they don't neccessarily have to
+ match directly.
+ </p>
+ <p>
+ Derek then looks at the characterization of
+ <code>ex1:dataSet</code> to find these compatible attributes:
+ </p>
+ <pre class="example turtle">
+ ex1:dataSet a ex1:DataSet ;
+ ex1:regions ( ex1:North, ex1:NorthWest, ex1:East ) ;
+ dc:creator ex1:DataGov ;
+ dc:title "Regional incidence dataset 2011" .
+ </pre>
+ <!--
+ <pre class="example asn">
+ entity(ex1:dataSet, [
+ type="ex1:DataSet",
+ ex1:regions="North,NorthWest,East",
+ dc:creator="ex1:DataGov",
+ dc:title="Regional incidence dataset 2011"])
+ </pre>
+ -->
+ <p>Derek can from this deduce that both datasets had at some point
+ the same creator and title. Derek then compares this to the
+ attributes for each of the complementing entities:
+ </p>
+ <pre class="example turtle">
+ ex1:dataSet1 a ex1:DataSet ;
+ ex1:postCodes ( "N1", "N2", "NW1", "E1", "E2" ) ;
+ ex1:totalIncidents 141 ;
+ dc:creator ex1:DataGov ;
+ dc:title "Regional incidence dataset 2011" .
+ </pre>
+ <!--
+ <pre class="example asn">
+ entity(ex1:dataSet1, [
+ type="ex1:DataSet",
+ ex1:postCodes="N1,N2,NW1,E1,E2",
+ ex1:totalIncidents="141",
+ dc:creator="ex1:DataGov",
+ dc:title="Regional incidence dataset 2011"])
+ </pre>
+ -->
+ <p>
+ Derek sees that the creator and title are directly mappable and
+ equal between these entities. He also knows (from his region
+ aggregation method) that the <code>ex1:postCodes</code> <code>N1</code> and
+ <code>N2</code> are in the
+ region <code>ex1:North</code>, and so on, and can confirm that although
+ this regional characterisation of the data is not expressed
+ using the same attributes in the two entities, they are <em>compatible</em>.
+ </p>
+ <p>Derek notes that <code>ex1:totalIncidents</code> is not stated
+ for <code>ex1:dataSet</code>, and not mappable to any of the
+ other existing attributes. Thus this could be one of the
+ complementing attributes that makes <code>ex1:dataSet1</code>
+ more specific than <code>ex1:dataSet</code>.
+
+ Derek can from the assertion <code>ex1:dataSet1
+ prov:wasComplementOf ex1:dataSet</code>
+ see that <code>ex1:dataSet</code>
+ did have 141 incidents when its characterization interval
+ overlapped that of <code>ex1:dataSet1</code>, but not neccessarily
+ throughout its lifetime. Note that in this example the provenance
+ assertions are not providing any direct description of the
+ characterization interval of the entities.
+ </p>
+ <p>
+ Due to the open world assumption (more
+ information might be added later) he can not conclude
+ from this alone that <code>ex1:dataSet</code> at any point did
+ <strong>not</strong> have 141 incidents. He therefore does not know
+ for sure that <code>ex1:totalIncidents</code> is a complementing
+ attribute which <code>ex1:dataSet</code> does not have in its
+ characterisation.
+ </p>
+ <p>
+ Derek finally compares the newer revision
+ <code>ex1:dataSet2</code> with
+ <code>ex1:dataSet</code>:
+ </p>
+ <pre class="example turtle">
+ ex1:dataSet2 a ex1:DataSet ;
+ ex1:postCodes ( "N1", "N2", "NW1", "NW2", "E1", "E2" ) ;
+ ex1:totalIncidents 158 ;
+ dc:creator ex1:DataGov ;
+ dc:title "Regional incidence dataset 2011" .
+ </pre>
+ <!--
+ <pre class="example asn">
+ entity(ex1:dataSet1, [
+ type="ex1:DataSe2",
+ ex1:postCodes="N1,N2,NW1,NW2,E1,E2",
+ ex1:totalIncidents="158",
+ dc:creator="ex1:DataGov",
+ dc:title="Regional incidence dataset 2011"])
+ </pre>
+ -->
+ <p>
+ In this revision, the new postcode <kbd>NW2</kbd> appears, this is still
+ <em>compatible</em> with the region <code>ex1:NorthWest</code>
+ of <code>ex1:dataSet</code>
+ On the other hand, the attribute <code>prov:totalIncidents</code> have gone up to 158.
+ </p>
+ <p>
+ From the <code>prov:wasComplementOf</code> assertion Derek knows that
+ <code>ex1:dataSet2</code> also provides additional attributes for
+ <code>ex1:dataSet</code>, but because the total incidents can't
+ both be 141 and 158, the attribute <code>ex1:totalIncidents</code>
+ is a complementing attribute, and changes over the
+ characterisation interval (lifespan) of <code>ex1:dataSet</code>,
+ and is thus not one of its characterising attributes. He also now
+ knows that <code>ex1:dataSet</code> is a common characterisation
+ of the dataset that spans (parts of) both revisions. It has
+ however not been asserted explicitly that the
+ <code>ex1:dataSet</code> is a somewhat more general
+ characterisation, just that it allows mutability on the
+ <code>prov:totalIncidents</code> attribute and overlapped (parts
+ of) the timespans of the two revisions.
+ </p>
+ <p>
+ From this Derek concludes that he can still use the regions Nort,
+ North West and East in the diagram layout, but as the
+ <code>ex1:totalIncidents</code> differ, something in the
+ raw data has changed. He can't from this provenance assertion
+ alone tell if that is merely from the addition of the post code
+ NW2, or if data for the other post codes have changed as well.
+ Derek desides to redo the aggregation by region using
+ <code>ex1:dataSet2</code> and regenerate the
+ graphics using the same layout.
+ </p>
</section>
<section>