incorporated text on derivation in prov-dm wd4
authorLuc Moreau <l.moreau@ecs.soton.ac.uk>
Thu, 08 Mar 2012 17:16:22 +0000
changeset 1813 23b182816fa2
parent 1811 65876ec0c4a3
child 1814 724afee592af
incorporated text on derivation in prov-dm wd4
model/prov-dm-constraints.html
model/prov-dm.html
--- a/model/prov-dm-constraints.html	Thu Mar 08 15:49:36 2012 +0000
+++ b/model/prov-dm-constraints.html	Thu Mar 08 17:16:22 2012 +0000
@@ -681,7 +681,7 @@
 
 
 <p>
-A generation's id is OPTIONAL. It MUST be used when annotating generations (see Section <a href="#term-annotation">Annotation</a>) or when defining precise-1
+A generation's id is OPTIONAL. It MUST be used when annotating generations (see Section <a href="#term-annotation">Annotation</a>) or when defining precise
 derivations (see <a href="#Derivation-Relation">Derivation</a>).
 </p>
 
@@ -715,7 +715,7 @@
 
 
 <p>
-A usage id is OPTIONAL. It MUST be present when annotating usages (see Section <a href="#term-annotation">Annotation</a>) or when defining precise-1 derivations (see
+A usage id is OPTIONAL. It MUST be present when annotating usages (see Section <a href="#term-annotation">Annotation</a>) or when defining precise derivations (see
 <a href="#Derivation-Relation">Derivation</a>).</p>
 
 <p>
@@ -784,19 +784,13 @@
 <section id="Derivation-Relation">
 <h4>Derivation</h4>
 
-
-
-<p>A precise-1  derivation is richer  than an imprecise-1 derivation, itself, being more informative that an imprecise-n derivation Hence, the following implications
-hold.</p>
-<div class='inference' id='derivation-implications'>
+<p>A derivation is more informative if it contains a reference to an activity, generation, and usage. Hence, the following implication
+holds.</p>
+<div class='inference' id='derivation-implication'>
 Given two entities denoted by <span class="name">e1</span> and <span class="name">e2</span>, <span class='conditional'>if</span> the assertion  <span class="name">wasDerivedFrom(e2,
 e1, a, g2, u1, attrs)</span>
- holds for some generation identified by <span class="name">g2</span>, and usage identified by <span class="name">u1</span>, then <span
-class="name">wasDerivedFrom(e2,e1,[prov:steps="single"] &cup; attrs)</span> also holds.<br>
-
-Given two entities denoted by <span class="name">e1</span> and <span class="name">e2</span>, <span class='conditional'>if</span> the assertion  <span class="name">wasDerivedFrom(e2,
-e1, [prov:steps="single"] &cup; attrs)</span>
- holds, then <span class="name">wasDerivedFrom(e2,e1,attrs)</span> also holds.<br>
+ holds for some generation <span class="name">g2</span>,  usage <span class="name">u1</span>,  and set of attribute-value pairs <span class="name">attrs</span>, then <span
+class="name">wasDerivedFrom(e2,e1, attrs)</span> also holds.<br>
  </div>
 
 <div class="interpretation-forward">
@@ -805,27 +799,6 @@
 </div>
 
 
-<p>The imprecise-1 derivation has the same meaning as the  precise-1
- derivation, except that an activity  
- is known to exist, though it does not need to be 
-asserted.  This is formalized by the following inference rule,
-referred to as <em>activity introduction</em>:</p>
-<div class='inference' id="activity-introduction">
-<span class='conditional'>If</span> <span class="name">wasDerivedFrom(e2,e1)</span> holds, <span class='conditional'>then</span> there exist an activity, with identifier <span
-class="name">a</span>, a usage identified by <span class="name">u</span>, and a generation identified by <span class="name">g</span>
-such that:
-<pre class="codeexample">
-activity(a,aAttrs)
-wasGeneratedBy(g,e2,a,gAttrs)
-used(u,a,e1,uAttrs)
-</pre>
-for sets of attribute-value pairs <span class="name">gAttrs</span>, <span class="name">uAttrs</span>, and <span class="name">aAttrs</span>.
-</div>
-
-
-
-
-
 <p>
 Note that inferring derivation from usage and generation does not hold
 in general. Indeed, when a generation <span class="name">wasGeneratedBy(g, e2, a, attrs2)</span>
@@ -839,14 +812,6 @@
 </p>
 
 
-<p>The effective placeholder for an entity generation time is  <a>generation</a>. The presence of 
-time information in imprecise derivations is merely a convenience notation for a timeless derivation and a generation with this generation time information. </p>
-
-<div class='inference' id="derivation-time-elimination">
-<span class='conditional'>If</span> <span class="name">wasDerivedFrom(e2,e1,t,attrs)</span> holds, <span class='conditional'>then</span> the following expressions also hold:
-<span class="name">wasDerivedFrom(e2,e1,attrs)</span> and <span class="name">wasGeneratedBy(e2,t)</span>.
-</div>
-
 <p></p>
 <div class="structural-forward">
 See <a href="#derivation-use">derivation-use</a> for a structural constraint on derivations.
@@ -856,13 +821,14 @@
 
 
 
-<div class='issue'>Several points were raised about the attribute steps.
+<div class='pending'>Several points were raised about the attribute steps.
 Its name, its default value   <a href="http://www.w3.org/2011/prov/track/issues/180">ISSUE-180</a>.
  <a href="http://www.w3.org/2011/prov/track/issues/179">ISSUE-179</a>.</div>
 
 <div class='issue'> Emphasize the notion of 'affected by'   <a href="http://www.w3.org/2011/prov/track/issues/133">ISSUE-133</a>.</div>
 
-<div class='issue'> Simplify derivation   <a href="http://www.w3.org/2011/prov/track/issues/249">ISSUE-249</a>.</div>
+<div class='pending'> Simplify derivation   <a href="http://www.w3.org/2011/prov/track/issues/249">ISSUE-249</a>.</div>
+
 
 
 </section>
@@ -1498,20 +1464,22 @@
 
 
 
-<p>A further inference is permitted from the imprecise-1 derivations: </p>
+
+
+
+<p>A further inference is permitted from derivations with an explicit activity and no usage: </p>
 <div class='inference' id='derivation-use'>
-<p>Given an activity with identifier <span class="name">a</span>, entities  denoted by <span class="name">e1</span> and <span class="name">e2</span>, and a set of attribute-value
-pairs <span class="name">attrs2</span>,
-<span class='conditional'>if</span> <span class="name">wasDerivedFrom(e2,e1, [prov:steps="single"])</span> and <span class="name">wasGeneratedBy(e2,a,attrs2)</span> hold, <span
-class='conditional'>then</span> <span class="name">used(a,e1,attrs1)</span> also holds
-for some set of attribute-value pairs <span class="name">attrs1</span>.
+<p>Given an activity <span class="name">a</span>, entities  denoted by <span class="name">e1</span> and <span class="name">e2</span>, and  sets of attribute-value
+pairs <span class="name">dAttrs</span>, <span class="name">gAttrs</span>,
+<span class='conditional'>if</span> <span class="name">wasDerivedFrom(e2,e1, a, dAttrs)</span> and <span class="name">wasGeneratedBy(e2,a,gAttrs)</span> hold, <span
+class='conditional'>then</span> <span class="name">used(a,e1,uAttrs)</span> also holds
+for some set of attribute-value pairs <span class="name">uAttrs</span>.
 </div>
 <p>This inference is justified by the fact that the entity denoted by <span class="name">e2</span> is generated by at most one activity in a given account
 (see <a href="#generation-uniqueness">generation-uniqueness</a>). Hence,  this activity is also the one referred to by the usage of <span class="name">e1</span>. 
 </p>
 
 
-
 <p>We note that the converse inference, does not hold.
 From <span class="name">wasDerivedFrom(e2,e1)</span> and <span class="name">used(a,e1)</span>, one cannot
 derive <span class="name">wasGeneratedBy(e2,a,attrs2)</span> because identifier <span class="name">e1</span> may occur in usages performed by many activities, which may have not generated the entity denoted by <span class="name">e2</span>.</p>
--- a/model/prov-dm.html	Thu Mar 08 15:49:36 2012 +0000
+++ b/model/prov-dm.html	Thu Mar 08 17:16:22 2012 +0000
@@ -114,7 +114,7 @@
  
           // if your specification has a subtitle that goes below the main
           // formal title, define it here
-          subtitle   :  "Draft (WD4) for internal discussion",
+          subtitle   :  "About-to-be-frozen WD4 (for internal release)",
 
  
           // if you wish the publication date to be other than today, set this
@@ -1329,154 +1329,63 @@
 </li>
 </ul>
 </section>
-
 <section id="Derivation-Relation">
+
 <h4>Derivation</h4>
 
 <div class="glossary-ref" ref="glossary-derivation" withspan='true'></div>
 
 
-<div class='note'>
-This text was not edited much. It keeps on referring to asserter/assertion.  Before editing this section, we would like to have  <a href="http://www.w3.org/2011/prov/track/issues/249">ISSUE-249</a> resolved.
-</div>
-
-
-<p>According to <a href="#conceptualization">Section Conceptualization</a>, for an entity to be transformed from, created from, or affected by another in some way, there must be some
+
+
+<p>According to <a href="#conceptualization">Section Overview</a>, for an entity to be transformed from, created from, or resulting from an update to another, there must be some
 underpinning activities performing the necessary actions resulting in such a derivation.  
-However, asserters may not assert or have knowledge of these activities and associated details: they may not assert or know their number, they may not assert or know their identity, they may
-not assert or know the attributes characterizing how the relevant entities are used or generated. To accommodate the varying circumstances of the various asserters, PROV-DM allows more or
-less precise derivations to be asserted.  Hence, PROV-DM uses the terms <em>precise</em> and <em>imprecise</em> to characterize the different kinds of derivations. We note
-that the derivation itself is exact (i.e., deterministic, non-probabilistic), but it is its description, expressed in a derivation assertion, that may be imprecise. </p>
-
-<p>The  lack of precision may come from two sources:</p>
+A derivation can be described at various levels of precision. In its simplest form, derivation relates two entities. Optionally, attributes can be added to describe modalities of derivation.  If the derivation is the result of a single known activity, then this activity can also be optionally expressed. And to provide a completely accurate description of derivation, the generation and usage of the generated and used entities, respectively, can be provided. The reason for optional information such as activity, generation, and usage to be linked to derivations is to aid analysis of provenance and to facilitate provenance-based reproducibility. </p>
+
+
+<p><div class="attributes" id="attributes-derivation">A <dfn>derivation</dfn><span class="withAsn">, written <span class="pnExpression" id="pn-wasDerivedFrom">wasDerivedFrom(id, e2, e1, a, g2, u1, attrs)</span> in PROV-ASN,</span> contains:</p>
 <ul>
-<li> the number of activities that underpin a derivation is not asserted or known, or</li>
-<li> any of the other details that are involved in the derivation is not asserted or known; these include activity identities, generation and usage, and their attributes.</li>
-</ul>
-
-
-
-
-<p>Hence, we can consider two axis.  An activity number axis that has values  <em>single</em>, <em>multiple</em>,  and <em>unknown</em>, respectively representing the case where one activity
-is known to have occurred, more than one activities are known to have occurred, or an unknown number of activities have occurred. Likewise, we can consider another axis to cover other
-details (identities, generation, usage, and attributes), with values <em>asserted</em> and <em>not asserted</em>. We can then form a matrix of possible derivations. Out of the six
-possibilities, 
-PROV-DM offers three forms of derivations to cater for five of them, while the remaining one is not meaningful.  The following table summarizes names for the three kinds of
-derivation, which we then explain.</p>
-
-<div style="text-align: center;">
-<table border="1" style="margin-left: auto; margin-right: auto;">
-<caption>PROV-DM Derivation Type Summary</caption>
-<tr><td colspan=2 rowspan=2></td><td colspan=2><em>other details</em> axis</td></tr>
-<tr><td>asserted</td><td>not asserted</td></tr> 
-<tr><td rowspan=3><em>activity number</em><br>axis</td><td>single</td><td><a>precise-1 derivation</a></td><td><a>imprecise-1 derivation</a></td></tr> 
-<tr><td>multiple</td><td><a>imprecise-n derivation</a></td><td rowspan=2><a>imprecise-n derivation</a></td></tr> 
-<tr><td>unknown</td><td>&mdash;</td></tr> 
-</table>
-</div>
-
-<ul>
-<li> The asserter asserts that derivation is due to exactly one activity, and all the details are asserted. We call this a precise-1 derivation.</li>
-<li> The asserter asserts that derivation is due to exactly one activity, but other details,  whether known or unknown, are not asserted. We call this an imprecise-1 derivation.</li>
-<li> The following cases are captured by an imprecise-n derivation.
-<ul>
-<li> The asserter knows that multiple activities are involved or ignores the number of activities involved in the derivation, and  other details are not asserted. </li>
-<li> The asserter knows that multiple activities are involved in the derivation, and all their details are asserted. In this case,  these activities are connected by means of generated and
-used intermediary entities.  Despite all activities and details being known, there is no guarantee that any of these activities plays an active role in the derivation; hence, this case is
-also regarded as imprecise. Instead, precise derivations need to be expressed between these intermediary entities.  </li>
-</ul>
-</ul>
-
-<p> We note that the last theoretical cases cannot occur, since
-  asserting the details of an unknown number of activities is a contradiction.
-</p>
-
-<p>In order to represent the number of activities in a derivation, we introduce a PROV-DM attribute <span class="name">steps</span>, which can take two possible values:   <span
-class="name">single</span> and <span class="name">any</span>.
-When <span class="name">prov:steps="single"</span>, derivation is due to one activity; when <span class="name">prov:steps="any"</span>, the number of activities is multiple or not known.</p>
-
-
-<p>The three kinds of <dfn id="dfn-derivation">derivations</dfn> are successively introduced.  Making use of the attribute <span class="name">steps</span>, we can distinguish the various derivation types.</p>
-
-<p><div class="attributes" id="attributes-derivation">A <dfn>precise-1 derivation</dfn><span class="withAsn">, written <span class="pnExpression">wasDerivedFrom(id, e2, e1, a, g2, u1, attrs)</span> in PROV-ASN,</span> contains:</p>
-<ul>
-<li><span class='attribute'>id</span>:  an OPTIONAL identifier  identifying the derivation;</li> 
-<li><span class='attribute'>generatedEntity</span>: the identifier  of the generation entity;</li>
-<li><span class='attribute'>usedEntity</span>: the identifier of the used entity;</li>
-<li><span class='attribute'>activity</span>: the identifier of the activity using and generating the above entities;</li>
-<li><span class='attribute'>generation</span>: the identifier the generation for the generated entity and activity;</li> 
-<li><span class='attribute'>usage</span>: the identifier of the usage for the used entity and activity;</li> 
-<li><span class='attribute'>attributes</span>: an OPTIONAL set of attribute-value pairs that describe the modalities of this derivation, optionally including the attribute-value
-pair <span class="name">prov:steps="single"</span>.</li>
+<li><em>id</em>:  an OPTIONAL identifier  for a derivation;</li> 
+<li><em>generatedEntity</em>: the identifier of the entity generated by the derivation;</li>
+<li><em>usedEntity</em>: the identifier of the entity used by the derivation;</li>
+<li><em>activity</em>: an OPTIONAL identifier for the activity using and generating the above entities;</li>
+<li><em>generation</em>: an OPTIONAL identifier for the generation involving the generated entity and activity;</li> 
+<li><em>usage</em>: an OPTIONAL identifier for the usage involving the used entity and activity;</li> 
+<li><em>attributes</em>: an OPTIONAL set of attribute-value pairs that describe the modalities of this derivation.</li>
 </ul>
 </div>
-<p>It is OPTIONAL to include  the attribute <span class="name">prov:steps</span> in a precise-1 derivation since it already refers to the one and only one activity underpinning the
-derivation.</p>
-
-
-<p>An <dfn>imprecise-1 derivation</dfn><span class="withAsn">, written <span class="pnExpression">wasDerivedFrom(id, e2,e1, t, attrs)</span> in PROV-ASN,</span> contains:</p>
-<ul>
-<li><span class='attribute'>id</span>:  an OPTIONAL identifier identifying the derivation;</li> 
-<li><span class='attribute'>generatedEntity</span>: the identifier of  the generated entity;</li>
-<li><span class='attribute'>usedEntity</span>: the identifier of the used entity;</li>
-<li><span class='attribute'>time</span>: an OPTIONAL "generation time", the time at which the entity was created;</li>
-<li><span class='attribute'>attributes</span>: a set of attribute-value pairs that describe the modalities of this derivation; it MUST include the attribute-value pair <span class="name">prov:steps="single"</span>.</li>
-</ul>
-<p>An imprecise-1 derivation MUST include the attribute <span class="name">prov:steps</span>,  since it is the only means to distinguish this derivation from an imprecise-n derivation.</p>
-
-
-<p>An <dfn>imprecise-n derivation</dfn><span class="withAsn">, written <span class="pnExpression">wasDerivedFrom(id, e2, e1, t, attrs)</span> in PROV-ASN,</span> contains:</p>
-<ul>
-<li><span class='attribute'>id</span>:  an OPTIONAL identifier  identifying the derivation;</li> 
-<li><span class='attribute'>generatedEntity</span>: the identifier of the generated entity;</li>
-<li><span class='attribute'>usedEntity</span>: the identifier of the used entity;</li>
-<li><span class='attribute'>time</span>: an OPTIONAL "generation time", the time at which the entity was created;</li>
-<li><span class='attribute'>attributes</span>: an OPTIONAL set of attribute-value pairs that describe the modalities of this derivation; it optionally includes the attribute-value pair <span class="name">prov:steps="any"</span>.</li>
-</ul>
-<p>It is OPTIONAL to include  the attribute <span class="name">prov:steps</span> in an imprecise-n derivation. It defaults to <span class="name">prov:steps="any"</span>.</p> 
-
-
-<p>None of the three kinds of derivation is defined to be transitive. Domain-specific specializations of these derivations may be defined in such a way that the transitivity property
+
+<p> Derivation is not defined to be transitive. Domain-specific specializations of derivation may be defined in such a way that the transitivity property
 holds.</p>
 
 
+
+
+
 <div class="anexample">
 <p>The following descriptions state the existence of derivations.</p>
 <pre class="codeexample">
-wasDerivedFrom(e5,e3,a4,g2,u2)
-wasDerivedFrom(e5,e3,a4,g2,u2,[prov:steps="single"])
-
-wasDerivedFrom(e3,e2,[prov:steps="single"])
-
-wasDerivedFrom(e2,e1,[])
-wasDerivedFrom(e2,e1,[prov:steps="any"])
-
-wasDerivedFrom(e2,e1,2012-01-18T16:00:00, [prov:steps="any"])
+wasDerivedFrom(e2,e1)
+wasDerivedFrom(e2,e1,[prov:type="physical transform"])
+wasDerivedFrom(e2,e1,a,g2,u1)
+  wasGeneratedBy(g2,e2,a)
+  used(u1,a,e1)
 </pre>
 <p>
-The first two are precise-1 derivations expressing that the activity identified by <span class="name">a4</span>, by
-using the entity denoted by <span class="name">e3</span> according to usage <span class="name">u2</span>
- derived the
-entity denoted by <span class="name">e5</span> and generated it according to generation
- <span class="name">g2</span>. </p>
-
+The first and second lines are about derivations between  <span class="name">e2</span> and  <span class="name">e1</span>, but no information is provided as to the identity of the activity (and usage and generation) underpinning the derivation. In the second line, a type attribute is also provided.</p>
 <p>
-The third line describes an imprecise-1 derivation, which is similar for <span class="name">e3</span> and <span class="name">e2</span>, but it leaves the activity and associated attributes implicit. The fourth and fifth lines are about imprecise-n derivations between  <span class="name">e2</span> and  <span class="name">e1</span>, but no information is provided as to the number and identity of activities underpinning the derivation. The sixth derivation extends the fifth with the derivation time of <span class="name">e2</span>.
-</p>
+The third description expresses that activity  <span class="name">a</span>, 
+using the entity <span class="name">e1</span> according to usage <span class="name">u1</span>,
+ derived the
+entity <span class="name">e2</span> and generated it according to generation
+ <span class="name">g2</span>. It is followed by descriptions for generation <span class="name">g2</span> and usage <span class="name">u1</span>. With such a comprehensive description of derivation, a program that analyzes provenance can identify the activity underpinning the derivation, it can identify how the original entity <span class="name">e1</span> was used by  the activity (e.g. for instance, which argument it was passed as, if the activity is the result of a function invocation), and which output the derived entity <span class="name">e2</span> was obtained from (say, for a function returning multiple results).</p>
 </div>
 
 
 
-<div class='issue'>Several points were raised about the attribute steps.
-Its name, its default value   <a href="http://www.w3.org/2011/prov/track/issues/180">ISSUE-180</a>.
- <a href="http://www.w3.org/2011/prov/track/issues/179">ISSUE-179</a>.</div>
-
 <div class='issue'> Emphasize the notion of 'affected by'   <a href="http://www.w3.org/2011/prov/track/issues/133">ISSUE-133</a>.</div>
 
 
-<div class='issue'> Is imprecise-1 derivation necessary? Can we just use precise-1 and imprecise-n?   <a href="http://www.w3.org/2011/prov/track/issues/249">ISSUE-249</a>.</div>
-
-
 </section>