structural constraints
authorLuc Moreau <l.moreau@ecs.soton.ac.uk>
Wed, 15 Feb 2012 11:11:54 +0000
changeset 1564 bb280fb10877
parent 1563 a6f55a6a9766
child 1565 c875346539ea
structural constraints
model/working-copy/prov-dm-constraints.html
--- a/model/working-copy/prov-dm-constraints.html	Wed Feb 15 10:02:39 2012 +0000
+++ b/model/working-copy/prov-dm-constraints.html	Wed Feb 15 11:11:54 2012 +0000
@@ -494,7 +494,7 @@
   
 </section>
 
-<section id="data-model-concepts"> 
+<section id="definitional-constraints"> 
 
 <h2>PROV-DM Definitional Constraints and Inferences</h2>
 
@@ -1118,13 +1118,13 @@
 
 <p>
 	<div class="structural-forward">
-	  See Section <a href="#structural-constraints">structural-constraints</a> for a structural constraint on account records.
+	  See Section <a href="#structural-constraints">structural-constraints</a> for a structural constraint on accounts
 	</div>
 </p>
 
 <p>In some cases, there may be a requirement for the two descriptions to be included in a same account. To satisfy the constraint <a href="#unique-description-in-account">unique-description-in-account</a>, we can adopt a different identifier for one of them, and relate the two descriptions with the <span class="name">alternateOf</span> relation. </p>
 
-<div class="anexample">
+<div class="anexample" id="merge-with-rename">
 <p>We now reconsider the same two descriptions of a same entity, but we change the identifier for one of them:</p>
 <pre class="codeexample">
 entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ])
@@ -1366,55 +1366,25 @@
 <section id="structural-constraints"> 
 <h3>PROV-DM Structural Constraints</h3>
 
-<section>
-<h3>Stuff to Edit</h3>
-<li>
-The entity identifier <span class="name">id</span> contained in an entity record is expected to be unique among all the identifiers contained in  the current account's records. 
-This constraint is elaborated upon in <a href="#identifiable-term-in-account">identifiable-term-in-account</a>. It means that the current account does not contain any other record for
-this identifier. Effectively, <span class="name">id</span>  acts as a <em>local</em> identifier for this record.  In this specification, whenever we write "an entity record identified by ...
-",  this identification is to be understood in the context of the account that defines it. </li>
-
-<li>The activity identifier <span class="name">id</span> contained in an activity record is also expected to be unique among all the identifiers contained in  the current account's records. 
-This constraint is elaborated upon in <a href="#identifiable-term-in-account">identifiable-term-in-account</a>. It means that the current account does not contain any other record for
-this identifier, and that effectively <span class="name">id</span>  acts as a <em>local</em> identifier for this record in the current account.</li>
-
-<li>An entity assertion
- is about a thing, whose  situation in the world may be variant.
- An entity record is asserted at a particular point and is invariant, in the sense that 
-its attributes are given a value as part of that assertion.
-</li>
-
+<p><a href="#definitional-constraints">Section 4</a> provides definitional constraints for data model concepts.
+<a href="#account-constraints">Section 5</a> introduce constraints on descriptions occurring in accounts.
+<a href="#account-constraints">Section 6</a> defines an interpretation of this data model, in terms of event ordering
+constraints.  
+This section introduces further constraints on the structure of PROV-DM descriptions.  Descriptions that satisfy these constraints are said to be <dfn>structurally well-formed</dfn>.  A
+benefit of structurally well-formed provenance descriptions is that further inferences can be made, because descriptions are more precise, and therefore, richer. </p>
 
-<li id='attribute-occurrence-in-entity-record'>The attributes
-occurring in an entity record MUST be declared in the namespace
-referred to by their prefix according to
-<a href="#term-attribute">Section term-attribute</a>. Furthermore,
-for each attribute, a namespace also declares the number of
-occurrences it may have in a list of attributes. An entity record is
-valid if the number of occurrences of any of its attributes is
-compatible with this attribute's declaration it its namespace. This
-property applies to all types of records, and is referred to
-as <a>attribute occurrence validity</a>.</li>
-
-
-</section>
-
-<p>Sections 5 and 6 define a data model for provenance, which, for the most part, is unconstrained. Section 7.1 defines an interpretation of this data model, in terms of event ordering
-constraints.  This section introduces further constraints on the structure of PROV-DM records.  Records that satisfy these constraints are said to be <dfn>structurally well-formed</dfn>.  A
-benefit of structurally well-formed provenance records is that further inferences can be made, because records are more precise, and therefore, richer. </p>
-
-<p>According to the definition of a <a>generation record</a>, an entity becomes available after this entity's generation event, and does not exist before this event.  From this definition,
-we conclude that PROV-DM does not allow for an entity to have two generation records occurring at two different instants.
+<p>According to the definition of a <a>generation</a>, an entity becomes available after this entity's generation event, and does not exist before this event.  From this definition,
+we conclude that PROV-DM does not allow for an entity to have two generations occurring at two different instants.
 The rationale for this constraint is as follows.
  Two distinct <a title="entity generation event">generation events</a> (by a same activity or by two distinct activities), occurring one after the other, necessarily create two distinct
 entities; otherwise, the second <a title="entity generation event">generation event</a> would have resulted in an entity that existed before its creation, which contradicts the definition of
-<a>generation record</a>.</p>
+<a>generation</a>.</p>
 
-<p>So, PROV-DM allows for two distinct <a>generation records</a>  <span class="name">g1</span> and <span class="name">g2</span> referencing a same entity record provided they occur
+<p>So, PROV-DM allows for two distinct <a>generations</a>  <span class="name">g1</span> and <span class="name">g2</span> referencing a same entity provided they occur
 <em>simultaneously</em>. 
 <!-- (This means that <span class="name">g1</span> <a>precedes</a> <span class="name">g2</span> and <span class="name">g2</span> <a>precedes</a> <span class="name">g1</span>.) -->
-  In practice, for such a simultaneous generation to occur, the generation event has to be unique and caused by a <em>single world activity</em>, though the provenance records may contain
-different activity records providing alternative descriptions of that same world activity. 
+  In practice, for such a simultaneous generation to occur, the generation event has to be unique and caused by a <em>single world activity</em>, though  provenance may contain
+several  descriptions for the <em>same</em> world activity. 
 </p>
 
 <div class="anexample">
@@ -1430,80 +1400,70 @@
 wasGeneratedBy(e,a0)
 wasGeneratedBy(e,a2)
 </pre>
-This example is permitted in PROV-DM if the two activity records <span class="name">a0</span> and <span class="name">a2</span> provide alternate descriptions of what happens in the world
-with respect to this generation event.
+<p>So, we have two different <a title="generation">generations</a> for entity <span class="name">e</span>.  Such an example is permitted in PROV-DM if the two activities denoted by <span class="name">a0</span> and <span class="name">a2</span> are a single thing happening  in the world
+but described from different perspectives.</p>
 </div>
 
-<p>While this example is permitted in PROV-DM, it does not expose the hierarchical organization of executions and  it mixes records providing two descriptions of a same execution. This issue
-is highlighted by two different <a title="generation record">generation records</a> for entity <span class="name">e</span>, which makes reasoning about this kind of provenance records
-unnecessarily difficult.  Such assertions are said not be structurally well-formed.</p>
+<p>While this example is permitted in PROV-DM, it does not make the inter-relation between activities explicit, and  it mixes descriptions expressed from different perspectives together. 
+While this may acceptable in some specific applications, it becomes challenging for inter-operability. Indeed, PROV-DM does not offer any relation describing the structure of activities.
+  Such descriptions are said not be structurally well-formed.</p>
 
-<p>Structurally well-formed provenance records can be obtained by partitioning the generation records into different accounts. This makes it clear that these records provide alternative
-descriptions of the same real-world generation event, rather than describing two generation events for the same entity. When accounts are used, the example can be encoded as follows.</p>
+<p>Structurally well-formed provenance can be obtained by partitioning the generations into different accounts. This makes it clear that these generations provide alternative
+descriptions of the same real-world generation event, rather than describing two distinct generation events for the same entity. When accounts are used, the example can be encoded as follows.</p>
 
 
 <div class="anexample">
 <p>
-The same example is now revisited, with the following assertions that are structurally well-formed. Two accounts are introduced, and there is a single generation record for entity <span
+The same example is now revisited, with the following assertions that are structurally well-formed. Two accounts are introduced, and there is a single generation for entity <span
 class="name">e</span> per account.</p>
+
+<p>In a first account, entitled "summary", we find:</p>
 <pre class="codeexample">
-account(ex:summary,
-        http://example.org/asserter, 
         activity(a0,t1,t2,[prov:type="workflow execution"])
-        wasGeneratedBy(e,a0))
-
-account(ex:detailed,
-        http://example.org/asserter, 
+        wasGeneratedBy(e,a0)
+</pre>
+<p>In a second account, entitled "detail", we find:</p>
+<pre class="codeexample">
         activity(a1,t1,t3,[prov:type="workflow execution"])
         activity(a2,t3,t2,[prov:type="workflow execution"])
         wasInformedBy(a2,a1)
-        wasGeneratedBy(e,a2))
+        wasGeneratedBy(e,a2)
 </pre>
 </div>
 
 
 
-<!--
-<p>A given entity record can be referred to in a single generation record in the scope of a given <a href="#term-Account">account</a>.
-The rationale for this constraint is as follows.
-If two activities sequentially set different values to some attribute by means of two different <a title="entity generation event">generation events</a>, then they generate distinct
-entities. Alternatively,  for two activities to generate an entity simultaneously, they would require some synchronization by which they agree the entity is released for use; the end of this
-synchronization would constitute the actual generation of the entity, but is performed by a single activity. This uniqueness constraint is formalized as follows.
--->
-
-
-<p>Structurally well-formed records satisfy some constraints, which force the structure of descriptions to be exposed by means of accounts. With these constraints satisfied, further
-inferences can be made about structurally well-formed records.
-The uniqueness of generation records in accounts is formulated as follows.
+<p>Structurally well-formed provenance satisfies some constraints, which force the structure of descriptions to be exposed by means of accounts. With these constraints satisfied, further
+inferences can be made about structurally well-formed descriptons.
+The uniqueness of generations in accounts is formulated as follows.
 </p>
 
-<div class='constraint' id='generation-uniqueness'>Given an entity record denoted by <span class="name">e</span>, two activity records denoted by <span class="name">a1</span> and <span
+<div class='constraint' id='generation-uniqueness'>Given an entity denoted by <span class="name">e</span>, two activities denoted by <span class="name">a1</span> and <span
 class="name">a2</span>, and two sets of attribute-value pairs <span class="name">attrs1</span> and <span class="name">attrs2</span>,
-<span class='conditional'>if</span> the records <span class="name">wasGeneratedBy(e,a1,attrs1)</span> and <span class="name">wasGeneratedBy(e,a2,attrs2)</span> exist in the scope of a given
+<span class='conditional'>if</span> <span class="name">wasGeneratedBy(id1,e,a1,attrs1)</span> and <span class="name">wasGeneratedBy(id2,e,a2,attrs2)</span> exist in the scope of a given
 account,
-<span class='conditional'>then</span> <span class="name">a1</span>=<span class="name">a2</span>  and <span class="name">attrs1</span>=<span class="name">attrs2</span>.
+<span class='conditional'>then</span> <span class="name">id1</span>=<span class="name">id2</span>, <span class="name">a1</span>=<span class="name">a2</span>  and <span class="name">attrs1</span>=<span class="name">attrs2</span>.
 </div> 
 
 
 
-<p>A further inference is permitted from the imprecise-1 derivation record: </p>
+<p>A further inference is permitted from the imprecise-1 derivations: </p>
 <div class='inference' id='derivation-use'>
-<p>Given an activity record identified by <span class="name">a</span>, entity records identified by <span class="name">e1</span> and <span class="name">e2</span>, and set of attribute-value
+<p>Given an activity with identifier <span class="name">a</span>, entities  denoted by <span class="name">e1</span> and <span class="name">e2</span>, and a set of attribute-value
 pairs <span class="name">attrs2</span>,
 <span class='conditional'>if</span> <span class="name">wasDerivedFrom(e2,e1, [prov:steps="single"])</span> and <span class="name">wasGeneratedBy(e2,a,attrs2)</span> hold, <span
 class='conditional'>then</span> <span class="name">used(a,e1,attrs1)</span> also holds
 for some set of attribute-value pairs <span class="name">attrs1</span>.
 </div>
-<p>This inference is justified by the fact that the entity represented by entity record identified by <span class="name">e2</span> is generated by at most one activity in a given account
-(see <a href="#generation-uniqueness">generation-uniqueness</a>). Hence,  this activity record is also the one referred to in the usage record of <span class="name">e1</span>. 
+<p>This inference is justified by the fact that the entity denoted by <span class="name">e2</span> is generated by at most one activity in a given account
+(see <a href="#generation-uniqueness">generation-uniqueness</a>). Hence,  this activity is also the one referred to by the usage of <span class="name">e1</span>. 
 </p>
 
 
 
 <p>We note that the converse inference, does not hold.
 From <span class="name">wasDerivedFrom(e2,e1)</span> and <span class="name">used(a,e1)</span>, one cannot
-derive <span class="name">wasGeneratedBy(e2,a,attrs2)</span> because identifier <span class="name">e1</span> may occur in usage records referring to 
-many activity records, but they may not be referred to in generation records containing identifier <span class="name">e2</span>.</p>
+derive <span class="name">wasGeneratedBy(e2,a,attrs2)</span> because identifier <span class="name">e1</span> may occur in usages performed by many activities, which may have not generated the entity denoted by <span class="name">e2</span>.</p>
 
 
 <p>
@@ -1511,15 +1471,19 @@
 it satisfies the constraint  <a href="#generation-uniqueness">generation-uniqueness</a>. If an account is structurally well-formed, it support the inference <a
 href="#derivation-use">derivation-use</a>.</p>
 
-<p> The union of two accounts is another account, 
-containing the unions of their respective records, where
- records with a same identifier should be understood according to constraint <a href="#identifiable-term-in-account">identifiable-term-in-account</a>. Structurally well-formed
+<p> Taking the union of two accounts is another account, 
+formed by the union of the descriptions they respectively contain.  We note that the resulting union may or may not invalidate some constraints:
+<ul>
+<li> Two entity descriptions with a same identifier but different sets of attributes exist in each original account may invalidate <a href="#unique-description-in-account">unique-description-in-account</a> in the union, unless some form of description merging or renaming (as per <a href="#merge-with-rename">Example</a>) occurs.
+<li> Structurally well-formed
 accounts are not
 closed under union because the
 constraint <a href="#generation-uniqueness">generation-uniqueness</a> may no
-longer be satisfied in the resulting union.  </p>
+longer be satisfied in the resulting union.  </li>
+</ul>
+<p>How to reconcile such accounts is beyond the scope of this specification.</p>
 
-<p>
+<!--
 Indeed, let us reconsider example <a href="#account-example-1">account-example-1</a>, and let us define another account record as follows.</p>
 
 <div class="anexample">
@@ -1539,7 +1503,7 @@
 the resulting set of records violates <a href="#generation-uniqueness">generation-uniqueness</a> if the two activities <span class="name">a0</span> and <span class="name">a1</span> are
 distinct.</p>
 </div>
-
+-->
 
 <div class="note">
 Can the semantics characterize better what can be achieved with structurally well-formed accounts?
@@ -1743,7 +1707,33 @@
 compatible with the temporal layout of the graphical notation.
 -->
 
+<section>
+<h3>From Scruffy to Proper ...</h3>
 
+<div class='note'>
+Now that all constraints have been defined, it's time to revisit provenance nicknamed scuffy, and see what it actuall means, and how it can be made more proper.  
+</div>
+</section>
+
+<section>
+<h3>Stuff to Keep, Maybe?</h3>
+
+
+
+
+<li id='attribute-occurrence-in-entity-record'>The attributes
+occurring in an entity record MUST be declared in the namespace
+referred to by their prefix according to
+<a href="#term-attribute">Section term-attribute</a>. Furthermore,
+for each attribute, a namespace also declares the number of
+occurrences it may have in a list of attributes. An entity record is
+valid if the number of occurrences of any of its attributes is
+compatible with this attribute's declaration it its namespace. This
+property applies to all types of records, and is referred to
+as <a>attribute occurrence validity</a>.</li>
+
+
+</section>
 
 <section class="appendix"> 
       <h2>Acknowledgements</h2>