prov: changeset 3427:d0521144ced4

--- a/model/prov-constraints.html	Thu Jun 21 10:06:40 2012 -0700
+++ b/model/prov-constraints.html	Fri Jun 22 07:27:30 2012 -0700
@@ -271,25 +271,29 @@
 PROV-DM, the PROV data model, is a data model for provenance that describes
 the entities, people and activities involved in
 producing a piece of data or thing. 
-PROV-DM is structured in six components, dealing with: 
-(1) entities and activities, and the time at which they were created, used, or ended;
-(2) agents bearing responsibility for entities that were generated and activities that happened;
-(3) derivations of entities from entities;
-(4) properties to link entities that refer to a same thing;
-(5) collections forming a logical structure for its members;
-(6) a simple annotation mechanism.
+PROV-DM is organized in six components, respectively dealing with:
+(1)
+entities and activities, and the time at which they were created,
+used, or ended;
+(2) derivations of entities from entities;
+(3) agents
+bearing responsibility for entities that were generated and activities
+that happened;
+(4) a notion of bundle, a mechanism to support
+provenance of provenance; 
+(5) properties to link entities that
+refer to the same thing;
+(6) collections forming a logical structure for its members. 
 </p>
 
 
-<p> This document introduces a further set of concepts useful for
-  understanding the PROV data model and defines <i>inferences</i>
+<p> This document introduces <i>inferences</i> and <i>definitions</i>
   that are allowed on provenance statements and <i>validity
-  constraints</i> that PROV instances should
-  follow. These inferences and constraints are useful for readers who
-  develop applications that generate provenance or reason over
-  provenance.
-</p>
-</section>
+  constraints</i> that PROV instances should satisfy. These inferences
+  and constraints are useful for readers who develop applications that
+  generate provenance or reason over provenance.  They can also be
+  used to <i>normalize</i> PROV instances to a form that can easily be
+  compared.</p> </section>
 
 <section id="sotd">
 <h4>PROV Family of Specifications</h4>
@@ -316,6 +320,8 @@
 </ul>
 
 <h4>First Public Working Draft</h4>
+<div class='note'> Revise for 2PWD</div>
+
  <p>This is the first public release of the PROV-CONSTRAINTS
 document. Following feedback, the Working Group has decided to
 reorganize the PROV-DM document substantially, separating the data model,
@@ -356,53 +362,51 @@
 
 <h3>Purpose of this document</h3>
 
-<p> PROV-DM is a conceptual data model for provenance (realizable
-using different serializations such as PROV-N, PROV-O, or PROV-XML).
-However, nothing in the PROV-DM specification [[PROV-DM]] forces sets
-of PROV
-statements (or <a>instances</a>) to be meaningful, that is, to correspond to a consistent
-history of objects and interactions.  Furthermore, nothing in the
-PROV-DM specification enables applications to perform inferences over
-PROV <a>instances</a>.  </p>
+<p> PROV-DM is a conceptual data model for provenance, which is realizable
+using different serializations such as PROV-N, PROV-O, or PROV-XML.
+The PROV-DM specification [[PROV-DM]] does not impose any requirements
+upon sets of PROV
+statements (or <a>instances</a>) to be valid, that is, to correspond to a consistent
+history of objects and interactions to which logical reasoning can be safely applied. </p>
 
 <p> This document specifies <em>inferences</em> over PROV instances that
 applications MAY employ, including definitions of some provenance
 statements in terms of others, and also defines a class of <em>valid</em>
 PROV instances by specifying <em>constraints</em> that valid PROV instances must
 satisfy. Applications SHOULD produce valid provenance and
-MAY reject provenance that is not valid in order to increase
-the usefulness of provenance and reliability of applications that
-process it.  <a href="#compliance"
+MAY reject provenance that is not <em>valid</em>.  <a href="#compliance"
 class="sectionRef"></a>
 summarizes the requirements for compliance with this document, which
 are specified in detail in the rest of the document.
 </p>
 
-<p> This specification lists inferences and definitions together in one
-section (<a href="#inferences" class="sectionRef"></a>), defines the
-induced notion of <a>equivalence</a> (<a href="#equivalence"
-class="sectionRef"></a>), and then
-considers two kinds of validity constraints (<a href="#constraints"
-class="sectionRef"></a>): <em>structural constraints</em> that
-prescribe properties of PROV instances that can be checked directly
-by inspecting the syntax, and <em>event ordering</em> constraints that
-require that the records in a PROV <a>instance</a> are consistent with a
-sensible ordering of events relating the activities, entities and
-agents involved.  In separate sections we consider additional
-constraints specific to collections and accounts (<a
- href="#collection-constraints" class="sectionRef"></a> and <a
- href="#account-constraints" class="sectionRef"></a>).  </p>
-
-
-<p>
-Finally, the specification includes a section (<a
- href="#rationale" class="sectionRef"></a>) describing the rationale
-for the inferences and constraints in greater detail, particularly
-background on events, attributes, the role of inference, and
-accounts. A formal mathematical model that further justifies the
-constraints and inferences is found in [[PROV-SEM]].
+<p> This specification lists inferences and definitions together in
+one section (<a href="#inferences" class="sectionRef"></a>), and introduces two kinds of
+constraints (<a href="#constraints" class="sectionRef"></a>):
+<em>uniqueness constraints</em> that prescribe properties of PROV
+instances that can be checked directly by inspecting the syntax, and
+<em>event ordering</em> constraints that require that the records in a
+PROV <a>instance</a> are consistent with a sensible ordering of events
+relating the activities, entities and agents involved.  Next, 
+notions of <a>equivalence</a> and <a>normalization</a> are defined (<a href="#equivalence"
+class="sectionRef"></a>).
+In <a
+ href="#collection-constraints" class="sectionRef"></a> we consider
+additional constraints specific to collections and in <a
+ href="#account-constraints" class="sectionRef"></a> we likewise
+discuss how constraints interact with bundles.  </p>
+
+
+<p>To summarize: compliant applications use definitions,
+inferences, and uniqueness constraints to normalize PROV instances,
+and then apply event ordering constraints to determine whether the
+instance has a consistent event ordering.  If so, the instance is
+<a>valid</a>, and the normal form is considered <a>equivalent</a> to
+the original instance.  Also, any two PROV instances that yield the
+same normal form are considered <a>equivalent</a>.  Further discussion
+of the semantics of PROV statements, which justifies the inferences
+and constraints, can be found in the formal semantics [[PROV-SEM]].
 </p>
-
 </section>
 <section id="audience">
 <h3> Audience </h3>
@@ -410,11 +414,10 @@
 <p> The audience for this document is the same as for [[PROV-DM]]: developers
 and users who wish to create, process, share or integrate provenance
 records on the (Semantic) Web.  Not all PROV-compliant applications
-need to check validity when processing provenance, but many
-applications could benefit from the inference rules specified here.
-Conversely, applications that create or transform provenance should
-try to produce valid provenance, to make it more useful to other
-applications.
+need to perform inferences or check validity when processing provenance.
+However, applications that create or transform provenance should
+attempt to produce valid provenance, to make it more useful to other
+applications by ruling out nonsensical or inconsistent information.
 </p>
 
 <p>This document assumes familiarity with [[PROV-DM]].
@@ -453,7 +456,9 @@
   <div class="note">
     Should we specify a way for PROV instances to say whether they
     are meant to be validated or not?  Seems outside the scope of this
-    document, may require changes to PROV-N.
+    document, may require changes to PROV-N.  Proposal: Point out that
+    this can be done already by describing bundles; more details in
+    best practices.
   </div>
 
 <p>
@@ -520,11 +525,11 @@
 
 <section id="inferences">
 <h2>Inferences and Definitions</h2>
-
+<div class='note'> Add definitions/rules for expanding optionals</div>
 <p>
 In this section, we describe <a title="inference">inferences</a> and <a title="definition">definitions</a> that MAY be used on
-  provenance data, and a notion of <a
-title="equivalence">equivalence</a> on PROV instances.  
+  provenance data, and preserve <a>equivalence</a> on <a>valid</a>
+PROV instances (as detailed in <a href="#equivalence" class="sectionRef"></a>).
 An  <dfn id="inference">inference</dfn> is a rule that can be applied
   to PROV instances to add new PROV statements.  A <dfn
   id="definition">definition</dfn> is a rule that states that a
@@ -549,14 +554,11 @@
   <span class="name">concl_1</span> ... <span class="name">concl_n</span> to the instance, possibly after generating fresh
   identifiers <span class="name">a_1</span>,...,<span class="name">a_m</span> for unknown objects.  These fresh
   identifiers might later be found to be equal to known identifiers;
-  these fresh identifiers play a similar role in PROV constraints to existential variables in logic.
+  they play a similar role in PROV constraints to existential variables in logic.
 </p>
 <div class='note'>
-  TODO: Is this re-inventing blank nodes in PROV-DM, and do we want to
-  do this?  A lot of the inferences have existentially quantified
-  conclusions (and there is some theory that supports this).
-
-  TODO: Make sure conjunctive reading of conclusion is clear.
+  TODO: Make sure conjunctive reading of conclusion is clear.  Discuss
+  connection to TGDs.
   </div>
 
 <p> Definitions have the following general form:</p>
@@ -570,16 +572,14 @@
   </div>
  
   <p>
-  This means that a provenance expression defined_exp is defined in
+  This means that a provenance expression <span class="name">defined_exp</span> is defined in
   terms of other expressions.  This can be viewed as a two-way
   inference:  If <span class="name">defined_exp</span>
   can be found in a PROV instance, we can add all of the expressions
 <span class="name">defining_exp_1</span> ... <span class="name">defining_exp_n</span> to the instance, possibly after generating fresh
-  identifiers <span class="name">a_1</span>,...,<span class="name">a_m</span> for unknown objects.  Conversely, if there
-  exist identifiers <span class="name">a_1</span>...<span class="name">a_m</span> such that <span class="name">defining_exp_1</span>
-  and ... and <span class="name">defining_exp_n</span> hold in the instance, we can add the defined
-  expression <span class="name">def_exp</span>.  When an expression is defined in terms of
-  others, it is in a sense redundant; it is safe to replace it with
+  identifiers <span class="name">a_1</span>,...,<span
+  class="name">a_m</span> for unknown objects.  It is safe to replace
+  a defined expression with
   its definition.
 </p>
   
@@ -593,6 +593,8 @@
 as the existence of an underlying entity generated by one activity and used by the
 other.</p>
 
+  <div class="note">Explicitly list existentially quantified vars in constraints</div>
+  
 <div class='definition' id='wasInformedBy-definition'>
 <p>
 Given two activities identified by <span class="name">a1</span> and <span class="name">a2</span>, 
@@ -612,7 +614,9 @@
 from 
 <span class="name">wasInformedBy(a2,a1)</span>, we know that there exists <span class="name">e1</span> such that <span class="name">e1</span> was generated by <span class="name">a1</span>
 and used by <span class="name">a2</span>. Likewise, from <span class="name">wasInformedBy(a3,a2)</span>, we know that there exists  <span class="name">e2</span> such that <span
-class="name">e2</span> was generated by <span class="name">a2</span> and used by <span class="name">a3</span>. The following illustration shows a case for which transitivity cannot hold. The
+class="name">e2</span> was generated by <span class="name">a2</span>
+and used by <span class="name">a3</span>. The following illustration
+shows a counterexample to  transitivity. The
 horizontal axis represents the event line. We see that <span class="name">e1</span> was generated after <span class="name">e2</span> was used. Furthermore, the illustration also shows that
 <span class="name">a3</span> completes before <span class="name">a1</span>.  So it is impossible for <span class="name">a3</span> to have used an entity generated by <span
 class="name">a1</span>.</p>
@@ -620,7 +624,7 @@
 <div style="text-align: center;">
 <figure>
 <img src="images/informedByNonTransitive.png" alt="non transitivity of wasInformedBy" />
-<figcaption>Counter-example for transitivity of wasInformedBy</figcaption>
+<figcaption id="counterexample-wasInformedBy">Figure 1: Counter-example for transitivity of wasInformedBy</figcaption>
 </figure>
 </div>
 
@@ -667,7 +671,7 @@
  <p>Derivations with an explicit activity and no usage admit the
   following inference: </p>
 <div class='inference' id='derivation-use'>
-<p>Given an activity <span class="name">a</span>, entities  denoted by <span class="name">e1</span> and <span class="name">e2</span>, 
+<p>Given an activity <span class="name">a</span>, and entities  denoted by <span class="name">e1</span> and <span class="name">e2</span>, 
 <span class='conditional'>IF</span> <span class="name">wasDerivedFrom(-;e2,e1, a, -, -)</span> and <span class="name">wasGeneratedBy(-;e2,a,-)</span> hold, <span
 class='conditional'>THEN</span> <span class="name">used(-;a,e1,-)</span> also holds.
 </div>
@@ -675,34 +679,18 @@
 (see <a class="rule-text" href="#unique-generation"><span>TBD</span></a>). Hence,  this activity is also the one referred to by the usage of <span class="name">e1</span>. 
 </p>
 
-<div class="note">
-  There is some redundancy in the following discussion.
-  </div>
 
 <p>The converse inference does not hold.
 From <span class="name">wasDerivedFrom(e2,e1)</span> and <span class="name">used(a,e1,-)</span>, one cannot
-derive <span class="name">wasGeneratedBy(e2,a,-)</span> because identifier <span class="name">e1</span> may occur in usages performed by many activities, which may have not generated the entity denoted by <span class="name">e2</span>.</p>
-
-  <p>
-Note that derivation cannot in general be inferred from the existence
-of related usage and generation events. Indeed, when a generation <span class="name">wasGeneratedBy(g; e2, a, -, attrs2)</span>
-<a>precedes</a> <span class="name">used(u; a, e1, -, attrs1)</span>, for
-some <span class="name">e1</span>, <span class="name">e2</span>, <span class="name">attrs1</span>, <span class="name">attrs2</span>, and <span class="name">a</span>, one
-cannot infer derivation <span class="name">wasDerivedFrom(e2, e1, a, g, u)</span>
-or <span class="name">wasDerivedFrom(e2,e1)</span> since 
- <span class="name">e2</span> cannot possibly be derived from
- <span class="name">e1</span>, given the creation of <span class="name">e2</span> <a>precedes</a> the use
-of <span class="name">e1</span>.  That is, if <span class="name">e1</span> is generated
-by an activity before <span class="name">e2</span> is used, then
-obviously  <span class="name">e2</span> cannot have been derived from
-<span class="name">e1</span>.  However, even if  <span
-class="name">e2</span> happens used before   <span class="name">e1</span>
+derive <span class="name">wasGeneratedBy(e2,a,-)</span> because
+  identifier <span class="name">e1</span> may be used by many
+  activities, whereas at most one activity could generate the entity
+  <span class="name">e2</span>.  Even if  <span
+class="name">e2</span> is used by some activity that later generates   <span class="name">e1</span>
 is generated, it is not safe to assume that  <span
 class="name">e2</span> was derived from  <span class="name">e1</span>.
-</p>
-
-<p> Derivation is not defined to be transitive. Domain-specific specializations of derivation may be defined in such a way that the transitivity property
-holds.</p>
+Derivation is not defined to be transitive either, following similar
+  reasoning as for <span class="name">wasInformedBy</span>. </p>
 
   
 
@@ -769,9 +757,9 @@
 <span class="name">e</span> and <span class="name">ag</span>,  
 <span class='conditional'>THEN</span> there exists an activity with some identifier <span class="name">a</span> such that the following statements hold:
 <pre>
-activity(a, -, -)
-wasGeneratedBy(-; e, a, -)
-wasAssociatedWith(-; a, ag, -)
+activity(a,-,-)
+wasGeneratedBy(-;e,a,-)
+wasAssociatedWith(-;a,ag,-)
 </pre>
 </p>
 </div>
@@ -836,11 +824,9 @@
 
 
  <section> 
-<h3>Component 4: Alternate Entities</h3>
-<div class="note">TODO: There is currently no consensus what inferences on
-  alternate or specialization should be assumed.  The following
-  section lists possible inferences that may or may not be adopted. Section is under review, pending ISSUE-29.
-</div>
+<h3>Component 4: Alternative and Specialized Entities</h3>
+
+<div class="note"> Merge in motivating discussion from sec. 8 here.</div>
 
 
   <p>The relation <span class='name'>alternateOf</span> is an equivalence relation: <a>reflexive</a>,
@@ -920,7 +906,9 @@
 <hr>
 
 
-
+<div class="note">This constraint belongs with other uniqueness
+  constraints, or in the section on bundles.</div>
+  
 
 <div id='functional-contextualization_text'>
 <p>An entity can be the subject of at most one contextualization relation.</p>
@@ -956,210 +944,12 @@
   </div>
 
 
-  <div class="note">
-    The following sections are retained from an older version, and are
-    not consistent with the above constraints.  This will be revised
-    once the consensus on ISSUE-29 is clearer.
-    </div>
-    
-  <section id="term-Specialization">
-<h3>Specialization</h3>
-
-
-<p>Specialization is <em>neither symmetric nor anti-symmetric</em>.
-</p>
-
-<div class="anexample" id="anexample-not-symmetric">
-"Alice's toyota car on fifth Avenue" is a specialization of "Alice's toyota car", but the converse does not hold.
-</div>
-
-<div class="anexample" id="anexample-specialization-not-anti-symmetric">
-anti-symmetric counter-example???
-</div>
-
-
-<p>Specialization is <em>transitive</em>. Indeed if <span
-class="name">specializationOf(e1,e2)</span> holds, then there is some
-common thing, say <span class="name">T1-2</span> they both refer to,
-and  <span class="name">e1</span> is a more specific aspect of this
-thing than <span class="name">e2</span>. Likewise, if <span
-class="name">specializationOf(e2,e3)</span> holds, then there is some
-common thing, say <span class="name">T2-3</span> they both refer to, and  <span class="name">e2</span> is a more specific aspect of this
-thing than <span class="name">e3</span>.  The things <span
-class="name">T1-2</span> and <span class="name">T2-3</span> are the
-same since <span class="name">e2</span> is an aspect of both of them,
-so <span
-class="name">specializationOf(e1,e3)</span> follows since <span class="name">e1</span> and <span class="name">e3</span>
-are aspects fo the same thing and <span class="name">e1</span> is more specific than <span class="name">e3</span>. </p>
-
-
-<div class="anexample" id="anexample-specialization-is-transitive">
-A specialization of "this  email message" would be, for example, the "printed version on my desk", which is a specialization of "my thoughts on this email thread".  So, the "printed version on my desk" is also a specialization   "my thoughts on this email thread".
-</div>
-
-
-</section> 
-
-<section id="term-Alternate">
-<h3>Alternate</h3>
-</section> 
-
-
-<p>Alternate not is <em>reflexive</em>. Indeed, <span class="name">alternate(e,e)</span> does not hold for any arbitrary entity <span class="name">e</span> since  <span class="name">e</span> may not be a specialization of another entity.</p>
-
-
-<p>Alternate is <em>symmetric</em>. Indeed, if <span class="name">alternate(e1,e2)</span> holds,
-then there exists an unspecified entity <span class="name">e</span>, such that
-both <span class="name">e1</span> and <span class="name">e2</span> are specialization of <span class="name">e</span>.
-Therefore, <span class="name">alternate(e2,e1)</span> also holds.
-</p>
-
-
-
-<p>Alternate is <em>not transitive</em>. Indeed, if <span class="name">alternate(e1,e2)</span> holds,
-then there exists an unspecified entity <span class="name">e1-2</span>, such that
-both <span class="name">e1</span> and <span class="name">e2</span> are specialization of <span class="name">e1-2</span>.
-Likewise, if <span class="name">alternate(e2,e3)</span> holds,
-then there exists an unspecified entity <span class="name">e2-3</span>, such that
-both <span class="name">e2</span> and <span class="name">e3</span> are specialization of <span class="name">e2-3</span>.
-It does not imply that there is a common entity <span class="name">e1-3</span>
-that both  <span class="name">e1</span> and <span class="name">e3</span> specialize.
-</p>
-
-
-<div class="anexample" id="anexample-alternate-not-transitive1">
-<p>At 6pm, the customer in a chair is a woman in a red dress, who happens to be Alice. After she leaves, another customer arrives at 7pm, a man with glasses, who happens to be Bob.  Transitivity does not hold since the <span class="name">womanInRedDress\</span> is not alternate of <span class="name">customerInChairAt7pm</span>.
-<pre>
-alternate(womanInRedDress,customerInChairAt6pm)
-specialization(customerInChairAt6pm,Alice)
-specialization(womanInRedDress,Alice)
-
-alternate(manWithGlasses,customerInChairAt7pm)
-specialization(customerInChairAt7pm,Bob)
-specialization(manWithGlasses,Bob)
-
-alternate(customerInChairAt6pm, customerInChairAt7pm)
-specialization(customerInChairAt6pm, customerInChair)
-specialization(customerInChairAt7pm, customerInChair)
-</pre>
-</div>
-
-<p>The above example shows that  <span class="name">customerInChairAt6pm</span> and <span class="name">customerInChairAt7pm</span> are two alternate entities that have no overlap, while <span class="name">womanInRedDress</span> and <span class="name">customerInChairAt6pm</span> do overlap.
-The following example   illustrates another case of non-overlapping alternate entities.
-</p>
-
-<div class="anexample">Two copies of the same book, where copy A was destroyed before copy B was made.</div>
-
-
 </section>
 
 </section>
 
 
 
-  <section id="equivalence">
-<h2>Equivalence</h2>
-
-
-  For the purpose of checking inferences and constraints, we define a
-notion of <a>equivalence</a> of PROV s.  Equivalence has the following characteristics:
-
-
-<ul>
-  <li>Missing attributes that are interpreted as omitted values are
-  handled by generating a fresh
-  identifier for the omitted value.
-  </li>
-  <li> Redundant expressions are merged according to uniqueness
-  constraints. </li>
-  <li>
-  The order of provenance expressions is irrelevant to the meaning of
-  a PROV instance.  That is, a
-  PROV instance is equivalent to any other instance obtained by
-  permuting its expressions.
-  </li>
-  <li>
-  Inference rules and definitions preserve equivalence.  That is, a <a>PROV
-  instance</a> is equivalent to the instance obtained by applying any
-  inference rule.
-  </li>
-  <li>Equivalence is reflexive, symmetric, and transitive.</li>
-</ul>
-
-  <section id="optional-attributes">
-  <h3>Optional Attributes</h3>
-  
-<div class="note">
-  TODO: Clarify how optional attributes are handled; clarify merging.  The following is
-  not very explicit about the difference between "not present" and
-  "omitted but inferred".
-  </div>
-<div id="optional-attributes1">
-<p>PROV-DM allows for some attributes to
-  be optionally expressed. Unless otherwise specified, when an
-  optional attribute is not present in a statement, some value
-  SHOULD be assumed to exist for this attribute, though it is not
-  known which.</p>
-
-  The only exceptions are:
-  <ul>
-   <li><span id="optional-attributes2">Activities also allow for an
-  optional start time attribute.  If both are specified, they MUST be
-  the same, as expressed by <a class="rule-text"
-href="#unique-startTime"><span>TBD</span></a>.</span></li>
-
-
-
-
-    <li><span id="optional-attributes3">Activities also allow for an optional end time attribute.  If both are specified, they MUST be the same, as expressed by <a class="rule-text"
-href="#unique-endTime"><span>TBD</span></a>.</span></li>
-
-</li>
-<li><div id="optional-attributes4">In an association of the form
-  <span class="name">wasAssociatedWith(a, ag, -, attr)</span>, the
-  absence of a plan means: either no plan exists, or a plan exists but
-  it is not identified.</div></li>
-  <li><div id="optional-attributes5">
-In an association of the form <span class="name">wasAssociatedWith(a, -, pl, attr)</span>, an agent exists but it is not identified.</div>
-</li>
-<li><div id="optional-activity">
-In a delegation of the form <span class="name">actedOnBehalfOf(a,
-  ag2, ag1, -, attr)</span>, an activity exists but is not identified.
-</div></li>
-   </ul>
-</div>
-
-</section>
-
-<section id="normalization">
-<h3>Normalization</h3>
-
-
-<p>
-We define the <dfn>normal form</dfn> of a PROV instance as the set
-of provenance expressions resulting from merging all of the overlapping
-expressions in the instance and applying all possible inference rules
-to this set.  Formally, we say that two PROV  instances are
-<dfn>equivalent</dfn> if they have the same normal form (that is,
-after applying all possible inference rules, the two instances produce
-the same set of PROV-DM expressions.)
-</p>
-
-<div class="note">
-  We should check that normal forms exist, i.e. that applying rules
-  and definitions eventually terminates.  More clarity is needed about
-  enforcing uniqueness via merging vs. constraint checking.
-  </div>
-
-<p> An application that processes PROV-DM data SHOULD handle
-equivalent instances in the same way. (Common exceptions to this rule
-include, for example, pretty printers that seek to preserve the
-original order of statements in a file and avoid expanding
-inferences.)  </p>
-
-</section>
-  
-</section> <!-- inferences -->
 
 <section id="constraints">
 <h2>Validity Constraints</h2>
@@ -1168,13 +958,8 @@
 
 
 <p>
-This section defines a collection of constraints on PROV instances.  A PROV instance is <dfn id="dfn-valid">valid</dfn>
-  if, after applying all possible inference and definition rules from
-  <a href="#inferences">Section 2</a>, the resulting instance
-  satisfies all of the constraints specified in this section.
- </p>
-
-  <p> There are two kinds of constraints:
+This section defines a collection of constraints on PROV instances.
+There are two kinds of constraints:
   <ul><li><em>uniqueness constraints</em> that say that a <a>PROV
   instance</a> can contain at most one expression or that multiple
   expressions about the same objects need to have the same values (for
@@ -1190,6 +975,9 @@
     </li>
     </ul>
 
+    Uniqueness constraints are checked through <a>merging</a> pairs of
+    statements.  
+
 <p>The PROV data model is implicitly based on a notion of <dfn
   id="dfn-event">instantaneous event</dfn>s (or just <a
   title="instantaneous event">event</a>s), that mark
@@ -1198,19 +986,23 @@
 notion of event is not first-class in the data model, but it is useful
 for explaining its other concepts and its semantics [[PROV-SEM]].
 Thus, events help justify  <i>inferences</i> on provenance as well as
-<i>validity</i> constraints indicating when provenance is self-consistent. In <a href="#section-event-time" class="sectionRef"></a> we
-discuss the motivation for <a title="instantaneous event">instantaneous events</a>
-and their relationship to time in greater detail.</p>
+<i>validity</i> constraints indicating when provenance is
+  self-consistent.
+
+<div class="note">Merge in event descriptions from sec. 8.4</div>
+    
 
 <p>  PROV-DM
 identifies five kinds of <a title="instantaneous event">instantaneous events</a>, namely <a>entity generation
 event</a>, <a>entity usage event</a>, <a>entity invalidation event</a>, <a>activity start event</a>
 and <a>activity end event</a>.  PROV-DM adopts Lamport's clock
-assumptions [[CLOCK]] in the form of a reflexive, transitive partial order <a>follows</a>
-(and its inverse <a>precedes</a>) between <a title="instantaneous event">instantaneous events</a>.  Furthermore,
-PROV-DM assumes the existence of a mapping from <a title="instantaneous event">instantaneous events</a> to time clocks,
+assumptions [[CLOCK]] in the form of a reflexive, transitive partial order <a>precedes</a>
+between <a title="instantaneous event">instantaneous events</a>.  Furthermore,
+PROV-DM assumes the existence of a mapping from <a
+title="instantaneous event">instantaneous events</a> to time instants,
 though the actual mapping is not in scope of this specification.</p>
 
+<div class="note"> Preceding sentence a bit vague</div>
     
 <div class="note">
   TODO: More about what it means for constraints to be satisfied;
@@ -1220,10 +1012,8 @@
     <section id="structural-constraints">
 <h3>Uniqueness Constraints</h3>
 
-<div class="note">
-Attribute uniqueness constraints?
-</div>
-
+  <div class="note"> More explanation about constraints.  Non-examples
+    such as attributes not necessarily being unique.  Define merging semantics.</div>
 <p>
 <hr>
 
@@ -1275,46 +1065,28 @@
 <span class='conditional'>THEN</span> <span class="name">id1</span>=<span class="name">id2</span>, <span class="name">a1</span>=<span class="name">a2</span>, <span class="name">t1</span>=<span class="name">t2</span>  and <span class="name">attrs1</span>=<span class="name">attrs2</span>.</p>
 </div> 
 
-<div class="note">
-Wouldn't the above constraint violate uniqueness?
-</div>
-
+<div class='constraint' id='unique-invalidation'>
+<p>
+Given an entity denoted by <span class="name">e</span>, two activities denoted by <span class="name">a1</span> and <span
+class="name">a2</span>, two time instants  <span class="name">t1</span> and <span
+class="name">t2</span>, and two sets of attribute-value pairs <span class="name">attrs1</span> and <span class="name">attrs2</span>,
+<span class='conditional'>IF</span> <span class="name">wasInvalidatedBy(id1; e, a1, t1, attrs1)</span> and <span class="name">wasInvalidatedBy(id2; e, a2, t2, attrs2)</span> exist,
+<span class='conditional'>THEN</span> <span class="name">id1</span>=<span class="name">id2</span>, <span class="name">a1</span>=<span class="name">a2</span>, <span class="name">t1</span>=<span class="name">t2</span>  and <span class="name">attrs1</span>=<span class="name">attrs2</span>.</p>
+</div> 
 <div class="note">
-Invalidation uniqueness?
-</div>
-
-<p>
-
-<hr>
-
-<p id='unique-generation-time_text'>A generation can be used to indicate a generation time without having to specify the involved activity.  A generation time is unique, as specified by the following constraint.<p> 
-<div class="note">
-Seems redundant given unique-generation
+Fill in invalidation uniqueness.
 </div>
-<div class='constraint' id='unique-generation-time'>
-<p>Given an entity denoted by <span class="name">e</span> and 
-two time instants  <span class="name">t1</span> and <span
-class="name">t2</span>,
-<span class='conditional'>IF</span> <span class="name">wasGeneratedBy(-;e, -, t1)</span> and <span class="name">wasGeneratedBy(-;e, -, t2)</span> hold, <span class='conditional'>THEN</span> <span class="name">t1</span>=<span class="name">t2</span>.</p>
-</div> 
-
+
+<p> It follows from the above constraints that the generation and
+invalidation times of
+an entity are unique, if specified.
 <p>
 
-<hr>
-
-<p id='unique-invalidation-time_text'>An invalidation can be used to indicate an invalidation time without having to specify the involved activity.  An invalidation time is unique, as specified by the following constraint.<p> 
-
-<div class='constraint' id='unique-invalidation-time'>
-<p>Given an entity denoted by <span class="name">e</span> and 
-two time instants  <span class="name">t1</span> and <span
-class="name">t2</span>,
-<span class='conditional'>IF</span> <span class="name">wasInvalidatedBy(-;e, -, t1)</span> and <span class="name">wasInvalidatedBy(-;e, -, t2)</span> hold, <span class='conditional'>THEN</span> <span class="name">t1</span>=<span class="name">t2</span>.</p>
-</div> 
 
 
 <p>
 <hr>
-
+<div class="note">Refactor these as key constraints.</div>
 <p id='unique-startTime_text'>An <a>activity start event</a> is the <a title="instantaneous event">instantaneous event</a> that marks the instant an activity starts. It allows for an optional time attribute.  <span id="optional-start-time">Activities also allow for an optional start time attribute.  If both are specified, they MUST be the same, as expressed by the following constraint.</span>
 </p>
 
@@ -1356,7 +1128,7 @@
 event">generation event</a> precedes any of this
 entity's <a title="entity usage event">usage events</a>.  Should this
 ordering constraint be violated, the associated generation and
-usage could not be credible.  The rest of this section defines
+usage would not be credible.  The rest of this section defines
 the <dfn>temporal interpretation</dfn> of provenance instances as a
 set of instantaneous event ordering constraints. </p>
 
@@ -1377,15 +1149,18 @@
 <span class="name">e1</span> follows <span class="name">e2</span>,
 this means that either the two events are equal or <span
 class="name">e1</span> happened after <span
-class="name">e2</span>. Both relations are partial orders, meaning
-that they are reflexive, transitive, and antisymmetric.</p>
+class="name">e2</span>. Both relations are strict partial orders, meaning
+that they are irreflexive, transitive, and antisymmetric.</p>
 
 <div class="note"> Do we want to allow an event to
   "precede" itself?  Perhaps precedes should be strict.
 </div>
 
 <div class="note">
-  The following discussion is unclear: what is being said here, and why?
+  The following discussion is unclear: what is being said here, and
+  why?  Proposal: The times should ideally be consistent with event
+  ordering, but doign so requires knowing how they map onto timelines,
+  which we don't specify.
   </div>
 
 <p>PROV-DM also allows for time observations to be inserted in
@@ -1400,25 +1175,9 @@
 verification is outside the scope of this specification.
 </p>
 
-<p>The following figure summarizes the ordering constraints in a
-graphical manner. For each subfigure, an event time line points to the
-right. Activities are represented by rectangles, whereas entities are
-represented by circles. Usage, generation and derivation are
-represented by the corresponding edges between entities and
-activities.  The five kinds of <a title="instantaneous event">instantaneous events</a> are represented by vertical
-dotted lines (adjacent to the vertical sides of an activity's
-rectangle, or intersecting usage and generation edges).  The ordering
-constraints are represented by triangles: an occurrence of a triangle between two <a title="instantaneous event">instantaneous event</a> vertical dotted lines represents that the event denoted by the left
-line precedes the event denoted by the right line.</p>
 
 
   
-<div style="text-align: center;">
-<figure>
-<figcaption id="ordering-activity-fig">Summary of <a title="instantaneous event">instantaneous event</a> ordering constraints for activities</figcaption>
-<img src="images/ordering-activity.png" alt="constraints between events" />
-</figure>
-</div>
 
 <!-- Constraint template: 
 <span class="conditional">IF</span>
@@ -1437,7 +1196,7 @@
 <p>
 In this section we discuss constraints from the perspective of
 the <a>lifetime</a> of an activity.  An activity starts, then during
-its lifetime uses, generates entities and communicates with  or starts
+its lifetime uses, generates or invalidates entities, and communicates with  or starts
 other
 activities, and finally ends.  The following constraints amount to
 checking that all of the events associated with an activity take place
@@ -1445,6 +1204,41 @@
 start and endpoints of its lifetime.
 </p>
 
+<p><a href="#ordering-activity">Figure 2</a>> summarizes the ordering
+  constraints on activities in a
+graphical manner. For this and subsequent figures, an event time line points to the
+right. Activities are represented by rectangles, whereas entities are
+represented by circles. Usage, generation and derivation are
+represented by the corresponding edges between entities and
+activities.  The five kinds of <a title="instantaneous event">instantaneous events</a> are represented by vertical
+dotted lines (adjacent to the vertical sides of an activity's
+rectangle, or intersecting usage and generation edges).  The ordering
+constraints are represented by triangles: an occurrence of a triangle between two <a title="instantaneous event">instantaneous event</a> vertical dotted lines represents that the event denoted by the left
+line precedes the event denoted by the right line.</p>
+  <div class="note"> Miscellanous suggestions about figures:
+<li>
+  The use of triangles in the ordering constraints leaves me uncertain
+    about which vertical line they are constraining. I worry that it
+    is not the "very next one". I would be more confident if a yellow
+    horizontal "edge" actually touched the two vertical edges that it
+    was constraining (and the yellow triangle could be on top of that
+    edge).
+
+</li><li>
+    I think it would help if the "corresponding edges between entities and activities" where the same visual style as the vertical line marking the time the Usage, generation and derivation occurred. A matching visual style provides a Gestalt that matches the concept. I am looking at subfigures b and c in 5.2.0</li>
+<li>
+Is Invalidation missing in "The following figure summarizes the ordering constraints" paragraph?</li> </ul>
+  </div>
+
+  <div style="text-align: center;">
+
+<figure>
+<img src="images/ordering-activity.png" alt="constraints between events" />
+<figcaption id="ordering-activity-fig">Summary of <a title="instantaneous event">instantaneous event</a> ordering constraints for activities</figcaption>
+</figure>
+</div>
+
+  
 <hr />
 
 <p id='start-precedes-end_text'>
@@ -1919,6 +1713,127 @@
 
 </section> <!-- constraints -->
 
+  <section id="equivalence">
+<h2>Equivalence</h2>
+<div class="note">TODO: revise to discuss role of definitions and
+  constraints in normalization, equivalence and 
+  validity checking</div>
+
+  We define a
+notion of <a>normalization</a>, <a>validity</a> and <a>equivalence</a> of PROV instancess.  Equivalence has the following characteristics:
+
+
+<ul>
+  <li>
+  The order of provenance expressions is irrelevant to the meaning of
+  a PROV instance.  That is, a
+  PROV instance is equivalent to any other instance obtained by
+  permuting its expressions.
+  </li>
+  <li>The order of attribute-value pairs in attribute lists is
+  irrelevant to the meaning of a PROV statement.  That is, a PROV
+  statement carrying attributes is equivalent to any other statement
+  obtained by permuting attribute-value pairs.
+  </li>
+  <li>
+  Applying inference rules, definitions, and uniqueness constraints preserves equivalence.  That is, a <a>PROV
+  instance</a> is equivalent to the instance obtained by applying any
+  inference rule or definition, or by <a>merging</a> two statements to
+  enforce a uniqueness constraint.
+  </li>
+  <li>Equivalence is reflexive, symmetric, and transitive.</li>
+</ul>
+
+  <section id="optional-attributes">
+  <h3>Optional Attributes</h3>
+  
+<div class="note">
+  TODO: Clarify how optional attributes are handled; clarify merging.  The following is
+  not very explicit about the difference between "not present" and
+  "omitted but inferred".
+  </div>
+<div id="optional-attributes1">
+<p>PROV-N allows some statement parameters and attributes to
+  be optional. Unless otherwise specified, when an
+  optional attribute is not present in a statement, some value
+  SHOULD be inferred for this attribute.</p>
+
+  The only exception is:
+  <ul>
+<li><div id="optional-attributes4">In an association of the form
+  <span class="name">wasAssociatedWith(a, ag, -, attr)</span>, the
+  absence of a plan means: either no plan exists, or a plan exists but
+  it is not identified.</div></li>
+</div></li>
+   </ul>
+</div>
+
+</section>
+
+<section id="normalization">
+<h3>Normalization and Equivalence</h3>
+<div class="note"> Is instance = bundle here?</div>
+
+<p>
+We define the <dfn>normal form</dfn> of a PROV instance as the set
+of provenance expressions resulting from merging all of the overlapping
+expressions in the instance and applying all possible inference rules
+to this set.  Formally, we say that two PROV  instances are
+<dfn>equivalent</dfn> if they have the same normal form (that is,
+after applying all possible inference rules, the two instances produce
+the same set of PROV-DM expressions.)
+</p>
+
+<div class="note">
+  We should check that normal forms exist, i.e. that applying rules
+  and definitions eventually terminates.  More clarity is needed about
+  enforcing uniqueness via merging vs. constraint checking.
+  </div>
+
+<p> An application that processes PROV-DM data SHOULD handle
+equivalent instances in the same way. (Common exceptions to this rule
+include, for example, pretty printers that seek to preserve the
+original order of statements in a file and avoid expanding
+inferences.)  </p>
+
+</section>
+  <section id="validity">
+<h3>Validity</h3>
+
+  <div class="note">Move material on validity here.  Signpost it in
+  sec. 1 and 2.  </div>
+  A PROV instance is <dfn id="dfn-valid">valid</dfn>
+if its normal form exists and satisfies all of
+  the validity constraints; this implies that the instance satisfies
+  all of the inferences, definitions and constraints.  
+  The following algorithm can be used to compute normal forms and determine validity:
+
+  <ol>
+    <li>
+    Apply all definitions by replacing each defined statement by its
+    definition (possibly introducing fresh identifiers in the process).
+    </li>
+  <li>
+    Apply all inferences by adding the conclusion of each inference
+    whose hypotheses are satisfied and whose conclusion does not
+    already hold.
+    </li>
+  <li>
+    Apply all uniqueness constraints by merging pairs of statements
+    and unifying identifiers and constants (or failing if this would require merging two
+    distinct constants). If successful, this yields a normal form.
+    </li>
+    <li>Apply event ordering constraints to determine whether there is
+    a consistent ordering on the events in the normal form.
+    </li>
+  </ul>
+
+  <div class="note">Justify termination, confluence.</div>
+</section>
+  
+</section> <!-- equivalence -->
+
+
 <section id="dictionary-constraints">
 <h2>Dictionary Constraints</h2>
 <div class="note">
author	James Cheney <jcheney@inf.ed.ac.uk>
	Fri, 22 Jun 2012 07:27:30 -0700
changeset 3427	d0521144ced4
parent 3426	f903bbb282aa
child 3428	80f43abae55f