prov: changeset 4374:9d2967fd1345

--- a/model/prov-constraints.html	Wed Aug 29 18:30:30 2012 +0100
+++ b/model/prov-constraints.html	Thu Aug 30 13:26:48 2012 +0100
@@ -981,9 +981,6 @@
 <section>
 <h4>Validation Process Overview</h4>
 
-<div class="note">
-  In progress, outline to be filled in
-  </div>
 
     <p>
   This section collects common concepts and operations that are used
@@ -1088,20 +1085,20 @@
   specification can also be viewed as logical formulas.</p>
 <ul>
   <li>
-  A definition of the form "<span class="name">A</span> <span class="conditional">IF AND ONLY IF</span> there
-  exists <span class="name">y<sub>1</sub></span>...<span class="name">y<sub>m</sub></span> such that <span class="name">B<sub>1</sub></span> and ... and <span class="name">B<sub>k</sub></span>"
-  can be thought of as a formula "<span class="math">∀ x<sub>1</sub>,....,x<sub>n</sub>. A ⇔ ∃ y<sub>1</sub>...y<sub>m</sub> . B<sub>1</sub> ∧ ... ∧ B<sub>k</sub></span>", where <span class="math">x<sub>1</sub></span>...<span class="math">x<sub>n</sub></span> are the
+  A definition of the form <span class="name">A</span> <span class="conditional">IF AND ONLY IF</span> there
+  exists <span class="name">y<sub>1</sub></span>...<span class="name">y<sub>m</sub></span> such that <span class="name">B<sub>1</sub></span> and ... and <span class="name">B<sub>k</sub></span>
+  can be thought of as a formula <span class="math">∀ x<sub>1</sub>,....,x<sub>n</sub>. A ⇔ ∃ y<sub>1</sub>...y<sub>m</sub> . B<sub>1</sub> ∧ ... ∧ B<sub>k</sub></span>, where <span class="math">x<sub>1</sub></span>...<span class="math">x<sub>n</sub></span> are the
   free variables of the formula.  
 </li>
-<li>An inference of the form "<span class="conditional">IF</span> <span class="name">A<sub>1</sub></span> and ... and <span class="name">A<sub>l</sub></span>  <span class="conditional">THEN</span>  there
-  exists <span class="name">y<sub>1</sub></span>...<span class="name">y<sub>m</sub></span> such that <span class="name">B<sub>1</sub></span> and ... and <span class="name">B<sub>k</sub></span>" can
-  be thought of as a formula "<span class="math">∀ x<sub>1</sub>,....,x<sub>n</sub>.  A<sub>1</sub> ∧ ... ∧ A<sub>l</sub> ⇒ ∃ y<sub>1</sub>...y<sub>m</sub> . B<sub>1</sub> ∧ ... ∧ B<sub>k</sub></span>", where <span class="math">x<sub>1</sub></span>...<span class="math">x<sub>n</sub></span> are the
+<li>An inference of the form <span class="conditional">IF</span> <span class="name">A<sub>1</sub></span> and ... and <span class="name">A<sub>l</sub></span>  <span class="conditional">THEN</span>  there
+  exists <span class="name">y<sub>1</sub></span>...<span class="name">y<sub>m</sub></span> such that <span class="name">B<sub>1</sub></span> and ... and <span class="name">B<sub>k</sub></span> can
+  be thought of as a formula <span class="math">∀ x<sub>1</sub>,....,x<sub>n</sub>.  A<sub>1</sub> ∧ ... ∧ A<sub>l</sub> ⇒ ∃ y<sub>1</sub>...y<sub>m</sub> . B<sub>1</sub> ∧ ... ∧ B<sub>k</sub></span>, where <span class="math">x<sub>1</sub></span>...<span class="math">x<sub>n</sub></span> are the
   free variables of the formula.  
 </li>
-<li>A uniqueness, ordering, or typing constraint of the form "<span class="conditional">IF</span> <span class="name">A</span> <span class="conditional">THEN</span> <span class="name">C</span>" can be viewed as a formula
-  "<span class="math">∀ x<sub>1</sub>...x<sub>n</sub>. A ⇒ C</span>".  </li>
-<li>A constraint of the form "<span class="conditional">IF</span> <span class="name">A</span> <span class="conditional">THEN INVALID</span>" can be viewed as a formula
-  "<span class="math">∀ x<sub>1</sub>...x<sub>n</sub>. A ⇒ False</span>".  </li>
+<li>A uniqueness, ordering, or typing constraint of the form <span class="conditional">IF</span> <span class="name">A</span> <span class="conditional">THEN</span> <span class="name">C</span> can be viewed as a formula
+  <span class="math">∀ x<sub>1</sub>...x<sub>n</sub>. A<sub>1</sub> ∧ ... ∧ A<sub>l</sub> ⇒ C</span>.  </li>
+<li>A constraint of the form <span class="conditional">IF</span> <span class="name">A</span> <span class="conditional">THEN INVALID</span> can be viewed as a formula
+  <span class="math">∀ x<sub>1</sub>...x<sub>n</sub>. A ⇒ False</span>.  </li>
   </ul>
 
 
@@ -1110,23 +1107,54 @@
   In logic, a formula's meaning is defined by saying when it is
   <em>satisfied</em> in a situation.  Likewise, we can view
   definitions, inferences, and constraints as being satisfied or not
-  satisfied in a PROV instance.  
+  satisfied in a PROV instance, augmented with information about the constraints.
   </p>
-  <ol><li>Definitions</li>
-  <li>Inferences</li>
-  <li>Uniqueness constraints</li>
-  <li>Key constraints</li>
-  <li>Ordering constraints</li>
+  <ol>
+    <li>A logical equivalence as used in a definition is satisfied
+  when the formula <span class="math">∀
+  x<sub>1</sub>,....,x<sub>n</sub>. A ⇔ ∃
+  y<sub>1</sub>...y<sub>m</sub> . B<sub>1</sub> ∧ ... ∧
+  B<sub>k</sub></span> holds, that is, for any substitution of the
+  variables <span class="math">x<sub>1</sub>,....,x<sub>n</sub></span>, formula <span class="math">A</span> and
+  formula <span class="math">∃
+  y<sub>1</sub>...y<sub>m</sub>. B<sub>1</sub> ∧ ... ∧
+  B<sub>k</sub></span> are either both true or both false.
+    </li>
+  <li>A logical implication as used in an inference is
+   satisfied with the forumula  <span class="math">∀
+  x<sub>1</sub>,....,x<sub>n</sub>.  A<sub>1</sub> ∧ ... ∧
+  A<sub>l</sub> ⇒ ∃ y<sub>1</sub>...y<sub>m</sub> . B<sub>1</sub> ∧
+  ... ∧ B<sub>k</sub></span> holds, that is, for any substitution of
+  the variables <span class="math">x<sub>1</sub>,....,x<sub>n</sub></span>, if  <span class="math">A<sub>1</sub> ∧ ... ∧
+  A<sub>l</sub></span> is true, then 
+ for some further substitution of terms for variables <span class="math">
+  y<sub>1</sub>...y<sub>m</sub></span>, formula <span class="math">B<sub>1</sub> ∧ ... ∧
+  B<sub>k</sub></span> is also true.</li>
+  <li>A uniqueness, ordering, or typing constraint is satisfied when
+  its associated formula <span class="math">∀ x<sub>1</sub>...x<sub>n</sub>. A<sub>1</sub> ∧ ... ∧ A<sub>l</sub> ⇒ C</span> holds, that is, for any substitution of
+  the variables <span class="math">x<sub>1</sub>,....,x<sub>n</sub></span>, if  <span class="math">A<sub>1</sub> ∧ ... ∧
+  A<sub>l</sub></span> is true, then <span class="math">C</span> is
+  also true.</li>
+  <li>An impossibility constraint is satisfied when the formula
+  <span class="math">∀ x<sub>1</sub>...x<sub>n</sub>. A ⇒
+  False</span> holds.  This is logically equivalent to <span class="math">∄
+  x<sub>1</sub>...x<sub>n</sub>. A</span>, that is, there exists no
+  substitution for <span class="math">x<sub>1</sub>...x<sub>n</sub></span> making <span class="math">A</span> true.
 </ol>  
 
 <h4>Merging</h4>
 
-  <p>Merging is an operation that takes two terms and compares them to
+  <p><em>Merging</em> is an operation that takes two terms and compares them to
   see if they are equal, or can be made equal by substituting an
   existential variable with another term.  This operation is a special
   case of <em>unification</em>, a common operation in logical
   reasoning, including logic programming and automated reasoning.
+Merging two terms <span class="math">t,t'</span> results in either substitution <span class="math">S</span>
+  such that <span class="math">S(t) = S(t')</span>, or failure indicating that there is no
+  substitution that can be applied to both <span class="math">t</span> and <span class="math">t'</span> to make
+  them equal.
 </p>
+
   
 
  <h4>Applying definitions, inferences, and constraints</h4>
@@ -1138,23 +1166,55 @@
 </p>
   <ul>
   <li>
-  A definition of the form "A IF AND ONLY IF there
-  exists y1...ym such that B1 and ... and Bk"
-  can be applied by searching for an occurrence of <span class="math">A</span> and replacing
-  it with <span class="math"> B1 and ... and Bk</span>, generating fresh existential
-  variables y1...ym.  The formula <span class="math">A</span> is redundant, and can be
+  A logical equivalence of the form <span class="math">∀
+  x<sub>1</sub>,....,x<sub>n</sub>. A ⇔ ∃
+  y<sub>1</sub>...y<sub>m</sub> . B<sub>1</sub> ∧ ... ∧
+  B<sub>k</sub></span>
+  can be applied by searching for any occurrences of <span
+    class="math">A</span> in the instance  and adding <span class="math"> B<sub>1</sub>, ..., B<sub>k</sub></span>, generating fresh existential
+  variables <span class="math">y<sub>1</sub>,...,y<sub>m</sub></span>, and conversely, whenever there is an
+    occurrence of <span class="math"> B<sub>1</sub>, ..., B<sub>k</sub></span>, adding
+    <span class="math">A</span>.
+    In our setting, the defined formulas <span class="math">A</span> are never used in other
+    constraints, so it is sufficient to replace all occurrences of
+    <span class="math">A</span> with their definitions.  The formula <span
+    class="math">A</span> is then redundant, and can be
   removed from the instance.
 </li>
-<li>An inference of the form "IF A THEN  there
-  exists y1...ym such that B1 and ... and Bk"
+<li>An inference of the form <span class="math">∀
+  x<sub>1</sub>,....,x<sub>n</sub>.  A<sub>1</sub> ∧ ... ∧
+  A<sub>l</sub> ⇒ ∃ y<sub>1</sub>...y<sub>m</sub> . B<sub>1</sub> ∧
+  ... ∧ B<sub>k</sub></span> can be applied by searching for  any occurrences of <span
+    class="math"> A<sub>1</sub> ∧ ... ∧
+  A<sub>l</sub></span> in the instance and, for each such match,
+adding <span class="math">B<sub>1</sub> ∧
+  ... ∧ B<sub>k</sub></span> to the instance, generating fresh existential
+  variables <span class="math">y<sub>1</sub>,...,y<sub>m</sub></span>.
 </li>
-<li>A uniqueness  constraint of the form "IF A THEN t = t'" can be
+<li>A uniqueness  constraint of the form <span class="math">∀
+x<sub>1</sub>...x<sub>n</sub>. A<sub>1</sub> ∧ ... ∧ A<sub>l</sub> ⇒ t
+= t'</span> can be
   applied by searching for an occurrence of <span class="math">A</span> in the instance, and
-  if one is found, merging the terms <span class="math">t</span> and <span class="math">t'</span>.  </li>
-
+  if one is found, merging the terms <span class="math">t</span> and
+<span class="math">t'</span>. If successful, the resulting
+substitution is applied to the instance; otherwise, the application
+of the uniqueness constraint fails.  </li>
+<li>A key constraint can similarly be applied by searching for
+different occurrences of a statement with the same identifier, merging the
+corresponding parameters of the statements, and concatenating their
+attribute lists.  The substitutions obtained by merging are applied to
+the instance.
+</li>
   </ul>
   
- <p>The process of applying definitions, inferences, and constraints
+ <p>As noted above, uniqueness or key constraint
+  application can <em>fail</em>, if a required merging step fails.  Failure of constraint
+  application means that there is no way to add information to the
+  instance to satisfy the constraint, which in turn implies that the
+  instance is <em>invalid</em>.
+  </p>
+  <p>
+  The process of applying definitions, inferences, and constraints
   to a PROV instance until all of them are satisfied is similar to
   what is sometimes 
   called <em>chasing</em> [[DBCONSTRAINTS]] or <em>saturation</em>
@@ -1164,37 +1224,56 @@
  
  <h4>Termination</h4>
   <p>
-  It is possible to define sets of formulas such that normalization
-  does not terminate, that is, such that there is an infinite sequence
-  of legal application steps.  This is obviously something we want to
+ In general, applying sets of logical formulas of the above
+  definition, inference, and constraint forms is not guaranteed to
+  terminate.  A simple example is the inference <span class="math">R(x,y) ⇒ ∃z. R(x,z)
+  ∧R(z,y)</span>, which can be applied to <span class="math">{R(a,b)}</span> to generate an
+  infinite sequence of larger and larger instances.  This is obviously something we want to
   avoid.  There is a great deal of work on termination of the chase in
-  databases, or of sets of constraint handling rules.  This
-  specification ensures termination of normalization 
+  databases, or of sets of constraint handling rules.  The termination
+  of the notion of normalization defined in this specification is
+  guaranteed because the definitions, inferences and uniqueness/key
+  constraints correspond to a <em>weakly acyclic</em> set of
+  tuple-generating and equality-generating dependencies, in the
+  terminology of [[DBCONSTRAINTS]].  The termination of the remaining
+  ordering, typing, and impossibility constraints is easy to show.  <a href="#termination">Appendix
+  C</a> gives a proof that the definitions, inferences, and uniqueness
+  and key constraints are weakly acyclic and therefore terminating.
   </p>
   <p>
   There is an important subtlety that is essential to guarantee
   termination.  This specification draws a distinction between knowing
-  that an identifier has type 'entity', 'activity', or 'agent', and having
-  an explicit entity(id), activity(id), or agent(id) statement in the instance.
+  that an identifier has type <span class="name">'entity'</span>, <span class="name">'activity'</span>, or <span class="name">'agent'</span>, and having
+  an explicit <span class="name">entity(id)</span>, <span class="name">activity(id)</span>, or <span class="name">agent(id)</span> statement in the instance.
   For example, focusing on entity, it is a stronger statement to say
-  that entity(id) holds in the instance, because we can apply
+  that <span class="name">entity(id)</span> holds in the instance, because we can apply
   entity-generation-invalidation-inference to infer generation and
-  invalidationevents for the entity.  In contrast, if we only know
-  that 'entity' in typeOf(id), this does not imply that entity(id)
-  holds, so we cannot necessarily assume that 
+  invalidation events for the entity.  In contrast, if we only know
+  that <span class="name">'entity' ∈ typeOf(id)</span>, this does not imply that <span class="name">entity(id)</span>
+  holds.
   </p>
   <p>
   This distinction (for both entities and activities) is essential to
-  ensure termination of the inferences.  If we strengthened the type
-  inference constraints to inferences that add new entity statements
-  to the instance, then we could keep generating new entities and
-  activities in an unbounded chain into the past (or future).  The
+  ensure termination of the inferences, because we allow inferring
+  that a declared <span class="name">entity(id)</span> or  <span class="name">activity(id)</span> has a generation
+  or respectively start event.  If we strengthened the type
+  inference constraints to add new entity or activity statements for
+  the entities and activities involved in generating or starting other
+  declared entities or activities, then we could keep generating new entities and
+  activities in an unbounded chain into the past (as in the "chicken
+  and egg" paradox).  The
   design adopted here assumes that instances explicitly declare the
-  entities and activities that are explicitly declared in the instance
-  can be inferred to have start and end events, but not those that are
-  indirectly referenced in other relations.
+  entities and activities that are relevant for validity checking, and
+  these can be inferred to have start and end events.  This inference
+  is not supported for identifiers
+that are indirectly referenced in other relations and therefore have
+  type <span class="name">'entity'</span> or <span class="name">'activity'</span>.  
   </p>
 
+<div class="note">
+Figure may need to be updated.  Figure credit: Tim Lebo.
+  </div>
+
 
 <div style="text-align: center;">
 <figure>
@@ -1214,16 +1293,45 @@
   such constraints follows a saturation strategy similar to that for 
   </p>
   <ol>
-  <li>For ordering constraints, we check by generating all of the
+  <li><p>
+    For ordering constraints, we check by generating all of the
   precedes and strictly-precedes relationships specified by the rules.  These can be thought
   of as a directed graph whose nodes are terms, and whose edges are
-  precedes or strictly-precedes relationships.  The constraints are
-  inconsistent if there is a cycle in the graph that includes a
-  strictly-precedes edge, and consistent otherwise. </li>
-  <li>For typing constraints...
+  precedes or strictly-precedes relationships.  
+    An ordering constraint  of the form <span class="math">∀
+x<sub>1</sub>...x<sub>n</sub>. A<sub>1</sub> ∧ ... ∧ A<sub>l</sub> ⇒
+precedes(t,t')</span> can be applied by searching for occurrences of
+<span class="math"> A<sub>1</sub> ∧ ... ∧ A<sub>l</sub></span> and for each such match
+adding the atomic formula <span class="math">precedes(t,t')</span> to
+  the instance.  After all such constraints have been checked, and the
+  resulting edges added to the graph, the ordering constraints are
+  violated if there is a cycle in the graph that includes a
+  strictly-precedes edge, and satisfied otherwise.
+    </p>
+</li>
+<li><p>For typing constraints, we check by constructing a function
+  <span class="math">typeOf(id)</span> mapping identifiers to sets of possible types.  We
+  start with a function mapping each identifier to the empty set,
+  reflecting no consrtaints on the identifiers' types.  A typing
+  constraint of the form <span class="math">∀
+x<sub>1</sub>...x<sub>n</sub>. A<sub>1</sub> ∧ ... ∧ A<sub>l</sub> ⇒ 'type' ∈ typeOf(id)
+</span> is checked  by adjusting the
+  function by adding <span class="name">'type'</span> to <span class="math">typeOf(id)</span> for each conclusion
+  <span class="name">'type' ∈ typeOf(id)</span> of the rule.  Typing constraints with
+  multiple conclusions are handled analogously.  Once all constraints
+  have been checked in all possible ways, we check that the
+  disjointness constraints hold of the resulting <span class="math">typeOf</span> function.
+  (These are essentially impossibility constraints).
+</p>
   </li>
-  <li>
-  For impossibility constraints...
+  <li><p>For impossibility constraints, we check by searching for the
+  forbidden pattern that the impossibility constraint describes. Any
+  match of this pattern leads to failure of the constraint checking process.
+An impossibility constraint of the form <span class="math">∀
+x<sub>1</sub>...x<sub>n</sub>. A<sub>1</sub> ∧ ... ∧ A<sub>l</sub> ⇒
+False</span> can be applied by  searching for occurrences of
+<span class="math"> A<sub>1</sub> ∧ ... ∧ A<sub>l</sub></span> in the instance, and if any
+such occurrence is found, signaling failure.</p>
   </li>
   </ol>
   <p>
@@ -1254,8 +1362,8 @@
   instances can also be equvalent if they differ only in choice of
   names of existential variables. Because of this, the appropriate
   notion of equivalence of normal forms is <em>isomorphism</em>.  Two
-  instances <span class="math">I1</span> and <span class="math">I2</span> are isomorphic if there is an invertible
-  function on existential variables <span class="math">h</span> such that <span class="math">h(I1) = I2</span>.
+  instances <span class="math">I<sub>1</sub></span> and <span class="math">I<sub>2<sub></span> are <em>isomorphic</em> if there is an invertible
+  substitution <span class="math">S</span> mapping existential variables to existential variables such that <span class="math">S(I<sub>1</sub>) = I<sub>2<sub></span>.
   This is similar to the notion of equivalence used in [[RDF]], where
   blank nodes play an analogous role to existential variables.  </p>
  
@@ -1272,7 +1380,7 @@
   those obtained by the algorithm in this specification.
   </p>
 
-<h4>From Instances to Documents</h4>
+<h4>From Instances to Bundles and Documents</h4>
 
 <p>PROV documents can contain multiple instances: a <a>toplevel
 instance</a>, the set of statements not appearing within a bundle, and
@@ -1282,8 +1390,8 @@
 and the names of its bundles are distinct; a PROV document is
 equivalent to another if their toplevel instances are equivalent, they
 have the same number of bundles with the same names, and the instances
-of their corresponding bundles are equivalent.  As for blank nodes in
-RDF, the scope of an existential variable in PROV is the instance
+of their corresponding bundles are equivalent.  Analogously to blank nodes in
+[[RDF]], the scope of an existential variable in PROV is the instance
 level, so existential variables with the same name occurring in
 different instances do not necessarily denote the same term.  This
 is a consequence of the fact that the instances of two equivalent
@@ -1294,7 +1402,7 @@
 </section>
 
 <section>
-<h4>Summary of constraints and inferences</h4>
+<h4>Summary of definitions, inferences, and constraints</h4>
 
 <p><a href="">Table 2</a> summarizes the definitions, inferences, and
 constraints of this document.
author	James Cheney <jcheney@inf.ed.ac.uk>
	Thu, 30 Aug 2012 13:26:48 +0100
changeset 4374	9d2967fd1345
parent 4373	5279a2172dc6
child 4375	b0c1e2e3f83f