prov: changeset 4323:8efc0cc1e9e0

--- a/model/prov-constraints.html	Thu Aug 09 14:55:52 2012 +0100
+++ b/model/prov-constraints.html	Thu Aug 09 15:05:56 2012 +0100
@@ -784,9 +784,264 @@
 </div>
 
 <div class="note">
-  James to add prose
+  In progress, outline to be filled in
   </div>
 
+    <p>
+  This section collects common concepts and operations that are used
+  throughout the specification, and relates them to background
+  terminology and ideas from logic [[Logic]], constraint programming
+  [[CHR]], and database constraints [[DBCONSTRAINTS]].  This section
+  does not attempt to provide a complete introduction to these topics,
+  but it is provided in order to aid readers familiar with one or more
+  of these topics in understanding the specification, and to clarify
+  some of the motivations for choices in the specification to all readers.
+  </p>
+
+ 
+ <h2>Constants, Variables and Placeholders</h2>
+  <p>
+  PROV statements involve identifiers (URIs), literals, 
+  placeholders, and attribute lists.  However, in order to specify
+  constraints over PROV instances, we also need <em>variables</em>
+  that represent unknown identifiers, literals, or placeholders.
+  These variables are similar to those in first-order
+  logic [[Logic]].  A variable is a symbol that can be replaced by
+  other symbols, including either other variables or constant
+  identifiers, literals, or placeholders.  In a few special cases, we
+  also use variables for unknown attribute lists.
+  </p>
+  
+  <p>Several definitions and inferences conclude by saying that some
+  objects exist such that some other formulas hold.  Such an inference
+  introduces fresh <em>existential variables</em> into the instance.
+  Such a variable denotes a fixed object that exists, but its exact
+  identity is unknown, so we often refer to such variables as
+  <em>existential variables</em>.  We do not allow existential
+  variables that stand for unknown attribute lists.  </p>
+
+  <p>In particular, many
+  occurrences of the placeholder symbol <span class="name">-</span> stand for unknown
+  objects; these are handled by expanding them to existential
+  variables. Some placeholders, however, indicate the absence of an
+  object, rather than an unknown object.  In other words, the
+  placeholder is overloaded, with different meanings in different
+  places.
+  </p>
+  <p>An expression is called a <em>term</em> if it is either a
+  constant identifier, literal, placeholder, or variable, and write
+  <span class="math">t</span> to denote an arbitrary term.
+</p>
+
+  
+ <h2>Substitution</h2>
+<p>A substitution is a function that maps variables to terms. Concretely, since we only
+  need to consider substitutions of finite sets of variables, we can
+  write substitutions as <span class="math">[x_1 = t_1,...,x_n=t_n]</span>.  A substitution
+  <span class="math">S = [x_1 = t_1,...,x_n=t_n]</span> 
+  can be <em>applied</em> to a term as follows.
+<ol><li>
+  If the term is a variable <span class="math">x_i</span>, one of the variables in the
+  domain of <span class="math">S</span>, then <span class="math">S(x) = t_i</span>.
+  </li>
+  <li>
+  </li>
+  </ol>
+</p>
+
+
+  
+ 
+
+ <h2>Formulas</h2>
+  <p>
+  For the purpose of constraint checking, we view PROV statements
+  (possibly involving existential variables) as
+  <dfn>formulas</dfn>.  An instance can be viewed as "theory" in
+  logic, that is, a set of formulas all thought to describe the same
+  situation.  The set can also be thought of a single, large formula:
+  the conjunction of all of the atomic formulas.
+  </p>
+<p>The atomic constraints considered in this specification can be
+  viewed as atomic formulas:</p>
+<ul>
+  <li>Uniqueness constraints...</li>
+  <li>Ordering constraints...
+  </li>
+  <li>Typing constraints...
+  </li>
+  <li>Impossibility constraints</li>
+  </ul>
+  <p> Similarly, the definitions, inferences, and constraint rules in this
+  specification can also be viewed as logical formulas.</p>
+<ul>
+  <li>
+  A definition of the form "A IF AND ONLY IF there
+  exists y1...ym such that B1 and ... and Bk"
+  can be thought of as a formula "\forall x1,....,xn. A <==> \exists y1...ym such that B1 and ... and Bn", where x1...xn are the
+  free variables of the formula.  
+</li>
+<li>An inference of the form "IF A THEN  there
+  exists y1...ym such that B1 and ... and Bk" can be thought of as a formula "\forall x1,....,xn. A ==> \exists y1...ym such that B1 and ... and Bn", where x1...xn are the
+  free variables of the formula.  
+</li>
+<li>A uniqueness, ordering, or typing constraint of the form "IF A THEN C" can be viewed as a formula
+  "forall x1...xn. A ==> C".  </li>
+<li>A constraint of the form "IF A THEN INVALID" can be viewed as a formula
+  "forall x1...xn. A ==> false".  </li>
+  </ul>
+
+
+ <h2>Satisfying definitions, inferences, and constraints</h2>
+  <p>
+  In logic, a formula's meaning is defined by saying when it is
+  <em>satisfied</em> in a situation.  Likewise, we can view
+  definitions, inferences, and constraints as being satisfied or not
+  satisfied in a PROV instance.  
+  </p>
+
+<h2>Merging</h2>
+
+  <p>Merging is an operation that takes two terms and compares them to
+  see if they are equal, or can be made equal by substituting an
+  existential variable with another term.  This operation is a special
+  case of <em>unification</em>, a common operation in logical
+  reasoning, including logic programming and automated reasoning.
+</p>
+  
+
+ <h2>Applying definitions, inferences, and constraints</h2>
+<p>Formulas can also be interpreted as having computational
+  content.  That is, if an instance does not satisfy a formula, we can
+  often <em>apply</em> the formula to the instance to produce another
+  instance that does satisfy the formula.  Definitions, inferences,
+  and uniqueness constraints can be applied to instances:
+</p>
+  <ul>
+  <li>
+  A definition of the form "A IF AND ONLY IF there
+  exists y1...ym such that B1 and ... and Bk"
+  can be applied by searching for an occurrence of <span class="math">A</span> and replacing
+  it with <span class="math"> B1 and ... and Bk</span>, generating fresh existential
+  variables y1...ym.  The formula <span class="math">A</span> is redundant, and can be
+  removed from the instance.
+</li>
+<li>An inference of the form "IF A THEN  there
+  exists y1...ym such that B1 and ... and Bk"
+</li>
+<li>A uniqueness  constraint of the form "IF A THEN t = t'" can be
+  applied by searching for an occurrence of <span class="math">A</span> in the instance, and
+  if one is found, merging the terms <span class="math">t</span> and <span class="math">t'</span>.  </li>
+
+  </ul>
+  
+ <p>The process of applying definitions, inferences, and constraints
+  to a PROV instance until all of them are satisfied is similar to
+  what is sometimes 
+  called <em>chasing</em> [[DBCONSTRAINTS]] or <em>saturation</em>
+  [[CHR]].  We call this process <em>normalization</em>.
+  </p>
+
+ 
+ <h2>Termination</h2>
+  <p>
+  It is possible to define sets of formulas such that normalization
+  does not terminate, that is, such that there is an infinite sequence
+  of legal application steps.  This is obviously something we want to
+  avoid.  There is a great deal of work on termination of the chase in
+  databases, or of sets of constraint handling rules.  This
+  specification ensures termination of normalization 
+  </p>
+  <p>
+  There is an important subtlety that is essential to guarantee
+  termination.  This specification draws a distinction between knowing
+  that an identifier has type 'entity', 'activity', or 'agent', and having
+  an explicit entity(id), activity(id), or agent(id) statement in the instance.
+  For example, focusing on entity, it is a stronger statement to say
+  that entity(id) holds in the instance, because we can apply
+  entity-generation-invalidation-inference to infer generation and
+  invalidationevents for the entity.  In contrast, if we only know
+  that 'entity' in typeOf(id), this does not imply that entity(id)
+  holds, so we cannot necessarily assume that 
+  </p>
+  <p>
+  This distinction (for both entities and activities) is essential to
+  ensure termination of the inferences.  If we strengthened the type
+  inference constraints to inferences that add new entity statements
+  to the instance, then we could keep generating new entities and
+  activities in an unbounded chain into the past (or future).  The
+  design adopted here assumes that instances explicitly declare the
+  entities and activities that are explicitly declared in the instance
+  can be inferred to have start and end events, but not those that are
+  indirectly referenced in other relations.
+  </p>
+
+ <h2>Checking ordering, typing, and impossibility constraints</h2>
+  <p>
+  The ordering, typing, and impossibility constraints are checked
+  rather than applied.  This means that they do not generate new
+  formulas expressible in PROV, but they do generate basic constraints
+  that might or might not be consistent with each other.  Checking
+  such constraints follows a saturation strategy similar to that for 
+  </p>
+  <ol>
+  <li>For ordering constraints, we check by generating all of the
+  precedes and strictly-precedes relationships specified by the rules.  These can be thought
+  of as a directed graph whose nodes are terms, and whose edges are
+  precedes or strictly-precedes relationships.  The constraints are
+  inconsistent if there is a cycle in the graph that includes a
+  strictly-precedes edge, and consistent otherwise. </li>
+  <li>For typing constraints...
+  </li>
+  <li>
+  For impossibility constraints...
+  </li>
+  </ol>
+  <p>
+  A normalized instance that satisfies all of the checked constraints
+  is called <a>valid</a>.  Validity can be, but is not required to be,
+  checked by normalizing and then checking constraints.  Any other
+  algorithm that provides equivalent behavior (that is, accepts the
+  same valid instances and rejects the same invalid instances) is allowed.
+  In particular, the checked constraints and the
+  applied definitions, inferences and uniqueness constraints do not
+  interfere with one another, so it is also possible to mix checking
+  and application.  This may be desirable in order to detect
+  invalidity more quickly.
+  </p>
+
+
+
+ <h2>Equivalence and Isomorphism</h2>
+  <p> Given two normal forms, a natural question is whether they contain
+  the same information, that is, whether they are equivalent (if so,
+  then the original instances are also equivalent.)  By analogy with
+  logic, if we consider normalized PROV instances with existential
+  variables to represent sets of possible situations, then two normal
+  forms may describe the same situation but differ in inessential
+  details such as the order of statements or of elemennts of
+  attribute-value lists.  To remedy this, we can easily consider
+  instances to be equvalent up to reordering of attributes.  However,
+  instances can also be equvalent if they differ only in choice of
+  names of existential variables. Because of this, the appropriate
+  notion of equivalence of normal forms is <em>isomorphism</em>.  Two
+  instances <span class="math">I1</span> and <span class="math">I2</span> are isomorphic if there is an invertible
+  function on existential variables <span class="math">h</span> such that <span class="math">h(I1) = I2</span>.
+  This is similar to the notion of equivalence used in [[RDF]], where
+  blank nodes play an analogous role to existential variables.  </p>
+ 
+<p>Equivalence can be checked by normalizing instances, checking that
+  both instances are valid, then
+  testing whether the two normal forms are isomorphic.  (It is
+  technically possible for two invalid normal forms to be isomorphic,
+  but to be condered equivalent, the two instances must also be
+  valid.)
+As with validity, the algorithm suggested by this specification is
+  just one of many possible ways to implement equivalence checking; it
+  is not required that implementations compute normal forms
+  explicitly, only that their determinations of equivalence match
+  those obtained by the algorithm in this specification.
+  </p>
 
 </section>
 
@@ -1105,6 +1360,7 @@
 
 </section>
 
+ 
 
 <section id="inferences">
 <h2>Definitions and Inferences</h2>
@@ -1635,7 +1891,10 @@
 <p>
 <span class='conditional'>IF</span> <span
   class="name">activity(a,t1,t2,_attrs)</span> <span
-  class="conditional">THEN</span> there exist <span class="name">_start</span>, <span class="name">_e1</span>, <span class="name">_end</span>,
+  class="conditional">THEN</span> there exist <span
+  class="name">_start</span>, <span class="name">_e1</span>, <span
+  class="name">_a1</span>, <span class="name">_end</span>, <span
+  class="name">_a2</span>, 
   and <span class="name">_e2</span> such that
   <span
   class="name">wasStartedBy(_start; a,_e1,_a1,t1,[])</span> and <span class="name">wasEndedBy(_end; a,_e2,_a2,t2,[])</span>.
author	James Cheney <jcheney@inf.ed.ac.uk>
	Thu, 09 Aug 2012 15:05:56 +0100
changeset 4323	8efc0cc1e9e0
parent 4322	d55c694e9994
child 4324	2e942c753bd4