--- a/model/prov-constraints.html Thu Aug 09 14:55:52 2012 +0100
+++ b/model/prov-constraints.html Thu Aug 09 15:05:56 2012 +0100
@@ -784,9 +784,264 @@
</div>
<div class="note">
- James to add prose
+ In progress, outline to be filled in
</div>
+ <p>
+ This section collects common concepts and operations that are used
+ throughout the specification, and relates them to background
+ terminology and ideas from logic [[Logic]], constraint programming
+ [[CHR]], and database constraints [[DBCONSTRAINTS]]. This section
+ does not attempt to provide a complete introduction to these topics,
+ but it is provided in order to aid readers familiar with one or more
+ of these topics in understanding the specification, and to clarify
+ some of the motivations for choices in the specification to all readers.
+ </p>
+
+
+ <h2>Constants, Variables and Placeholders</h2>
+ <p>
+ PROV statements involve identifiers (URIs), literals,
+ placeholders, and attribute lists. However, in order to specify
+ constraints over PROV instances, we also need <em>variables</em>
+ that represent unknown identifiers, literals, or placeholders.
+ These variables are similar to those in first-order
+ logic [[Logic]]. A variable is a symbol that can be replaced by
+ other symbols, including either other variables or constant
+ identifiers, literals, or placeholders. In a few special cases, we
+ also use variables for unknown attribute lists.
+ </p>
+
+ <p>Several definitions and inferences conclude by saying that some
+ objects exist such that some other formulas hold. Such an inference
+ introduces fresh <em>existential variables</em> into the instance.
+ Such a variable denotes a fixed object that exists, but its exact
+ identity is unknown, so we often refer to such variables as
+ <em>existential variables</em>. We do not allow existential
+ variables that stand for unknown attribute lists. </p>
+
+ <p>In particular, many
+ occurrences of the placeholder symbol <span class="name">-</span> stand for unknown
+ objects; these are handled by expanding them to existential
+ variables. Some placeholders, however, indicate the absence of an
+ object, rather than an unknown object. In other words, the
+ placeholder is overloaded, with different meanings in different
+ places.
+ </p>
+ <p>An expression is called a <em>term</em> if it is either a
+ constant identifier, literal, placeholder, or variable, and write
+ <span class="math">t</span> to denote an arbitrary term.
+</p>
+
+
+ <h2>Substitution</h2>
+<p>A substitution is a function that maps variables to terms. Concretely, since we only
+ need to consider substitutions of finite sets of variables, we can
+ write substitutions as <span class="math">[x_1 = t_1,...,x_n=t_n]</span>. A substitution
+ <span class="math">S = [x_1 = t_1,...,x_n=t_n]</span>
+ can be <em>applied</em> to a term as follows.
+<ol><li>
+ If the term is a variable <span class="math">x_i</span>, one of the variables in the
+ domain of <span class="math">S</span>, then <span class="math">S(x) = t_i</span>.
+ </li>
+ <li>
+ </li>
+ </ol>
+</p>
+
+
+
+
+
+ <h2>Formulas</h2>
+ <p>
+ For the purpose of constraint checking, we view PROV statements
+ (possibly involving existential variables) as
+ <dfn>formulas</dfn>. An instance can be viewed as "theory" in
+ logic, that is, a set of formulas all thought to describe the same
+ situation. The set can also be thought of a single, large formula:
+ the conjunction of all of the atomic formulas.
+ </p>
+<p>The atomic constraints considered in this specification can be
+ viewed as atomic formulas:</p>
+<ul>
+ <li>Uniqueness constraints...</li>
+ <li>Ordering constraints...
+ </li>
+ <li>Typing constraints...
+ </li>
+ <li>Impossibility constraints</li>
+ </ul>
+ <p> Similarly, the definitions, inferences, and constraint rules in this
+ specification can also be viewed as logical formulas.</p>
+<ul>
+ <li>
+ A definition of the form "A IF AND ONLY IF there
+ exists y1...ym such that B1 and ... and Bk"
+ can be thought of as a formula "\forall x1,....,xn. A <==> \exists y1...ym such that B1 and ... and Bn", where x1...xn are the
+ free variables of the formula.
+</li>
+<li>An inference of the form "IF A THEN there
+ exists y1...ym such that B1 and ... and Bk" can be thought of as a formula "\forall x1,....,xn. A ==> \exists y1...ym such that B1 and ... and Bn", where x1...xn are the
+ free variables of the formula.
+</li>
+<li>A uniqueness, ordering, or typing constraint of the form "IF A THEN C" can be viewed as a formula
+ "forall x1...xn. A ==> C". </li>
+<li>A constraint of the form "IF A THEN INVALID" can be viewed as a formula
+ "forall x1...xn. A ==> false". </li>
+ </ul>
+
+
+ <h2>Satisfying definitions, inferences, and constraints</h2>
+ <p>
+ In logic, a formula's meaning is defined by saying when it is
+ <em>satisfied</em> in a situation. Likewise, we can view
+ definitions, inferences, and constraints as being satisfied or not
+ satisfied in a PROV instance.
+ </p>
+
+<h2>Merging</h2>
+
+ <p>Merging is an operation that takes two terms and compares them to
+ see if they are equal, or can be made equal by substituting an
+ existential variable with another term. This operation is a special
+ case of <em>unification</em>, a common operation in logical
+ reasoning, including logic programming and automated reasoning.
+</p>
+
+
+ <h2>Applying definitions, inferences, and constraints</h2>
+<p>Formulas can also be interpreted as having computational
+ content. That is, if an instance does not satisfy a formula, we can
+ often <em>apply</em> the formula to the instance to produce another
+ instance that does satisfy the formula. Definitions, inferences,
+ and uniqueness constraints can be applied to instances:
+</p>
+ <ul>
+ <li>
+ A definition of the form "A IF AND ONLY IF there
+ exists y1...ym such that B1 and ... and Bk"
+ can be applied by searching for an occurrence of <span class="math">A</span> and replacing
+ it with <span class="math"> B1 and ... and Bk</span>, generating fresh existential
+ variables y1...ym. The formula <span class="math">A</span> is redundant, and can be
+ removed from the instance.
+</li>
+<li>An inference of the form "IF A THEN there
+ exists y1...ym such that B1 and ... and Bk"
+</li>
+<li>A uniqueness constraint of the form "IF A THEN t = t'" can be
+ applied by searching for an occurrence of <span class="math">A</span> in the instance, and
+ if one is found, merging the terms <span class="math">t</span> and <span class="math">t'</span>. </li>
+
+ </ul>
+
+ <p>The process of applying definitions, inferences, and constraints
+ to a PROV instance until all of them are satisfied is similar to
+ what is sometimes
+ called <em>chasing</em> [[DBCONSTRAINTS]] or <em>saturation</em>
+ [[CHR]]. We call this process <em>normalization</em>.
+ </p>
+
+
+ <h2>Termination</h2>
+ <p>
+ It is possible to define sets of formulas such that normalization
+ does not terminate, that is, such that there is an infinite sequence
+ of legal application steps. This is obviously something we want to
+ avoid. There is a great deal of work on termination of the chase in
+ databases, or of sets of constraint handling rules. This
+ specification ensures termination of normalization
+ </p>
+ <p>
+ There is an important subtlety that is essential to guarantee
+ termination. This specification draws a distinction between knowing
+ that an identifier has type 'entity', 'activity', or 'agent', and having
+ an explicit entity(id), activity(id), or agent(id) statement in the instance.
+ For example, focusing on entity, it is a stronger statement to say
+ that entity(id) holds in the instance, because we can apply
+ entity-generation-invalidation-inference to infer generation and
+ invalidationevents for the entity. In contrast, if we only know
+ that 'entity' in typeOf(id), this does not imply that entity(id)
+ holds, so we cannot necessarily assume that
+ </p>
+ <p>
+ This distinction (for both entities and activities) is essential to
+ ensure termination of the inferences. If we strengthened the type
+ inference constraints to inferences that add new entity statements
+ to the instance, then we could keep generating new entities and
+ activities in an unbounded chain into the past (or future). The
+ design adopted here assumes that instances explicitly declare the
+ entities and activities that are explicitly declared in the instance
+ can be inferred to have start and end events, but not those that are
+ indirectly referenced in other relations.
+ </p>
+
+ <h2>Checking ordering, typing, and impossibility constraints</h2>
+ <p>
+ The ordering, typing, and impossibility constraints are checked
+ rather than applied. This means that they do not generate new
+ formulas expressible in PROV, but they do generate basic constraints
+ that might or might not be consistent with each other. Checking
+ such constraints follows a saturation strategy similar to that for
+ </p>
+ <ol>
+ <li>For ordering constraints, we check by generating all of the
+ precedes and strictly-precedes relationships specified by the rules. These can be thought
+ of as a directed graph whose nodes are terms, and whose edges are
+ precedes or strictly-precedes relationships. The constraints are
+ inconsistent if there is a cycle in the graph that includes a
+ strictly-precedes edge, and consistent otherwise. </li>
+ <li>For typing constraints...
+ </li>
+ <li>
+ For impossibility constraints...
+ </li>
+ </ol>
+ <p>
+ A normalized instance that satisfies all of the checked constraints
+ is called <a>valid</a>. Validity can be, but is not required to be,
+ checked by normalizing and then checking constraints. Any other
+ algorithm that provides equivalent behavior (that is, accepts the
+ same valid instances and rejects the same invalid instances) is allowed.
+ In particular, the checked constraints and the
+ applied definitions, inferences and uniqueness constraints do not
+ interfere with one another, so it is also possible to mix checking
+ and application. This may be desirable in order to detect
+ invalidity more quickly.
+ </p>
+
+
+
+ <h2>Equivalence and Isomorphism</h2>
+ <p> Given two normal forms, a natural question is whether they contain
+ the same information, that is, whether they are equivalent (if so,
+ then the original instances are also equivalent.) By analogy with
+ logic, if we consider normalized PROV instances with existential
+ variables to represent sets of possible situations, then two normal
+ forms may describe the same situation but differ in inessential
+ details such as the order of statements or of elemennts of
+ attribute-value lists. To remedy this, we can easily consider
+ instances to be equvalent up to reordering of attributes. However,
+ instances can also be equvalent if they differ only in choice of
+ names of existential variables. Because of this, the appropriate
+ notion of equivalence of normal forms is <em>isomorphism</em>. Two
+ instances <span class="math">I1</span> and <span class="math">I2</span> are isomorphic if there is an invertible
+ function on existential variables <span class="math">h</span> such that <span class="math">h(I1) = I2</span>.
+ This is similar to the notion of equivalence used in [[RDF]], where
+ blank nodes play an analogous role to existential variables. </p>
+
+<p>Equivalence can be checked by normalizing instances, checking that
+ both instances are valid, then
+ testing whether the two normal forms are isomorphic. (It is
+ technically possible for two invalid normal forms to be isomorphic,
+ but to be condered equivalent, the two instances must also be
+ valid.)
+As with validity, the algorithm suggested by this specification is
+ just one of many possible ways to implement equivalence checking; it
+ is not required that implementations compute normal forms
+ explicitly, only that their determinations of equivalence match
+ those obtained by the algorithm in this specification.
+ </p>
</section>
@@ -1105,6 +1360,7 @@
</section>
+
<section id="inferences">
<h2>Definitions and Inferences</h2>
@@ -1635,7 +1891,10 @@
<p>
<span class='conditional'>IF</span> <span
class="name">activity(a,t1,t2,_attrs)</span> <span
- class="conditional">THEN</span> there exist <span class="name">_start</span>, <span class="name">_e1</span>, <span class="name">_end</span>,
+ class="conditional">THEN</span> there exist <span
+ class="name">_start</span>, <span class="name">_e1</span>, <span
+ class="name">_a1</span>, <span class="name">_end</span>, <span
+ class="name">_a2</span>,
and <span class="name">_e2</span> such that
<span
class="name">wasStartedBy(_start; a,_e1,_a1,t1,[])</span> and <span class="name">wasEndedBy(_end; a,_e2,_a2,t2,[])</span>.