added constraints blog draft
authorPaolo Missier <pmissier@acm.org>
Sat, 08 Sep 2012 13:47:28 +0100
changeset 4434 105b1152b108
parent 4433 cc1c187a3a78
child 4435 ad709adcae50
added constraints blog draft
model/contraints-blog.html
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/model/contraints-blog.html	Sat Sep 08 13:47:28 2012 +0100
@@ -0,0 +1,107 @@
+<html><body>
+
+<p><strong>Note: all links use http://www.w3.org/TR/prov-constraints/</strong>  so most of them don't land in the right place just yet</p>
+
+<h2>Last Call: Constraints of the Provenance Data Model</h2>
+
+<p>On Sept. XX, 2012 the Provenance Working Group has announced Last Call on a new document in the suite that defines the core of the PROV family of specifications: <a href="http://www.w3.org/TR/prov-constraints/">PROV-CONSTRAINTS</a>.</p>
+
+<p>This follows the recent <a href="http://www.w3.org/blog/SW/2012/07/24/last-call-3-working-drafts-for-provenance-interchange/">Last Call announcement for 3 other documents</a>, namely <a href="http://www.w3.org/TR/prov-dm/">PROV-DM<a>, <a href="http://www.w3.org/TR/prov-o/">PROV-O<a>, <a href="http://www.w3.org/TR/prov-n/">PROV-N<a>. The meaning of <em>Last Call</em> is clarified in the earlier announcement. Essentially, it means that the specification document is open to public comments for a set period of time, at the end of which the editors commit to produce the final version of the document, where all such comments are accounted for following internal group discussions.</p>
+
+<p> The <a href="http://www.w3.org/TR/prov-constraints/">PROV-CONSTRAINTS</a> document complements the first three, and is focused on the notion of <em>valid</em> provenance. The intent of provenance validation  is to ensure that a set of PROV statements, called a <strong>PROV instance</strong>, represents a history of objects and their interactions which is consistent, and thus safe to use for the purpose of logical reasoning and other kinds of analysis.  
+ Valid PROV statements satisfy the all definitions, inferences, and constraints found in the  The <a href="http://www.w3.org/TR/prov-constraints/">PROV-CONSTRAINTS</a> document. </p>
+<p>
+
+
+Thus, the document can be used for two main purposes:
+
+<ul> 
+<li>To design a validator that can be used to check the consistency of a PROV instance, and to infer new statements from the given ones by means of logic reasoning;
+<li>To determine whether or not two PROV instances are <em>equivalent</em>. For this, the notion of <strong>normal form</strong> for PROV statements is defined in the document, along with guidelines for normalizing  PROV instances, and for comparing two normalized such sets. 
+</ul>
+
+<h3>What is in the CONSTRAINTS document?</h3>
+
+<p>The document includes <em>definitions</em>,  <em>constraints</em>, and <em>inference rules</em>. 
+
+<ul>
+<li><strong>Definitions</strong> are rules that specify when two PROV statements are equivalent. For example, <a href="http://www.w3.org/TR/prov-constraints#optional-identifiers">Def. 1</a> stipulates the optional nature of identifiers, by stating  that an expression of the form:<br/>
+
+<code>r(a1,...,an) </code><br/>
+
+is equivalent to: <br/>
+
+<code>r(id; a1,...,an)</code> <br/>
+
+where <code>id</code> is an identifier that in unique across all instances of relation <code>r</code> (this is true for the majority of relations in PROV, with a few exceptions noted as part of the <a href="http://www.w3.org/TR/prov-constraints#optional-identifiers">definition</a>).</li>
+
+<li><strong>Inferences</strong> are IF-THEN rules by which new valid statements can be generated from a given set of statements. For example,  <a href="http://www.w3.org/TR/prov-constraints#derivation-generation-use-inference">Inference 11</a> specifies that, if entity <code>e2</code> was generated from <code>e1</code> during the course of activity <code>a</code>, then it must be the case that <code>a</code> used <code>e1</code> at some point in time,  and that <code>e2</code> was generated by <code>a</code> at some other time. This is expressed as follows:</p>
+
+<code>IF wasDerivedFrom(_id; e2,e1,a,gen2,use1,_attrs), THEN there exists _t1 and _t2 such that used(use1; a,e1,_t1,[]) and wasGeneratedBy(gen2; e2,a,_t2,[]).</p></code>
+
+</li>
+
+<li>Finally, three types of <strong>constraints</strong> are defined.
+
+<ul>
+<li><strong>Uniqueness constraints</strong>. These include key constraints, stating for instance that <code>e</code> is key for statement <code>entity(e,attrs)</code>, but also constraints that state the uniqueness of events such as the generation of an entity.  <a href="http://www.w3.org/TR/prov-constraints#unique-generation">Constraint 25</a> for example states that only one generation event can be associated to a generated entity and a generating activity:</p>
+
+<code>IF wasGeneratedBy(gen1; e,a,_t1,_attrs1) and wasGeneratedBy(gen2; e,a,_t2,_attrs2), THEN gen1 = gen2.</code></p></li>
+
+<li><strong>Event ordering constraints</strong>. These specify the possible orderings of events (generation, usage, invalidation of entities, start and end of activities) that correspond to a sensible history. For example, an entity should not be used before it is generated (<a href="http://www.w3.org/TR/prov-constraints#generation-precedes-usage">Constraint 39</a>):</p>
+
+<code>IF wasGeneratedBy(gen; e,_a1,_t1,_attrs1) and used(use; _a2,e,_t2,_attrs2) THEN gen precedes use.</code></p>
+
+<li><strong>Impossibiliy constraints</strong>. These are used to state for example that the same identifier cannot be used in two different relation types (i.e. <code>entity(foo)</code> and <code>activity(foo)</code> is an illegal combination), but also to state property of relations, for example "specialization is irreflexive" (<a href="http://www.w3.org/TR/prov-constraints#impossible-specialization-reflexive">Constraint 54</a>): </p>
+ <code> IF specializationOf(e,e) THEN INVALID.</code></p>
+and "the set of entities and activities are disjoint" (<a href="http://www.w3.org/TR/prov-constraints#entity-activity-disjoint">Constraint 57</a>):</p>
+<code>IF 'entity' &isin; typeOf(id) AND 'activity' &isin; typeOf(id) THEN INVALID.</code></p>
+</ul>
+
+</ul>
+<h3>Examples</h3>  
+
+The rules and constraints machinery just described is intended to be used for validation and inference of PROV statements. Some simple examples follow.
+
+<h4>Example 1</h4>  
+
+<strong>Luc maybe you can add a simple inference here? </strong>
+
+<h4>Example 2</h4>  
+
+We now show an inference process involving ordering constaints, which leads to concluding that <em>all the events involved in the provenance must all be simultaneous</em>. Although logically this is a possibility, this is most likely an indication of some of the statements disrupt the consistency of the entire history.  The example involves a case of <em>mutual derivation</em> of an entity from another. Suppose our PROV instance includes the following statement:</p>
+<code>wasDerivedFrom(e2,e1,a,gen,use)</code></p>
+
+that is, <code>e2</code> was generated from <code>e1</code> through activity <code>a</code>. Here <code>gen</code> and <code>use</code> denote the events for the generation of <code>e2</code> and the <code>use</code> of <code>e1</code>, respectively, by <code>a</code>.</p>
+
+
+<a href="http://www.w3.org/TR/prov-constraints#derivation-usage-generation-ordering">Constraint 43</a> defines the precedence of use over generation in the context of derivation:</p>
+
+<code>IF wasDerivedFrom(e2,e1,a,gen,use) THEN use precedes gen.</code></p>
+
+Intuitively, <code>a</code> must have used <code>e1</code> prior to generating  <code>e2</code>: </p>
+<code>  use precedes gen.</code></p>
+   
+Suppose we add the following statement to the instance:</p>
+<code>wasDerivedFrom(e1,e2,a',gen',use')</code></p>     
+That is, <code>e1</code> was generated from <code>e0</code> through activity <code>a'</code>. This new statement may come about because of a merge operation that attempts to blend together two independent sets of statements which predicate on the same objects.  Adding this new statement, however, creates a circular derivation between <code>e1</code> and <code>e2</code>, a fairly atypical situation. We therefore expect that our constraint system be able to tell us something interesting. Indeed, by application of the same <a href="http://www.w3.org/TR/prov-constraints#derivation-usage-generation-ordering">Constraint 43</a>, this new statement entails: </p>
+<code> use' precedes gen'</code></p>
+
+Furthermore, <a href="http://www.w3.org/TR/prov-constraints#generation-precedes-usage">Constraint 39</a>, mentioned earlier, specifies  that the generation of any entity must precede all of its uses. Thus we have the additional precedence relations:</p>
+<code> gen   precedes use'  (e1 generated before use),  gen'  precedes use   (e2 generated before use)</code></p>
+
+The precedence relation is a preorder between instantaneous events: a constraint of the form <code>e1 precedes e2</code> means that <code>e1</code> happened <em>at the same time as</em> or before <code>e2</code>. By this definition, the set of precedence relations entailed so far leads to a single logical conclusion: all the events involved in the circular derivation must have happened at the same time:</p>
+<code>use = gen = use' = gen'.</code></p>
+ Although this is feasible in theory, in practice this conclusion flags these two statements as potentially inconsistent. 
+A reasoner can of course extend these conclusions to the case of a circular derivation involving more than two activities.
+ 
+<h2>Conclusion</h2>
+
+
+
+</html></body>
+
+
+
+
+