prov: changeset 6128:915883486cdb

--- a/semantics/prov-sem.html	Thu Apr 11 11:12:13 2013 +0200
+++ b/semantics/prov-sem.html	Thu Apr 11 12:36:34 2013 +0100
@@ -92,6 +92,12 @@
     border: 1px solid #888;
     background: #fff;
 }
+.lemma {
+    padding:    1em;
+    margin: 1em 0em 0em;
+    border: 1px solid #888;
+    background: #fff;
+}
 
 .proof {
     padding:    1em;
@@ -620,7 +626,7 @@
 
       function updateRules() {
         var count=1;
-        $('.constraint,.definition,.inference,.semantics,.component,.theorem').each(function(index) {
+        $('.constraint,.definition,.inference,.semantics,.component,.theorem,.lemma').each(function(index) {
 
           var myid=$(this).attr('id');
           var mycount=$(this).attr('number');
@@ -911,8 +917,6 @@
 
           // if you wish the publication date to be other than today, set this
            publishDate:  "2013-04-30",
-          previousPublishDate:  "2013-03-12",
-          previousMaturity:  "WD",
 
           // if the specification's copyright date is a range of years, specify
           // the start date here:
@@ -920,9 +924,11 @@
  
           // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
           // and its maturity status
-          
+          previousPublishDate:  "2013-03-12",
+          previousMaturity:  "WD",
+         
           // if there a publicly available Editor's Draft, this is the link
-          edDraftURI:           "http://dvcs.w3.org/hg/prov/raw-file/default/semantics/prov-sem.html",
+//edDraftURI:           "http://dvcs.w3.org/hg/prov/raw-file/default/semantics/prov-sem.html",
 
 
 
@@ -988,7 +994,7 @@
 data model (called the <dfn>naive semantics</dfn>), viewing
 PROV-DM statements as atomic formulas in the sense of first-order
 logic, and viewing the constraints and inferences specified in
-PROV-CONSTRAINTS as a first-order theory. It is shown that valid PROv
+PROV-CONSTRAINTS as a first-order theory. It is shown that valid PROV
 instances (in the sense of PROV-CONSTRAINTS) correspond to satisfiable
 theories.
 This information may be useful to researchers or users of PROV to
@@ -1028,6 +1034,7 @@
 <li> <a href="http://www.w3.org/TR/2013/NOTE-prov-links-20130430/">PROV-LINKS</a> (Note) introduces a mechanism to link across bundles [[PROV-LINKS]].</li>
 </ul>
 
+
 </section>
 
 
@@ -1037,7 +1044,9 @@
       <h2>Introduction</h2> 
 
 <p>
-Provenance is a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing.
+Provenance is a record that describes the people, institutions,
+entities, and activities involved in producing, influencing, or
+delivering a piece of data or a thing. [[PROV-DM]]
 This document complements
   the PROV-DM specification [[PROV-DM]] that defines a data model for
   provenance on the Web, and the PROV-CONSTRAINTS specification
@@ -1072,7 +1081,7 @@
 </section>
 <p>The PROV-DM and PROV-CONSTRAINTS specifications give motivating examples that
 provide an intuition about the meaning of the constructs.  For some
-concepts, such as use, start, end, generation/invalidation, and
+concepts, such as use, start, end, generation, invalidation, and
 derivation, the meaning is either obvious or situation-dependent.
 However, during the development of PROV, the importance of additional
 concepts became evident, but the intuitive meaning or correct use of
@@ -1092,27 +1101,27 @@
 logic, principally model theory (though our use of these tools is lightweight).  This information may be useful to users for understanding the
 intent behind certain features of PROV, to researchers investigating
 richer forms of reasoning over provenance, or to future efforts
-building upon PROV.  It is intended as an exploration of <b>one</b> semantics for PROV, not a definitive specification of the  <b>only</b>
+building upon PROV.  It is intended as an exploration of <b>one</b> semantics for PROV, <b>not</b> a definitive specification of the  <b>only</b>
 semantics of PROV.  We provide a semantics that satisfies all
 of the constraints on valid PROV instances, and such that valid PROV
 instances correspond to satisfiable theories: every valid instance has
 a model, and vice versa.</p>
 
-<p> Although it is a work in progress, the naive semantics has some appealing
+<p> The <a>naive semantics</a> has some appealing
 properties.  Specifically, it provides a declarative counterpart to
 the operational definition of validity taken in PROV-CONSTRAINTS.  In
 the specification, validity is defined via a normalization process
 followed by constraint checking on the normal form.  This approach was adopted
 to keep the specification closer to implementations, although other
 implementations are possible and allowed.  In addition to providing a
-naive semantics, this document shows that the operational
+<a>naive semantics</a>, this document shows that the operational
 presentation of PROV validity checking is equivalent to the
 declarative presentation adopted here.  This could help justify
 alternative approaches to validity checking.</p>
 
 <p>This document mostly considers the semantics of PROV statements and
   instances.  PROV documents can consist of multiple instances, such
-  as named bundles. The semantics does not (as yet) cover general PROV documents, but the
+  as named bundles. The semantics do not cover general PROV documents, but the
   semantics can be used on each instance in a document separately,
   just as PROV-CONSTRAINTS specifies that each instance in a document
   is to be validated separately.
@@ -1121,7 +1130,16 @@
   The semantics of extensions of PROV, such as dictionaries
   [[PROV-DICTIONARY]] and linking across bundles [[PROV-LINKS]], are
   beyond the scope of this document.  </p>
-  
+
+
+<p>This document has been reviewed by the Working Group, but the
+theorems and proofs have not been
+formally peer-reviewed in the sense of an academic paper.  Thus, 
+the Working Group believes this document is an appropriate starting
+point for future study of the semantics of PROV, but further work may
+be needed.
+</p>
+
 <section> <h4>Structure of this document</h4> <p>
 
 </p>
@@ -1224,7 +1242,7 @@
 <h3> Identifiers </h3>
 
 <p>A lowercase symbol $x,y,...$ on its own denotes an identifier.
-Identifiers may or may not be URIs.  Identifiers are viewed as
+ Identifiers are viewed as
 variables from the point of view of logic.  Identifiers denote objects, and
 two different identifiers $x$ and $y$ may denote equal or different objects.  We write $Identifiers$ for the
 set of identifiers of interest in a given situation (typically, the
@@ -1238,7 +1256,9 @@
 <p>We assume a set $Attributes$ of attribute labels and a set $Values$
 of possible values of attributes.  To allow for the fact that some
 attributes can have undefined or multiple values, we sometimes use the set
-$P(Value)$, that is, the set of sets of values.
+$P(Values)$, that is, the set of sets of values.  Thus, we can use the
+empty set to stand for an undefined value and $\{a,b\}$ to stand for
+the set of values of a two-valued attribute.
 </p> </section>
 
 
@@ -1247,7 +1267,7 @@
 <h3> Times </h3>
 
 <p>We assume an ordered set $(Times,\leq)$ of time instants, where
-$Times \subseteq Val$ and $\leq$ is a linear order.
+$Times \subseteq Values$ and $\leq$ is a linear order.
 </p>
 
 <!--
@@ -1280,8 +1300,8 @@
 <p>The following atomic formulas correspond to the statements of PROV-DM.  We assume that definitions 1-4 of PROV-CONSTRAINTS have been applied in order to expand all optional parameters; thus, we use uniform notation $r(id,a_1,\ldots,a_n)$ instead of the semicolon notation $r(id;a_1,\ldots,a_n)$.
 </p>
 <p>Each parameter is either an identifier, a constant (e.g. a time or
-  other literal value in an attribute list), or a null symbol "-".
-  Placeholder symbols $-$ can only appear in the specified arguments
+  other literal value in an attribute list), or a null symbol "$-$".
+  Placeholder symbols "$-$" can only appear in the specified arguments
   $pl$ in $wasAssociatedWith$ and $a,g,u$ in $wasDerivedFrom$, as shown in the grammar below.
 </p>
 $$
@@ -1396,14 +1416,20 @@
 interpret PROV formulas and instances.  A structure consists of a
 collection of sets, functions and relations.  The components of a
 structure $W$ are given in the rest of the section in
-<em>components</em>, highlighted in boxes.
-
+<em>components</em>, highlighted in boxes.</p>
+
+<div class="remark">
+  <p>We use the term "component" here in a different sense than in
+  PROV-DM.  Here, the components are parts of a large definition,
+  whereas PROV-DM defines six components that group different parts of
+  the PROV data model.</p>
+  </div>
 
 <section>
 <h3> Things </h3> 
 
 <p><em>Things</em>  is a set of things in the situation being modeled.
-Each thing has an associates set of $Events$ and attributes whose
+Each thing has an associated set of $Events$ and attributes whose
 values can change over time.  Different kinds of $Events$ are specified further below.
 </p>
 <p>To model this, a structure $W$ includes:
@@ -1412,7 +1438,7 @@
   <li> a set $Things$ of things</li>
   <li> a set $Events$ of events</li>
   <li> a function $events : Things \to P(Events)$ from things to
-  sets of events.</li>
+associated   sets of events.</li>
   <li>a function $value : Things \times Attributes \times Events \to
   P(Values)$ giving the possible values of each attribute of a
   $Thing$ at the instant of a given event.
@@ -1442,6 +1468,10 @@
 Events$ and $a \in Attributes$, this does not imply that $T_0 = T_1$.
 </p>
 
+<p>$Things$ are associated with certain kinds of $Objects$ called $Entities$, defined in
+the next subsection.  Specifically, the function $thingOf$ associates
+an $Entity$ to a $Thing$.
+</p>
 
 </section>
 
@@ -1485,7 +1515,7 @@
 <p>As with <em>Things</em>, the range of $value$ is sets of values,
 making $value$ effectively a multivalued function.  It is also
 possible to have two different objects that are indistinguishable by
-their attributes and time intervals.  Objects are not things, and the
+their attributes and associated events.  Objects are not things, and the
 sets of $Objects$ and $Things$ are disjoint; however, certain objects,
 namely entities, are associated with things.
 </p>
@@ -1587,9 +1617,8 @@
   <em>collections</em>, with the following associated structure:</p>
   <div class="component" id="collections">
     <ul><li>A set $Collections \subseteq Entities$</li>
-    <li>A membership relation $Contains\subseteq Collections \times Entities$
-  indicating when an entity is a member of another (collection)
-  entity.</li>
+    <li>A membership function $members : Collections \to P( Entities)$
+mapping each collection to its set of members.</li>
   </ul>
     </div>
   </section>
@@ -1599,7 +1628,8 @@
 <h4> Activities </h4>
 
 
-<p>An <em>activity</em> is an object that encompasses a set of events.  We introduce:
+<p>An <em>activity</em> is an object corresponding to a continuing
+    process rather than an evolving thing.  We introduce:
 </p>
 <div class="component" id="activities">
   <ol><li>A set $Activities \subseteq Objects$ of activities.</li>
@@ -1661,8 +1691,8 @@
 <section>
 <h5> Events </h5>
 
-<p>An $Event$ is an influence whose events is a single time
-instant, and relates an activity to an entity (which could be an
+<p>An $Event$ is an instantaneous influence that relates an activity
+to an entity (either of which could also be an
 agent).  Events have types including usage, generation, invalidation, starting and ending.  Events are instantaneous.  We introduce:
 </p>
 <div class="component" id="events">
@@ -1676,7 +1706,7 @@
 \not\preceq e$ hold.
 </li>
 <li>A function $started : Starts \to Activities \times Entities \times
-Activities$, such that $started(st) = (a,e,a')$ implies $start \in
+Activities$, such that $started(start) = (a,e,a')$ implies $start \in
 events(a) \cap events(e) \cap events(a')$.
 </li>
 <li>A function $ended : Ends \to Activities \times Entities \times Activities$, such that $ended(end) = (a,e,a')$ implies $end \in
@@ -1797,7 +1827,7 @@
   and use step.  However, not all such potential derivation paths are associated
   with derivations; there can (and in general will) be many such paths
   that are not associated with derivation steps.  In other words, because we require derivations to be
-  explicitly assocated with derivation paths, it is not sound to infer
+  explicitly associated with derivation paths, it is not sound to infer
   the existence of a derivation from the existence of an 
   alternating generation/use chain.
   </p>
@@ -1805,8 +1835,8 @@
   The reason why we need paths and not just individual derivation
   steps is to reflect that $wasDerivedFrom(id,e_2,e_1,-,-,-,attrs)$ formulas can
   represent multiple derivation steps.  However, there is no way to
-  express a multi-step derivation path in PROV: any valid PROV
-  instance turns out to have a model in which all derivation paths are
+  force a derivation to take multiple steps.  Any valid PROV
+  instance has a model in which all derivation paths are
   one-step.</p></div>
   
 </section>
@@ -1841,11 +1871,13 @@
     </li>
     <li id="axiom5">
     If $d \in Derivations$ and $prov:Revision \in
-    value(d,prov:type)$ and $derivationPath(deriv) = e_2 \cdot w \cdot
-    e_1$ then $thingOf(e_1) = thingOf(e_2)$.
+    value(d,prov:type)$ and there exists $w \in (Generations \cup
+    Activities \cup Uses \cup Entities)^* $ such that $derivationPath(deriv) = e_2 \cdot w \cdot
+    e_1 \in DerivationPaths$  then $thingOf(e_1) = thingOf(e_2)$.
     </li>
     <li id="axiom6">
-    If $attributedTo(att) = (e,ag)$ then there exist $gen$ and $assoc$
+    If $attributedTo(att) = (e,ag)$ then there exist $gen$, $assoc$
+    and $a$
     such that $generated(gen) = (e,a)$ and $associatedWith(assoc) = (a,ag)$.
     </li>
     <li id="axiom7">
@@ -1957,8 +1989,7 @@
     </li>
     <li id="axiom36">
     If $e \in Entity$ and $prov:emptyCollection \in
-    value(e,prov:type)$ then $e \in Collections$ and there exists no
-    $e'$ such that $(e,e') \in Contains$.
+    value(e,prov:type)$ then $e \in Collections$ and $members(e) = \emptyset$.
     </ol>
 </div>
 
@@ -1975,7 +2006,11 @@
   should be a constraint analogous to Constraint 34 that specifies
   that any invalidation event in which an activity participates must
   follow the activity's start event(s) and precede its end
-  event(s).</p>
+  event(s).
+  </p>
+  <p>Here, we exempt invalidations from axioms 22 and 23 in order to
+  simplify the proof of weak completeness.</p>
+  
   </div>
   </section>
   
@@ -2157,14 +2192,14 @@
 <li>[WF] The identifier $id$ maps to an activity $act = \rho(id) \in Activities$.
 </li>
 <li>$\rho(st) \in Times$ is the activity's start time, that is:
-$startTime(id) = \rho(st)$.
+$startTime(act) = \rho(st)$.
 </li>
-<li>$\rho(et)$ is the activity's end time, that is: $endTime(id) = \rho(et)$.
+<li>$\rho(et)$ is the activity's end time, that is: $endTime(act) = \rho(et)$.
 </li>
 <li>There exists $start,e,a$ such that $started(start) = (act,e,a)$,
-and for all such start events $startTime(a) = time(start).</li>
+and for all such start events $startTime(act) = time(start).</li>
 <li>There exists $end,e',a'$ such that $ended(end) = (act,e',a')$, and
-for all such end events $endTime(a) = time(end)$.</li>
+for all such end events $endTime(act) = time(end)$.</li>
 <li>The attributes match: $match(W,act,attrs)$.
 </li>
 </ol>
@@ -2534,7 +2569,7 @@
 <!--<li>$(ent_1,ent_2) \in SpecializationOf$.-->
   <li>The two entities present aspects of the same thing, that is, $thingOf(ent_1) = thingOf(ent_2)$.
 </li>
-<li>The events of $ent_1$ is contained in that of $ent_2$, i.e. $events(ent_1) \subseteq events(ent_2)$.
+<li>The events of $ent_1$ are contained in those of $ent_2$, i.e. $events(ent_1) \subseteq events(ent_2)$.
 </li>
 <li>For each attribute $attr$ we have $value(ent_1,attr) \supseteq value(ent_2,attr)$.</li>
 <li>At least one of these inclusions is strict: that is, either
@@ -2557,7 +2592,7 @@
 -->
 <p>The second criterion says that the two Entities present (possibly different) aspects of
 the same Thing. Note that the third criterion allows $ent_1$ and
-$ent_2$ to have the same events (or that of $ent_2$ can be larger).
+$ent_2$ to have the same events (or $events(ent_2)$ can be larger).
 The last criterion allows $ent_1$ to have more defined attributes than
 $ent_2$, but they must include the attributes defined by $ent_2$.  Two
   different entities that have the same attributes can also be related
@@ -2600,7 +2635,7 @@
   = \rho(c) \in Collections$ and $ent = \rho(e) \in Entities$.
 </li>
 <li>The entity $ent$ is a member of the collection $coll$: that is,
-$(coll,ent) \in Contains$.
+$ent \in members(coll)$.
 </li></ol>
 </div>
 
@@ -2658,7 +2693,7 @@
   <section>
   <h4>notNull</h4>
   <p>The $notNull(x)$ formula is used to specify that a value may not
-  be the null value $\bot$.  The symbol $-$ always denotes the null
+  be the null value $\bot$.  The symbol "$-$" always denotes the null
   value (i.e. $\rho(-) = \bot$).
   <div class="semantics" id="notNull-semantics">
     <p>$W,\rho\models notNull(e)$ holds if and only if $\rho(e) \neq
@@ -2684,8 +2719,7 @@
       <li>$W,\rho\models typeOf(c,Collection)$ holds if and only if
       $\rho(c) \in Collections$.</li>
       <li>$W,\rho\models typeOf(c,EmptyCollection)$ holds if and only if
-      $\rho(c) \in Collections$ and there is no $e \in Entities$ such
-    that $(\rho(c),e) \in  Contains$.</li>
+      $\rho(c) \in Collections$ and $members(\rho(c)=\emptyset$.</li>
       </ol>
       
     </div>
@@ -2702,7 +2736,7 @@
     In this section we restate all of the inferences and constraints
     of PROV-CONSTRAINTS in terms of first-order logic.  For each, we
     give a proof sketch showing why the inference or constraint is
-    sound for reasoning about the naive semantics.  We exclude the
+    sound for reasoning about the <a>naive semantics</a>.  We exclude the
     definitional rules in PROV-CONSTRAINTS because they are only
     needed for expanding the abbreviated forms of PROV-N statements to
     the logical formulas used here.</p>
@@ -3729,7 +3763,7 @@
 <div class="proof">
   <p>Each part follows from the fact that the semantics of
   $wasDerivedFrom$ only allows formulas to hold when either all three
-  of $a,g,u$ are $-$ (denoting $\bot$) or none of them are.</p>
+  of $a,g,u$ are "$-$" (denoting $\bot$) or none of them are.</p>
   </div>
 <div class="constraint" number="52" id="impossible-specialization-reflexive">$\begin{array}[t]{l}
 \forall e.~
@@ -3812,7 +3846,7 @@
   <h2>Soundness and Completeness</h2>
 
   <p>Above we have presented arguments for the soundness of the
-constraints and inferences with respect to the naive semantics.
+constraints and inferences with respect to the <a>naive semantics</a>.
 Here, we relate the notions of <dfn>validity</dfn> and <dfn>normal
 form</dfn> defined in PROV-CONSTRAINTS to the semantics.  
 </p>
@@ -3863,8 +3897,31 @@
   <h3>Weak Completeness</h3>
 
 <p> In this section we give a translation from valid PROV instances to
-  structures, and show that a valid PROV instance has a model.</p>
-
+  structures, and show that a valid PROV instance has a model.  We
+  call this property <em>weak completeness</em>. 
+
+  </p>
+
+  <div class="remark">
+    <p> The term <em>weak</em> refers to the fact that
+    there are still some inferences that are sound in the semantics but not
+    enforced by validation.  For example, consider the following
+    (valid) PROV instance fragment:</p>
+    <pre>
+entity(e,[a=1])
+agent(e,[b=2])
+    </pre>
+    <p>This instance is valid and has a model, but in every model
+    satisfying the instance, it is also true that:</p>
+    <pre>
+entity(e,[a=1,b=2])
+agent(e,[a=1,b=2])
+    </pre>
+    <p>Thus, weak completeness captures the fact that every valid
+    instance has a model, but does not imply that a valid instance
+    satisfies all of the deductions possible in that model.
+    </p>
+    </div>
   
   <p> Let $I$ be a valid PROV instance that is in normal form.
   We define a structure $M(I)$ as follows, by giving the sets,
@@ -3874,49 +3931,71 @@
 
   <p>First, without loss of generality, we assume that all times
   specified in activity or event formulas in $I$ are ground values.
-  If not, set each variable in such a position to some dummy value.</p>
-
+  If not, set each variable in such a position to some dummy value.
+  This is justified by the following fact:</p>
+
+<div class="lemma" id="time-grounding">
+  <p>
+  If $I$ is valid then $S(I)$ is valid, where $S$ is any 
+  substitution that maps time variables to time constants.
+  </p>
+  </div>
+  <div class="proof">
+    <p>
+    First, consider a substitution $S = [t:=c]$ that maps a
+  single time variable to a constant.  It is straightforward to check
+  that if $I$ is in normal form, then $S(I)$ is in normal form, since
+  none of the inferences or uniqueness constraints can be enabled by
+  changing a time variable uniformly in $I$.  Similarly, the remaining
+  constraints are insensitive to the time values, so $S(I)$ is in
+  normal form and satisfies all of the remaining constraints just as
+  $I$ does.  The general case of a substitution that replaces multiple time variables
+  with constants is a straightforward generalization since we can view
+  such a substitution as a composition of single-variable substitutions.
+    </p>
+    </div>
+    
   <section>
 <h4>Sets</h4>
   <p> The sets of structure $M(I)$ are: </p>
 \[
   \begin{eqnarray*}
   Entities &=& \{id \mid I \models typeOf(id,entity)\}\\
-  Plans &=& \{pl \mid wasAssociatedWith(id,ag,act,pl,attrs) \in I, pl
+  Plans &=& \{pl \mid \exists id,ag,ac,attrs.~ wasAssociatedWith(id,ag,act,pl,attrs) \in I, pl
   \neq -\}\\
   Collections &=& \{c \mid I \models typeOf(c,prov:Collection) \text{
   or } I \models typeOf(c,prov:EmptyCollection)\} \\
   
   Activities &=& \{id \mid I \models typeOf(id, activity)\}\\
   &\cup& \{a_{id},a'_{id} \mid id \in Entities\}\\
-  &\cup& \{a_{id} \mid wasDerivedFrom(id,e_2,e_1,-,-,-,attrs) \in I\}\\
+  &\cup& \{a_{id} \mid \exists id,e_2,e_1.~wasDerivedFrom(id,e_2,e_1,-,-,-,attrs) \in I\}\\
   Agents &=& \{id \mid I \models typeOf(id,agent)\}\\
   \\
-  Usages &=&  \{id \mid used(id,a,e,t,attrs) \in I\}\\
-  &\cup& \{u_{id} \mid wasDerivedFrom(id,e_2,e_1,-,-,-,attrs) \in
+  Usages &=&  \{id \mid \exists a,e,t,attrs.~used(id,a,e,t,attrs) \in I\}\\
+  &\cup& \{u_{id} \mid \exists id,e_2,e_1,attrs.~wasDerivedFrom(id,e_2,e_1,-,-,-,attrs) \in
   I\}\\
-  Generations &=&  \{id \mid wasGeneratedBy(id,e,a,t,attrs) \in I\}\\
-  &\cup& \{g_{id} \mid wasDerivedFrom(id,e_2,e_1,-,-,-,attrs) \in
+  Generations &=&  \{id \mid \exists e,a,t,attrs.~wasGeneratedBy(id,e,a,t,attrs) \in I\}\\
+  &\cup& \{g_{id} \mid \exists id,e_2,e_1,attrs.~wasDerivedFrom(id,e_2,e_1,-,-,-,attrs) \in
   I\}\\
  & \cup & \{g_{id} \mid id \in Entities\}\\
-   Invalidations &=&  \{id \mid wasInvalidatedBy(id,e,a,t,attrs) \in
+   Invalidations &=&  \{id \mid \exists e,a,t,attrs.~wasInvalidatedBy(id,e,a,t,attrs) \in
   I\}\\
   & \cup & \{i_{id} \mid id \in Entities\}\\
-  Starts &=&  \{id \mid wasStartedBy(id,a,e,a',t,attrs) \in I\}\\
-  Ends &=&  \{id \mid wasEndedBy(id,a,e,a',t,attrs) \in I\}\\
+  Starts &=&  \{id \mid \exists a,e,a',t,attrs.~wasStartedBy(id,a,e,a',t,attrs) \in I\}\\
+  Ends &=&  \{id \mid \exists a,e,a',t,attrs.~wasEndedBy(id,a,e,a',t,attrs) \in I\}\\
   Events &=& Usages \cup Generations \cup Invalidations \cup Starts
   \cup Ends\\
   \\
-  Associations &=& \{id \mid wasAssociatedWith(id,ag,act,pl,attrs) \in
+  Associations &=& \{id \mid \exists ag,act,pl,attrs.~ wasAssociatedWith(id,ag,act,pl,attrs) \in
   I\}\\
-  Attributions &=&  \{id \mid wasAttributedTo(id,e,ag,attrs) \in I\}\\
-  Delegations &=&  \{id \mid actedOnBehalfOf(id,ag_2,ag_1,act,attrs) \in I\}\\
-  Communications &=&  \{id \mid wasInformedBy(id,a_2,a_1,attrs) \in I\}\\
-  Derivations &=&  \{id \mid wasDerivedFrom(id,e_2,e_1,a,g,u,attrs) \in I\}\\
+  Attributions &=&  \{id \mid \exists e,ag,attrs.~wasAttributedTo(id,e,ag,attrs) \in I\}\\
+  Delegations &=&  \{id \mid \exists ag_2,ag_1,attrs.~ actedOnBehalfOf(id,ag_2,ag_1,act,attrs) \in I\}\\
+  Communications &=&  \{id \mid \exists a_2,a_1,attrs.~wasInformedBy(id,a_2,a_1,attrs) \in I\}\\
+  Derivations &=&  \{id \mid \exists e_2,e_1,a,g,u,attrs.~wasDerivedFrom(id,e_2,e_1,a,g,u,attrs) \in I\}\\
 \\
   Influences &=& Events \cup Associations \cup Attributions \cup
   Communications \cup Delegations\\
-  &\cup & \{id \mid wasInfluencedBy(id,o_2,o_1,attrs) \in I\}\\
+  &\cup & \{id \mid \exists o_2,o_1,attrs.~ wasInfluencedBy(id,o_2,o_1,attrs) \in I\}\\
   \\
   Objects &=& Entities \cup Activities \cup Agents \cup Influences\\
   \end{eqnarray*}
@@ -3986,7 +4065,7 @@
 also define the set of all events involved in $e$ as the set of events
 immediately involved in $e$ or any specialization of $e$.  Similarly,
 the values of attributes of $e$ are those immediately declared for $e$
-along with those of $e'$ that $e$ specializes.  We also introduce dummy
+along with those of any $e'$ that $e$ specializes.  We also introduce dummy
 generation and invalidation events for each entity $e$, along with
 activities $a_e,a'_e$ to perform them.
 </p>
@@ -4059,7 +4138,7 @@
   <p>This definition is deterministic because the sets of identifiers
   of different $Events$ are disjoint, and the associated times are unique.</p>
   
-  <p>Finally, the functions giving the interpretations of the
+  <p>The functions giving the interpretations of the
   different identified influences are as follows:
   </p>
 \[\begin{eqnarray*}
@@ -4116,6 +4195,11 @@
 <p> It is straightforward to verify (by their definitions) that the
 event sets associated with entities and activities satisfy the
 side-conditions in <a href="#events">Component 9</a>.</p>
+
+    <p> Finally, the collection membership function $members$ is
+    defined as follows:</p>
+    \[members(c) = \{e \mid hadMember(c,e) \in I\]
+
 </section>
 <section>
 <h4>Relations</h4>
@@ -4126,11 +4210,8 @@
     \[evt \preceq evt' \iff (evt,evt') \in G_I\]
     <p> closed under reflexivity and transitivity.  Here, we are using a slight abuse of notation: we write $G_I$
   for the directed graph that is used during validation of $I$ to
-  test for cycles amond event ordering constraints.  See Sec. 7.1 of PROV-CONSTRAINTS [[PROV-CONSTRAINTS]].</p>
-
-    <p> Finally, the collection membership relation $Contains$ is
-    defined as follows:</p>
-    \[(c,e) \in Contains \iff hadMember(c,e) \in I\]
+  test for cycles among event ordering constraints.  See Sec. 7.1 of PROV-CONSTRAINTS [[PROV-CONSTRAINTS]].</p>
+
     
 </section>
     <section>
@@ -4319,7 +4400,12 @@
       <h2>Acknowledgements</h2> 
       <p> 
 
-This  document has been produced by the PROV Working Group, and its contents reflect extensive discussion within the Working Group as a whole.
+This  document has been produced by the PROV Working Group, and its
+      contents reflect extensive discussion within the Working Group
+      as a whole.  Thanks specifically to Khalid Belhajjame, Tom De
+      Nies, Paolo
+      Missier, Simon Miles, Luc Moreau, Satya Sahoo, Joachim Van
+      Herwegen for detailed feedback.
       </p>
 <p>Thanks also to Robin Berjon for the ReSPec.js specification writing
 tool and to MathJax for their LaTeX-to-HTML conversion tools, both of

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/semantics/review/final-khalid.txt	Thu Apr 11 12:36:34 2013 +0100
@@ -0,0 +1,64 @@
+> 1. Is the purpose of the document clear and consistent with the working group's consensus about the semantics? If not, can you suggest clarifications or improvements?
+ > 
+ > Yes
+ > 
+ > 2. Are there minor issues that can be corrected easily prior to final release?
+ > 
+ > Yes (see below)
+ > 
+ > 3. Are there blocking issues that must be addressed prior to final release?
+ > 
+ > No
+ > 
+ > 4. ISSUE-579 requested that we incorporate an axiomatization using a more standard logic formalism e.g. first-order logic.  The current draft attempts to address this.  Can this issue be closed?
+ > 
+ > Yes.
+ > 
+ > 5. ISSUE-635 requested that we address the issues of soundness and completeness in the semantics.  This is currently attempted, by generalizing the semantics (which unfortunately also decreases the connection to intuitive notions of time.)  As a result, we have a soundness and weak completeness result stating that any valid PROV instance has a model and vice versa.  Can this issue be closed?
+ > 
+ > Yes.
+ > 
+ > 
+ > 
+ > C1. In Section 1.1., it will be helpful to provide a reference to Naive Semantics.
+ > 
+
+I'm not sure what you mean here.  I've made other copies of "naive semantics" into links to this definition.
+
+If you mean refer to an external resource, please suggest one.
+
+ > C2. In Section 2.1, you state that "To allow for the fact that some
+ > attributes can have undefined or multiple values, we sometimes use the set P(Value), that is, the set of sets of values." For clarification, you may add here that you use empty sets for undefined values.
+
+There is a remark to this effect later, but it's good to signpost it.
+
+ > 
+ > C3. In Section 3, you use the term component, which is also use in PROV-DM, but with different meaning. It is worth mentioning this.
+
+Done
+
+ > 
+ > C4. In the first paragraph of Section 3.2, the correspondence between Things and Objects is not clear, although it is talked about. This confusion is then lifted later on in the same section, but I think it is worth saying something about this earlier.
+
+Done
+
+ > 
+ > C5. In Section 3.3, component 14, I could not follow the 5th bullet, I suspect there is a variable that has not been declared.
+ > 
+
+Do you mean additional axiom 5?
+
+I believe the unquantified variable was $w$, and I've clarified it.  Basically it is a dummy variable for the stuff in between the beginning and end of a derivation path.
+
+
+ > C6. In component 15, I was wondering if the following inference is mentioned somewhere: "a path between two entities that is contained in a dervation path, is also a derivation path"
+ > 
+
+No, there is no inference to this effect in PROV-CONSTRAINTS, so we do not assume it here.
+
+ > C7. In the sets of structures that are listed in Section 6.2.1, there are variables that are not bound in the set. Some of these variables are associated the imprecise derivation, but there are also others that are not. In particular, if you look at the second set in the union that define Activities, you will notice that aid and aid' are not bound.
+ > 
+ > 
+
+These are new activities, generations, invalidations, and usages associated with certain ids.  The notations a_id and so on are explained later.  
+

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/semantics/review/final-luc.txt	Thu Apr 11 12:36:34 2013 +0100
@@ -0,0 +1,84 @@
+
+ > - abstract: PROv -> PROV
+
+Done
+
+ > - sotd paragraph: add the paragraph about feedback and errata.
+ > 
+
+TODO
+
+ > - section 1: paragraph 1: definition of provenance: cite prov-dm.
+ > 
+
+Done
+
+ > - section 1.1: generation/invalidation  -> generation, invalidation
+ > 
+
+Done
+
+ > - section 1.1: make 'not' bold
+ >   not a definitive specification of the ONLY semantics
+ >   ->
+ >   NOT a definitive specification of the ONLY semantics
+ > 
+
+Done
+
+ > 
+ > - I like the mapping to events.
+ > 
+ > -  remark in 3.2.4.6: assocated -> associated
+
+Done
+
+ > 
+ > - remark in 3.2.4.6: I didn't understand the last paragraph of the
+ > remark. why is it there is no way to express multi step derivation
+ > path?
+
+There is no way to write a PROV instance whose ONLY models have multiple step derivations.
+
+ > 
+ > - component 15 axioms: I was surprised to see a reference to things among
+ > the axioms (see axiom 5).  But I guess, this is OK.
+ > 
+
+This is because one of the constraints links revision and alternate in a semi-arbitrary way (that is, there is nothing about the way revisions are interpreted that forces this).  There are other constraints about alternate but these follow from the way alternate is modeled.
+
+ > 
+ > - section 3.3, remark: well spotted that we miss a constraint for
+ >  invalidation.
+ >  Did you identify it when proving properties?
+
+Yes
+
+ > 
+ > - section 4.3.2. How is \rho(st) defined?  what it 'st' is a constant
+ >  or an identifier?
+
+"st" is an ordinary parameter, i.e. either a time value constant or an identifier denoting an unknown time.  TODO
+
+ > 
+ > - section 6.2: for the reader, can you make explicit why this is "WEAK completness".
+ > 
+
+addressed in a remark/lemma
+
+ > - just before 6.2.1: why can you set each time variable to some dummy value?
+ > 
+ >  Is it because once, you have established an instance is valid, then
+ >  you have already unified all possible time variable to ground
+ >  values. Then, instantiating any time variable can no longer break
+ >  any constraints.
+
+Yes, and this is because time values (and their ordering) are "just data", and so their values are irrelevant.
+
+ > 
+ > - 6.2.3:
+ > 
+ >  amond -> among
+ > 
+ >
+Done

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/semantics/review/final-satya.txt	Thu Apr 11 12:36:34 2013 +0100
@@ -0,0 +1,45 @@
+ > 
+ > 
+ > Please address the following review questions:
+ > 
+ > 1. Is the purpose of the document clear and consistent with the working group's consensus about the semantics? If not, can you suggest clarifications or improvements?
+ > 
+ > Yes the document is easy to follow.
+ >  
+ > 2. Are there minor issues that can be corrected easily prior to final release?
+ > Yes, some minor points are listed below 
+ > 
+ > 3. Are there blocking issues that must be addressed prior to final release?
+ > No 
+ > 
+ > 4. ISSUE-579 requested that we incorporate an axiomatization using a more standard logic formalism e.g. first-order logic.  The current draft attempts to address this.  Can this issue be closed?
+ > Yes
+ > 
+ > 5. ISSUE-635 requested that we address the issues of soundness and completeness in the semantics.  This is currently attempted, by generalizing the semantics (which unfortunately also decreases the connection to intuitive notions of time.)  As a result, we have a soundness and weak completeness result stating that any valid PROV instance has a model and vice versa.  Can this issue be closed?
+ > Can be closed - was not able to review completeness properly (will need to cross-check with cited axioms in the Section 6.2.4).
+ > 
+ > Minor issues:
+ > Section 1.1
+ > As Khalid pointed out, need to clarify or cite "Naive Semantics".
+
+Again, not sure what is intended here.  Naive just means naive, i.e. there's no reason to believe that it is the only or best example.
+
+ > 
+ > Section 2.1
+ > The two sentences together are a bit confusing "Identifiers may or may not be URIs. Identifiers are viewed as variables from the point of view of logic."
+ > If they are URIs they cannot be variables - I am not sure the statement about URIs is needed here.
+
+Removed the first sentence.  The intent is to align the way people use URIs and sameAs in OWL with variables and equality here.  But if saying less is less confusing I'm happy to do that.
+
+ > 
+ > Section 2.3
+ > This may be nit picking, should "is a linear order" be replaced by more generic "partial order"?
+ > 
+
+It doesn't matter; we never use the time ordering for anything anyway.  I am thinking that timestamps (xsd:dateTime) is linearly ordered in the obvious way, but this doesn't imply anything about the ordering of the associated events, because we don't necessarily know e.g. that all the time stamps are in the same frame of reference/time zone.
+
+ > Section 4.4.12
+ > In the Remark "(or that of can be larger)" "larger" is not clear
+ >   
+
+Fixed I hope.

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/semantics/review/final-simon.txt	Thu Apr 11 12:36:34 2013 +0100
@@ -0,0 +1,69 @@
+> Yes, the document seem clear and the semantics correct. I see no blocking issues, but some more minor ones below. The two issues mentioned seem fully addressed to my (shallow) knowledge of them.
+ > 
+ > Abstract: "PROv" should be "PROV"
+
+Done
+
+ > 
+ > Section 1.1: "Although it is a work in progress" (3rd paragraph) and "(as yet)" (fourth paragraph). Is this appropriate for the final release of the note?
+
+I've adjusted this.  Basically, I'm a little uncomfortable with publishing something with "theorems" and "proofs" that hasn't been formally peer-reviewed.  So I've removed the preliminary-sounding text above and added a paragraph just noting that the WG has reviewed it and it reflects a consensus but this hasn't been peer-reviewed as far as the formal content goes.
+
+
+ > 
+ > Section 2.2: "a set Values... we sometimes use the set P(Value)" (Values vs Value)
+ > 
+
+Fixed
+
+ > Section 2.3: "Times SUBSET Val" should be "Times SUBSET Values"
+ > 
+
+Fixed
+
+
+ > Section 2.4: "a null symbol "-". Placeholder symbols - " -> They seem to be the same symbol except that null has quotes round it (though it doesn't when used later). Can this be clarified?
+ > 
+
+The quotes are for rmphasis, since otherwise it looks strange in the text.  So they should all be "-" when they occur in a sentence.  Fixed.
+
+
+ > Section 3.1: "Each thing has an associates set of Events" -> "associated set of Events"
+ > 
+
+Fixed
+
+ > Section 3.1, Component 1, point 3: "from things to sets of events." does not seem to explain the relation. Shouldn't it be "from things to the sets of events associated with those things."?
+ > 
+
+Fixed "to associated sets of events"
+
+ > Section 3.2.4.1: "An Event is an influence whose events is a single time instant" -> I just couldn't interpret this statement. Are there one or several events (it is ungrammatical)? I assume when you say "whose events" you mean the events associated to the (influence) object, but if so, I think this could be rephrased to be clearer. It seems a circular definition, as "events" refers to elements of the set "Event", right? I think I would need some rephrasing to help understand what was intended.
+ > 
+
+Sorry, this was due to a late search-and-replace replacing "time interval" with "set of events".  Fixed to " instantaneous influence that relates an activity to an entity"
+
+ > Section 3.2.4.1: "relates an activity to an entity (which could be an agent)" -> should this be "(either of which could be an agent)"?
+
+Yes, fixed.
+
+ > 
+ > Sections 4.4.*: Many of the rules use a mixture of "e" and "ent", "e1" and "ent1", "e2" and "ent2", and in Semantics 30, "ent2.gen.act.use.ent1" should be "e2.g.a.u.e1" I think.
+ > 
+
+Actually, this is intentional: the rules generally use parameter names like e,a etc. for syntax, and then define ent = rho(e), act = rho(a), etc. as the semantic values associated to the symbols by rho.  In particular, the derivation paths are paths of semantic entities, generations, activities, etc. and so we want to use ent2, gen, act, etc. there.
+
+ > Section 4.4.12: "The events of ent1 is contained" (ungrammatical) 
+
+Fixed.
+
+ > 
+ > Section 6.2.2: "the values of attributes of e are those immediately declared for e along with those of any specialization". Why? Isn't this the wrong way round? An entity's specialization has attribute values that the entity does not have (across its lifetime). I may have misunderstood the intent here.
+ > 
+
+Yes ,this was backwards in the text, updated to "... of any $e'$ that $e$ specializes"
+
+ > Section 6.2.3: "amond" -> "among".
+ > 
+
+Fixed.

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/semantics/review/final-tom.txt	Thu Apr 11 12:36:34 2013 +0100
@@ -0,0 +1,60 @@
+
+ > 1. Introduction: 
+ >   PROv -> PROV
+ >   The semantics does not (as yet) cover --> The semantics do not (as yet) cover
+ >
+
+Fixed
+
+> 3.1 
+ >   an associates set of Events --> associated
+
+Fixed
+
+ > 
+ > 3.2
+ >      It is also possible to have two different objects that are indistinguishable by their attributes and time intervals --> did you mean events instead of time intervals?
+
+Fixed - incomplete search & replace.
+
+ > 
+ > 3.2.1.2
+ >   Just wondering why you decided to use a relation for contains here, whereas for the rest of the document, you use functions. Both are correct, but this is the only place a relation was used. (unless we missed something)
+
+Yes, that change seems like a win!  (We do use relations for event ordering though.)  much better for stating emptiness constraints.
+
+ > 
+ > 3.2.2
+ >     An activity is an object that encompasses a set of events. --> Earlier in the document an object is described as encompassing a set of events, so according to this, all objects would be activities.
+
+Well, this isn't an if and only if.  But there is no reason to be so specific., clarified this.
+
+ > 3.2.4.1
+ >     started(st)=(a,e,a′) implies start∈ --> st is undefined. Change to started(start)
+ 
+Fixed
+
+> 3.3
+ >     6. 'a' is undefined --> there exist gen, assoc and a
+
+Fixed
+
+ > Also, the remark about axioms 22 and 23 is correct, but why is it arguable? If the constraint exists, why leave invalidations out here? Does it break something else?
+ > 
+
+If we omit invalidations, then the weak completeness result won't hold, or at least, I'm not sure how to prove it.  Trying to come up with a counterexample.
+
+ > 4.3.2
+ >     startTime/endTime(id): --> should be act instead of id
+
+Fixed
+
+
+ > 4.4.10.1 use event ρ(u)∈Usages. --> use event use = ρ(u) ∈Usages
+ > 
+
+Fixed
+
+ > I see most of these have been fixed already after other reviews.
+ > Overall, it's a good document, and for the most part understandable if you know the basic syntax. Chapter 6 is harder to follow, especially if you're not that familiar with formal logic, or a bit rusty. (which is probably not the audience for this document anyway)
+ >

author	jcheney@inf.ed.ac.uk
	Thu, 11 Apr 2013 12:36:34 +0100
changeset 6128	915883486cdb
parent 6127	17868f242e82
child 6129	057c642b9044

semantics/prov-sem.html
semantics/review/final-khalid.txt
semantics/review/final-luc.txt
semantics/review/final-satya.txt
semantics/review/final-simon.txt
semantics/review/final-tom.txt