* Working on Stian issue-459 etc.
authorJames Cheney <jcheney@inf.ed.ac.uk>
Tue, 07 Aug 2012 12:45:25 +0100
changeset 4296 8f63cf1f72f1
parent 4295 b5347a7bc524
child 4297 14a374300475
* Working on Stian issue-459 etc.
model/comments/issue-459-stian.txt
model/prov-constraints.html
--- a/model/comments/issue-459-stian.txt	Tue Aug 07 12:32:13 2012 +0100
+++ b/model/comments/issue-459-stian.txt	Tue Aug 07 12:45:25 2012 +0100
@@ -1,7 +1,7 @@
    > 
    > 
    > The following is my review of the PROV Constraints document from:
-   >    http://dvcs.w3.org/hg/prov/raw-file/tip/model/releases/ED-prov-constraints-20120723/prov-constraints.htm   > l
+   >    http://dvcs.w3.org/hg/prov/raw-file/tip/model/releases/ED-prov-constraints-20120723/prov-constraints.html
    > 
    > Please accept my apologies in the delay, this document, being rather
    > formal, required a bit more time on my behalf to be reviewed.
@@ -28,6 +28,9 @@
    > document changes from a specification-style language to a "we assume"
    > paper style language. I would try to keep it at specification level.
    > 
+
+"We assume" has been reworded.
+
    > Some of the introductory text seems to assume the reader wants to do
    > normalization, however I suspect most readers will only be interested
    > in validation; in particular to ensure their own PROV instance is
@@ -158,6 +161,8 @@
    > specification might be unclear or confusing. 
    > 
    > 
+
+1.
    > 
    > Abstract: "statements and constraints that PROV instances must satisfy
    > in order to be considered valid".
@@ -173,11 +178,21 @@
    > Without the clarification of "according to this document" the meaning
    > of "MUST" here is difficult - would it not be valid PROV-DM either?
    > 
-   > 
+
+Fixed.
+
+2.
+
+  > 
    > > 1.1 Conventions
    > > Some of these ariables
    > --> variables
    > 
+
+Fixed
+
+3.
+
    > 
    > > _, by convention, to indicate that they (intentionally) appear
    > only once in the formula; thus, the textual variable name is
@@ -193,6 +208,12 @@
    > Replace with something like "such passive variables are provided
    > merely as an aid to the reader."
    > 
+
+Fixed.
+
+
+
+4.
    > 
    > 1.2 Purpose of this document
    > 
@@ -203,6 +224,11 @@
    > 
    > This statement might be in other documents as well, but I've not checked.
    > 
+
+Fixed.  I think this phrasing only stated here...
+
+5.
+
    > 
    >    > > A valid PROV instance corresponds to a consistent history
    > of objects and interactions to which logical reasoning can be
@@ -211,6 +237,11 @@
    > Again the valid, but here with a small definition. A forward-link to
    > section 6 would be helpful.
    > 
+
+Done.
+
+6. BLOCKER
+
    > 
    > 
    > > Applications should also use definitions, inferences and constraints
@@ -236,6 +267,11 @@
    > similarly applications consuming provenance MAY reject provenance that
    > is not _valid_."
    > 
+
+This is ISSUE-464.  Your wording is adopted.
+
+7.
+
    > 
    > > To summarize: compliant applications use definitions, inferences,
    >    > and uniqueness constraints to normalize PROV instances, and then
@@ -250,12 +286,25 @@
    > normalization.  It is saying you can't be compliant without doing all
    > of the above!
    > 
+
+This is ISSUE-464.  I have reworded to avoid giving the impression
+that we require normalization, validity checking, or equivalence
+checking, but kept the summary - I feel it is important to summarize
+how the components fit together.
+
+8.
+
    > 
    > 1.3 Structure of this document
    > > certain statments
    > statements
    > 
    > 
+
+Fixed.
+
+9.
+
    > 
    > 2.1 Entities, Activities and Agents
    > 
@@ -270,21 +319,40 @@
    > entirety of an entity's lifetime"
    > 
    > 
+
+Fixed.
+
+10.
+
    > 
    > > Different entities that are aspects of the same thing are called alternate,
    > 'alternate' -> 'alternates'
    > 
    > 
+
+Fixed.
+
+11.
+
    > > PROV relations of specialization and alternate can be used
    > Use actual relation names and code style:
    > 
    >   'specialization' -> '<code>specializationOf</code>'
    >   'alternate' -> '<code>alternateOf</code>'
    > 
+
+Fixed.
+
+12.
    > > express dependencies between the various entities using events
    > 
    > italics on 'events'.
    > 
+
+Done.
+
+13.
+
    > 
    >    > > An activity is delimited by its start and its end events;
    > hence, it occurs over an interval delimited by two instantaneous
@@ -292,6 +360,10 @@
    > 
    > Remove "hence, "
    > 
+Fixed.
+
+14.
+
    > 
    >    > > In contrast, an activity (..) is typically not identifiable by the characteristics it exhibits at any point during its duration.
    > 
@@ -301,6 +373,11 @@
    > use the argument that activities are not things in the world, they are
    > what happens to the things in the world.
    > 
+
+Fixed.
+
+15. 
+
    >    > > An entity usage event is the instantaneous event that
    > marks the first instant of an entity's consumption timespan by an
    > activity. 
@@ -313,6 +390,11 @@
    > this instant, although the activity could potentially have other
    > usages of the same entity starting at different times."
    > 
+
+Fixed.
+
+16.
+
    > 
    > 2.3 Summary of constraints..
    > 
@@ -328,6 +410,19 @@
    > horisontal headers? Perhaps even the Type/Relation column could be
    > done as headers?
    > 
+
+I had put this in its own section, in the non-normative part. It would
+be good to summarize the constraints somehow, but this doesn't seem to
+be working well. It would also be good to summarize the typing
+constaints in the informative section (e.g. previewing the typing
+constraint rules).
+
+Should this table / index be an appendix or something?
+
[email protected]?
+
+17.
+
    > 
    > > To be compliant:
    > 
@@ -347,6 +442,17 @@
    > 
    > Replace "When" with "If".
    > 
+
+I believe when and if are synonymous in this context.  I replaced the
+last three since these are the only ones that change the emphasis.
+The first reads strangely to me (If...may) so I left it as is.
+
+I also added a paragraph to make it clear that we do not require
+computing normal forms, only that whatever algorithms people use
+produce the same results as the specification.
+
+18.
+
    > 
    > 4. Inferences and Definitions
    > 
@@ -357,6 +463,11 @@
    > 
    > use concl1 and concln in first line. (Consistency).
    > 
+
+Fixed.
+
+19.
+
    > 
    > 4.1 Optional identifiers
    > 
@@ -365,6 +476,11 @@
    > "We note that" -> "Note that"  ("we" are not observing here, but specifying!)
    > 
    > 
+
+Fixed.
+
+20.
+
    > 
    > > then the following definition holds:
    > > r(id;a1,...,an) holds IF AND ONLY IF r(id;a1,...,an,[]) holds.
--- a/model/prov-constraints.html	Tue Aug 07 12:32:13 2012 +0100
+++ b/model/prov-constraints.html	Tue Aug 07 12:45:25 2012 +0100
@@ -421,14 +421,18 @@
 </p>
 
 
-<p> This document introduces <a>definitions</a> and <a>inferences</a>
-  that are allowed on provenance statements and <a>constraints</a>
-  that PROV instances MUST satisfy in order to be considered
-  <a>valid</a>. These definitions, inferences, and constraints are useful for
-  readers who develop applications that generate provenance or reason
-  over provenance.  They can also be used to <a title="normal form">normalize</a> PROV
-  instances to forms that can easily be compared in order to determine
-  whether two PROV instances are <a>equivalent</a>.</p>
+<p> This document defines a subset of PROV instances called
+<i><a>valid</a></i> PROV instances.  A <a>valid</a> PROV instance corresponds to a
+consistent history of objects and interactions to which logical
+reasoning can be safely applied.
+Valid PROV instances satisfy
+certain <a>definitions</a>, <a>inferences</a>, and
+<a>constraints</a>. These definitions, inferences, and constraints
+provide a measure of consistency checking for provenance and reasoning
+over provenance.  They can also be used to <a title="normal
+form">normalize</a> PROV instances to forms that can easily be
+compared in order to determine whether two PROV instances are
+<a>equivalent</a>.</p>
 
 </section>
 
@@ -501,8 +505,8 @@
     lower-case identifiers.  Some of these variables are written
     beginning with the underscore character <span
     class="name">_</span>, by convention, to indicate that they
-    (intentionally) appear only once in the formula; thus, the textual
-    variable name is mnemonic only.  </p>
+    appear only once in the formula.  Such variables are
+    provided merely as an aid to the reader.  </p>
 
 </section>
 
@@ -512,7 +516,7 @@
 <h3>Purpose of this document</h3>
 
 <p>The PROV Data Model, PROV-DM, is a conceptual data model for provenance, which is
-realizable using different serializations such as PROV-N and PROV-O.
+realizable using different representations such as PROV-N and PROV-O.
 A PROV <dfn>instance</dfn> is a set of PROV statements,
 possibly including <a>bundles</a>, or named sets of statements. For
 example, such a PROV instance could be a .provn document, the result
@@ -529,24 +533,40 @@
 <a>valid</a> PROV instances by specifying <em>constraints</em> that
 <a>valid</a> PROV instances must satisfy. There are four kinds of
 constraints: <em>uniqueness constraints</em>, <em>event ordering
-constraints</em>, <em>impossibility constraints</em>, and <em>type constraints</em>.
-Applications SHOULD produce valid
-provenance and MAY reject provenance that is not <a>valid</a>.  Applications
-SHOULD also use definitions, inferences and constraints to normalize
-PROV instances in order to determine whether two such instances convey
-the same information.
-
-
-<p>To summarize: compliant applications use definitions,
-inferences, and uniqueness constraints to normalize PROV instances,
-and then check event ordering, impossibility, and type constraints.  If these
-checks succeed, the instance is
-<a>valid</a>, and the normal form is considered <a>equivalent</a> to
-the original instance.  Also, any two PROV instances that yield the
-same normal form are considered <a>equivalent</a>.  Further discussion
+constraints</em>, <em>impossibility constraints</em>, and <em>type
+constraints</em>.
+Further discussion
 of the semantics of PROV statements, which justifies the inferences
 and constraints, can be found in the formal semantics [[PROV-SEM]].
 </p>
+
+<p>We define validity and equivalence in terms of a
+concept called <a title="normal form">normalization</a>.  Definitions, inferences,
+and uniqueness constraints can be applied to <a title="normal
+form">normalize</a> PROV instances, and event ordering, typing, and
+impossibility constraints are checked on the normal form to determine
+<a title="valid">validity</a>.  Equivalence of two PROV 
+instances can be determined by comparing their normal forms.
+</p>
+<p>
+This document outlines an algorithm for validity checking based on
+<a title="normal form">noramlization</a>.  Applications MAY implement
+validity and equivalence checking using normalization, as suggested
+here, or in
+any other way as long as the same instances are considered valid or
+equivalent, respectively.
+</p>
+
+<p>Checking validity or equivalence are RECOMMENDED, but not required, for
+applications compliant with PROV.  This specification defines how
+validity and equivalence are to be checked, if an application elects
+to support them at all.  
+Applications producing provenance SHOULD ensure that it is
+<a>valid</a>, and similarly applications consuming provenance MAY reject provenance that is not <a>valid</a>.  Applications
+which are determining whether PROV instances convey the same
+information SHOULD check equivalence as specified here, and SHOULD
+treat equivalent instances in the same way.
+</p>
 </section>
 <section>
 <h4>Structure of this document</h4>
@@ -604,8 +624,9 @@
 
 <section id='rationale' class="informative">
 <h3>Rationale</h3>
-<p> In this section we give a high-level rationale that provides some
-  further background for the constraints. </p>
+<p> This section gives a high-level rationale that provides some
+  further background for the constraints, but does not affect the
+technical content of the rest of the specification.</p>
 
 <section>
 <h4>Entities, Activities and Agents</h4>
@@ -627,15 +648,15 @@
 between its <a title="entity generation event">generation event</a>
 and its <a title="entity invalidation event">invalidation event</a>.
 An entity's attributes are established when the entity is
-created and describe the entity's situation and (partial) state
-during the entity's lifetime.</p>
+created and (partially) describe the entity's situation and state
+during the entirety of the entity's lifetime.</p>
 
 <p>
 A different entity (perhaps representing a different user or
 system perspective) may fix other aspects of the same thing, and its provenance
 may be different.  Different entities that fix aspects of the same
-thing are called <em>alternate</em>, and the PROV relations of
-specialization and alternate can be used to link such entities.</p>
+thing are called <em>alternates</em>, and the PROV relations of
+<span class="name">specializationOf</span> and <span class="name">alternateOf</span> can be used to link such entities.</p>
 
 <p>Besides entities, a variety of other PROV objects have
 attributes, including activity, generation, usage, invalidation, start, end,
@@ -657,9 +678,9 @@
   over which relevant attributes of the thing are not fixed, a PROV
   instance would describe multiple entities, each with its own
   identifier, <a>lifetime</a>, and fixed attributes, and express dependencies between
-  the various entities using events.  For example, if we want to
+  the various entities using <i><a href="instantaneous event">events</a></i>.  For example, in order to
   describe the provenance of several versions of a document, involving
-  attributes such as authorship that change over time, we need
+  attributes such as authorship that change over time, one can use
   different entities for the versions linked by appropriate
   generation, usage, revision, and invalidation events. 
 </p>
@@ -673,7 +694,9 @@
 attributes; this leads to potential ambiguity, which is mitigated through the
 use of identifiers.</p>
 
-<p>An <a>activity</a> is delimited by its <a title="activity start event">start</a> and its <a title="activity end event">end</a> events; hence, it occurs over
+<p>An <a>activity</a> is delimited by its <a title="activity start
+event">start</a> and its <a title="activity end event">end</a>
+events.  It occurs over
 an interval delimited by two <a title="instantaneous event">instantaneous
 events</a>. However, an activity statement need not mention start or end time information, because they may not be known.
 An activity's attribute-value pairs are expected to describe the activity's situation during its interval, i.e. an interval between two instantaneous events, namely its <a title="activity start event">start</a> event and its <a title="activity end event">end</a> event.
@@ -685,8 +708,7 @@
 any point in its lifetime, persists during this interval, and
 preserves the characteristics provided.  In
 contrast, an activity is something that occurs, happens, unfolds, or
-develops through time, but is typically not identifiable by the
-characteristics it exhibits at any point during its duration.  This
+develops through time.  This
 distinction is similar to the distinction between 'continuant' and
 'occurrent' in logic [[Logic]].</p>
 
@@ -699,7 +721,7 @@
 in many different contexts within individual systems and across the
 Web. Different systems may use different clocks which may not be
 precisely synchronized, so when provenance statements are combined by
-different systems, we may not be able to align the times involved to a
+different systems, an application may not be able to align the times involved to a
 single global timeline.  Hence, PROV is designed to minimize
 assumptions about time.  Instead, PROV talks about (identified)
 events. </p>
@@ -737,8 +759,9 @@
 
 <p>An <dfn id="dfn-usage-event">entity usage event</dfn> is the <a
 title="instantaneous event">instantaneous event</a> that marks the first instant of
-an entity's consumption timespan by an activity.  Before this instant
-the entity had not begun to be used by the activity.</p>
+an entity's consumption timespan by an activity.  The described usage
+had not started before this instant, although the activity could
+potentially have used the same entity at a different time.</p>
 
 
 <p>An <dfn id="dfn-invalidation-event">entity invalidation event</dfn>
@@ -1032,17 +1055,21 @@
   <ol><li>When processing provenance, an
     application MAY apply the inferences and definitions in <a
     href="#inferences" class='sectionRef'></a>.</li>
-   <li>When determining whether a PROV instance is <a>valid</a>, an
+   <li>If determining whether a PROV instance is <a>valid</a>, an
     application MUST check that all of the
     constraints of <a href="#constraints" class="sectionRef"></a> are
-  satisfied  on the <a>normal form</a> of the instance.</li>
-   <li> When producing provenance meant for other applications to
+  satisfied  on the <a>normal form</a> of the instance.  </li>
+   <li> If producing provenance meant for other applications to
     use, the application SHOULD produce <a>valid</a> provenance, as specified in <a href="#normalization-validity-equivalence" class="sectionRef"></a>. </li>
-    <li>When determining whether two PROV instances are
+    <li>If determining whether two PROV instances are
   <a>equivalent</a>, an application MUST determine whether their
   normal forms are equal, as specified in <a href="#normalization-validity-equivalence" class="sectionRef"></a>.
   </ol>
 
+  <p>Compliant applications are not required to explicitly compute
+  normal forms; however, if checking validity or equivalence, the
+  results should be the same as would be obtained by computing normal
+  forms as defined in this specification.</p>
 
   <p>
   All diagrams are for illustration purposes
@@ -1063,7 +1090,7 @@
 <section id="inferences">
 <h2>Inferences and Definitions</h2>
 <p>
-In this section, we describe <a title="inference">inferences</a> and <a title="definition">definitions</a> that MAY be used on
+This section  describes <a title="inference">inferences</a> and <a title="definition">definitions</a> that MAY be used on
   provenance data, and preserve <a>equivalence</a> on <a>valid</a>
 PROV instances (as detailed in <a href="#normalization-validity-equivalence" class="sectionRef"></a>).
 An  <dfn id="inference">inference</dfn> is a rule that can be applied
@@ -1082,16 +1109,19 @@
 <span class="name">hyp<sub>k</sub></span> <span class='conditional'>THEN</span>
   there exists <span class="name">a<sub>1</sub></span> and ... and <span
   class="name">a<sub>m</sub></span> such that <span
-  class="name">conclusion<sub>1</sub></span> and ... and <span class="name">conclusion<sub>n</sub></span>.</p>
+  class="name">concl<sub>1</sub></span> and ... and <span class="name">concl<sub>n</sub></span>.</p>
   </div>
  
-<p> This means that if all of the provenance statements matching
+<p> Inferences can be applied to PROV instances.  Applying an inference to an instance means that if all of the provenance statements matching
   <span class="name">hyp<sub>1</sub></span>... <span class="name">hyp<sub>k</sub></span>
-  can be found in a PROV instance, we can add all of the statements
+  can be found in the instance, then we check whether the conclusion 
   <span class="name">concl<sub>1</sub></span> ... <span
-  class="name">concl<sub>n</sub></span> to the instance, possibly after
+  class="name">concl<sub>n</sub></span> is satisfied for some values
+  of existential variables.  If so, application of the inference has
+  no effect on the instance.  If not, then a copy the
+  conclusion should be added to the instance, after
   generating fresh identifiers <span class="name">a<sub>1</sub></span>,...,<span
-  class="name">a<sub>m</sub></span> for unknown objects.  These fresh
+  class="name">a<sub>m</sub></span> for the existential variables.  These fresh
   identifiers might later be found to be equal to known identifiers;
   they play a similar role in PROV constraints to existential
   variables in logic, "labeled nulls" in database theory
@@ -1114,15 +1144,17 @@
   </div>
  
   <p>
-  This means that a provenance statement <span class="name">defined_exp</span> is defined in
-  terms of other statements.  This can be viewed as a two-way
-  inference:  If <span class="name">defined_exp</span>
-  can be found in a PROV instance, we can add all of the statements
+  A definition can be applied to a PROV instance, since its <span class="name">defined_exp</span> is defined in
+  terms of other statements.  Applying a
+  definition to an instance means that if an occurrence of a defined
+  provenance statement  <span class="name">defined_exp</span>
+  can be found in a PROV instance, then we can remove it and add all of the statements
 <span class="name">defining_exp<sub>1</sub></span> ... <span class="name">defining_exp<sub>n</sub></span> to the instance, possibly after generating fresh
   identifiers <span class="name">a<sub>1</sub></span>,...,<span
-  class="name">a<sub>m</sub></span> for unknown objects.  It is safe to replace
+  class="name">a<sub>m</sub></span> for existential variables.  In
+  other words, it is safe to replace
   a defined statement with
-  its definition.
+  its definition.  
 </p>
   
 <p> Definitions and inferences can be viewed as logical formulas;
@@ -2142,8 +2174,9 @@
 <p>
 <hr>
 
-  <p id='key-object_text'> We assume that the various identified objects of PROV have
-  unique statements describing them within a PROV instance, through
+  <p id='key-object_text'>The various identified objects of PROV MUST have
+  unique statements describing them within a valid PROV instance.
+  This is enforced through
   the following key constraints:
   </p>
   <div class='constraint' id='key-object'>
@@ -2217,9 +2250,9 @@
 
 
 <div id='unique-generation_text'>
-<p>We assume that an entity has exactly one generation and
+<p>An entity has exactly one generation and
 invalidation event (either or both may, however, be left implicit).
-  Note that together with the key constraints above, this implies that
+  Together with the key constraints above, this implies that
   <span class="name">e</span> is also a key for generation and
   invalidation statements.
 </p>
@@ -2252,7 +2285,7 @@
 
 <hr />
 <p id='unique-wasStartedBy_text'>
-We assume that an activity has exactly one start and
+An activity has exactly one start and
 end event (either or both may, however, be left implicit).  Again,
 together with above key constraints these constraints imply that the
 activity is a key for activity start and end statements.
@@ -2342,9 +2375,9 @@
 
 <p>Given that provenance consists of a description of past entities
 and activities, <a>valid</a> provenance instances MUST
-satisfy <em>ordering constraints</em> between instantaneous events, which we introduce in
+satisfy <em>ordering constraints</em> between instantaneous events, which are introduced in
 this section.  For instance, an entity can only be used after it was
-generated; hence, we say that an entity's <a title="entity generation
+generated; in other words, an entity's <a title="entity generation
 event">generation event</a> precedes any of this
 entity's <a title="entity usage event">usage events</a>.  Should this
 ordering constraint be violated, the associated generation and
@@ -2360,14 +2393,15 @@
 
 
 <p>Specifically, <dfn id="dfn-precedes">precedes</dfn> is a <a>preorder</a>
-between <a title="instantaneous event">instantaneous events</a>.  When
-we say <span class="name">e1</span> precedes <span
-class="name">e2</span>, this means that <span class="name">e1</span>
+between <a title="instantaneous event">instantaneous events</a>.  A
+constraint of the form
+<span class="name">e1</span> precedes <span
+class="name">e2</span> means that <span class="name">e1</span>
 happened at the same time as or before <span class="name">e2</span>.
 For symmetry, <dfn id="dfn-follows">follows</dfn> is defined as the
-inverse of <a title="precedes">precedes</a>; that is, when we say
-<span class="name">e1</span> follows <span class="name">e2</span>,
-this means that <span class="name">e1</span> happened at the same time
+inverse of <a title="precedes">precedes</a>; that is, a constraint of
+the form 
+<span class="name">e1</span> follows <span class="name">e2</span> means that <span class="name">e1</span> happened at the same time
 as or after <span class="name">e2</span>. Both relations are
 <a>preorder</a>s, meaning that they are <a>reflexive</a> and
 <a>transitive</a>.  Moreover, we sometimes consider <em>strict</em> forms of these
@@ -2428,7 +2462,7 @@
 <h3>Activity constraints</h3>
 
 <p>
-In this section we discuss constraints from the perspective of
+This section specifies ordering constraints from the perspective of
 the <a>lifetime</a> of an activity.  An activity starts, then during
 its lifetime uses, generates or invalidates entities, and communicates with  or starts
 other
@@ -2642,8 +2676,8 @@
 
 <p id='generation-precedes-invalidation_text'>
 Generation of an entity precedes its invalidation. (This
-follows from other constraints if the entity is used, but we state it
-explicitly to cover the case of an entity that is generated and
+follows from other constraints if the entity is used, but it is stated
+explicitly here to cover the case of an entity that is generated and
 invalidated without being used.)</p>
 
 <div class='constraint' id='generation-precedes-invalidation'>
@@ -2997,7 +3031,8 @@
   <p><span class="conditional">IF</span> <span class="name">hyp<sub>1</sub></span> and ... and  <span class="name">hyp<sub>n</sub></span> <span class="conditional">THEN</span> <span class="conditional">INVALID</span>.</p>
   </div>
 
-<p> To check an impossibility constraint on instance <span class="math">I</span>, we check whether there is
+<p> Checking an impossibility constraint on instance <span
+  class="math">I</span> means  checking whether there is
 any way of matching the pattern <span class="name">hyp<sub>1</sub></span>, ..., <span class="name">hyp<sub>n</sub></span>.  If there
 is, then checking the constraint on <span class="math">I</span> fails (which implies that
 <span class="math">I</span> is invalid).
@@ -3022,7 +3057,7 @@
     <!--
 <hr>
 <div class="note"> This is also a constraint, but follows from
-  irreflexivity and transitivity so we may omit it.</div>
+  irreflexivity and transitivity so it may be omitted.</div>
 
     <p id="specialization-asymmetric_text"/>