> > > The following is my review of the PROV Constraints document from: > http://dvcs.w3.org/hg/prov/raw-file/tip/model/releases/ED-prov-constraints-20120723/prov-constraints.html > > Please accept my apologies in the delay, this document, being rather > formal, required a bit more time on my behalf to be reviewed. > > As usual, my review is a bit long.. :-/ I did not have time to write a > shorter one. > > > Summary: > > This document is heavy material. This is partially natural as of its > nature by defining all constraints and inferences, but also I get a > feeling this document has had less refinements from the editors, and > particularly from the WG. Many of the issues I raise below I would > have expected the WG to discuss. I believe that after the split of > PROV-DM to PROV-Constraint and PROV-N, we were very happy with the new > form of the other documents, but that we might have neglected > PROV-Constraint. > > The document is structurally easy to follow, as it progresses from > definitions and inferences through time order constraints and finally > impossibility constraints, with a summary of how PROV instances can be > normalized and validated. At some point in the constraint, the > document changes from a specification-style language to a "we assume" > paper style language. I would try to keep it at specification level. > "We assume" has been reworded. > Some of the introductory text seems to assume the reader wants to do > normalization, however I suspect most readers will only be interested > in validation; in particular to ensure their own PROV instance is > valid. I would for instance be interested in finding out how I can > express most of these as OWL constructs. > > > In my review below, most of the comments are about syntactic tweaks > and some light change of wording to make the text clearer. None of > these should be considered blockers for LC. > > However I consider the following as blockers: (see below for details) > > > 1) "Applications should also use definitions, inferences and > constraints to normalize PROV instances in order to determine whether > two such instances convey the same information." -- NO! > > 2) "compliant applications use definitions, inferences, and uniqueness > constraints to normalize PROV instances (..)" -- DELETE > > 3) Inference rules with existential variables causes infinite loops - > add note about not recursing on purely existential variables > > 4) wasDerivedFrom activity and wasAssociatiedWith plan as > non-existential - ; causes headaches later on where the plan is > assumed to exist. -- REWORD later uses > > 5) Do we have WG consensus on that all entities must be invalidated, > and all activities must terminate? Seems to talk about the future, > rather than the past. > > 6) Do we have WG consensus on activity start/end requiring triggers? > Can an activity terminate itself without a trigger? Start > instantaneously? > > 7) Remove inference 12 > > 8) Remove wasStartedByActivity from figure > > 9) wasInvalidatedBy strictly follows wasGeneratedBy - do we have WG > consensus? OK with zero-length activity? Need light justification. > > 10) wasDerivedFrom(e2,e1,use,gen) requires use strictly before generation - why? > > 11) wasAttributedTo constraints WRONG in my view. > > I will create individual issues for these in the tracker. > > > > On Fri, Jul 20, 2012 at 12:02 PM, Provenance Working Group Issue > Tracker wrote: > > PROV-ISSUE-459 (prov-constraints-lc-review): PROV-CONSTRAINTS review [prov-dm-constraints] > > > > http://www.w3.org/2011/prov/track/issues/459 > > > > Raised by: James Cheney > > On product: prov-dm-constraints > > > > Hi, > > > > This issue is to capture review comments for the next draft of PROV-CONSTRAINTS, which will be released soon. > > > > Please answer the following review questions: > > > > > 1. Is PROV-CONSTRAINTS ready to be released as a last call working > draft (modulo editorial issues and resolution to the below issues)? > > No, see blockers above. Uncertain about WG consent on some issues. > (but I might have missed decisions on these!) > > > > 2. Regarding ISSUE-346: Is the role, meaning, and intended use of each type of inference or constraint clear? (http://www.w3.org/2011/prov/track/issues/346) > > Yes. > > > > 3. Regarding ISSUE-451: Are there any objections to the revision-is-alternate inference? (http://www.w3.org/2011/prov/track/issues/451) > > No. > > > > 4. Regarding ISSUE-454: Are the rules for disjointness clear and > appropriate? (http://www.w3.org/2011/prov/track/issues/454) > > Yes, this is clear from constraint 52 and 53. > > > 5. Regarding ISSUE-458: Should influence (and therefore all > subrelations, including communication) be irreflexive, or can it > be reflexive (i.e., can wasInfluencedBy(x,x) be valid)? > (http://www.w3.org/2011/prov/track/issues/458) > > We agreed in the WG to make it wasInfluencedBy not irreflexive, which > I agree with, thus removing this constraint. > > > > 5. Are there any objections to closing other open issues on PROV-CONSTRAINTS? They are: > > > http://www.w3.org/2011/prov/track/issues/387 > Close. > > > > http://www.w3.org/2011/prov/track/issues/394 > > Close, alternateOf *is* symmetric > > > http://www.w3.org/2011/prov/track/issues/452 > > No, not fully, see my comments for bits where plan still seems to be required.. > > > http://www.w3.org/2011/prov/track/issues/453 > > Close. (We allow same ID) > > > 6. Are there any new issues concerning definitions, > constraints, or inferences? If so, please raise as new issues to > be addressed before LC vote, ideally with a suggested change that > would address the issue. > > Yes, see blockers above. I'll create issues for these independently. > > > The following is a lengthy page by page view with details of the above > blockers, spelling/grammar correction and similar remarks. Where you > see I am discussing with myself you can conclude that the > specification might be unclear or confusing. > > 1. > > Abstract: "statements and constraints that PROV instances must satisfy > in order to be considered valid". > > A definition of 'valid' would be appropriate. When checking now I see > there is an underline and link to > http://dvcs.w3.org/hg/prov/raw-file/tip/model/releases/ED-prov-constraints-20120723/prov-constraints.html#dfn-valid > (which is way down, in section 6)- but somehow this underline is not > shown in the printout (using Chrome), and hence I don't understand > that this is a new term. Use italics and add "according to this > document" (as it could still be valid by PROV-DM alone). > > Without the clarification of "according to this document" the meaning > of "MUST" here is difficult - would it not be valid PROV-DM either? > Fixed. 2. > > > 1.1 Conventions > > Some of these ariables > --> variables > Fixed 3. > > > _, by convention, to indicate that they (intentionally) appear > only once in the formula; thus, the textual variable name is > mnemonic only. > > Remove "(intentionally)". I assume you mean here that _variables do > not actively take part in the formula, similar to how Python and Ruby > programmers might assign a value to the variable _ if they don't care > about it: (_,_,object) = get_triple() > > Meaning of 'mnemonic' here is unclear. A mnemonic is a way to remember > something. Why do I need to remember something that appears only once? > Replace with something like "such passive variables are provided > merely as an aid to the reader." > Fixed. 4. > > 1.2 Purpose of this document > > > > "PROV-DM, is a conceptual data model for provenance, which is realizable using different serializations such as PROV-N and PROV-O". > > PROV-O is not a serialization, it is a representation. Replace with > "representation". (PROV-N is both) > > This statement might be in other documents as well, but I've not checked. > Fixed. I think this phrasing only stated here... 5. > > > > A valid PROV instance corresponds to a consistent history > of objects and interactions to which logical reasoning can be > safely applied. By default, PROV instances need not be valid. > > Again the valid, but here with a small definition. A forward-link to > section 6 would be helpful. > Done. 6. BLOCKER > > > > Applications should also use definitions, inferences and constraints > to normalize PROV instances in order to determine whether two such > instances convey the same information. > > No, they should not! It is not a requirement for applications to > determine equivalence. > > > Reword to something like: > > > > Applications which are determining whether PROV instances convey the same information SHOULD use definitions, inferences and constraints to normalize the instances. > > > Similarly this: > > > Applications should produce valid provenance and may reject provenance that is not valid > > should be: > > "Applications producing provenance SHOULD ensure it is _valid_, and > similarly applications consuming provenance MAY reject provenance that > is not _valid_." > This is ISSUE-464. Your wording is adopted. Closed. 7. > > > To summarize: compliant applications use definitions, inferences, > > and uniqueness constraints to normalize PROV instances, and then > > apply event ordering constraints to determine whether the > > instance has a consistent event ordering. If so, the instance is > > valid, and the normal form is considered equivalent to the > > original instance. Also, any two PROV instances that yield the > > same normal form are considered equivalent. > > Delete this whole paragraph (except for PROv-SEM reference) - it is > also assuming applications of PROV-Constraint only want to do > normalization. It is saying you can't be compliant without doing all > of the above! > This is ISSUE-464. I have reworded to avoid giving the impression that we require normalization, validity checking, or equivalence checking, but kept the summary - I feel it is important to summarize how the components fit together. Closed. 8. > > 1.3 Structure of this document > > certain statments > statements > > Fixed. 9. > > 2.1 Entities, Activities and Agents > > > > An entity's attributes are established when the entity is > created and describe the entity's situation and (partial) state > during an entity's lifetime > > Ambiguous, move (partial) earlier and add 'entirety': > > "An entity's attributes are established when the entity is created and > (partially) describe the entity's situation and state during the > entirety of an entity's lifetime" > > Fixed. 10. > > > Different entities that are aspects of the same thing are called alternate, > 'alternate' -> 'alternates' > > Fixed. 11. > > PROV relations of specialization and alternate can be used > Use actual relation names and code style: > > 'specialization' -> 'specializationOf' > 'alternate' -> 'alternateOf' > Fixed. 12. > > express dependencies between the various entities using events > > italics on 'events'. > Done. 13. > > > > An activity is delimited by its start and its end events; > hence, it occurs over an interval delimited by two instantaneous > events. > > Remove "hence, " > Fixed. 14. > > > > In contrast, an activity (..) is typically not identifiable by the characteristics it exhibits at any point during its duration. > > You can't use this argument, not only have we not defined > 'identifiable by characteristics' (anymore), but you just said above > that entities are *not* identified by their attributes. I would rather > use the argument that activities are not things in the world, they are > what happens to the things in the world. > Fixed. 15. > > > An entity usage event is the instantaneous event that > marks the first instant of an entity's consumption timespan by an > activity. > > Before this instant the entity had not begun to be used by the activity. > > Not true, there could be multiple usages for the same entity/activity, > but at different times (say in different roles). > > Delete, or replace with "The described usage had not started before > this instant, although the activity could potentially have other > usages of the same entity starting at different times." > Fixed. 16. > > 2.3 Summary of constraints.. > > > This table is utterly confusing to me as it is not a summary, it does > not tell me anything, and I don't understand the columns. (Not helped > by the lack of colours and borders on a print out!) > > I can see it is useful to have a kind of index of all the constraints, > but the table needs some work to be understandable. In the web version > it is kind-of OK, since the constraints have human readable names - > but is there a reason to keep the third column rather than simple > horisontal headers? Perhaps even the Type/Relation column could be > done as headers? > I had put this in its own section, in the non-normative part. It would be good to summarize the constraints somehow, but this doesn't seem to be working well. It would also be good to summarize the typing constaints in the informative section (e.g. previewing the typing constraint rules). Should this table / index be an appendix or something? @Luc - figure 17. > > > To be compliant: > > > > When processing provenance, an application may apply the > inferences and definitions in section 4. Inferences and > Definitions. > > When determining whether a PROV instance is valid, an application > > must check that all of the constraints of section 5. Constraints > > are satisfied on the normal form of the instance. > > When producing provenance meant for other applications to use, the > > application should produce valid provenance, as specified in > > section 6. Normalization, Validity, and Equivalence. > > When determining whether two PROV instances are equivalent, an > > application must determine whether their normal forms are equal, > > as specified in section 6. Normalization, Validity, and > > Equivalence. > > Replace "When" with "If". > I believe when and if are synonymous in this context. I replaced the last three since these are the only ones that change the emphasis. The first reads strangely to me (If...may) so I left it as is. I also added a paragraph to make it clear that we do not require computing normal forms, only that whatever algorithms people use produce the same results as the specification. 18. > > 4. Inferences and Definitions > > > IF hyp1 and ... and hypk THEN there exists a1 and ... and am such > > that conclusion1 and ... and conclusionn. > > > (..) matching hyp1... hypk can be found in a PROV > instance, we can add all of the statements concl1 ... concln to > > use concl1 and concln in first line. (Consistency). > Fixed. 19. > > 4.1 Optional identifiers > > > We note that Definitions 1, 2, and 3 desugar compact PROV-N notation into a normal form > > "We note that" -> "Note that" ("we" are not observing here, but specifying!) > > Fixed. 20. > > > then the following definition holds: > > r(id;a1,...,an) holds IF AND ONLY IF r(id;a1,...,an,[]) holds. > > Too many holds! I like better the syntax without all the 'holds', as you do in: > > > Definition 3 > > activity(id,attrs) IF AND ONLY IF activity(id,-,-,attrs). > > I am confused by the 'holds'. It implies that you can simply ignore it > otherwise. > Removed unnecessary "holds". 21. > > > > There are also no expansion rules for entity, agent, > communiction, attribution, influence, alternate, or > specialization, because these have no optional parameters aside > from the identifier and attribute, which are expanded by other > rules above. > > Delete "also". > "attribute" -> "attributes" > "by other rules above" -> "by definition 1 and 2" > > What about mentionOf and bundle ? Fixed. Bundles are not relations so I don't think they need to be in the list. 22. > > > > Thus, before proceeding to apply other definitions or inferences, most occurrences of - must be replaced by fresh existential variables, distinct from any others occurring in the instance. > > As I'll explain later, the MUST here means you will never finish > processing the inferences. > > This is ISSUE-465. 23. > > "The only exceptions, where - must be left in place, are the activity > parameter in wasDerivedFrom and the plan parameter in > wasAssociatedWith." > > Add "as explained in the remark below" > Done. 24. > > > > (Table) > > wasDerivedFrom(id;e2,e1,attrs) id > > This can be removed, because you have already said in Definition 3 that > > wasDerivedFrom(id;e2,e1,attrs) IF AND ONLY IF > wasDerivedFrom(id;e2,e1,-,-,-,attrs). > Fixed. 25. > > Before definition 4, which otherwise looks like a new thing, add something like: > > "The following definition generalizes the definitions of the above > table for expanding parameters." > Clarified this transition. 26. > > For each r in {entity, activity, agent}, the following definition holds: > > r(a0,...,ai-1, -, ai+1, ...,an) IF AND ONLY IF there exists a' such that r(a0,...,ai-1,a',ai+1,...,an). > > I don't see a reason to make this very complicated general solution > for those 3 relations, as they are very simple with regards to > optional parameters: > > > activity(a1, -, -, attr) > entity(e1, attr) (no existentials!) > agent(ag1, attr) (no existentials!) > > > So change it to just a definition for activity() and its optional timestamps. > > Done. 27. > > > > For each r in { used, wasGeneratedBy, wasInfluencedBy, > wasInvalidatedBy, wasStartedBy, wasEndedBy, wasInformedBy, > wasDerivedFrom, wasAttributedTo, wasAssociatedWith, actedOnBehalfOf} > > Make this the same order as in the table. Done. 28. > > > if the ith parameter of r is an expandable parameter of r > > I would add "as explained in Table x" and give the table a name. As it > stands the table is not clearly marked as normative and the definition > on its own sounds incomplete. Done. (Also added a note that Table 2 is normative). 29. > > > A final check is required on this inference to ensure that it > does not lead to non-termination, when combined with Inference 5 > (communication-generation-use-inference). > > I assume here "this inference" should be "the following inference" - > as 'this' is inference 5 itself! > > Fixed by making it a reference. 30. > > Remark > > Make it clearer that we are talking about this particular example: > > > We cannot infer wasInformedBy(a3,a1) from these statements. > --> ".. from these statements alone." > > > > Furthermore, the illustration also shows that a3 completes before a1. > --> ".. before a1 started." > > > > So it is impossible for a3 to have used an entity generated by a1. > --> "So in this case it is..." > Fixed by adopting suggested rephrasings. 31. > > > Inference 7 (entity-generation-invalidation-inference) > > > From an entity, we can infer that existence of generation and invalidation events. > > This REQUIRES entities to become invalidated (at some point). It is > consistent with entities requiring generation, but it means I get > inferred strange wasInvalidatedBy for real life entities like: > > entity(math:pi) > entity(phys:universe) > entity(phys:vacuum) > entity(phys:energy) > entity(concept:existence) > entity(uk:2011census) > entity(uspolitics:resultOfPresidentialElection2012) > > When are these destroyed? By what? Is it certain that everything is > destroyed? What about things that are still existing at the time of > provenance being written, with this you are requiring them all to die > - I thought PROV only talked about the past. "We are all going to die" > - but you don't know when or how - so why should the PROV imply > provenance statements about the future? > > This is ISSUE-466. This is closed, since the invalidation inferences reflect the resolution of issue 331. 32. > > Inference 8 (activity-start-end-inference) > > > From an activity statemen,t we can infer that start and > end events having times matching the start and end times of the > activity. > > > --> "statement" > Delete "that" Fixed. 33. > > IF > activity(a,t1,t2,_attrs) THEN there exist _id1, _e1, _id2, and > _e2 such that wasStartedBy(_id1;a,_e1,_a1,t1,[]) and > wasEndedBy(_id2;a,_e2,_a2,t2,[]). > > So it is impossible for an activity to start or end without a trigger? > I am not so sure about this.. this creates phantom triggers, not too > dissimilar to our previous phantom agents, in particular for a > self-terminating process this can become a bit odd, "I'll tell my self > to stop now!" > > All activities must end? Same argument as for inference 7 applies. > This is ISSUE-467. 34. > > > Combining inference 7 and 8 means inference rules loops forever, > creating triggers and activities, giving a chicken-and-egg problem > going back to the beginning of the universe, and beyond. > > I only find a way out of this if an activity (phys:evolutionOfUniverse > ?) creates its own trigger, which I am glad to see is allowed as > Constraint 39 luckily does not use "strictly precedes". > > > Here's a whiteboard drawing I made (this freaked out the office..) > when trying to figure out which loops can be formed from the > inferences (I've crossed out the 'dead ends', an arrow shows 'THEN' > links between statements and/or variables of statements): > > http://www.w3.org/2011/prov/wiki/File:PROV-Constraint-inference-loops.jpg > > > I think this is problematic, at least, the document should say > something like "Care should be taken when implementing inference rules > 7 and 8 in combination, as by recursively following inference rules > for statements with only existential variables an endless loop would > be formed." (In SPARQL this could for instance be by using FILTER > (!isBlank(?c)) although you would need to exclude bnodes from the > original graph) > > As far as I can tell from my lovely picture, stopping inference at > purely existential variables should be enough, as these inferences > would quickly not pass on potential 'real' variables. > > > > I initially was not sure why we would need to require inference 7 and > 8, however I now see they are useful for the later event ordering. > > For instance you could know that: > > wasDerivedFrom(a, b) > wasDerivedFrom(b, c) > > wasGeneratedBy(a, -, t1) > wasInvalidatedBy(a, -, t2) > > wasGeneratedBy(c, -, t4) > > Then even if you know nothing about the generation or invalidation of > b, we can conclude that t3 > t1 and do validity checks, by inferring: > wasGeneratedBy(b, -, ?t3) > > and from constraints finding that t4 >= ?t3 >= t2 > t1. > > > Similarly a used(c) can tell you something about the wasInvalidatedBy > - it is not earlier than the time of > used/wasStartedBy/wasEndedBy/wasAssociatedWith and invalidation of a > specialization - meaning 'it must have existed then'. This is good. > This is ISSUE-465. 35. > > > > IF wasEndedBy(_id;a,e1,a1,_t,_attrs), THEN there exist > _gen and _t1 such that wasGeneratedBy(_gen;e1,a1,_t1,[]). > > > IF wasDerivedFrom(_id;e2,e1,a,id2,id1,_attrs), THEN there > exists _t1 and _t2 such that used(id1;a,e1,_t1,[]) and > wasGeneratedBy(id2;e2,a,_t2,[]) hold. > > General comment: > > I think the style with identifiers like _gen is much more readable > than _id1 and id2, at least when they appear multiple times. So I > would propose to change such examples to use _use1, gen1 etc.). I > would keep _id* for the first argument (if it is not used elsewhere), > but change id*. > Done. 36. > > I agree on removing inference 12 and the following paragraph and > remark, this is redundant from inference 11 and definition 4. > This is ISSUE-454 - as the inference is related to how derivation optional parameters work. Issue 454 was closed; we are determining whether we still need inference 12 given the current design of wasDerivedFrom. We are adding a constraint and conditional expansion rule that makes inf 12 superfluous - basically, if the activity is specified then we can expand the generation and use, if it is not specified then the generation and use have to be unspecified also. 37. > > However, I think it is worth pointing out somewhere that "Derivation > is not defined to be transitive either", but not by "following similar > reasoning as for wasInformedBy." - I don't see how that reasoning > would apply here. If you have: > > wasDerivedFrom(e3, e2) > wasDerivedFrom(e2, e1) > > and you are attempting to say that > > wasDerivedFrom(e3, e1) is not always possible by the reasoning of > wasInformedBy, then you are implying that e1 could have been generated > after e2. This is however not possible due to constraints on the > implied generation and usage. > > > It is however logically possible to not be wasDerivedFrom, but that is > because you can't go from: > > used(a, e1, t1) > wasGeneratedBy(e2, a, t2) > to > wasDerivedFrom(e2, e1) > > > EVEN IF t2 > t1, this is not true. wasDerivedFrom is a stronger > statement than simply saying that something used something and later > generated something else; it could be that e1 had nothing to do with > how a generated e2. > > For instance, a type of bean-to-cup coffee maker grinds coffee beans > whenever you press the button to make a cup of coffee, then brews a > cup of coffee. However, there is a buffer of pre-ground coffee (to > speed up the process), so those beans will be part of the *next* cup > of coffee made - the coffee you get is not derived from the (same) > beans that the machine just used. > > It might be worth to make a Remark about this. The remark after > Inference 12 is not this, that talks about not deriving wasGeneratedBy > from wasDerivedFrom+used. (which to me seems self-evident). > Fixed. I have made it a bare, separate remark stating that derivation is not necessarily transitive. If more needs to be said, it should probably be said in prov-dm or primer. I personally believe we should not bother with remarks justifying properties that we do NOT require, since there are an infinite number of these, and justifying why we do not want them is not the job of the spec, only explaining how to handle the ones we do want. --jrc. 38. > > > > A derivation specifying activity, generation and use events is a > special case of a derivation that leaves these unspecified. (The > converse is not the case). > > --> "not the case because a blank activity is a special case in definition 4" > Fixed. 39. > > > > > 4.4 Agents > > Attribution identifies an agent as responsible for an entity > > The DM term is now "ascribing", not "responsible": > > http://www.w3.org/TR/prov-dm/#term-attribution > > Attribution is the ascribing of an entity to an agent. > > Change sentence to match DM, "Attribution is the ascribing of an > entity to an agent". > Fixed. 40. > > > An agent can only be responsible for an entity if it was > associated with an activity that generated the entity > > "responsible for" -> "ascribed to". > Fixed. (The sentence needed to be rearranged since ascription is not symmetric.) 41. > > > Inference 15 (attribution-inference) > > IF wasAttributedTo(_att;e,ag,_attrs) holds for some identifiers > e and ag, THEN there exist a, _t, _gen, _assoc, _pl, such that > wasGeneratedBy(_gen;e,a,_t,[]) and > wasAssociatedWith(_assoc;a,ag,_pl,[]) hold. > > This enforces _pl, not allowing it to be blank. (Goes against > definition 4). Instead - should be inferred, and no 'there exists' > required for -. (it might exist, though!) > This is ISSUE-452. > > > > > Inference 16 (delegation-inference) > > IF actedOnBehalfOf(_id;ag1, ag2, a, _attrs) THEN there exist > _id1, _pl1, _id2, and _pl2 such that wasAssociatedWith(_id1;a, > ag1, _pl1, []) and wasAssociatedWith(_id2;a, ag2, _pl2, []).# > > Again, this enforces a plan to exist for the delegation. That is very, > very often not the case! It is worth pointing out here that the two > plans are different. > This is ISSUE-452. > > The following remark tries to clarify this, but as remarks are not > normative, and the inferences are, then their "there exists _pl1" > would force plans to exist. > > In particular, the remark says: > > "because the existential variables _pl1 and _pl2 can be > replaced with constant identifiers or placeholders - > independently" > ..but it is never explained anywhere before this that they can be > replaced with -. > > ISSUE-452. Yes it is, in the definition of merging. But this needs to be clearer, since merging is not explained until later. I plan to address this by covering the common concepts of variable, substitution, merging, and application of inferences clearly up front before we go into the details of def/inf/constraints. 42. > > > > Inference 17 (influence-inference) > > Various existential variables are not on the _underscore style: > > > IF wasGeneratedBy(id;e,a,t,attrs) THEN wasInfluencedBy(id; e, a, attrs). > t -> _t > Fixed. > > IF used(id;a,e,t,attrs) THEN wasInfluencedBy(id; a, e, attrs). > t -> _t > Fixed > > IF wasInformedBy(id;a2,a1,attrs) THEN wasInfluencedBy(id; a2, a1, attrs). > > IF wasStartedBy(id;a2,e,a1,t,attrs) THEN wasInfluencedBy(id; a2, e, attrs). > a1 -> _a1 > t -> _t > Fixed > > IF wasEndedBy(id;a2,e,a1,t,attrs) THEN wasInfluencedBy(id; a2, e, attrs). > a1 -> _a1 > t -> _t > Fixed > > IF wasInvalidatedBy(id;e,a,t,attrs) THEN wasInfluencedBy(id; e, a, attrs). > t -> _t > Fixed > > IF wasDerivedFrom(id; e2, e1, attrs) THEN wasInfluencedBy(id; e2, e1, attrs). > > IF wasAttributedTo(id;e,ag,attr) THEN wasInfluencedBy(id; e, ag, attrs). > > IF wasAssociatedWith(id;a,ag,pl,attrs) THEN wasInfluencedBy(id; a, ag, attrs). > pl -> - > No, this should be _pl. > > IF actedOnBehalfOf(id;ag2,ag1,a,attrs) THEN wasInfluencedBy(id; ag2, ag1, attrs). > a -> _a Fixed. 43. > > > A good question is if wasInfluencedBy can be none of the above (ie. > ex:customInfluence). I suppose yes. Does this need to be made clear in > a remark? The reason is that combined with Constraint 52 it might be > misinterpreted as disjoint subclasses (true) and equivalent with a the > union (false). We don't make influence disjoint with anything, and we say this. We also don't say that influence is equivalent to the union of its instances. I've added to the remark to say this (though elsewhere we encourage people to use specific relations). 44. > > > > 5.1 Uniqueness Constraints > > or a special symbol undefined > > Not sure why this special symbol undefined is described here, as it is > not used anywhere in PROV-N but in the next bullet point (where it is > not used as a symbol!). > > Suggest rewrite to "or it is undefined, indicating that the > merge cannot be performed." > Changed to say that merging fails (we use this term elsewhere) 45. > > > The merge of two attribute lists attrs1 and attrs2 is their > union, considered as unordered lists, written attrs1 ∪ attrs2 > > Add something like ", allowing duplicate keys, but merging equal > key-value pairs". Done. 46. > > > > If merging succeeds with a substitution S, then S is applied to the instance I, yielding result I(S). > > So you mean S(I) ? What notation is this? > S(I), correct. This is the instance obtained by replacing all existential variables x in the domain of S with S(x) wherever they appear in I - we probably need to write it down clearly, but it is a standard notion. 47. > > > In the common case that a particular field of a relation > uniquely determines the other fields, we call the uniqueness > constraint a key constraint. > > key constraint > Fixed, Made it a definition too. 48. > > > We assume that an entity has exactly one generation and invalidation event > > We don't assume, here we specify! > > --> "An entity has exactly..." > Done 49. > > > > > > > It follows from the above constraints that > the generation and invalidation times of an entity are unique, if > specified. > > Not alone. It requires also the merging procedure for Key constraints. > Could this also be boxed and made formal? Like a kind of definition? > > Change "times" to "activites and times". > Actually, I think this should be demoted to a remark: it is an observation about the implications of the specification, not an additional requirement. But I extended the remark to give an example. 50. > > > IF wasGeneratedBy(id1;e,_a1,_t1,_attrs1) and wasGeneratedBy(id2;e,_a2,_t2,_attrs2), THEN id1=id2. > .. > > IF wasInvalidatedBy(id1;e,_a1,_t1,_attrs1) and wasInvalidatedBy(id2;e,_a2,_t2,_attrs2), THEN id1=id2. > > Just to avoid confusion, I would use change the identifiers of the second: > > IF wasInvalidatedBy(id3;e,_a3,_t3,_attrs3) and > wasInvalidatedBy(id4;e,_a4,_t4,_attrs4), THEN id3=id4. > These have been changed to gen1, gen2, etc. 51. > > "We assume that an activity has exactly one start and end event" > --> > "An activity has.." > > Done. 52. > > "Again, together with above key constraints these constraints imply > that the activity is a key for activity start and end statements" > Heavy sentence. "Combined with key constraints 26 and its merging > procedure this implies that.." > > > Constraint 29 (unique-wasStartedBy) > > IF wasStartedBy(id1;a,_e1,_a1,_t1,_attrs1) and wasStartedBy(id2;a,_e2,_a2,_t2,_attrs2), THEN id=id'. > > id'-> id2 > Done 53. > > Constraint 30 (unique-wasEndedBy) > > IF wasEndedBy(id1;a,_e1,_a1,_t1,_attrs1) and wasEndedBy(id2;a,_e2,_a2,_t2,_attrs2), THEN id=id'. > > --> > IF wasEndedBy(id3;a,_e3,_a3,_t3,_attrs3) and > wasEndedBy(id4;a,_e4,_a4,_t4,_attrs4), THEN id3=id4. > Done (with more descriptive id names) 54. > > > Constraint 31 (unique-startTime) > > IF activity(a,t,_t',_attrs) and wasStartedBy(id;a,_e1,_a1,t1,_attrs1), THEN t=t1. > > --> > IF activity(a,t,_t2,_attrs) and > wasStartedBy(_id;a,_e1,_a1,t1,_attrs1), THEN t=t1. > I don't like this either, but I fixed it to improve parallelism > > > Constraint 32 (unique-endTime) > > IF activity(a,_t,t',_attrs) and wasEndedBy(_id;a,_e1,_a1,t1,_attrs1), THEN t' = t1. > > --> > IF activity(a,_t,t2,_attrs) and wasEndedBy(_id;a,_e3,_a3,t3,_attrs1), > THEN t2 = t3. > > Likewise, I don't like your version either but I fixed it to avoid the problem. 55. > > > Constraint 33 (mention-unique) > > IF mentionOf(e, e1, b1) and mentionOf(e, e2, b2), THEN e1=e2 and b1=b2. > > no need for this constraint, just add to constraint 25: > > 4. The identifier field e is a KEY for the mentionOf(e, e1, b1) statement. > I wanted to keep this separate since it is at risk. It is equivalent to say that e is a KEY (though it is not an identifier field!). We can do this later, if Mention stays in the specification, since it does not change the technical content this is editorial. 56. > > > 5.2 Event Ordering Constraints > > > This specification assumes that a preorder exists between instantaneous events > > Link to definition for "preorder" > Done 57. > > > > Moreover, we sometimes consider strict forms of these > orders: we say e1 strictly precedes e2 to indicate that e1 > happened before e2. This is a transitive relation. > > --> > > Moreover, we sometimes consider strict forms of these orders: we say > e1 strictly precedes e2 to indicate that e1 happened before e2, but > not at the same time. This is a transitive, irreflexible relation. > Done 58. > > To check ordering constraints, we generate all precedes and > strictly precedes relationships arising from the ordering > constraints to form a directed graph, with edges marked precedes > or strictly precedes, and check that there is no cycle containing > a strictly precedes edge. > > This sounds like a paper, describing one chosen way to check ordering. > > Change to "One way to check ordering constraint is to generate all ..." > Excuse me! Done 59. > > > 5.2.1 Activity constraints > > > An activity starts, then during its lifetime uses, > generates or invalidates entities, and communicates with or > starts other activities, and finally ends > > > It can also be associated with agents who affect the behaviour of the activity. > Done 60. > > Figure 2e > > Shows wasStartedByActivity - now defunct relation. Remove. > ISSUE-468. Figure change. @Luc? 61. > > > > Constraint 35 (usage-within-activity) > > IF used(use;a,e,_t,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes use. > > IF used(use;a,e,_t,_attrs) and wasEndedBy(end;a,_e1,_a1,_t1,_attrs1) THEN use precedes end. > > Change second line to avoid duplicate variables: > > IF used(use;a,e,_t,_attrs) and wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) > THEN use precedes end. > > 62. > Same for: > > Constraint 36 (generation-within-activity) > > IF wasGeneratedBy(gen;_e,a,_t,_attrs) and > wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes > gen. > > IF wasGeneratedBy(gen;_e,a,_t,_attrs) and wasEndedBy(end;a,_e1,_a1,_t1,_attrs1) THEN gen precedes end. > > IF wasGeneratedBy(gen;_e,a,_t,_attrs) and > wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN gen precedes end. > 63. > > Add new constraint for invalidation within activity, mirroring > generation. Add to figure 2. > 64. > As for generation, invalidation implies ordering of events, since the > invalidation event had to occur during the associated activity. This > is illustrated by Figure 2 (x) and expressed by Constraint x > (invalidation-within-activity). > > Constraint x(invalidation-within-activity) > IF wasInvalidatedBy(inv;_e,a,_t,_attrs) and > wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes inv. > IF wasInvalidatedBy(inv;_e,a,_t,_attrs) and > wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN invprecedes end. > 65. > > > > Constraint 37 (wasInformedBy-ordering) > > > IF wasInformedBy(_id;a2,a1,_attrs) and > wasStartedBy(start;a1,_e1,_a1',_t1,_attrs1) and > wasEndedBy(end;a2,_e2,_a2',_t2,_attrs2) THEN start precedes end. > > This MUST follow from: > > > Inference 5 (communication-generation-use-inference) > > > IF wasInformedBy(_id;a2,a1,_attrs) holds THEN there exist > e, _id1, _t1, _id2, and _t2, such that > wasGeneratedBy(_id1;e,a1,_t1,[]) and used(_id2;a2,e,_t2,[]) > hold. > > and constraint 35 (usage-within-activity) and constraint 36 > (generation-within-activity). Thus this is a corollary rather than a > new constraint. > > I would suggest removing it (and from figure 2), or writing it in a > different corollary style. > 66. > > > > 5.2.2 Entity constraints > > Figure 3b) implies that the deriving activity starts *after* usage and > finishes *before* generation. extend blue box to extend beyond both > dotted vertical lines. > > I don't understand 3c), something seems to be missing. The lower arrow > from e2 has no label (was derived from?), and there are no activities. > @Luc? - change figures 67. > > > > Constraint 38 (generation-precedes-invalidation) > > IF wasGeneratedBy(gen;e,_a1,_t1,_attrs1) and > wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN gen strictly > precedes inv. > > Why is this relation in particular *strictly precedes*? This needs to > be justified (beyond "we need some strictly in there so we can do our > loop testing"). So an entity can't have zero lifetime, but an activity > may? > > An activity can use an entity at the same time as it was generated, > but it can't invalidate it then? This implies some kind of minimal > planck time on entities, which is probably OK for most applications of > PROV. > > > After some discussion with fellow geeks, I have however come go agree > that an entity can't have an empty lifespan, to avoid problems and to > ensure time moves forward. I think we need to formulate this by using > the description of an entity as characterizing and fixing aspects for > some duration. But currently the DM descriptions seem to imply the > opposite, entities can have zero lifespan, but activities cannot! > > http://www.w3.org/TR/prov-dm/#term-entity > > > An entity ◊ is a physical, digital, conceptual, or other > > kind of thing with some fixed aspects; entities may be real or > > imaginary. > > An activity ◊ is something that occurs over a period of time and > > acts upon or with entities; it may include consuming, processing, > > transforming, modifying, relocating, using, or generating > > entities. > > I do however see bigger use for a zero lifespan activity, because it > can be used to describe transitions of entities. So can we add to > Constraint 38 some kind of remark about why an entity must have a > non-zero lifespan? Something like: > > "Constraint 38 implies that an entity must have a non-zero lifespan by > using 'strictly precedes', that is the entity cannot be invalidated at > the same instant as it is generated. The reasoning for this is that a > meaningful entity is a thing with some aspects fixed. For these > aspects to be fixed, the entity must exist for some (possibly > infinitesimal) time. Note that this requirement does not apply to > activities." > This is ISSUE-469. Strictly has been removed from entity-generation-invalidation-ordering. Closed. 68. > Constraint 38 and 40 are not shown in Figure 3. > > Figure issue. @Luc? 69. > > First, we consider derivations, where the activity and usage are known. > --> > First, we consider derivations where the activity and usage are known. In > > > the usage of e1 has to precede the generation of e2 > > --> ".. has to strictly precede ..." > > > This needs a similar explanation for why this needs to be strictly > preceded. Use and generation do not require this on their own. I am > not sure what the reasoning here, perhaps it has to do with the > semantics of being 'derived'? > > > http://www.w3.org/TR/prov-dm/#dfn-wasderivedfrom > > > A derivation ◊ is a transformation of an entity into > another, an update of an entity resulting in a new one, or the > construction of a new entity based on a pre-existing entity. > > This definition places a strict time order, it forces an 'older' and > 'newer' entity. However I find it strange to apply this to the *usage* > event, and would rather keep only the following constraint: > > > Constraint 42 (derivation-generation-generation-ordering) > > > IF wasDerivedFrom(_d;e2,e1,attrs) and > wasGeneratedBy(gen1;e1,_a1,_t1,_attrs1) and > wasGeneratedBy(gen2;e2,_a2,_t2,_attrs2) THEN gen1 strictly > precedes gen2. > > Thus use1 precedes gen2 (but not strictly), constraint 41 can be > reformulated as a corollary with constraint 42+39. I don't see a good > reasoning for that to be strictly. > This is ISSUE-470. Strictly has been removed from derivation-generation-use-ordering. Closed. 70. > > A suggested Remark for constraint 42: > "This constraint, similar to constraint 38, requires the derived > entity to be generated strictly following the generation of the > original entity. This follows from the PROV-DM description of > derivation, A derivation is a transformation of an entity into > another, an update of an entity resulting in a new one, or the > construction of a new entity based on a pre-existing entity, thus > the derived entity must be newer than the original entity. " > > Done. 71. > Unique variables to avoid confusion: > > > Constraint 43 (wasStartedBy-ordering) > > IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and > wasGeneratedBy(gen;e,_a2,_t2,_attrs2) THEN gen precedes start. > > IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and > > wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN start precedes inv. > --> > IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and > wasInvalidatedBy(inv;e,_a3,_t3,_attrs3) THEN start precedes inv. > > Since these are two separate inferences, there is no real possibility of confusion, but fine. 72. > > > Constraint 44 (wasEndedBy-ordering) > > IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and wasGeneratedBy(gen;e,_a2,_t2,_attrs2) THEN gen precedes end. > > > IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and > wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN end precedes inv. > --> > IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and > wasInvalidatedBy(inv;e,_a3,_t3,_attrs3) THEN end precedes inv. > > Done 73. > > > > 5.2.3 Agent constraints > > This section assumes agents are entities. I think this should be > clarified, because not all agents are entities, and we don't have any > specific relations to describe the lifetime of agents. > > So perhaps it should be clarified that in the case that an agent is an > activity, then similar constraints should apply with wasStartedBy(ag, > ..) instead of wasGeneratedBy(ag, ... ) and wasEndedBy(ag, ...) > instead of wasInvalidatedBy(ag, ...), and that for non-activity > non-entity agents, appropriate constraints for their particular > lifetime would apply, but that would be out of scope for this > document. > I revised the section to point out that an agent could be an activity. We have added the missing inferences to handle the case when the agent happens to be an activity, rather than an entity. 74. > > Figure 5a) is very difficult to understand, as the extent of the two > triangles is not shown. Could this be added, such as in 2a)? > @Luc figure 75. > > > The agent may be generated, or may only become associated with the activity, after the activity start: > > This sentence can be misinterpreted as the wrong kind of may. Use RFC > 'MAY' to indicate that this is optional. Same for next sentence. > > > the agent is required to exist before the activity end > > " is required to exist" -> "MUST exist" > > > agent invalidation is required to happen after the activity start > > "is required to" -> "MUST" > Done 76. > > > Constraint 47 (wasAssociatedWith-ordering) > > > > > > > > IF > wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and > wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) and > wasInvalidatedBy(inv;ag,_a2,_t2,_attrs2) THEN start precedes > inv. > > IF wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and > > wasGeneratedBy(gen;ag,_a1,_t1,_attrs1) and > > wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN gen precedes end. > > Plan is not required (and is non-expandable), so replace _pl with - or > add note that _pl above MAY be -. > Done (clarified that _pl may be -). 77. > Make identifiers unique in second line: > > IF wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and > wasGeneratedBy(gen;ag,_a3,_t3,_attrs3) and > wasEndedBy(end;a,_e4,_a4,_t4,_attrs4) THEN gen precedes end. > > Seems unnecessary but fine. 78. > > An entity that was attributed to an agent must have some overlap with the agent. > > Why?? > > > > The agent is required to exist before the entity invalidation. > > I don't agree with that. First of all, why has the attribution need to > have anything to do with the invalidation of an entity? If you > contribute to an entity, all of that has to happen *before* the entity > is generated. It does not matter what happens after that. > > > Likewise, the entity generation must precede the agent destruction. > > This also means it is not valid to attribute a book to an author if > the book was published after the author's death. (For instance The > GIrl with the Dragon Tattoo). > > By our inferences, it is only a requirement that the agent was > associated with an activity that eventually gave birth to the entity. > The agent is not required to be there till the end of the activity, > that sounds like an artificial constraint to me. Thus I would remove > constraint 48. > > > What you can instead say that an agent's association with that > activity must precede an entity's generation, because otherwise he > can't be associated with its generating activity. This does not > directly follow from constraint 47 and Inference 15 > (attribution-inference), so we need a constraint to force the > generation to be after the *association* started, the first would then > follow. Association don't have time, unfortunately, but we can use > same reasoning as in constraint 47: > > IF wasAttributedTo(_at;e,ag,_attrs) and > wasGeneratedBy(genE;e,_a1,_t1,_attrs1) and > wasGeneratedBy(genAg;ag,_a1,_t1,_attrs1) THEN genAg precedes genE > > > We can't say anything about the entity's invalidation; attribution > relates to association with the generation, not invalidation. The > agent's invalidation after the start of the activity a1 (which does > not affect e) is covered by constraint 47+ inference 15. > > > > For delegation, two agents need to have some overlap in their lifetime. > > Why? This means a solicitor can't act on a dead person's behalf. > > > IF actedOnBehalfOf(_del;ag2,ag1,_a,_attrs) and > wasGeneratedBy(gen;ag1,_a1,_t1,_attrs1) and > wasInvalidatedBy(inv;ag2,_a2,_t2,_attrs2) THEN gen precedes inv. > > This however, is correct. There is no requirement for overlap, just > that ag1 was born before ag2 died. Thus my great grandfather can't act > on my behalf. (not assigned by me, at least!) > > Change text to: > "For delegation, the responsible agent must have been generated before > the invalidation of the delegated agent.". > This is ISSUE-471 79. > > 5.4: > > > IF entity(c,[prov:type='prov:EmptyCollection']) THEN 'entity' ∈ > > typeOf(c) AND 'prov:EmptyCollection' ∈ typeOf(c). > > Change to (for consistency, not needed by checks): > > IF entity(c,[prov:type='prov:EmptyCollection']) THEN 'entity' ∈ > typeOf(c) AND 'prov:Collection' ∈ typeOf(c) AND 'prov:EmptyCollection' > ∈ typeOf(c). Done. 80. > > > IF wasDerivedFrom(id; e2, e1, a, g2, u1, attrs) THEN 'entity' ∈ > typeOf(e2) AND 'entity' ∈ typeOf(e1) AND 'activity' ∈ typeOf(a). > > what if a is - ? Also need: > > IF wasDerivedFrom(id; e2, e1, -, -, -, attrs) THEN 'entity' ∈ > typeOf(e2) AND 'entity' ∈ typeOf(e1) > Done. 81. > > > > IF wasAssociatedWith(id;a,ag,pl,attrs) THEN 'activity' ∈ > typeOf(a) AND 'agent' ∈ typeOf(ag) AND 'entity' ∈ typeOf(pl). > > What if pl is -? Add: > > IF wasAssociatedWith(id;a,ag,-,attrs) THEN 'activity' ∈ typeOf(a) AND > 'agent' ∈ typeOf(ag) > Done. 82. > > > > Note that there is no disjointness between entities and > agents. This is because one might want to make statements about > the provenance of an agent, by making it an entity. Therefore, > users may assert both entity(a1) and agent(a1) in a valid PROV > instance. > > Similarly, an agent might in some cases be expressed rather as an activity. > Added clarification. 83. > > > 6. Normalization, Validity, and Equivalence > > > Because of the potential interaction among inferences, > definitions and constraints, the above algorithm is > recursive. Nevertheless, all of our constraints fall into a class > of tuple-generating dependencies and equality-generating > dependencies that satisfy a termination condition called weak > acyclicity that has been studied in the context of relational > databases [DBCONSTRAINTS]. > > I find I need to add a condition to not follow inferences on purely > existential variables to avoid a recursive loop. So I would add that > to point 4. This is ISSUE-465 84. > > > (Replacing two different names with equal names does change the meaning.) > > --> "... does however change ..." > Done 85. > > > D. References > > Why is PROV-DM not a normative reference? > Done (and PROV-N) > > -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester > >