various comments from simon
authorLuc Moreau <l.moreau@ecs.soton.ac.uk>
Thu, 06 Oct 2011 12:54:06 +0100
changeset 547 383380cbb920
parent 546 131204c4c68c
child 548 266035d8d652
various comments from simon
model/simon-comments.txt
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/model/simon-comments.txt	Thu Oct 06 12:54:06 2011 +0100
@@ -0,0 +1,290 @@
+>Luc, Paolo,
+>
+>Here's my comments on the current data model document, annotated with
+>(T) for typo/text clarity or (C) for content comment/question. I think
+>most/all comments are small enough that an issue need not be raised.
+>
+>Throughout:
+>(T) Sections are referred to in the text by "Section Entity", "Section
+>Process Execution" etc. Shouldn't these be the section numbers?
+
+TODO when stable
+
+>(T) There seems to be inconsistency in symbols following the change
+>from roles to qualifiers. Sometimes "q" is used in constraint
+>definitions, examples etc. and sometimes "r" is used. I suggest it
+>would be clearer to always use "q".
+
+Done (hopefully everywhere)
+
+>(T) There are a few "characterised" in amongst the majority
+>"characterized" spelling.
+
+Done
+
+>(C) At least one standard qualifier name, "role", is used in the
+>document, but it is not clear what namespace this name is in. Does it
+>mean no other "role"s from domain-specific ontologies may be used in
+>Prov data?
+
+Added section 1.2 explicitly declaring PROV-DM namespace.
++ added that role is declared in that namespace.
+
+>
+>Sec 2.1:
+>(T) paragraph 1: "Words such thing or activity" should be "Words such
+>as 'thing' or 'activity'"
+
+Done
+
+>(C) paragraph 2: The first mention of "provenance" in the document
+>proper is in the second paragraph of this section, and is a bit out of
+>the blue ("unambiguously report provenance"). Can we add some
+>intuition about what provenance is (for this data model)?
+
+
+Now, provenance is introduced in section 1.
+
+
+>(T) Example paragraph 1: "perspectives about a resource" should be
+>"perspectives on a resource"
+
+Done
+
+>(C) Example paragraph 1: "the report independent of where it is hosted
+>over time" - I suggest also saying "and of its content over time", to
+>distinguish this entity from the report version entity above it
+
+Done
+
+>(C) paragraph 6: "punctual events"? "punctual" as most commonly used
+>implies prior planning of when something should occur. I'm not sure
+>what you are intending in this context.
+
+???? I don't understand
+
+>(C) paragraph 6: "a partial order exists between events". I assume you
+>mean a temporal order? What kind of ordering do you mean?
+
+... partial ;-) .... between events... 
+What is the issue?
+
+>(C) paragraph 6: "global notion of time and Lamport's style clocks" -
+>this seems like a weirdly specific level of detail for this overview
+>section, especially considering that many other aspects of the model
+>are not mentioned at all in the overview.
+
+Given that time is so critical and the object of several issues, it's
+important to state our assumptions.
+
+>
+>Sec 2.3:
+>(C) Regarding the note (not attempting to ensure consistency of an
+>asserter) - this seems practical. I'm not sure how we could enforce
+>consistency in any circumstance, only define what it means or say it
+>is application specific.
+
+Rephrased.
+
+>
+>Sec 4.1:
+>(T) "We denote this e1." and the same for e2 etc. It is not entirely
+>clear whether "this" refers to the event or the entity.
+>
+
+TODO
+
+>Sec 4.2:
+>(C) The fact that Alice is the creator of e1 seems to be expressed
+>twice, first as an attribute "creator=Alice", and secondly as the
+>"creator" role of an agent in the creation process. I don't think it
+>is a good idea for either clarity of use of the model or for ensuring
+>interoperability for there to be multiple ways to express the same
+>thing, if it can be at all avoided. Even if we cannot stop someone
+>using either method, can't we say which they *should* use to aid
+>interoperability?
+
+TODO
+
+>(T) "Generation expressions... represent the event at which a file is
+>created". The surrounding text is generic rather than specific to the
+>example, implying this should be "entity" rather than "file",
+>Otherwise, readers may assume that all entities are files or that
+>generation only applies to files.
+
+TODO
+
+>(T) Paragraph on wasComplementOf: in "attribute content" and
+>"attribute spellchecked", fixed width font (or another font) should be
+>used for the attribute names to show they are names, else the sentence
+>can be read in strange ways.
+
+Done.
+
+>
+>Sec 4.3:
+>(T) Fig 1: The arrow from pe2 to a3 is a different direction to the
+>other "agent" links. It is also not clear if an "agent" link is the
+>same as a "wasControlledBy" link. If so, the pe2-a3 arrow direction
+>makes most sense, as the others seem to be saying the agent was
+>controlled by the process execution.
+>
+
+TODO
+
+>Sec 5.1:
+>(T) The last sentence, regarding a "house-keeping construct" is rather
+>opaque. I'm not sure what the reader is supposed to understand from
+>this.
+>
+
+Rewritten
+
+>Sec 5.2.1:
+>(C) First sentence: "entity expression" is given exactly the same
+>definition that "entity" was in Section 4. I think having two terms
+>for the same thing will cause confusion. I like addition of
+>"expressions" to the model in general, though, as I think this greatly
+>clarifies what is intended.
+
+TODO: well the issue is that in the ER diagram, we really talk about Entity Expressions, not Entities?
+
+>(C) "the meaning of attribute in the context of a process execution
+>expression is similar to the meaning of attribute for entity
+>expression" - I think the meaning should be exactly the same, not just
+>similar, else there will be confusion.
+
+Replaced similar by same, and replaced meaning by interpretation.
+
+
+>(C) Following from the above point: "A process execution expression's
+>attribute remains constant for the duration of the activity" - OK, but
+>does it also characterise the process execution, e.g. is the start
+>time part of what distinguishes one execution from others?
+>(T) "noted processExecution" - I think you mean "denoted" (or
+>"written" or "expressed")
+>
+>Sec 5.2.3:
+>(T) "representation a characterized thing" - missing "of"
+>(T) Last sentence, "On the contrary" should be "On the other hand",
+>and "inferred" should be "infer"
+>
+>Sec 5.2.4:
+>(T) Last sentence: "expectede"
+>
+>Sec 5.3.3.1:
+>(C) I suggest that, as accounts are not introduced until later in the
+>document, the generation-unicity constraint will not make sense here.
+>Moreover, I think the constraint is more about accounts and what it
+>means for them to be consistent than it is about generation events or
+>process executions. Therefore, I suggest moving this constraint to the
+>section on accounts.
+>(C) Given that constraint derivation-events applies, don't we just
+>have two ways of saying the same thing? Why use the long form of
+>wasDerivedFrom when the same can be expressed using wasGeneratedBy and
+>used? Which variety *should* be used?
+>
+>Sec 5.3.3.2:
+>(T?) Constraint "derivation-linked-independent" seems to be a
+>tautology. I guess this is a typo?
+>
+>Sec 5.3.3.3:
+>(T) Paragraph 4: "In other word" should be "In other words"
+>
+>Sec 5.3.4:
+>(C) This section seems to be confusingly expressed, implying that
+>non-agent entities can control executions, whereas the control-agent
+>constraint (in the section on agents) contradicts this. It is probably
+>just a matter of clarifying the text, e.g. if you mean that a
+>non-agent entity can be asserted to be controlling an execution but
+>from this inferred to be an agent.
+>(T) The text may be read to imply that a control link has only one
+>qualifier, role, whereas I guess you mean that, like use/generate, it
+>can have multiple "modalities" as part of the qualifier?
+>
+>Sec 5.3.5:
+>(C) I can see this section causing some difficulty... While that may
+>just be the nature of the topic, there seems an important thing
+>missing: what has complementarity got to do with provenance? In other
+>words, what value (with regards to provenance) is there in asserting
+>complementarity?
+>(C) The text suddenly starts talking about "properties" from the
+>second paragraph. What are these, and do they have any relation to
+>attributes?
+>(C) Should the justification of why the complementarity relation is
+>not transitive be in this document? I would expect this document to
+>just state that it is not transitive and, for brevity and simplicity,
+>leave justifications to another document.
+>
+>Sec 5.3.6:
+>(C) Similarly to above, I'm not sure the justification of why
+>wasInformedBy is not transitive should be in this document.
+>
+>Sec 5.3.8:
+>(C) Constraint participation: This seems odd to me. In what
+>circumstances would you not know or want to assert which of the three
+>possibilities (used/controlled/complement) applied for a given entity
+>and execution? Is hadParticipant as defined really useful?
+>
+>Sec 5.3.9:
+>(C) Grammar definition: I don't understand what the
+>"relationIdentification" stuff is about or what all the identifiers
+>identify.
+>
+>Sec 5.4.1:
+>(C) This appears to be yet another way to say the same thing,
+>following the comment on Sec 4.2 above. If A is an "asserter" of
+>expression E, then we can either (i) express E to be an entity and use
+>an attribute "asserter=E"; (ii) express E to be an entity and A to be
+>an agent playing "role=asserter"; or (iii) put A in the "asserter"
+>slot of an "account" expression containing E. Why do we need all three
+>ways? Isn't method (ii) most consistent with the rest of the model?
+>
+>Sec 5.4.2:
+>(T) Second sentence: "return all the provenance assertions" - all the
+>assertions? or just "all the assertions in the container"?
+>(C) Under the definition given, you cannot have expressions in a
+>container but not in an account. Does this imply that every Prov
+>expression is made accessible as part of an account? I think this
+>would be a good thing for clarity, but it is not explicit in the
+>document (and also differs from OPM).
+>
+>Section 5.5.1:
+>(C) I agree with the first note. If it is mandatory to say something
+>but that what we say can be nothing, that means that it is not
+>mandatory at all. The "mandatory" thing seems to be just saying
+>something about the ASN, and so is irrelevant as the ASN is just there
+>to make the model concrete and readable.
+>
+>Sec 5.5.4:
+>(C) Second note: Wouldn't this mean that either account IDs or entity
+>IDs can never be URIs, as a sequence of URIs would itself not be a
+>URI? If so, that seems to make RDF serialisation difficult to achieve.
+>
+>Sec 5.5.6:
+>(C) I don't see the connection between the section's introductory text
+>and the content of the subsections.
+>
+>Sec 5.7.1:
+>(C) I think this section needs something introductory to say why it is
+>relevant to the data model, i.e. what has it to do with provenance,
+>why is it useful in the context of provenance, why is it standardised
+>rather than application-specific?
+>(C) If my record of what occurred does not start with an empty
+>container, but one with contents, how do I say that the elements are
+>part of the container? Do I have to model this as a series of
+>wasAddedTo links, even if I know nothing about how the elements were
+>added? Or is it out of scope of the standard?
+>
+>Sec 5.7.2:
+>(C) I don't see how wasQuoteOf is a sub-relation of wasRevisionOf, or
+>wasAttributedTo a sub-relation of wasEventuallyDerivedFrom, when the
+>super-relations do not contain reference to any agents but the
+>sub-relations do. What does it mean?
+>(T) Last sentence of 5.7.2.2: "wasQuoteOf" should be "wasAttributedTo"
+>
+>Thanks,
+>Simon
+>
+>-- Dr Simon Miles Lecturer, Department of Informatics Kings College London, WC2R 2LS, UK +44 (0)20 7848 1166 
+>