--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/model/comments/wd6-Graham.txt Mon May 28 17:13:53 2012 +0100
@@ -0,0 +1,415 @@
+ > > On 25/05/2012 11:16, Luc Moreau wrote:
+ > > > Hi Graham,
+ > > >
+ > > > I have produced an updated version of the prov-dm document for
+ > > > you to go through.
+ > > >
+ > > > http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120525/prov-dm.html
+ > >
+ > >
+ >
+ > I think we have the makings of a useful, compact orientation in
+ > section 2.1. Naturally, I have a number of comments, but they are
+ > increasingly more stylistic than to do with substance.
+ >
+ > In my original proposal for reorganization, I suggested moving a
+ > number of sections into the separate CORE part of the specification.
+ > Your reorganization does not do this, but outlines the key concepts
+ > early on. I'm fine with this, but as the definitions are provided
+ > later in the document I see no point in also including them in section
+ > 2. So my proposals focus more on explaining how the concepts work
+ > together and not repeating the actual definitions.
+ >
+ > As I reflect on what I've read, I think it might be worth linking each
+ > of the core structure concepts to the corresponding subsection in
+ > section 5. This would provide a quick-and-easy route from the
+ > structural overview to the corresponding details.
+ >
+ > Detailed comments follow.
+ >
+ > == Abstract ==
+ >
+ > I'm not convinced that the component structure needs to be mentioned
+ > in the abstract. I might re-arrange the first sentence to lead on the
+ > functionality provided, with something like:
+ >
+ > [[ Provenance consists of information about entities, activities and
+ > people involved producing a piece of data or thing, which can be used
+ > to form assessments about its quality, reliability or trustworthiness.
+ > PROV-DM is the conceptual data model that forms a basis for the W3C
+ > provenance (PROV) family of specifications. ... ]]
+ >
+ > Otherwise it looks pretty reasonable.
+ >
+ >
+ > == Section 1 ==
+ >
+ > Para 2: "We consider" -> "We present"
+ >
+ > Para 3: "The PROV data model" - this is first use in the body of the
+ > text, and should be defined (what's "PROV"?). Suggest the *previous*
+ > paragraph starts thus:
+ >
+ > "We present the PROV data model, a generic data model for provenance..."
+ >
+ > Para 3: suggest:
+ >
+ > "core structures form the essence of provenance descriptions, and
+ > are commonly found in various domain-specific vocabularies"
+ >
+ > to read:
+ >
+ > "core structures form the essence of provenance descriptions, and
+ > are commonly found in various domain-specific vocabularies that deal
+ > with provenance or similar kinds of information".
+ >
+ > (Examples, or informational references, could be added to back up this
+ > statement - precursor provenance models and CIDOC-CRM are examples I
+ > have used - OPM, OPMV, Provenir, PML all use broadly similar
+ > structures)
+ >
+ > Para 4 and list: I would have the derivations component immediately
+ > follow on from entities and activities (or folded in with those).
+ > More detail later in discussion of core structures.
+ >
+ > Para 5 and 6: I think these should be run together. I find that para
+ > 5 on its own doesn't convey anything useful. I would suggest even
+ > dropping para 5.
+ >
+ > Para 6: I'm not sure that "enriching" quite captures the idea. Also,
+ > "attributes and temporal information" are part of DM, not added by
+ > CONSTRAINTS. Here's my proposal for para 6:
+ >
+ > [[
+ > If something is changeable, then it is challenging to express its
+ > provenance precisely (e.g. the data from which a daily weather report
+ > is derived changes from day to day), to support reasoning about its
+ > correctness, trustworthiness, etc. This is addressed in a companion
+ > specification [PROV-CONSTRAINTS] by proposing formal constraints on
+ > the way that provenance descriptions are related to the things they
+ > describe (such as the use of attributes, temporal information and
+ > specialization of entities), and additional conclusions that are valid
+ > to infer if those constraints are satisfied. ]]
+ >
+ >
+ > == Section 2 ==
+ >
+ > "catering for more advanced uses..." - I would suggest "catering for more specific uses...".
+ >
+ >
+ > == Section 2.1 ==
+ >
+ > Para 1: suggest replacement (trying to focus more on orienting the reader on the key ideas):
+ >
+ > [[
+ >
+ > At it's core, provenance describes the use and production of
+ > /entities/ by /activities/, which may be controlled or influenced in
+ > various ways by /agents/. These core types and their relationships
+ > are illustrated in Figure 1. For a given artifact, its provenance can
+ > usually be seen as a "provenance trace" from one or more source
+ > entities via described activities. Annotations associated with the
+ > activities provide key information for assessing the reliability and
+ > trustworthiness of the result.
+ > ]]
+ >
+ > Figure 1 is a great improvement over previous incarnations, largely by
+ > virtue of the coloured boxes, but I think it could be more effective
+ > and appealing. I attach a proposed alternative (graffle and png)
+ > which follows the style of diagrams used in the examples.
+ >
+ > I think there's an inconsistency between the diagram (figure 1) and
+ > table (Table 2): relations on the diagram use values from the "Name"
+ > column of the table, but types use values from the "Concepts" column.
+ >
+ > I think it's a little confusing that there are named "concepts" and
+ > (sometimes) different names for the types and relations. This is
+ > behind my earlier comment suggesting that table 2 be moved top later
+ > in the document.
+ >
+ > I would suggest that the diagram should use the same terms as are used
+ > in the rest of section 2, then those names can also be used to locate
+ > the corresponding sections in the reference part of the document. In
+ > this arrangement, I think table 2 is redundant.
+ >
+ >
+ > == Section 2.1.1 ==
+ >
+ > Rather than focusing on the definitions of terms, which is covered
+ > later, I would aim to cover here the key relationships. In the case
+ > of entities and activities I think this is largely concerned with
+ > their inter-relationship.
+ >
+ > Suggest:
+ >
+ > [[
+ > Provenance describes /entities/, which are both generated and used by /activities/.
+ >
+ > While the main anticipated use of provenance is to describe entities
+ > that are digital artifacts, it is not constrained from describing
+ > other kinds of thing. Thus, an entity may be a broad diversity of
+ > notions, including digital objects such as a file or web page,
+ > physical things such as a mountain, a building, a printed book, or a
+ > car as well as abstract concepts and ideas.
+ >
+ > <skip entity definition: that's covered in section 5>
+ >
+ > <skip entity example: that's already covered in the text (yours and mine)>
+ >
+ > <skip activity definition: that's covered in section 5>
+ >
+ > Activities are (time-bounded) processes that consume or generate
+ > entities; they are the mechanisms by which entities are created and
+ > used in the creation of further entities. Just as entities cover a
+ > broad range of notions, activities can cover a broad range of
+ > processes, commonly related to information processing, but also
+ > covering broader notions like driving a car from Boston to Cambridge.
+ >
+ > <example 2 (activities> here>
+ >
+ > Provenance is concerned with activities that create a new state of
+ > affairs that can be described in terms of pre-existing entities, and
+ > new entities that exist as the result of the activities. Thus we have
+ > two kinds of relationship between an activity and entities:
+ >
+ > * Usage:
+ >
+ > is the relationship between an activity and the entities that it
+ > uses, which must exist in order for the activity to complete. Usage
+ > is considered to occur when an activity starts using an entity; if
+ > the entity does not exist at this time, usage cannot happen.
+ >
+ > * Generation:
+ > which is the elates an activity to entities that it creates, which
+ > do not exist before the activity is started and do exist by the time
+ > the activity completes. Generation is considered to occur when the
+ > entity is full created, at which point it may be available for use
+ > by other activities.
+ >
+ > <example 3 here>
+ >
+ > <example 4 here>
+ >
+ > One might reasonably ask what entities are used and consumed by
+ > driving a car from Boston to Cambridge. This is answered by
+ > considering that a single physical (or digital) artifact may
+ > correspond to several entities; in this case a car in Boston may be a
+ > different artifact from a car in Cambridge (which may in turn have
+ > implications for, say, taxation purposes). Thus, among other things,
+ > an entity "car in Boston" would be used, and a new entity "car in
+ > Cambridge" would be generated by this activity of driving. The
+ > provenance trace of our car might include: designed in Japan,
+ > manufactured in Korea, shipped to Boston USA, purchased by customer,
+ > driven to Cambridge, serviced by engineer in Cambridge, etc., all of
+ > which might be important information when deciding whether or not it
+ > represents a sensible second-hand purchase. Or some of it might
+ > alternatively be relevant when trying to determine the truth of a web
+ > page reporting a traffic violation involving that car. This breadth
+ > of provenance allows descriptions of interactions between physical and
+ > digital artifacts.
+ >
+ > <I added a fair amount of explanatory text here, because I think that
+ > the whole issue of breadth of interpretation begs some explanation.>
+ >
+ > Communication is the generation of an entity by an activity and its
+ > subsequent usage by another activity.
+ >
+ > <skip definition - it just repeats and is covered later>
+ >
+ >
+ > Example 5 here; I might also add to this: the activity of purchasing a
+ > car in Boston could be informed by the the activity of its being
+ > designed in Japan> ]]
+ >
+ >
+ > == After section 2.1.1 ==
+ >
+ > I have proposed previously, and still feel, that the section on
+ > derivation should be part of section 2.1.1. A compromise position
+ > that keeps it separate would be to introduce it immediately following
+ > section 2.1.1. It is a natural part of the discussion of provenance
+ > traces, and is arguable one of the most significant us of such traces
+ > (e.g. the weather report W was derived from meteorological datasets X,
+ > Y and Z; my Ford car was derived from a VW design).
+ >
+ > Thus, following on from the proposed revised 2.1.1:
+ >
+ > [[
+ >
+ > Derivation is the generation of an entity that is affected by some
+ > other entity that is used directly or indirectly. Derivation covers
+ > common information processing activities like transforming data,
+ > editing a document, and also extends more broadly to a canvas used for
+ > creating a painting, transporting a work of art from London to New
+ > York, or melting ice to produce water.
+ >
+ > While the basic idea is quite simple, the concept of derivation can be
+ > tricky: implicit is the notion that the generated object was affected
+ > in some way by the used object. It is not sufficient that an artifact
+ > being used by an activity which also generated a new artifact to say
+ > that the second artifact was derived from the first. In the activity
+ > of creating a painting, an artist may have mixed some paint that was
+ > never actually applied to the canvas - the painting would typically
+ > not be considered a derivation from the unused paint. The provenance
+ > model does not attempt to define what constitutes derivation; rather,
+ > it is considered to be something that is asserted, having been
+ > determined by unspecified means.
+ >
+ > Thus, while a chain of usage and generation is necessary for a
+ > derivation relation between entities, it is not sufficient; some
+ > knowledge of the activities involved is also needed. ]]
+ >
+ > <Again, I've added bit of text here, because I think it's part of the
+ > orientation that's needed to avoid misunderstandings like the one I
+ > exhibited in the last teleconference.>
+ >
+ >
+ > == Section 2.1.2 ==
+ >
+ > I find the introduction of agents as having "responsibility" is a bit
+ > bare, and doesn't really put it into a context of provenance usage.
+ > I'm also uneasy about describing software agents as having
+ > responsibility.
+ >
+ > You say "An agent may be a particular type of entity or activity" -
+ > can an agent *really* be an activity? While I would shop short of
+ > insisting it cannot, I'm not sure it helps to claim that it can be. I
+ > had the idea that the reason that a plan is distinct from an agent is
+ > so that provenance-of-instruments-of-agents (e.g. software) can be
+ > handled without making the agents also be entities.
+ >
+ > Rather than try pick apart the existing text, I'll offer my suggestion:
+ >
+ > [[
+ >
+ > Provenance provides a basis for evaluating reliability or
+ > trustworthiness of an entity. For many purposes, a key consideration
+ > for deciding whether something is reliable and/or trustworthy is
+ > knowing who or what was involved in its production. Data published by
+ > a respected independent organization may be considered more
+ > trustworthy that that from a lobby organization; a claim by a
+ > well-known scientist with an established track record may be more
+ > believed than a claim by a new student; a calculation performed by an
+ > established software library may be more reliable than by a one-off
+ > program.
+ >
+ > In provenance terms, an /agent/ is a person or entity that can
+ > initiate, control or otherwise bear responsibility for an activity.
+ >
+ > <example 6 here>
+ >
+ > An /association/ of an activity with an agent indicates that the agent
+ > had some role in the activity.
+ >
+ > <example 8 here>
+ >
+ > An /attribution/ of an entity to an agent means that the entity was
+ > generated by some (possibly unknown) activity that was associated with
+ > the agent.
+ >
+ > <example 7 here>
+ >
+ > The provenance model provides these mechanisms to express information
+ > for reliability or trustworthiness decisions, but does not specify how
+ > any such decisions should be made. ]]
+ >
+ > .........
+ >
+ > I'm going to review the rest of section 2 more quickly. I won't necessarily try to suggest alternatives
+ >
+ > == Section 2.2 ==
+ >
+ > Section 2.2 structure feels a bit contrived to me ... it deals with
+ > extension mechanisms (2.2.1) and some new concepts (2.2.2 and 2.2.3)
+ >
+ > My inclination would be to present:
+ > 2.2 Additional structures
+ > 2.2.1 Bundle
+ > 2.2.2 Collections
+ > 2.3 Extension mechanisms
+ > 2.3.1 Subtyping
+ > 2.3.2 Multi-way relations
+ > 2.3.3 Optional identification and new relations
+ >
+ > My further comments use the current section numbering...
+ >
+ > == 2.2.1.1 ==
+ >
+ > The styling here makes the examples look lime definitions... which if
+ > nothing else is a hostage to fortune (could get out of phase with the
+ > real definition).
+ >
+ > (Suggest dropping the "defined as ..." and linking the example to the actual definition.)
+ >
+ >
+ > == 2.2.1.2 ==
+ >
+ > Para 1: the description here is highly technical, and doesn't really
+ > give an indication why a user/developer would care. I'd rather just
+ > say something along the lines of wanting to express more information
+ > than is conveniently captured by a simple relation.
+ >
+ > The text seems very long-winded. I think the entire useful content
+ > could be captured in a couple of sentences plus the example. E.g.
+ >
+ > [[
+ >
+ > Association (@@link section 2.1.2) can express a relationship between
+ > a software agent and an entity, but in its basic form cannot indicate
+ > what software is being used by the agent. An extended form of
+ > association (@@link section 5.2.3) also specifies a /plan/, which is
+ > an entity representing a set of actions or steps intended by one or
+ > more agents to achieve some goals, such as the software that is
+ > executed by a software agent.
+ >
+ > ]]
+ >
+ > (Why is the agent optional in section 5.2.3?)
+ >
+ >
+ > == 2.2.1.3 ==
+ >
+ > The section title doesn't make any sense to me. The "New relations"
+ > bit is particularly confusing.
+ >
+ > From the description, it looks to me like reification of a relation so
+ > that arbitrary further information can be added to an instance.
+ >
+ > I think a title like "Identifying relation instances" might be a
+ > little clearer
+ >
+ > It's not clear to me whether *any* relation can be reified in this
+ > way, or are there some for which the optional id parameter is not
+ > allowed?
+ >
+ > Does use of this structure map 1:1 to use of qualified relations in
+ > the ontology? I think not.
+ >
+ >
+ > == 2.2.2 ==
+ >
+ > What exactly is a "provenance description"? I don't think that's been introduced yet.
+ >
+ > Maybe a sentence at the top of section 2 would help; e.g.
+ > [[
+ > A /provenance description/ is a set of assertions based on the core and extended provenance structures described below.
+ > ]]
+ >
+ > I see the term "provenance description" is used throughout the
+ > document, but I see no definition. Many of the uses are fine - being
+ > descriptive, but in this case (and maybe others) it's being used as a
+ > specific term in a definition, so I think it needs to be clarified.
+ >
+ > As it is, if the above description is correct, a bundle should be
+ > described as a named provenance description (not a set of
+ > descriptions).
+ >
+ > ....
+ >
+ > That's my review to the end of section 2, which is where I planned to
+ > focus my most intense efforts. I'll continue to look through the rest
+ > of the document in the time remaining today, but I'm going to send
+ > this off to you now.
+ >
+ > #g
+--