prov: changeset 3023:4981ab5bac69

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/model/comments/wd6-Graham.txt	Mon May 28 17:13:53 2012 +0100
@@ -0,0 +1,415 @@
+  >   > On 25/05/2012 11:16, Luc Moreau wrote:
+  >   > > Hi Graham,
+  >   > >
+  >   > > I have produced an updated version of the prov-dm document for
+  >   > > you to go through.
+  >   > >
+  >   > > http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120525/prov-dm.html
+  >   > 
+  >   > 
+  > 
+  > I think we have the makings of a useful, compact orientation in
+  > section 2.1. Naturally, I have a number of comments, but they are
+  > increasingly more stylistic than to do with substance.
+  > 
+  > In my original proposal for reorganization, I suggested moving a
+  > number of sections into the separate CORE part of the specification.
+  > Your reorganization does not do this, but outlines the key concepts
+  > early on.  I'm fine with this, but as the definitions are provided
+  > later in the document I see no point in also including them in section
+  > 2.  So my proposals focus more on explaining how the concepts work
+  > together and not repeating the actual definitions.
+  > 
+  > As I reflect on what I've read, I think it might be worth linking each
+  > of the core structure concepts to the corresponding subsection in
+  > section 5.  This would provide a quick-and-easy route from the
+  > structural overview to the corresponding details.
+  > 
+  > Detailed comments follow.
+  > 
+  > == Abstract ==
+  > 
+  > I'm not convinced that the component structure needs to be mentioned
+  > in the abstract.  I might re-arrange the first sentence to lead on the
+  > functionality provided, with something like:
+  > 
+  > [[ Provenance consists of information about entities, activities and
+  > people involved producing a piece of data or thing, which can be used
+  > to form assessments about its quality, reliability or trustworthiness.
+  > PROV-DM is the conceptual data model that forms a basis for the W3C
+  > provenance (PROV) family of specifications.  ...  ]]
+  > 
+  > Otherwise it looks pretty reasonable.
+  > 
+  > 
+  > == Section 1 ==
+  > 
+  > Para 2: "We consider" -> "We present"
+  > 
+  > Para 3: "The PROV data model" - this is first use in the body of the
+  > text, and should be defined (what's "PROV"?).  Suggest the *previous*
+  > paragraph starts thus:
+  > 
+  > "We present the PROV data model, a generic data model for provenance..."
+  > 
+  > Para 3:  suggest:
+  > 
+  >   "core structures form the essence of provenance descriptions, and
+  >   are commonly found in various domain-specific vocabularies"
+  > 
+  > to read:
+  > 
+  >   "core structures form the essence of provenance descriptions, and
+  >   are commonly found in various domain-specific vocabularies that deal
+  >   with provenance or similar kinds of information".
+  > 
+  > (Examples, or informational references, could be added to back up this
+  > statement - precursor provenance models and CIDOC-CRM are examples I
+  > have used - OPM, OPMV, Provenir, PML all use broadly similar
+  > structures)
+  > 
+  > Para 4 and list: I would have the derivations component immediately
+  > follow on from entities and activities (or folded in with those).
+  > More detail later in discussion of core structures.
+  > 
+  > Para 5 and 6: I think these should be run together.  I find that para
+  > 5 on its own doesn't convey anything useful.  I would suggest even
+  > dropping para 5.
+  > 
+  > Para 6: I'm not sure that "enriching" quite captures the idea.  Also,
+  > "attributes and temporal information" are part of DM, not added by
+  > CONSTRAINTS.  Here's my proposal for para 6:
+  > 
+  > [[
+  > If something is changeable, then it is challenging to express its
+  > provenance precisely (e.g. the data from which a daily weather report
+  > is derived changes from day to day), to support reasoning about its
+  > correctness, trustworthiness, etc.  This is addressed in a companion
+  > specification [PROV-CONSTRAINTS] by proposing formal constraints on
+  > the way that provenance descriptions are related to the things they
+  > describe (such as the use of attributes, temporal information and
+  > specialization of entities), and additional conclusions that are valid
+  > to infer if those constraints are satisfied.  ]]
+  > 
+  > 
+  > == Section 2 ==
+  > 
+  > "catering for more advanced uses..." - I would suggest "catering for more specific uses...".
+  > 
+  > 
+  > == Section 2.1 ==
+  > 
+  > Para 1:  suggest replacement (trying to focus more on orienting the reader on the key ideas):
+  > 
+  > [[
+  > 
+  > At it's core, provenance describes the use and production of
+  > /entities/ by /activities/, which may be controlled or influenced in
+  > various ways by /agents/.  These core types and their relationships
+  > are illustrated in Figure 1.  For a given artifact, its provenance can
+  > usually be seen as a "provenance trace" from one or more source
+  > entities via described activities.  Annotations associated with the
+  > activities provide key information for assessing the reliability and
+  > trustworthiness of the result.
+  > ]]
+  > 
+  > Figure 1 is a great improvement over previous incarnations, largely by
+  > virtue of the coloured boxes, but I think it could be more effective
+  > and appealing.  I attach a proposed alternative (graffle and png)
+  > which follows the style of diagrams used in the examples.
+  > 
+  > I think there's an inconsistency between the diagram (figure 1) and
+  > table (Table 2): relations on the diagram use values from the "Name"
+  > column of the table, but types use values from the "Concepts" column.
+  > 
+  > I think it's a little confusing that there are named "concepts" and
+  > (sometimes) different names for the types and relations.  This is
+  > behind my earlier comment suggesting that table 2 be moved top later
+  > in the document.
+  > 
+  > I would suggest that the diagram should use the same terms as are used
+  > in the rest of section 2, then those names can also be used to locate
+  > the corresponding sections in the reference part of the document.  In
+  > this arrangement, I think table 2 is redundant.
+  > 
+  > 
+  > == Section 2.1.1 ==
+  > 
+  > Rather than focusing on the definitions of terms, which is covered
+  > later, I would aim to cover here the key relationships.  In the case
+  > of entities and activities I think this is largely concerned with
+  > their inter-relationship.
+  > 
+  > Suggest:
+  > 
+  > [[
+  > Provenance describes /entities/, which are both generated and used by /activities/.
+  > 
+  > While the main anticipated use of provenance is to describe entities
+  > that are digital artifacts, it is not constrained from describing
+  > other kinds of thing. Thus, an entity may be a broad diversity of
+  > notions, including digital objects such as a file or web page,
+  > physical things such as a mountain, a building, a printed book, or a
+  > car as well as abstract concepts and ideas.
+  > 
+  > <skip entity definition: that's covered in section 5>
+  > 
+  > <skip entity example: that's already covered in the text (yours and mine)>
+  > 
+  > <skip activity definition: that's covered in section 5>
+  > 
+  > Activities are (time-bounded) processes that consume or generate
+  > entities; they are the mechanisms by which entities are created and
+  > used in the creation of further entities.  Just as entities cover a
+  > broad range of notions, activities can cover a broad range of
+  > processes, commonly related to information processing, but also
+  > covering broader notions like driving a car from Boston to Cambridge.
+  > 
+  > <example 2 (activities> here>
+  > 
+  > Provenance is concerned with activities that create a new state of
+  > affairs that can be described in terms of pre-existing entities, and
+  > new entities that exist as the result of the activities.  Thus we have
+  > two kinds of relationship between an activity and entities:
+  > 
+  > * Usage:
+  > 
+  >   is the relationship between an activity and the entities that it
+  >   uses, which must exist in order for the activity to complete.  Usage
+  >   is considered to occur when an activity starts using an entity; if
+  >   the entity does not exist at this time, usage cannot happen.
+  > 
+  > * Generation:
+  >   which is the elates an activity to entities that it creates, which
+  >   do not exist before the activity is started and do exist by the time
+  >   the activity completes.  Generation is considered to occur when the
+  >   entity is full created, at which point it may be available for use
+  >   by other activities.
+  > 
+  > <example 3 here>
+  > 
+  > <example 4 here>
+  > 
+  > One might reasonably ask what entities are used and consumed by
+  > driving a car from Boston to Cambridge.  This is answered by
+  > considering that a single physical (or digital) artifact may
+  > correspond to several entities; in this case a car in Boston may be a
+  > different artifact from a car in Cambridge (which may in turn have
+  > implications for, say, taxation purposes).  Thus, among other things,
+  > an entity "car in Boston" would be used, and a new entity "car in
+  > Cambridge" would be generated by this activity of driving.  The
+  > provenance trace of our car might include: designed in Japan,
+  > manufactured in Korea, shipped to Boston USA, purchased by customer,
+  > driven to Cambridge, serviced by engineer in Cambridge, etc., all of
+  > which might be important information when deciding whether or not it
+  > represents a sensible second-hand purchase.  Or some of it might
+  > alternatively be relevant when trying to determine the truth of a web
+  > page reporting a traffic violation involving that car.  This breadth
+  > of provenance allows descriptions of interactions between physical and
+  > digital artifacts.
+  > 
+  > <I added a fair amount of explanatory text here, because I think that
+  > the whole issue of breadth of interpretation begs some explanation.>
+  > 
+  > Communication is the generation of an entity by an activity and its
+  > subsequent usage by another activity.
+  > 
+  > <skip definition - it just repeats and is covered later>
+  > 
+  > 
+  > Example 5 here; I might also add to this: the activity of purchasing a
+  > car in Boston could be informed by the the activity of its being
+  > designed in Japan> ]]
+  > 
+  > 
+  > == After section 2.1.1 ==
+  > 
+  > I have proposed previously, and still feel, that the section on
+  > derivation should be part of section 2.1.1.  A compromise position
+  > that keeps it separate would be to introduce it immediately following
+  > section 2.1.1.  It is a natural part of the discussion of provenance
+  > traces, and is arguable one of the most significant us of such traces
+  > (e.g. the weather report W was derived from meteorological datasets X,
+  > Y and Z; my Ford car was derived from a VW design).
+  > 
+  > Thus, following on from the proposed revised 2.1.1:
+  > 
+  > [[
+  > 
+  > Derivation is the generation of an entity that is affected by some
+  > other entity that is used directly or indirectly.  Derivation covers
+  > common information processing activities like transforming data,
+  > editing a document, and also extends more broadly to a canvas used for
+  > creating a painting, transporting a work of art from London to New
+  > York, or melting ice to produce water.
+  > 
+  > While the basic idea is quite simple, the concept of derivation can be
+  > tricky: implicit is the notion that the generated object was affected
+  > in some way by the used object.  It is not sufficient that an artifact
+  > being used by an activity which also generated a new artifact to say
+  > that the second artifact was derived from the first.  In the activity
+  > of creating a painting, an artist may have mixed some paint that was
+  > never actually applied to the canvas - the painting would typically
+  > not be considered a derivation from the unused paint.  The provenance
+  > model does not attempt to define what constitutes derivation; rather,
+  > it is considered to be something that is asserted, having been
+  > determined by unspecified means.
+  > 
+  > Thus, while a chain of usage and generation is necessary for a
+  > derivation relation between entities, it is not sufficient; some
+  > knowledge of the activities involved is also needed.  ]]
+  > 
+  > <Again, I've added bit of text here, because I think it's part of the
+  > orientation that's needed to avoid misunderstandings like the one I
+  > exhibited in the last teleconference.>
+  > 
+  > 
+  > == Section 2.1.2 ==
+  > 
+  > I find the introduction of agents as having "responsibility" is a bit
+  > bare, and doesn't really put it into a context of provenance usage.
+  > I'm also uneasy about describing software agents as having
+  > responsibility.
+  > 
+  > You say "An agent may be a particular type of entity or activity" -
+  > can an agent *really* be an activity?  While I would shop short of
+  > insisting it cannot, I'm not sure it helps to claim that it can be.  I
+  > had the idea that the reason that a plan is distinct from an agent is
+  > so that provenance-of-instruments-of-agents (e.g. software) can be
+  > handled without making the agents also be entities.
+  > 
+  > Rather than try pick apart the existing text, I'll offer my suggestion:
+  > 
+  > [[
+  > 
+  > Provenance provides a basis for evaluating reliability or
+  > trustworthiness of an entity.  For many purposes, a key consideration
+  > for deciding whether something is reliable and/or trustworthy is
+  > knowing who or what was involved in its production.  Data published by
+  > a respected independent organization may be considered more
+  > trustworthy that that from a lobby organization; a claim by a
+  > well-known scientist with an established track record may be more
+  > believed than a claim by a new student; a calculation performed by an
+  > established software library may be more reliable than by a one-off
+  > program.
+  > 
+  > In provenance terms, an /agent/ is a person or entity that can
+  > initiate, control or otherwise bear responsibility for an activity.
+  > 
+  > <example 6 here>
+  > 
+  > An /association/ of an activity with an agent indicates that the agent
+  > had some role in the activity.
+  > 
+  > <example 8 here>
+  > 
+  > An /attribution/ of an entity to an agent means that the entity was
+  > generated by some (possibly unknown) activity that was associated with
+  > the agent.
+  > 
+  > <example 7 here>
+  > 
+  > The provenance model provides these mechanisms to express information
+  > for reliability or trustworthiness decisions, but does not specify how
+  > any such decisions should be made.  ]]
+  > 
+  > .........
+  > 
+  > I'm going to review the rest of section 2 more quickly.  I won't necessarily try to suggest alternatives
+  > 
+  > == Section 2.2 ==
+  > 
+  > Section 2.2 structure feels a bit contrived to me ... it deals with
+  > extension mechanisms (2.2.1) and some new concepts (2.2.2 and 2.2.3)
+  > 
+  > My inclination would be to present:
+  >   2.2 Additional structures
+  >   2.2.1 Bundle
+  >   2.2.2 Collections
+  >   2.3 Extension mechanisms
+  >   2.3.1 Subtyping
+  >   2.3.2 Multi-way relations
+  >   2.3.3 Optional identification and new relations
+  > 
+  > My further comments use the current section numbering...
+  > 
+  > == 2.2.1.1 ==
+  > 
+  > The styling here makes the examples look lime definitions... which if
+  > nothing else is a hostage to fortune (could get out of phase with the
+  > real definition).
+  > 
+  > (Suggest dropping the "defined as ..." and linking the example to the actual definition.)
+  > 
+  > 
+  > == 2.2.1.2 ==
+  > 
+  > Para 1: the description here is highly technical, and doesn't really
+  > give an indication why a user/developer would care.  I'd rather just
+  > say something along the lines of wanting to express more information
+  > than is conveniently captured by a simple relation.
+  > 
+  > The text seems very long-winded.  I think the entire useful content
+  > could be captured in a couple of sentences plus the example.  E.g.
+  > 
+  > [[
+  > 
+  > Association (@@link section 2.1.2) can express a relationship between
+  > a software agent and an entity, but in its basic form cannot indicate
+  > what software is being used by the agent.  An extended form of
+  > association (@@link section 5.2.3) also specifies a /plan/, which is
+  > an entity representing a set of actions or steps intended by one or
+  > more agents to achieve some goals, such as the software that is
+  > executed by a software agent.
+  > 
+  > ]]
+  > 
+  > (Why is the agent optional in section 5.2.3?)
+  > 
+  > 
+  > == 2.2.1.3 ==
+  > 
+  > The section title doesn't make any sense to me.  The "New relations"
+  > bit is particularly confusing.
+  > 
+  > From the description, it looks to me like reification of a relation so
+  > that arbitrary further information can be added to an instance.
+  > 
+  > I think a title like "Identifying relation instances" might be a
+  > little clearer
+  > 
+  > It's not clear to me whether *any* relation can be reified in this
+  > way, or are there some for which the optional id parameter is not
+  > allowed?
+  > 
+  > Does use of this structure map 1:1 to use of qualified relations in
+  > the ontology?  I think not.
+  > 
+  > 
+  > == 2.2.2 ==
+  > 
+  > What exactly is a "provenance description"?  I don't think that's been introduced yet.
+  > 
+  > Maybe a sentence at the top of section 2 would help; e.g.
+  > [[
+  > A /provenance description/ is a set of assertions based on the core and extended provenance structures described below.
+  > ]]
+  > 
+  > I see the term "provenance description" is used throughout the
+  > document, but I see no definition.  Many of the uses are fine - being
+  > descriptive, but in this case (and maybe others) it's being used as a
+  > specific term in a definition, so I think it needs to be clarified.
+  > 
+  > As it is, if the above description is correct, a bundle should be
+  > described as a named provenance description (not a set of
+  > descriptions).
+  > 
+  > ....
+  > 
+  > That's my review to the end of section 2, which is where I planned to
+  > focus my most intense efforts.  I'll continue to look through the rest
+  > of the document in the time remaining today, but I'm going to send
+  > this off to you now.
+  > 
+  > #g
+--
author	Luc Moreau <l.moreau@ecs.soton.ac.uk>
	Mon, 28 May 2012 17:13:53 +0100
changeset 3023	4981ab5bac69
parent 3022	c5da5fd3c8af
child 3024	454b2a25df6e