--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/model/comments/issue-274-daniel.txt Wed Feb 29 06:37:26 2012 +0100
@@ -0,0 +1,89 @@
+ > Hi all,
+ > here are my comments after reading part 1:
+ >
+ > Objectives:
+ >
+ > - decide whether the new documents are inline with the simplification
+ > objective recommend whether they become the new editor's draft.
+ >
+ > ---> YES, it is much more simple and easy to read now. I would
+ > take it as the new editor's draft.
+ >
+ >
+ > - Decide whether ISSUE-145, ISSUE-183, ISSUE-215, ISSUE-225 and
+ > ISSUE-234 (all relating to identifiers) can be closed
+ >
+ > ---> 145: No accounts anymore, just bundles (or AccountEntity),
+ > so it could be closed.
+ > ---> 215: It has to do with the distinction between records,
+ > accounts and mitning ids. Since we don't have records and accounts, then
+ > the issue could be closed.
+ > ---> 225: All objects in the universe of discourse have been
+ > clarified. Can be closed.
+ > ---> 234: The term "record" has been dropped. Therefore, this can
+ > be closed.
+ >
+ > ***Comments from
+ > http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html
+ > ***
+ >
+ > - Button "Hide ASN" does actually do anything?
+ >
+ > 2.3
+ > - AccountEntity? I thought it was Bundle, but ok.
+ >
+ > -Three types of agents are recognized by PROV-DM because they are commonly
+ > encountered in applications making data and documents available on the Web:
+ > persons, software agents, and organizations.--> Wasn't software supposed to
+ > be system/computingSystem?
+ >
+ > 2.5: there are arrows missing: Activity wasStartedBy Activity. Entity:
+ > alternateOf, specializationOf
+ >
+ > 3.1: It would be helpful to see the properties labelled in the figure.
+ >
+ > 3.2: Here I would suggest to simplify the figure (leave just 2 authors (as
+ > in the example), or the editors), and label the edges as well.
+ >
+ > 3.3: Ah finally a reference to metadata provenance :) This is what Kai and
+ > some of the DC community were asking for.
+ >
+ > 4.1.2: "In contrast, an activity is something that happens, unfolds or
+ > develops through time, but is typically not identifiable by the
+ > characteristics it exhibits at any point during its duration". What about
+ > the activity's ID. Why isn't that enough to characterize the activity
+ > enough to become an entity or an agent?
+ >
+ > 4.2: wasStartedBy between activities is missing in the table. In fact I
+ > haven't seen wasStartedBy between activities in the doc. It certainly was
+ > an overloaded property in the WD4. Has it been removed?
+ >
+ > 4.2.1.2:There is a note that refers to Usage record's id. It should be just
+ > usage.
+ >
+ > 4.2.3.2: I got the feeling from discussions on the mailing list that we
+ > were going to reduce one of the derivation types (Imprecise-1 derivation).
+ > Am I wrong?
+ >
+ > 4.3.3.5: I don't understand how a path in a computer or a row and a column
+ > are a geographic place.
+ >
+ > 5.5: Example missing
+ >
+ > 5.7: Example missing.
+ >
+ > 5.8: If collections are just a kind of entity and they have their custom
+ > relationships (afterInsertion, afterRemoval), would it make sense to
+ > separate them from the core? (In a profile, best practice or example of
+ > extensibility)
+ >
+ > *********
+ > - One question that came into my mind when reading the model: How would I
+ > model a usage that lasted for 20 min? (Right now we only have the beggining
+ > of the usage). Example: My activity uses 2 files. The first one is parsed
+ > for 20 mins and the other one instantly, and I want to model this with DM.
+ > Unless I create 2 activities (which is not what happened) I don't see how.
+ >
+ > Thanks,
+ > Daniel
+ >
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/model/comments/issue-274-eric.txt Wed Feb 29 06:37:26 2012 +0100
@@ -0,0 +1,60 @@
+ > My apologies for being so late on providing reviewer feedback.
+ >
+ > Overall I enjoyed the PROV-DM document, I felt that the authors have
+ > done an incredible job helping readers easily relate concepts in the
+ > data model. Here are my comments and suggestions.
+ >
+ > Eric
+ >
+ > ~~~
+ >
+ > Introduction
+ >
+ > I agreed with the discussion thread on changes to the introduction
+ > that introduced the purpose of the data model to describe provenance
+ > in natural language.
+ >
+ > Section 2.3 – I have mixed feelings about bringing out these concepts,
+ > they don’t tie into the example and collections isn’t mentioned again
+ > until section 5.8. While they are important perhaps could this
+ > section be left out of section 2?
+ >
+ > Section 3 Example
+ >
+ > Prior to the auditor example could an ultra simple example debuting an
+ > agent, process and entity something like “w3:Consortium publishes a
+ > technical report”?
+ >
+ > I’m wondering if the detailed auditor provenance example could be
+ > introduced first in a human readable story format prior to the
+ > bulleted list that highlights the specific provenance related
+ > concepts.
+ >
+ > In the example use of the somewhat cryptic working draft names
+ > “tr:WD-prov-dm-20111215” “tr:WD-prov-dm-20111018” is a bit difficult
+ > to following because I found myself mentally parsing the document
+ > names to keep track of the different documents. While this might be
+ > less realistic something like model-rev1.html, model-rev2.html might
+ > illustrate the same ideas.
+ >
+ > I am wondering if it might be more intuitive if the provenance graphic
+ > illustration preceeded the PROV-ASN notation. It provides a graphic
+ > that a person can study as they study the PROV-DM assertions in
+ > PROV-ASN notation.
+ >
+ > The graphic illustration seems to capture all the examples of
+ > provenance from the bulleted list while the PROV-DM assertions in
+ > PROV-ASN seem to be either incomplete (there isn’t a one to one
+ > correspondence to follow from the example to the PROV-DM assertions.
+ >
+ > 3.2 Great job bringing in the concept of viewing other perspectives
+ > on the same example.
+ >
+ > 4.2 Activity names in the table need updating.
+ >
+ > 4.3.3.5 prov:location – Could we change the wording slightly to say
+ > that Location is loosely based on an ISO 19112 but can also refer to
+ > non-geographic places such as a directory or row/column? The specific
+ > definition from ISO19112 is location:
+ > identifiable geographic place EXAMPLE “Eiffel Tower”, “Madrid”, “California””
+ >
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/model/comments/issue-274-graham.txt Wed Feb 29 06:37:26 2012 +0100
@@ -0,0 +1,430 @@
+ > I now realize I spent all morning reviewing the WRONG DOCUMENT :(
+ >
+ > I've now taken a quick look at
+ > http://dvcs.w3.org/hg/prov/raw-file/a5f7ff3d6b30/model/working-copy/towards-wd4.html
+ > - I think this does start to address some of the provenance complexity issues,
+ > but I also think many of the comments I made do still apply:
+ >
+ > Section 2: I think much of the material here could be in the core
+ > specification. But it's much easier to follow than the previous material. The
+ > diagram is less clear to me that the older diagram, but I think that's just a
+ > placeholder. if the overview text is retained, I think it might be helpful to
+ > have the overview diagram first.
+ >
+ > Section 3: I still find the example not-very-helpful at this point. It uses
+ > ASM expressions before they hjave been defined. I'd suggest having it as an
+ > appendix. I find the process vs authors view approach is confusing.
+ >
+ > Section 4: many of my previous comments (to previous section 5) are addressed
+ > here, but I still think Note/annotations is superfluous, and derivation is
+ > over-complicated. I'm not seeing the syntax distinguished symbol production
+ > (that used to be provenanceContainer). I think several of my previous comments
+ > about identifiers attributes and qualified names still apply.
+ >
+ > Out of time - need to join telecon now.
+ >
+ > #g
+ > --
+ >
+ >
+ > On 23/02/2012 13:16, Graham Klyne wrote:
+ > > Reviewing:
+ > > http://dvcs.w3.org/hg/prov/raw-file/7aadc6332722/model/ProvenanceModel.html
+ > >
+ > > Summary: I'm sorry to say that I don't think the document even starts to bring
+ > > in the kind of simplification discussed at the F2F meeting, which is required if
+ > > this spec is to gain traction with web developers.
+ > >
+ > > I find the document is still difficult to read, and in a full morning of
+ > > reviewing it I've only got as far as section 5. I think further *radical*
+ > > simplification is required for the data model description, and I think it's
+ > > possible without losing any essential information about the model.
+ > >
+ > > ...
+ > >
+ > > (Nit: when I load this document from a local copy of the repository, I get an
+ > > error reported indicating a problem with fetching the CSS. It loads OK from the
+ > > above URI. Is there a problematic relative URI reference in the source document?)
+ > >
+ > > ...
+ > >
+ > > I thought we'd agreed at F2F to provide a simple "scruffy" introduction to the
+ > > DM (part 1), then introduce the requirement and refinements for more formally
+ > > tractable provenance expressions that can be used to build accurate historical
+ > > records over multiple related artifacts (part 2). The document I'm reading does
+ > > very little that I can see to make the prov-dm more approachable, as was
+ > > indicated that we need to do at the F2F. As far as I can tell, the only thing
+ > > that has been in this direction is to *add* a new section on interpretation.
+ > > This, of itself, does nothing to simplify the DM description.
+ > >
+ > > I think we should be placing far more emphasis on making it a simple as we
+ > > possibly can for information providers to publish provenance. Consider that the
+ > > primary beneficiaries of provenance information are the *consumers* of published
+ > > information, not the *publishers*, so if we make life unnecessarily hard for
+ > > publishers we're shooting ourselves in the collective foot. From this, I think
+ > > the initial introduction to the DM needs to be radically simplified to the
+ > > extent that a developer can spend 10-15 minutes glancing at it and think "oh
+ > > yes, I can easily add this to my output data". If necessary, we push some of the
+ > > work of understanding what needs to be done to harmonize the data to make it
+ > > more suitable for building a historical record towards the consumer.
+ > >
+ > > ...
+ > >
+ > > With this in mind:
+ > >
+ > > Section 2:
+ > >
+ > > The introductory material in section 2.1 is unhelpful, and I propose it be
+ > > removed from the introduction. Most of this material is not important until we
+ > > come to consider the more formal aspects of the DM. With the exception of
+ > > 2.1.2.1 about events, which I think should be introduced in the PROV-DM core
+ > > model section. Similarly sections 2.2 and 2.3 (maybe moving the two introductory
+ > > sentences of 2.2 into section 2.4). Thus section 2 would become just a very
+ > > brief intro to the notation used for describing ASN, and maybe this could be
+ > > moved into the PROV-DM core section (sect 5).
+ > >
+ > > Section 3 looks generally useful. But it still mentions an "account record",
+ > > which I understood was being dropped. It also mentions "alternateOf" and
+ > > "specializationOf" which are not necessary for a "scruffy" introduction to
+ > > provenance, so I suggest mention of these is dropped from here. I suggest
+ > > dropping the sentence about core and common relations - it's just noise. With
+ > > the removal of accounts, I think the whole purpose of notes/annotation records
+ > > *as part of the provenance model* has become moot, and suggest that these be
+ > > dropped from the spec. There's nothing to prevent annotations being added to the
+ > > provenance data as rdfs:comment or rdfs:label values. I suggest dropping the
+ > > mention of extensibility points: again, it's just noise at this point.
+ > >
+ > > Section 4: to my mind, this example section adds no useful information and
+ > > doesn't help understanding of the (on account of being harder to follow than the
+ > > ASN model description), and suggest that it be dropped. Alternatively, I suggest
+ > > moving it to an appendix.
+ > >
+ > > Section 5: this is the vital core of this document. Section 3 provides a very
+ > > useful high-level overview, so this section can just get down to describing the
+ > > constructs.
+ > >
+ > > I note that ASN is mis-named: it's not really an *abstract* syntax notation;
+ > > it's quite concrete, so it's more like a (technology-neutral) functional syntax
+ > > notion. @@raise separate issue for this?
+ > >
+ > > Section 5.1: prov-dm is a data model, not an implementation, right? So why do we
+ > > need to introduce "housekeeping constructs ... to facilitate their interchange"?
+ > > Suggest dropping most of the discussion of "record container", and simply
+ > > introduce the "recordContainer" and "namespaceDeclaration" productions along
+ > > with production for "record".
+ > >
+ > >
+ > > Section 5.2.1: Entity record
+ > >
+ > > Suggest drop "In PROV-DM, " - it's redundant.
+ > >
+ > > Suggest the examples focus more on web documents, with "car" as more of an
+ > > afterthought. Primary use will probably be to describe web documents, sop lets
+ > > keep this at front-of-mind?
+ > >
+ > > Suggest dropping all mentions of "asserters viewpoint" and "situation in the
+ > > world" - these don't matter for the "scruffy" view of provenance.
+ > >
+ > > Suggest dropping the idea that the attributes somehow define the entity ("whose
+ > > situation in the world is represented by the attribute-value pairs"). They're
+ > > just there to provide information about the entity, and as hooks for
+ > > interoperability. (I argued previously for dropping attributes completely, but
+ > > was persuaded otherwise by the interoperability argument from the provenance
+ > > challenges - don't try to make more of them.)
+ > >
+ > > Suggest drop issue mentioning "characterization interval" - I think it's now a
+ > > non-issue.
+ > >
+ > > I think the issue of uniqueness of identifiers should be dealt with in the
+ > > introduction to ASN, not under the individual elements.
+ > >
+ > > Under "further considerations", suggest dropping all but 3rd and 6th bullets. In
+ > > the 6th bullet, I don't understand the stuff about "a namespace also declares
+ > > the number of occurrences...". I have deep concern about what this might be
+ > > trying to say. In any case, shouldn't this be covered under a description of the
+ > > namespace, if needed?
+ > >
+ > > I think the material about "activities" and "plans" really doesn't belong in
+ > > this section.
+ > >
+ > >
+ > > Section 5.2.2 Activity record
+ > >
+ > > Suggest drop "In PROV-DM, " - it's redundant.
+ > >
+ > > Didn't we discuss replacing the start, end times by events? I don't recall the
+ > > outcome - I'm just mentioning this in case it's been missed.
+ > >
+ > > For the example, I suggest leading on something to do with information on the web.
+ > >
+ > > It was a surprise to me to learn that PROV-DM has reserved attributes. If
+ > > attributes are in the model to support interoperability with other provenance
+ > > frameworks (which is my understanding from previous discussions), this feels
+ > > like a poor design choice. Maybe it should be a separate parameter? In any case,
+ > > I think the intent of this "subtyping" needs to be explained.
+ > >
+ > > If this is to be a "scruffy" introduction, I think the reference to
+ > > start-view-end is not needed here. In any case, the cross-reference is almost
+ > > impossible to locate in a printed copy of the spec.
+ > >
+ > > I think the issue of uniqueness of identifiers should be dealt with in the
+ > > introduction to ASN, not under the individual elements.
+ > >
+ > > Suggest dropping the "further considerations bullets."
+ > >
+ > > Did we not agree that activities *would* be allowable as entities (especially if
+ > > entities are just stuff that can identified).?
+ > >
+ > >
+ > > Section 5.2.3, Agent record
+ > >
+ > > Having introduced a framework for subtyping for activities, why not use the same
+ > > approach for different types of agents ... especially considering that two major
+ > > agent types are defined by reference to existing foaf definitions? I suggest not
+ > > asserting the claim that the agent types are mutually exclusive.
+ > >
+ > > Suggest drop reference to "situation in the world".
+ > >
+ > > Suggest drop discussion of inferences of agent records - if needed, they should
+ > > come later along with a more formal ("non-scruffy") treatment of the data model.
+ > >
+ > >
+ > > Section 5.2.4, Note record
+ > >
+ > > I think this should be dropped from the data model. I don't see that it serves
+ > > any needed *provenance* function. "extra information" can be added by
+ > > format-specific extensions. As such, this record type only adds noise to the
+ > > specification.
+ > >
+ > >
+ > > Section 5.3.1.1 generation record
+ > >
+ > > I believe the ASN syntax given verges on being ambiguous, and is unnecessarily
+ > > tricky to parse by a human or machine consumer; e.g. consider:
+ > >
+ > > wasGeneratedBy(a,b)
+ > > wasGeneratedBy(a,b,)
+ > >
+ > > The presence of the trailing comma in the second example completely changes the
+ > > parse tree productions associated with a and b. I think it would be much easier
+ > > if ASN simply required a dummy activity identifier to be provided; i.e. don't
+ > > make aidentifier optional. Indeed, rather than allowing optional identifiers
+ > > anywhere in the ASN, one might use a placeholder (e.g. '_') for any unspecified
+ > > identifier, which would make the overall syntax much more regular.
+ > >
+ > > Since the id is used only for annotations, I suggest dropping it (see section
+ > > 5.2.4 comment above).
+ > >
+ > > If this is to be a "scruffy" introduction, I think the reference to
+ > > generation-within-activity is not needed here. In any case, the cross-reference
+ > > is almost impossible to locate in a printed copy of the spec. Suggest drop this.
+ > >
+ > > Similarly, suggest dropping the structural constraint here.
+ > >
+ > >
+ > > Section 5.3.1.2 Usage record
+ > >
+ > > Suggest drop "In PROV-DM, " - it's redundant.
+ > >
+ > > Why is there an identifier for a usage record?
+ > >
+ > > Suggest lead with example of consuming a web resource.
+ > >
+ > > Suggest drop reference to annotation record (see above note about 5.2.4)
+ > >
+ > > Suggest drop reference to interpretation here
+ > >
+ > >
+ > > Section 5.3.2.1 Association record
+ > >
+ > > Para 3: Suggest drop first sentence, and simplify; i.e. just say; "Activities
+ > > may reflect the execution of a plan..."
+ > >
+ > > Para 4, there quite a bit of redundancy redundancy here. Suggest:
+ > > [[
+ > > A plan is the description of a set of actions or steps intended by one or more
+ > > agents to achieve some goal. PROV-DM is not prescriptive about the nature of
+ > > plans, their representation, the actions and steps they consist of, and their
+ > > intended goals. A plan can be a workflow for a scientific experiment, a recipe
+ > > for a cooking activity, or a list of instructions for a micro-processor
+ > > execution. Plans are entities, which may have associated provenance. An activity
+ > > may be associated with multiple plans, allowing for descriptions of activities
+ > > initially associated with a plan, which was changed, on the fly, as the activity
+ > > progresses. Plans can be successfully executed or they can fail. We expect
+ > > applications to exploit PROV-DM extensibility mechanisms to capture the rich
+ > > nature of plans and associations between activities and plans.
+ > > ]]
+ > >
+ > > Para 5: I see no value in cross-referencing the responsibility record here.
+ > > Suggest dropping this paragraph.
+ > >
+ > > Why is there an identifier for an association record?
+ > >
+ > >
+ > > Section 5.3.2.2 Start and End records
+ > >
+ > > This seems to overlap with start, end parameters on an activity. It's not
+ > > immediately clear how they play together.
+ > >
+ > > Should this record not describe an "event"? Then the id should identify the
+ > > start/end event, not the record. cf. Issue 207.
+ > >
+ > > Identifiers should denote activities and agents, *not records*.
+ > >
+ > >
+ > > Section 5.3.3.1 Responsibility record
+ > >
+ > > Suggest drop "To promote take-up... " and instead lead with a simple
+ > > introduction of what the record describes.
+ > >
+ > > Para 3: It seems to me that the responsibility record should stand independently
+ > > of any association record. Suggest drop "Given an activity association record...
+ > > (...)"
+ > >
+ > > Why is there an identifier for an responsibility record?
+ > >
+ > >
+ > > Section 5.3.3.2 Derivation record
+ > >
+ > > Suggest drop "In PROV-DM, "
+ > >
+ > > This whole section seems way to complicated. My understanding is that the
+ > > "Common relations" section is intended to cover those useful short-cut
+ > > expressions that can be expressed with less convenience in the core model. As
+ > > such, I think the derivation record should be a "common" rather than a "core"
+ > > relation.
+ > >
+ > > Aside from that, I really don't see the utility of all this stuff about precise
+ > > and imprecise derivations. I think there is just one useful relation to define,
+ > > roughly corresponding to "imprecise n-derivation record" here:
+ > >
+ > > - I note that the "imprecise 1-derivation record" and "imprecise n-derivation
+ > > record" are not syntactically distingushable, so there's no point in discussing
+ > > the difference.
+ > >
+ > > - the "precise 1-derivation record" can be expressed using an activity, usage
+ > > and generation record: I'm not convinced this alternative syntax is really
+ > > buying anything worthwhile.
+ > >
+ > > Suggest radical simplification along these lines, and move to section 6. Don't
+ > > introduce all the formal stuff until a later section handling more formal
+ > > treatments.
+ > >
+ > >
+ > > Section 5.3.3.3 Alternate and Specialization records
+ > >
+ > > In considering a "scruffy" view of provenance, these relations aren't really
+ > > needed. However, they do underpin a more formal treatment in the face of dynamic
+ > > resources.
+ > >
+ > > I would give serious consideration to introducing these later, when the more
+ > > formal treatment of dynamic resources is considered.
+ > >
+ > >
+ > > Section 5.3.4. Annotation record
+ > >
+ > > I think this serves no needed purpose, and should be dropped. (See earlier
+ > > comments for section 5.2.4.)
+ > >
+ > >
+ > > Section 5.4.1 Account record
+ > >
+ > > I understood we'd agreed to drop this.
+ > >
+ > >
+ > > Section 5.4.2 Record container
+ > >
+ > > I think this is mainly an artifact of the ASN syntax, and should be introduced
+ > > more briefly in the introductory section 5.1 (see previous comments)
+ > >
+ > >
+ > > Section 5.5.1 Attribute
+ > >
+ > > I think the "optional-attribute-value" productions covered in section 5.2.1
+ > > (Entity) should be covered here since they apply to multiple record types.
+ > >
+ > > I would prefer to see attribute names presented as being IRIs in the data model,
+ > > with the namespace-qualified CURIE syntax available as a convenience in the ASN
+ > > presentation.
+ > >
+ > > I think the predefined attribute names should be dealt with in a separate
+ > > section. I'm actually not convinced this is the best design choice for
+ > > properties with DM-defined meaning, as opposed to (say) using separate record
+ > > parameters, but that's more of a style issue than a fundamental objection.
+ > >
+ > > As indicated earlier, I think the whole discussion of derivation steps is too
+ > > much detail, and I don't see the value, and would suggest dropping the
+ > > prov:steps attribute.
+ > >
+ > > For attribute prov:label: why not just use rdfs:label?
+ > >
+ > >
+ > > Section 5.5.2 Identifiers
+ > >
+ > > The text says they are *qualified* names, but in most of the example they are
+ > > not. Also, some identifiers are described as having local scope: this is not
+ > > compatible with using *qualified* names which are essentially IRIs.
+ > >
+ > > The text describes identifiers as denoting *records* (e.g. entity record) - I
+ > > think this is wrong, and in any case is inconsistent with text elsewhere in the
+ > > document. They should demote "entity", "activity", "agent", etc.
+ > >
+ > >
+ > > Section 5.5.3 Literal
+ > >
+ > > "A PROV-DM Literal represents a value whose interpretation is outside the scope
+ > > of PROV-DM." What a Terrible Failure... the whole point of languages introducing
+ > > literals is precvisely that their interpretation *is* defined by the language.
+ > > If not, they might as well be names.
+ > >
+ > > I think the intent is that their interpretation is defined by reference to the
+ > > corresponding xsd datatype definition, or some other datatype definition, that
+ > > is effectively incorporated by reference.
+ > >
+ > > I'd suggest that an interpretation of literals is provided by:
+ > > - http://www.w3.org/TR/rdf-mt/#gddenot
+ > > - http://www.w3.org/TR/rdf-mt/#DTYPEINTERP
+ > >
+ > > Section 5.5.4 Time
+ > >
+ > > No syntax production provided or indicated.
+ > >
+ > > I think it's unnecessary and inappropriate to indicate where time is used. It's
+ > > just something to go wrong as the document evolves.
+ > >
+ > >
+ > > Section 5.5.5 Asserter
+ > >
+ > > Do we really still need this (now accounts are gone). Suggest dropping.
+ > >
+ > >
+ > > Section 5.5.6 Namespace
+ > >
+ > > I'd suggest covering this with the introduction of the record container syntax
+ > > production
+ > >
+ > >
+ > > Section 5.5.7 Location
+ > >
+ > > Do we have any explicit use of this? if not, I'd suggest dropping it.
+ > >
+ > > ...
+ > >
+ > > I'm out of time and stopping my review here. There's a general pattern here that
+ > > I'd also apply to section 6.
+ > >
+ > > I'd then take section 7 and (probably) exp[and it into several sections ("Part
+ > > 2") introducing and describing a more formal treatment of provenance that can be
+ > > used to bridge from and refine the "scruffy" view to something that can be
+ > > assembled and processed according to inferences that flow from the formal
+ > > semantics. A key point to introduce here would be that it is possible to create
+ > > provenance statements that cannot possibly satisfy the formal semantics, and to
+ > > indicate what additional constraints and disciplines should be applied to ensure
+ > > that they can (and hence to make the inferences that flow from those semantics
+ > > valid).
+ > >
+ > > #g
+ > > --
+ > >
+ > >
+ >
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/model/comments/issue-274-jun.txt Wed Feb 29 06:37:26 2012 +0100
@@ -0,0 +1,83 @@
+ > These comments are respect to the DM working draft 4,
+ > http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html.
+ > accessed on February 17, 2011.
+ >
+ > First of all, as my first time of reading the DM working draft, with my
+ > very fresh pair of eyes, I would like to say well done to the group.
+ > There are a lot of very interesting ideas in the model document, clearly
+ > reflecting a lot of deep thinking about the problem domain. And I like
+ > very much the position of the DM as for an interchange language. So well
+ > done, guys!
+ >
+ > However, if the main goal of this new version of the working draft is to
+ > simplify what we had, particularly to enable "an upgrade path, from
+ > 'scruffy provenance' (term TBD), to 'precise provenance' (term TBD)", I
+ > am not sure this goal was achieved!
+ >
+ > Here are what I think and why:
+ >
+ > 1. In the introduction section, there is no such introduction about
+ > 'scruffy provenance' (term TBD), or 'precise provenance' (term TBD). I
+ > think this is a key that should be brought in the front, and which
+ > should be used to structure the rest of the document. And this is not
+ > the case atm, IMO.
+ >
+ > 2. The Overview section: I am not sure I see much difference between
+ > this section and the section giving definitions to the 'core'. I would
+ > rather expect to see an overview of the model, for example, for the
+ > scruffy and precise level, what terms and properties we have at each
+ > level etc. I am sure Luc knows that the overview diagram needs update
+ > and I couldn't read the figure properly even printed the doc with
+ > high-resolution laser printer:)
+ >
+ > 3. I used the terminology of "terms" and "properties", but actually I
+ > don't what this data model is. What do we mean by "data model"? Is it a
+ > conceptual model, logical model, entity relationship model, or something
+ > else? It's not clearly stated and I am confused what terminologies I
+ > should used when referring to the model:(
+ >
+ > 4. The Example section: Would it be a good idea to define an example up
+ > in the front and use it throughout the whole document? I don't find a
+ > description about an example in this section and I found it hard to
+ > follow the 'examples' given in Section 3. And in the rest of the
+ > document, examples from many different scenarios are used. I wonder
+ > whether that prevents us from simplifying the reading of the spec.
+ >
+ > 5. Section 4, the PROM-DM Core: There are a lot of repetition with the
+ > overview section. And I wonder what we mean by "core". The core almost
+ > includes "all" the DM terms (apart from the few in section 5). My
+ > understanding of "core" would be really the essential set of DM terms
+ > that are must-haves to express the minimal provenance. IMO, the current
+ > "core" is rather inclusive, and provides constructs that can be used to
+ > support some rather complex provenance expressions.
+ >
+ > If we can agree on the notion of "scruffy" (minimal??) and "precise"
+ > (extended??), maybe the core part can be used to correspond to the
+ > "scruffy" part, and make it lighter, more succinct, and easier and
+ > quicker to grasp and follow?
+ >
+ > 6. There are many cross-references that don't quite work in the current
+ > working draft, like saying some terms are mentioned in the previous or
+ > another section. I didn't include these problems here because I think
+ > these were caused by the re-structuring. I could list them out once the
+ > structure gets more stable.
+ >
+ > 7. There are also some technical points that I marked down in the
+ > review, which I didn't raise here either, because I am 'new' to the
+ > group and I don't want to re-open closed issues. What's the stage of the
+ > technical part of DM? Are there still open technical discussions?
+ >
+ >
+ > In my opinion I think the document still needs some more work on the
+ > structuring and organization front to make it simplified.
+ >
+ > I think we should make a better use of the notion of "scruffy"
+ > (minimal??) and "precise" (extended??), and use this to guide the
+ > restructuring of the document.
+ >
+ > Thoughts?
+ >
+ > HTH,
+ >
+ > -- Jun
+ >
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/model/comments/issue-274-khalid.txt Wed Feb 29 06:37:26 2012 +0100
@@ -0,0 +1,110 @@
+ >
+ > Hi,
+ >
+ > I read mainly Part-1, and briefly looked at Part-2.
+ > I think that the simplification is on the right direction. I think
+ > however the part-1 can be further simplified by moving some definitions
+ > and details to part-2. I will give more details on this later on in the
+ > email.
+ >
+ > Below are the comments.
+ >
+ > - I think the title of part-2 is misleading as it does not contains only
+ > constraints but also definitions that are not present in part-1, and
+ > revise other definitions to provide more details, e.g., Entity.
+ > Therefore, I wonder if it would be better to rename part-1 and part-2. I
+ > couldn’t find better titles though. I thought of “core prov-dm” for
+ > part-1, and “extended prov-dm” for part 2, but that is not really what
+ > the two parts are about.
+ >
+ > - ASN is used in part-1, but not introduced. A brief definition when it
+ > is used for the first time, for example, may be good.
+ >
+ > - The first paragraph in Section 2.1, it is said that “provenance of
+ > Entities, that is of things in the world”. I am not sure that is the
+ > case, provenance of entities is not the same as provenance of things.
+ >
+ > - In the same section 2.1, it is said that “The definition of agent
+ > intentionally stays away from using concepts such as enabling, causing,
+ > *initiating*, affecting…”. Isn’t wasStartedBy, which is defined in
+ > Section 4.2.2.2 is used to specify that an agent initiated the execution
+ > of an activity?
+ >
+ > - The examples of generation and usage that are given in Section 2.2 are
+ > complicated. Although they are to give a precise definition of what
+ > generation and usage are by considering the time, e.g., “Examples of
+ > generation are the *completed* creation of a file by a program”. I think
+ > that at the stage it would be less confusing for the reader to simply
+ > know that the creation of a file is an example of generation.
+ >
+ > - In Section 2.3, plan is used in the text without being introduced before.
+ >
+ > - I have the impression that the diagram presented in Section 2.5 would
+ > be more useful if placed at the beginning of Section 2. Also, this
+ > diagram was not clear, i.e., the quality of the image is bad, when I
+ > printed it out on paper.
+ >
+ > - The title of Section 3.2 “The Authors View” is confusing. A reader
+ > that is quickly browsing the document may think that this section gives
+ > the views of the prov-dm authors about the prov-dm document :-)
+ >
+ > - In Section 4, first paragraph: “We revisit each concept *introduction*
+ > in Section 2” -> introduced
+ >
+ > - In the definition of Entity in Section 4.1.1: “id: an identifier
+ > identifying an entity” -> “id: an entity identifier”.
+ >
+ > - In the definition of Entity in Section 4.1.1: “attributes: an Optional
+ > set of attribute-value pairs *representing this entity’s situation in
+ > the world*” -> characterizing the thing that the entity represents. Or
+ > something in these lines.
+ >
+ > - In the same section, the constraint that the set of Activities and
+ > Entities are disjoint is presented, later on in Section 4.1.2, this
+ > constraint is explained further. However, the explanation is based on
+ > details that are not present in part-1, but are presented later on in
+ > part-2, specifically that “an entity exists in full at any point in its
+ > lifetime, persists during this interval, and preserves the
+ > characteristics that makes it identifiable”. I would therefore suggests
+ > moving the discussion about the above constraint, i.e., that entities
+ > and activities are disjoint to the constraint document.
+ >
+ > - In Section 4.2.1.1 Generation, it is said that “While each of the
+ > components activity, time, and attributes is Optional, at least one of
+ > them must be present”. I wonder if there is a straightforward way to
+ > encode this constraints in the serializations of prov-dm, in particular
+ > prov-o.
+ >
+ > - In Section 4.2.3.1 Responsibility Chain, in the definition of
+ > actedOnBehalfOf, it is specified that activity can be optional. We need
+ > to add some details to specify what will be the semantics of
+ > actedOnBehalfOf when activity is not given as an argument, that is means
+ > that a given agent ag1 acts on behalf of another agent ag2 in all the
+ > activities that ag1 is involved in?
+ >
+ > - Section 4.2.3.2 presents derivation. If the objective is to simplify
+ > part-1, then this section needs serious simplifications :-) In
+ > particular, there are three version of derivation precise-1, imprecise-n
+ > and imprecise-n. I was thinking of presenting only one, e.g., imprecise,
+ > without saying that it is imprecise, and giving more details about the
+ > different kinds of derivations in the constraint document. Also, I think
+ > traceability which is presented later on 5, is a first class relation,
+ > and therefore should be introduced when speaking about entity-entity
+ > relations in Section 4.2.3.
+ >
+ > - Section 4.2.3.3 on Alternate and Specialization can be moved to
+ > part-2, since to grasp these relations one needs to have more details
+ > about what entity represents, which are given in part-2.
+ >
+ > - Section 4.2 Relation, I think the order in which the subsections of
+ > this section are presented should be re-thinked. In particular, I have
+ > the impression that the reader would be interested to know about
+ > entity-entity relations, which are probably the most important relations
+ > in provenance, before getting to know what are the agent-activity and
+ > agent-agent relations.
+ >
+ > - The table presented in Section 4.2 need some text that explains to the
+ > reader how it can be read.
+ >
+ > Hope these comments will be of help, khalid
+ >
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/model/comments/issue-274-tim.txt Wed Feb 29 06:37:26 2012 +0100
@@ -0,0 +1,201 @@
+ > I was asked to review DM WD3. This email constitutes my review.
+ > I have included supplemental notes that I hope the DM editors will review and consider in future versions.
+ > I have raised a few of the bigger issues in the tracker already.
+ >
+ > Regards,
+ > Tim
+ >
+ > Goals of the review (per http://www.w3.org/2011/prov/wiki/Meetings:Telecon2012.02.16#PROV-DM_Simplification):
+ >
+ > • decide whether the new documents are inline with the simplification objective
+ >
+ > +1
+ >
+ >
+ > • recommend whether they become the new editor's draft
+ >
+ > +1
+ >
+ > • if not, identify blocking issues
+ > • if yes, identify potential issues to be raised against these future new editor's draft
+ >
+ > • decide whether ISSUE-145, ISSUE-183, ISSUE-215, ISSUE-225 and ISSUE-234 (all relating to identifiers) can be closed
+ >
+ >
+ > ------
+ > http://www.w3.org/2011/prov/track/issues/145
+ > qualified identifiers may not work well with named graphs
+ >
+ > This issue can be CLOSED. The treatment of AccountEntities (which I hope will be renamed to prov:Provenance) and the section on provenance of provenance does not impose a scoping of identifiers.
+ > This will make it easy to implement using RDF mechanisms
+ >
+ >
+ > ------
+ > http://www.w3.org/2011/prov/track/issues/183
+ > identifiers in prov-dm
+ >
+ >
+ > The use of identifiers is no longer confusing. They identify Entities, Activities, etc.
+ > "Records" (a dying term) are not identified, they identifier they mention is identifying the Entity, Activity, Involvement, etc.
+ >
+ >
+ > ------
+ > http://www.w3.org/2011/prov/track/issues/215
+ > ProvenanceOfW3CReport
+ >
+ > The example is good because it shows two perspectives, which makes it easy to use for AccountEntity (prov:Provenance).
+ > The identifiers make it a bit dry and hard to follow, but the concrete aspect is MUCH more useful.
+ >
+ >
+ > ------
+ > http://www.w3.org/2011/prov/track/issues/225
+ > What are the objects in the universe of discourse?
+ >
+ > This can be CLOSED. It is not confusing in the current writeup.
+ >
+ > ------
+ > http://www.w3.org/2011/prov/track/issues/234
+ > id identifies entity, not the record
+ >
+ > Can be CLOSED.
+ >
+ >
+ >
+ >
+ > ------- supplemental notes --------
+ >
+ >
+ > About notes in http://www.w3.org/2011/prov/wiki/ProvDMWorkingDraft4#Design_decisions
+ >
+ > • If part 3 is now separate from part 1, there is no need to talk about 'Entity Record' (or whatever Record) in part 1. Instead, we can just mention Entity (or whatever other concept)
+
+ > +1 This is much more natural
+ >
+ > • Given that Part 3 is just about ASN, and therefore is a language, then we can without confusion, talk about 'Entity Expression' since now these would be Expressions of the language
+ >
+ > +1
+ >
+ > • Does this mean that we would be dropping the term record entirely? What would we bundle up though?
+ >
+ > I would say we bundle up "expressions". One could bundle ASN expressions, RDF expressions, XML expressions, etc.
+ >
+ > • What about assertions? So should still use the word?
+ >
+ > I would suggest the more general term "expression" in place of "assertion".
+ >
+ >
+ > ------- supplemental notes --------
+ >
+ > About http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html
+ >
+ >
+ > Sections entitled "Activity-Entity Relation" seem a bit unnatural. Perhaps something like "Relations between Activities and Entities" would be clearer.
+ >
+ > The phrase "when the data it is about changes" is unclear.
+ >
+ > "To address this challenge, an upgrade path is proposed to enrich simple provenance..." This paragraph is nice. I'd suggest including "specific subject" in "qualify the subject of provenance".
+ >
+ >
+ > Is it okay to use ASN before it is defined? "In section 3, PROV-DM is applied to a short scenario, encoded in PROV-ASN, and illustrated graphically."
+ >
+ > "Section 4 provides the definition of PROV-DM." is a bit ambiguous. Please elaborate.
+ >
+ >
+ > The following duplicates: "Activities that operate on digital entities may for example move, copy, or duplicate them. Activities that operate on digital entities may for example move, copy, or duplicate them."
+ >
+ > I propose to change the Agent definition from->to:
+ > "An agent is a type of entity that can be associated to an activity, to indicate that it bears some form of responsibility for the activity taking place."
+ > "An agent is a type of entity that bears some form of responsibility for an activity taking place."
+ >
+ >
+ > perhaps add the person invoking the grammar checker to the following example (to illustrate the levels of responsibility):
+ > "Software for checking the use of grammar in a document may be defined as an agent of a document preparation activity, and at the same time one can describe its provenance, including for instance the vendor and the version history."
+ >
+ > add "an" to "Generation is the completed production of a new entity by activity." -> "Generation is the completed production of a new entity by an activity."
+ >
+ > reads oddly: "the activity had not begun to consume or use to this entity"
+ >
+ >
+ > avoid parens in a definition: "(and could not have been affected by the entity)"
+ >
+ >
+ > avoid "internal" in collection definition "A collection is an entity that has internal structure." -> "A collection is an entity provides structure to some constituents." (or something)
+ >
+ >
+ > shocked by naming of "AccountEntity" why not "PlanEntity" and "CollectionEntity" (no, I don't want that...) I propose to rename "AccountEntity" to "Provenance"
+ >
+ >
+ > This sentence is long. Suggest stopping it at the first comma. "It is important to reflect that there is a degree in the responsibility of agents, and that is a major reason for distinguishing among all the agents that have some association with an activity and determine which ones are really the originators of the entity."
+ > ("and that is a major reason for distinguishing" -> "There is a major reason for distinguishing")
+ >
+ >
+ > Suggest removing "active" in "indicating that the agent had an active role in the activity". Does RPI have an active role in the writing of this email (since I'm an RPI student...)? I'd say they have a role, but not an active one.
+ >
+ >
+ >
+ > http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html#section-UML shows Activity wasStartedBy Agent, but Luc just said in email recently that only Activity wasStartedBy Activity is the way forward. I prefer Activity wasStartedBy Agent and think that some other involvement should be named for the special informed involvement Activity ?triggered? Activity.
+ >
+ >
+ >
+ > "ex:pub2" is a bad name - is it an activity or entity? I recommend "ex:act2"
+ >
+ >
+ > why aren't the edges labeled in the example?
+ >
+ >
+ > avoid term "minted" when talking about choosing a URI for a Resource. "minted" is colloquial.
+ >
+ >
+ > "3.3 Attribution of Provenance" -- YES! :-)
+ >
+ >
+ > The definition of Activity "An activity is anything that can operate on entities." seems to talk about the future
+ >
+ >
+ >
+ > activity(id, st, et, [ attr1=val1, ...]) does include brackets for optional constituents st and et
+ >
+ >
+ > "(This type is equivalent to a "foaf:person" [FOAF])" --> we should not bind ourselves to FOAF:
+ >
+ >
+ >
+ >
+ > Please add a note to section Note to encourage people to use Account / AccountEntity/ Provenance to annotate provenance assertions as a better practice. When using AccountEntity, the annotated thing can be described _directly_ as a single triple instead of using Notes. Notes are very much "scruffy provenance" and do not benefit from the directness afforded by AccountEntity / prov:Provenance.
+ >
+ > :prov_1 {
+ > :simon a prov:Human;
+ > prov:hasAnnotation [
+ > a prov:Note; ex3:reputation "excellent";
+ > rdfs:comment "This is a kludge way to get indirection. Use prov:Provenance instead.";
+ > ];
+ > }
+ >
+ > :prov_2 {
+ > :simon ex3:reputation "excellent" .
+ > }
+ >
+ > :prov_1 a prov:Provenance; prov:wasAttributedTo :first_asserter .
+ > :prov_2 a prov:Provenance; prov:wasAttributedTo :trust_evaluator_agent. .
+ >
+ >
+ > I'm starting to agree that wasGeneratedBy(id,e,a,t,attrs) should become Generation(id,e,a,t,attrs)
+ >
+ >
+ >
+ >
+ > This starts to distract, I think: "While each of the components activity, time, and attributes is optional, at least one of them must be present."
+ > Permitting degenerate cases should not be a priority. If not much (or nothing) is said with an assertion, let it be.
+ >
+ >
+ >
+ >
+ > remove "order" from "wasGeneratedBy(e1,a1, 2001-10-26T21:32:52, [ex:port="p1", ex:order=1])" because it is distracting and encourages not using PROV for things that PROV should do.
+ > I think Paolo agreed to this before.
+ >
+ >
+ > both agents are responsible in Responsibility. Suggest to rename "responsible" to "superior" in "responsible: an identifier for the agent, on behalf of which the subordinate agent acted;" in section 4.2.3.1
+ >
+ >
+ >
+ > two wasQuotedFroms in the UML diagram in section 5
--- a/presentations/dagstuhl/prov-dm/overview/index.html Tue Feb 28 17:31:52 2012 +0100
+++ b/presentations/dagstuhl/prov-dm/overview/index.html Wed Feb 29 06:37:26 2012 +0100
@@ -333,7 +333,7 @@
<div class="slide" id="components">
- <h2>PROV-DM Structure</h2>
+ <h2>PROV Data Model Structure</h2>
<h3>PROV-DM Components</h3>
<ol>