> On Fri, Jun 29, 2012 at 1:23 PM, Provenance Working Group Issue > Tracker wrote: > > > PROV-ISSUE-438 (prov-n-post-f2f3-review ): Final review before last call vote [prov-n] > > > http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-n-20120629/prov-n.html > > > > > > Question for reviewers: Can the document be published as Last Call working draft? > > Of my comments below, the following SHOULD be addressed before > publishing as Last Call draft: > * introduce "nonterminal" Section 2.2 introduces it, and was all reformatted. > * fix syntax errors > * memberOf -> hasMember hadMembers is now defined. > * clarify/remote % encoding in local part of QUALIFIED_NAME Section 3.7.1 rewritten. In short: - %-encoding used as per IRI spec - new \-encoding captured by PN_CHARS_ESC > * clarifications about namespace scoping/overwrite > > Given a minimum of the above are addressed, then yes, the document can > be published as Last Call WD. > > Loads of details follow.. don't despair! > > > =================================== > > > > > > In 1.1, the use of the term "nonterminal" is not introduced. This > computer science term used for parsers might be obvious for anyone who > is reading the document to produce a PROV-N parser, but not for > "Readers of the [PROV-DM] and of [PROV-CONSTRAINTS] documents". I see > the term is used several times later in the document, and so should be > introduced. > > A sentence like "Those readers may find the _expression_ nonterminal a > useful entry point into the grammar" should be reformulated to > something like "Those readers may find the definition _expression_ a > useful entry point into the grammar". > TODO > > > > > > 1.3 The PROV namespace (see Section 4.7.1) > The section number should be 3.7.4 > Fixed. > > > > 2.1 > > > All PROV data model relations involve two primary elements, the subject and the object, in this order > Not true. Examples: > > > > activity(a1) > > > entity(e1) > So those might not be 'relations' - but that is not clear from the > above. Rephrase to something like "Many PROV-N predicates denote a > relationship between two primary elements, the _subject_ and the > _object_, in this order". Table 2 in PROV-DM shows that entity/activity are prov-dm types, as opposed to prov-dm relations. So I think this is OK. > > I would include an entity() example in this section, to show that not > all have subject and object. I would do this before Example 1 and the > above sentence. > > > > Example 0: > > > In this example, _e1_ is asserted to be a PROV _entity_. > > > entity(e1) > Agreed, I have added an example for entity/activity. > > > > Example 2 > > > In the following expressions, the optional activity a along with the generation and usage identifiers g2 and u1: > > > wasDerivedFrom(e2, e1, a, g2, u1) > The sentence does not parse. I suggest "The following expression > expands the above derivation relation by providing additional > elements: the activity _a_, the generation _g2_ and the usage _u1_. > Done. > > > Extended Backus-Naur Form (EBNF) notation > This is again well known in CS - but a hyperlink would be useful. Now > I had to go to Wikipedia. However the mini-intro right below is > useful. "The below is a summary of " > > > I would be more precise, that "The PROV-N grammar is specified in this > document using the .." - as knowledge of EBNF is not a requirement for > using PROV-N - just for understanding the definitions in this > document. Text was edited by Stian, thanks! > > > > > > 2.2 EBNF Grammar > > > .. > > > Each expression non-terminal expression i.e., entityExpression, activityExpression etc., corresponds to > expression non-terminal expression?? > > > Reformulate to something like "Each of the symbols included in > _expression_ above, e.g. entityExpression, corresponds to one element > (e.g. entity) in the PROV data model." > Done > > Why hyperlinks on 'expression' here in this section? It goes 1 line up > or down, it's very confusing. Use expression or similar > instead. Yep > > > > > such that the text for an element matches the corresponding expression production of the grammar. > What does this mean? That the text of the document corresponds to what > you get by parsing the text? That does not tell me anything.. > > > Is the bundle construct now required for the PROV-N document? If so, > almost none of the examples in this document or in PROV-DM are valid > PROV-N documents. That might well be the case, as they want to reuse > namespaces, etc - but that should then be stated explicitly, kind of > like I don't think we stated these examples were all documents. Very often we just discuss expressions. > > > > > 2.3 Optional attributes > Should add something like "The interpretation of an optional attribute > that is not provided is given by Section 4.1 in PROV-CONSTRAINTS" > somewhere. No action here. > > > > > 2.4 Identifiers and attributes > > > > Most expressions defined in the grammar include the use of two > terms: an identifier and a set of attribute-value pairs, delimited > by square brackets > This reads like: > > > > identifier [] attribute-value pairs > Which is not what is meant! Something more like: > > "Almost all expressions defined in this grammar may also include an > identifier for the expression. Most expression can also include a set > of attribute-value pairs, grouped within square brackets." Done > > > "By convention, optional identifiers are separated using a semi-colon > ';'." By convention.. but I might do something else? Just say that > they ARE separated like that: "Optional identifiers are separated > using a semi-colon ';', but where the identifiers are required, the > regular comma ',' is used." > > Or do you say I need to also allow parsing with ',' for the > permutations that are non-ambigious? The defition for > optionalIdentifier says no. (good!) > Done > > > > Example 7 > > > .. > > > The third example shows that one can optionally indicate the missing identifier using the - marker. > Add "This is equivalent to the first expression". Done > > > > Example 8 > > > The first and second activities have no attributes. > Add ", and are equivalent." > Done > > > > 2.5 Comments > > > ... such cooments ... > "such comments" > > > > > 3.1.3 Generation > > > generationExpression ::= "wasGeneratedBy" "(" optionalIdentifier > eIdentifier ( "," aIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")" > This does not allow aIdentifierOrMarker without timeOrMarker - is this > intentional? Yes, all optionals or none is the adopted convention. > > > The following examples are not valid: > > > > wasGeneratedBy(e2, a1, tr:WD-prov-dm-20111215) > > > wasGeneratedBy(ex:g1; e, a, tr:WD-prov-dm-20111215) > because tr:WD-prov-dm-20111215 is not a valid timeOrMarker. > yes fixed > > I would also recommend that these show-cases use the same names, so > rather than say intermixing a1 and ex:edit1 - just always use a1. > > > > > > > > Even though the production generationExpression allows for expressions wasGeneratedBy(e2, -, -) and > wasGeneratedBy(-; e2, -, -), > these expressions are not valid in PROV-N, since at least one of id, activity, time, and attributes must be present. > This is suboptimal, could it not be worked into the rules in time? > Could we rather say this follows from PROV-CONSTRAINTS and PROV-DM? > "Since" does not make sense here - where is it stated that they MUST > be present? Here. So try rather: "... are not valid in PROV-N; at > least one of... MUST be present" 'Since' is justified, since it is a prov-dm requirement. > > Equivalent comment for this kind of paragraph for usage, startExpression, etc. We see this as "semantic/post-parsing" checking. It would unecessarily make the productions complicated. > > > > > > 3.1.4 Usage > usageExpression ::= "used" "(" optionalIdentifier aIdentifier > "," ( "," eIdentifierOrMarker "," timeOrMarker )? > optionalAttributeValuePairs ")" > > This wrongly requires a double comma before > eIdentifierOrMarker-timeormarker block: > > used(a1,,e,-) > > Should be: > > > > usageExpression ::= "used" "(" optionalIdentifier aIdentifier ( "," eIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")" Correct, production changed. > > This rule also requires timeOrMarker if eIdentifierOrMarker is present > - is this intentional? I won't ask this again for the following forms > - but assume that the style is "no optionals or all optionals" - in > which case this should be clarified in section 2.3 - the position > argument hints that the old style of allowing to chop of trailing -'s > is allowed. > > > > Even though the production usageExpression allows for expressions used(a2, -, -) and used(-; e2, -, -), > should be: > used(-; a1, -, -) fixed > > > > 3.1.6 Start > > > Note: Even though the production startExpression allows for expressions wasStartedBy(e2, -, -) and wasStartedBy(-; > e2, -, -), > Should be: > wasStartedBy(a1, -, -, -) > wasStartedBy(-; a1, -, -, -) fixed > > > > > 3.1.7 End > > > wasEndedBy(e; ex:act2) > > > wasEndedBy(e; ex:act2, ex:trigger, -, 2011-11-16T16:00:00) > I would not use 'e' as id here - that is confusing as 'e' is used for > entity earlier. Try 'end'. > > > > Note:Even though the production endExpression allows for expressions wasEndedBy(e2, -, -) and wasEndedBy(-; e2, -, -), > Should be: > wasEndedBy(a1, -, -, -) > wasEndedBy(-; a1, -, -, -), Fixed. > > > 3.2.1 Derivation > > The following examples are invalid: > > wasDerivedFrom(d, e2, e1, a, g2, u1, [ex:comment="a righteous derivation"]) > wasDerivedFrom(d, e2, e1, a, g2, u1) > wasDerivedFrom(-, e2, e1, a, g2, u1) > > As they should use ; to terminate optionalIdentifier rather than , fixed > > > > 3.3.2 -> 3.2.4 (Revision, Quotation, Primary Source) > > An introduction on the top of these should be provided, to note that > the regular syntax for wasDerivedFrom is used, with the additional > prov:type attribute given to specify the type of derivation. As it > stands it can read as if only the syntax used in the example is valid. > Make these subsections of 3.2.1 instead? > > > 3.2.4 > > > [ prove:type='prov:PrimarySource' ]) > Should be: > > [ prov:type='prov:PrimarySource' ]) > fixed > > The use of 'single quotes' for the prov:type as opposed to "double > quote" for the other attribute is confusing. By reading deeply into > section 3.7.3 I see this is a short hand for "ex:value" %% > prov:QUALIFIED_NAME. Could a link be provided for this in 3.2.x? > > prov:type is not explained anywhere in PROV-N. Almost all examples > using it with custom attributes use "double quotes", for instance: > > > actedOnBehalfOf(del1; ag2, ag1, a, [prov:type="contract"]) > > > agent(ag4, [ prov:type="prov:Person", ex:name="David" ]) > > I assumed (wrongly?) that prov:type in a way has range > prov:QUALIFIED_NAME union xsd:anyURI - but I'm not sure as "a Literal > may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no > specific interpretation in the context of PROV.". I added a forwarded pointer at the first occurrence of this convenienceNotation. > > So does prov:type="contract" simply mean "contract" out of thin air > rather than bound to any namespace? We would struggle to translate > this to PROV-O where prov:type maps to rdf:type. If this is the case, > then I expect all prov:type="prov.*" should be with single quotes > instead. Yes, it's the case that all prov:type='prov:*' are in single quote. It is fine to require a conversion function if they don't use URI like types. > > > http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120628/prov-dm.html#term-attribute-type > says "PROV-DM is agnostic about the representation of types, and only > states that the value associated with a prov:type attribute must be a > PROV-DM Value." - but does not distinguish between 'single', "double" > quote and "qualified" %% prov:QUALIFIED_NAME notation. > Convenience notation appears later. > > > > > 3.4.1 Bundle declaration > > > Example 26 > > > bundle ex:author-view > > > agent(ex:Paolo, [ prov:type='prov:Person' ]) > > > agent(ex:Simon, [ prov:type='prov:Person' ]) > > > ... > > > endBundle > Nitpicking, but I would use " // ... " as "..." is not valid PROV-N. done > > > 3.4.1 should also provide a forward link to 4. Toplevel Bundle, > because otherwise the example does not make sense (prefix ex: is not > declared) prefix ex: was added since the named bundle must be self contained > > > > > > > 3.5.3 Mention > > > mentionExpression ::= "mentionOf" "(" identifier "," identifier "," bIdentifier ")" > Should be: > > mentionExpression ::= "mentionOf" "(" eIdentifier "," > eIdentifier "," bIdentifier ")" > > > Fixed. > > 3.6.1 Membership > > > > memberOf(mId, c, {e1, e2, e3}, []) // Collection membership > should be: > > > > memberOf(mId; c, {e1, e2, e3}, []) // Collection membership > > > This reads the wrong way - it says c is a member of {e1, e2, e3} by > the previously mentioned subject-relation-object rule. > > Propose renaming 'memberOf' to 'hasMembers', 'hadMembers' or > 'members'. (I do not propose to make the entity or entity set as first > argument) > hadMembers > > Opposite of all other example, this example does not start with a > 'full' expression, as the 'complete' argument is missing. Also the > attributes are empty. I would change to: > > > memberOf(mId; c, {e1, e2, e3}, true, [dct:description="All of them"]) > // Collection membership > Done > > > > // default "complete" flag is false > This is a PROV-DM or PROV-CONSTRAINT matter and should not be > described here - the remaining relations don't explain meaning of > missing optionals. > > > > > memberOf(c3, ,[]) > > > memberOf(c3, ,true, []) > Invalid syntax, as entitySet ::= "{" (eIdentifier)* "}" > > Change to: > > memberOf(c3, {} ,[]) > memberOf(c3, {} ,true, []) > > First one of these should be invalid by the same reason as for usage, > and thus should not be listed. (It would need to include some > attributes in [] to be valid) I don't understand why? > > > 3.7.1 Identifier > > > > dIdentifier ::= identifier > > This is not used anywhere and can be removed. (Dictionary?) Removed > > > A qualified name's prefix is optional. If a prefix occurs in a qualified name, > it refers to a namespace declared in a namespace declaration. > I would change it to "it MUST refer to a namespace as declared in a.." > - otherwise this would be valid: > > bundle > prefix ex > entity(fred:e1) > endBundle > > without declaring the prefix "fred". > Yes, MUST added. > > > > > > 3.7.2 Attribute > > > The reserved attributes in the PROV namespace are the following. > add "Their meaning is explained by [PROV-DM]". DONE. > > > ::= ("-")? (DIGIT)+ > > This *might* be in conflict with QUALIFIED_NAME which allows local > names like "1337" (without quotes) - but I have not checked so > thoroughly. It is at least confusing. It means that you can do: > > entity(1337, [1337=1337]) > > where the first and second 1337 is the relative IRI reference <1337> > based on the default namespace, while the third 1337 is the number > "1337" %% xsd:int. > > However I don't think there is anywhere that allows both a literal and > an identifier in the same position, so we MIGHT be safe. Parser heads > converge here. It is correct that 1337 can be a QUALIFIED_NAME and an INT_LITERAL. The context allows us to disambiguate: INT_LITERAL are only allowed as convenienceNotation for literals. > > > > > 3.7.3 Literal > Note:The productions for prov:QUALIFIED_NAME and INT_LITERAL are conflicting. In the context of a literal, > a parser should give precedence to the production for INT_LITERAL. > Again, I don't see this conflict, as the former requires 'single' > quotes and the latter does not allow that. There is misunderstanding here. This note is exactly the point you made above. The conflict is between INT_LITERAL and QUALIFIED_NAME, and not QUALIFIED_NAME_LITERAL. > > > entity(e1, [ex:value='1337']) // equivalent to > entity(e1, [ex:value="1337" %% prov:QUALIFIED_NAME]) > > entity(e1, [ex:value=1337]) // equivalent to > entity(e1, [ex:value="1337" %% xsd:int]) > > > It would be good if a (separate) example in 3.7.3 showed all of these > equivalences. Done, added an example to make the convenience notation clearer. > > > > > > 3.7.3.1 Reserved Type Values > > > The reserved type values in the PROV namespace are the following > Add "Their meaning is defined by [PROV-DM]. > > DONE > > > > 3.7.4 Namespace > > > > > namespaceDeclaration ::= "prefix" QUALIFIED_NAME namespace > > > ::= ( PN_PREFIX ":" )? PN_LOCAL > > > | PN_PREFIX ":" > This would allow: > > bundle > prefix fred:soup > endBundle > > but all the examples define prefix with only the lefthand side of a > QUALIFIED_NAME - ie. PN_PREFIX. So it should be: > > > namespaceDeclaration ::= "prefix" PN_PREFIX namespace Yes, it is a mistake in the document. The production you suggest is the one I had in my reference implementation. > > To match all valid prefixes in QUALIFIED_NAME. > > Note that QUALIFIED_NAME allows the empty prefix, ie ":ex1" (which is > different from "ex1"). ((And thus also ":")) No, this is not correct. PN_PREFIX always has at least one character. PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)? > > However this would be difficult to declare ":" in the current prefix > rule, because unlike say in Sparql and Turtle, the prefix is not > declared with the trailing ":". One would have to say: > > bundle > prefix // Two spaces before < ! > endBundle > > > I suggest to change the prefix declaration to include the trailing : - ie: > > namespaceDeclaration ::= "prefix" QUALIFIED_NAME ":" namespace > bundle > prefix : > prefix ex: > endBundle > This is not necessary given the above. > > > > Instead, they can be %-escaped or incorporated in the IRI denoted by a prefix. > > > ::= "%" HEX HEX > It is not defined what is the meaning of this escaping. What do the > HEX represent? If you mean "as in section 3.1. Mapping of IRIs to > URIs of [RFC3987]" - then include this. Yes, Done. > > Should be "in corporated in *a* namespace URI denoted by a prefix" - > as presumably that specific namespace binding does not exist yet if > you needed special characters in the local name. > > I'm still not sure what this mean. > > Assume you have: > > bundle > prefix s11: > // ... > endBundle > > And in there, you want to refer to the entity for > . Then presumably I could do: > > entity(s11:?fred%3Dsoup) > > but only if we intend for %xx to be expanded before making the URI, > rather than, as suggested by PROV-N, simply be valid parts of the URI > by concatenation. > > If we say they are passed on directly - then I don't see a way to > represent the above, as %3d escape for = is valid argument in query > parameters - such as in ?q=1%2B1%3D2 (1+1=2) - and thus %3Dsoup not > understood as the original =soup. > > Note - I am not arguing for double-escaping here, as I think that > would become very confusing - I am just wondering why % was added here > in the first place, if still only a selection of possibly local names > (given a general prefix) is valid - I think it just adds potential > complexity. This is issue is now solved by the \-escape mechanism in PN_CHARS_ESC. > > Of course the reason why this happens is because we don't have a > syntax allowed together with QUALIFIED_NAME - so the Sparql > analogy breaks down. > > > > > entity(ex:1234) // corresponds to IRI http://example.org/2/1234 > should be > > entity(ex:1234) // corresponds to IRI http://example.org/1/1234 > Yes, fixed. > > I would add: > > entity(c/) // corresponds to IRI http://example.org/2/c/ > entity(ex:/) // corresponds to IRI http://example.org/1// // > Strict concatenation > Done. > > > What does the following mean? > > > Note:The productions for qualifiedName and prefix are conflicting. In > the context of a namespaceDeclaration, a parser should give precedence > to the production for prefix. > > With 'prefix' here - do you mean PN_PREFIX or the keyword 'prefix' within I mean PN_PREFIX. abc can be a PN_PREFIX or a QUALIFIED_NAME (PN_LOCAL only, no PN_PREFIX) > > > > namespaceDeclaration ::= "prefix" QUALIFIED_NAME namespace > ? > > I still don't see any conflict. namespaceDeclaration requires the word > 'prefix', and so the following should be valid: > > bundle > prefix > prefix > > endBundle > > ie. binding the prefix "prefix:" to the relative IRI reference . > > namespaceDeclarations is optional in both bundle and namedBundle - but > "prefix" is not a valid expression, and can thus I don't see any > conflict here. Text has been rewritten to make it clear the potential conflict is for the tokenizer. > > > > 3.7.4 does not define how to interpret re-declaration of the same prefix: > > bundle > prefix ex1 > prefix ex1 > entity(ex1:fred) > endBundle > > (Personally I think they should not be allowed - Turtle will overwrite > sequentially - but nothing else in PROV-N depends on the sequence) Added: A set of namespace declarations namespaceDeclarations MUST NOT re-declare the same prefix. > > > 3.7.4 does neither not define that that prefixes and defaults are > looked up in the named bundle before the top level bundle. > > bundle > default > entity(fred) > bundle fred // same fred > default > entity(fred) // Different fred! > endBundle > endBundle An example was added to clarify how scope works. http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-n.html#anexample-namespace-scope Bundles are designed to be self contained. So in effect, no inheritance of namespace declaration. > > > > > Section 4 > > > A toplevel bundle, written _bundle decls exprs bundles endBundle_ in PROV-N, contains: > This 'written' thing is very confusing, as it is not written like > that. I would move up the bundle production rule first - use the same > style as for all the other productions. The following "Contains" > bullet points also only make sense after seeing the production rule: > > > > bundle ::= "bundle" (namespaceDeclarations)? (expression)* (namedBundle)* "endBundle" > The bullet points are confusing, because they talk about decls and > exprs vs namespaceDeclarations and expressions. Only one variable > name, please! > > > namespaceDeclarations are optional - both here and in the named > bundle. This means that this would be valid: > > bundle > entity(e1) // What is the default namespace?? > bundle b1 > // or what is it here? > entity(e2) > endBundle > endBundle > Given that bundles need to be self contained, we would generally expect some namespace to be declared for the identifiers occurring in the bundle. However, a toplevel bundle may be empty. Hence, no declaration required. I guess because a named bundle always has a name ... it would always require a namespace declaration. Should we enforce this in the grammar? I didn't make a change, I see this as "semantic" processing, beyond parsing. > > I found it confusing that the top level bundle and named bundle have > the same keyword. However I expect - although it is not stated > explicitly here - that you can't have a free-standing named bundle - > and that a PROV-N Document should have all expressions and named > bundled within a single top level bundle. I considered naming toplevel bundle differently. But ultimately, it contains expressions and namespace declarations ... it's a bundle. I made sure it was a distinct production to avoid arbitrary nesting of named bundles. > > > > > activity(a1, 2011-11-16T16:05:00, -,[prov:type="edit"]) > Should be (pretty printed ;) ) > > activity(a1, 2011-11-16T16:05:00, -, [prov:type="edit"]) > > Fixed > > > The following container > > > (..) > > > This container > s/container/bundle/g > Fixed. > > > > > > 5. Media Type > We agreed some changes to this in the telcon 2012-07-05, > "text/provenance-notation" and ".provn". > > > > Published specification: > > > This specification. > In the actual submission I assume that this will rather refer to /tr/PROV-N/. > > > > > may have the strings 'bundle' near the beginning > s/strings/string/ > > > > > Section B > I did not review section A or B in detail. > > Some whitespace needed after 'Cheney' both for [PROV-XML] and [PROV-RDF] > > > -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester >