Copyright © 2011-2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
PROV-DM is a data model for provenance that describes the entities, people and activities involved in producing a piece of data or thing in the world. PROV-DM is domain-agnostic, but is equipped with extensibility points allowing further domain-specific and application-specific extensions to be defined. PROV-DM is accompanied by PROV-N, a technology-independent notation, which allows serializations of PROV-DM instances to be created for human consumption, which facilitates the mapping of PROV-DM to concrete syntax, and which is used as the basis for a formal semantics of PROV-DM. The purpose of this document is to define the PROV-N notation.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is released internally by the Provenance Working Group.This document was published by the Provenance Working Group as an Editor's Draft. If you wish to make comments regarding this document, please send them to public-prov-wg@w3.org (subscribe, archives). All feedback is welcome.
Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Provenance is defined as a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing in the world. Two companion specifications respectively define PROV-DM, a data model for provenance, allowing such descriptions to be expressed [PROV-DM] and a set of constraints that provenance descriptions are expectively to satisfy [PROV-DM-CONSTRAINTS].
In this context, PROV-N was introduced as a notation to write instances of the data model, as close to its abstract syntax as possible. PROV-N is primarily aimed at human consumption. PROV-N allows serializations of PROV-DM instances to be written in a technology independent manner. So far, PROV-N has been used in the following ways:
PROV-N was designed to be as close as possible to PROV-DM without the syntactic bias and modelling constraints that concrete technologies bring with them, e.g., XML's choice between attribute and element, RDF's reliance on triples, or JSON's usage of dictionaries.
The purpose of this document is solely to define the syntax of PROV-N. For each construct of PROV-DM, a corresponding PROV-N expression is introduced, by way of a production in the PROV-N grammar presented in this document.
This specification is one of several specifications, referred to as the PROV family of specifications, defining the various aspects that are necessary to achieve the vision of inter-operable exchange of provenance:
The PROV-DM namespace is http://www.w3.org/ns/prov-dm/ (TBC).
All the elements, relations, reserved names and attributes introduced in this specification belong to the PROV-DM namespace.
The key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document are to be interpreted as described in [RFC2119].
This specification includes a grammar for PROV-N expressed using the Extended Backus-Naur Form (EBNF) notation.
Each production rule (or production, for short) in the grammar defines one non-terminal symbol, in the form:
E ::= expression
Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:Instances of the PROV-DM data model are expressed in PROV-N by a text conformant with the toplevel production expression of the grammar. These expressions are grouped in two categories: elementExpression (see section Element) and relationExpression (see section Relation).
PROV-DM elements can be entities, activities, agents, or notes. This section defines a production for the textual representation of each of these element types.
An entity's text matches the entityExpression production.
entity(tr:WD-prov-dm-20111215) entity(tr:WD-prov-dm-20111215, [ prov:type="document" ]) entity(tr:WD-prov-dm-20111215, [ prov:type="document", ex:version=2 ])
An activity's text matches the activityExpression production.
activity(ex:edit1,,) activity(ex:edit1,,,[prov:type="edit"]) activity(ex:a0, 2011-11-16T16:00:00,,[prov:type="createFile"]) activity(ex:a0, 2011-11-16T16:00:00, 2011-11-16T16:00:01, [prov:type="createFile"])
An agent's text matches the agentExpression production.
agent(ag4) agent(ag4, [ prov:type="prov:Human" %% xsd:QName, ex:name="David" ])
A note's text matches the noteExpression production.
note(ann1,[ex:color="blue", ex:screenX=20, ex:screenY=30])
PROV-DM relations can be generation, usage, derivation, activity association, responsibility chain, activity start, activity end, alternate, specialization, or annotations. This section defines a production for the textual representation of each of these relation types.
A generation's text matches the generationExpression production.
wasGeneratedBy(tr:WD-prov-dm-20111215, ex:edit1) wasGeneratedBy(tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00) wasGeneratedBy(ex:g1, tr:WD-prov-dm-20111215, ex:edit1) wasGeneratedBy(e2, a1, [ex:fct="save"])
A usage's text matches the usageExpression production.
used(ex:pub2, ar3:0111) used(ex:pub2, ar3:0111, 2011-11-16T16:00:00) used(ex:u1, ex:pub2, ar3:0111) used(a1,e1,[ex:fct="load"])
An activity association's text matches the activityAssociationExpression productions of the grammar defined in this specification document.
wasAssociatedWith(ex:pub2, w3:Consortium) wasAssociatedWith(ex:pub2, w3:Consortium @ pr:rec-advance) wasAssociatedWith(ex:pub2, w3:Consortium @ pr:rec-advance, [prov:role="funder"])
Activity start and end texts match the startExpression and endExpression productions of the grammar defined in this specification document.
actedOnBehalfOf(ag1,ag2) actedOnBehalfOf(ag1,ag2,a) actedOnBehalfOf(ag1,ag2,[prov:type="delegation"]) actedOnBehalfOf(ag2,ag3,a,[prov:type="contract"])
A derivation record's text matches the derivationExpression production.
wasDerivedFrom(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018) wasDerivedFrom(e2, e1, a, g2, u1)
An alternate relation's text matches the alternateExpression production.
alternateOf(tr:WD-prov-dm-20111215,ex:alternate-20111215)
A specialization relation's text matches the specializationExpressionproduction.
specializationOf(tr:WD-prov-dm-20111215,tr:prov-dm)
A note's text matches the noteExpression production.
hasAnnotation(tr:WD-prov-dm-20111215,ex2:n1)
In PROV-N, the prefix prov is reserved and denotes the PROV namespace.
An attribute's text matches the attribute production.
The reserved attributes in the PROV namespace are the following.
A Literal's text matches the Literal production.
The non terminals stringLiteral and intLiteral are syntactic sugar for quoted strings with datatype xsd:string and xsd:int, respectively.
In particular, a PROV-DM Literal may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no specific interpretation in the context of PROV-DM.
The reserved type values in the PROV namespace are the following.
Time instants are defined according to xsd:dateTime [XMLSCHEMA-2].
An expression container is a house-keeping construct of PROV-N capable of packaging up PROV-N expressions and namespace declarations. An expression container forms a self-contained package of provenance descriptions for the purpose of exchanging them. An expression container may be used to package up PROV-N expressions in response to a request for the provenance of something ([PROV-AQ]).
Given its status of house keeping construct for the purpose of exchanging provenance expressions, an expression container is not defined as a PROV-N expression (production expression).
An expression container, written container decls exprs endContainer in PROV-N, contains:
An expression container's text matches the expressionContainer production.
The following container contains expressions related to the provenance of entity e2.
container prefix ex: http://example.org/, entity(e2, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London last month."]) activity(a1, 2011-11-16T16:05:00,,[prov:type="edit"]) wasGeneratedBy(e2, a1, [ex:fct="save"]) wasAssociatedWith(a1, ag2, [prov:role="author"]) agent(ag2, [ prov:type="prov:Person" %% xsd:QName, ex:name="Bob" ]) endContainer
This container could for instance be returned as the result of a query to a provenance store for the provenance of entity e2 [PROV-AQ].
PROV-DM has introduced a notion of account by which a set of provenance descriptions can be bundled up and named. PROV-DM assumes the existence of mechanisms to implement accounts, but such mechanisms remain outside its scope. It is suggested that specific serializations may offer solutions to name bundles of descriptions.
Given that the primary motivation for PROV-N is to provide a notation aimed at human consumption, it is therefore appropriate to introduce a notation for accounts, which would include an account name and a bundle of expressions.
An account, written account(id, exprs) in PROV-N, contains:
In PROV-N, an account's text matches the accountExpression production of the grammar.
It is also useful to package up one or more account expressions in an expression container, for interchange purpose. Hence, expressionContainer is revised as follows.
The following container
container prefix ex: http://example.org/, account(ex:acc1,...) account(ex:acc2,...) endContainer
illustrates how two accounts with identifiers ex:acc1 and ex:acc2 can be returned in a PROV-N serialization of the provenance of something.
The following container
container prefix ex: http://example.org/, ... account(ex:acc1, entity(tr:WD-prov-dm-20111018, [ prov:type="pr:RecsWD" %% xsd:QName ]) entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ]) ... wasAssociatedWith(ex:pub2, w3:Consortium @ pr:rec-advance)) account(ex:acc2, entity(ex:acc1, [prov:type="prov:AccountEntity" %% xsd:QName ]) wasAttributedTo(ex1:acc1,w3:Consortium)) endContainer
illustrates a first account, with identifier ex:acc1, containing expressions describing the provenance of the technical report tr:WD-prov-dm-20111215, and a second account ex:acc2, describing the provenance of the first. In account ex:acc2, ex:acc1 is the identifier of an entity of type prov:AccountEntity.