PROV-DM, the PROV data model, is a data model for provenance that describes the entities, people and activities involved in producing a piece of data or thing. PROV-DM is structured in six components, dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) agents bearing responsibility for entities that were generated and activities that happened; (3) derivations of entities from entities; (4) properties to link entities that refer to the same thing; (5) collections forming a logical structure for its members; (6) a simple annotation mechanism.
This document introduces the provenance concepts found in PROV and defines PROV-DM types and relations. PROV data model is domain-agnostic, but is equipped with extensibility points allowing domain-specific information to be included.
Two further documents complete the specification of PROV-DM. First, a companion document specifies the set of constraints that provenance descriptions should follow. Second, a separate document describes a provenance notation for expressing instances of provenance for human consumption; this notation is used in examples in this document.
This is the fourth public release of the PROV-DM document. Following feedback, the Working Group has decided to reorganize this document substantially, separating the data model from its contraints and the notation used to illustrate it. The PROV-DM release is synchronized with the release of the PROV-O, PROV-PRIMER, PROV-N, and PROV-CONSTRAINTS documents. We are now clarifying the entry path to the PROV family of specifications.
For users to decide whether they can place their trust in a resource, they may want to analyze the resource's provenance, but also determine who its provenance is attributed to, and when it was generated. In other words, users need to be able to determine the provenance of provenance. Hence, provenance is also regarded as an entity (of type Bundle), by which provenance of provenance can then be expressed.
This concept allows for the provenance of the collection itself to be expressed in addition to that of the members. Many different types of collections exist, such as a set, dictionaries, or lists, all of which involve a membership relationship between the constituents and the collection.
An example of collection is an archive of documents. Each document has its own provenance, but the archive itself also has some provenance: who maintained it, which documents it contained at which point in time, how it was assembled, etc.
The two previous sections offer two different perspectives on the provenance of a document. PROV allows for multiple sources to provide the provenance of a subject. For users to decide whether they can place their trust in the document, they may want to analyze its provenance, but also determine who the provenance is attributed to, and when it was generated, etc. In other words, we need to be able to express the provenance of provenance.
PROV-DM offers a construct to name a bundle of provenance descriptions.
bundle ex:author-view agent(ex:Paolo, [ prov:type='prov:Person' ]) agent(ex:Simon, [ prov:type='prov:Person' ]) ... endBundleLikewise, the process view can be expressed as a separate named bundle.
bundle ex:process-view agent(w3:Consortium, [ prov:type='prov:Organization' ]) ... endBundle
To express their respective provenance, these bundles must be seen as entities, and all PROV constructs are now available to express their provenance. In the example below, ex:author-view is attributed to the agent ex:Simon, whereas ex:process-view to w3:Consortium.
entity(ex:author-view, [prov:type='prov:Bundle' ]) wasAttributedTo(ex:author-view, ex:Simon) entity(ex:process-view, [prov:type='prov:Bundle' ]) wasAttributedTo(ex:process-view, w3:Consortium)
TODO: full details of bundles can be found at ex:process-view and ex:author-view.
A bundle's identifier id identifies a unique set of descriptions.
As a named bundle is a set of descriptions, it is also an entity so that its provenance can be described.
PROV defines the following type for bundles:A bundle description is of the form entity(id,[prov:type='prov:Bundle', attr1=val1, ...]) where id is an identifier denoting a bundle, a type prov:Bundle and an OPTIONAL set of attribute-value pairs ((attr1, val1), ...) representing additional information about this bundle.
The provenance of provenance can then be described using PROV constructs, as illustrated by the following example.
Let us consider an example consisting of two entities ex:report1 and ex:report2.
entity(ex:report1, [ prov:type="report", ex:version=1 ]) wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01) entity(ex:report2, [ prov:type="report", ex:version=2]) wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01) wasDerivedFrom(ex:report2, ex:report1)
Let us assume that Bob observed the creation of ex:report1. A first bundle can be expressed.
bundle bob:bundle1 entity(ex:report1, [ prov:type="report", ex:version=1 ]) wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01) endBundle
In contrast, Alice observed the creation of ex:report2 and its derivation from ex:report1. A separate bundle can also be expressed.
bundle alice:bundle2 entity(ex:report1) entity(ex:report2, [ prov:type="report", ex:version=2 ]) wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01) wasDerivedFrom(ex:report2, ex:report1) endBundle
The first bundle contains the descriptions corresponding to Bob observing the creation of ex:report1. Its provenance can be described as follows.
entity(bob:bundle1, [prov:type='prov:Bundle']) wasGeneratedBy(bob:bundle1, -, 2012-05-24T10:30:00) wasAttributedTo(bob:bundle1, ex:Bob)
In contrast, the second bundle is attributed to Alice who observed the derivation of ex:report2 from ex:report1.
entity(alice:bundle2, [ prov:type='prov:Bundle' ]) wasGeneratedBy(alice:bundle2, -, 2012-05-25T11:15:00) wasAttributedTo(alice:bundle2, ex:Alice)
A provenance aggregator could merge two bundles, resulting in a novel bundle, whose provenance is described as follows.
bundle uuid:03 entity(ex:report1, [ prov:type="report", ex:version=1 ]) wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01) entity(ex:report2, [ prov:type="report", ex:version=2 ]) wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01) wasDerivedFrom(ex:report2, ex:report1) endBundle entity(agg:bundle3, [ prov:type='prov:Bundle' ]) agent(ex:aggregator01, [ prov:type='ex:Aggregator' ]) wasAttributedTo(agg:bundle3, ex:aggregator01) wasDerivedFrom(agg:bundle3, bob:bundle1) wasDerivedFrom(agg:bundle3, alice:bundle2)
The new bundle is given a new identifier agg:bundle3 and is attributed to the ex:aggregator01 agent.
In hasProvenanceIn(id, subject, bundle, target, service, prov, attrs), service and prov are both optional and mutually exclusive: if specified, either service or prov is provided.
A provenance locator specifies a context, referred to as located context in which further descriptions can be found about something.
When the subject and optional target denote entities, a provenance locator not only provides a located context, but it also expresses an alternate relation between the entity denoted by subject and the entity described in the located context. This is a alternate since the entity denoted by subject in the current context presents other aspects than the entity in the located one.
According to the following provenance locator, provenance descriptions about ex:report1 can be found in bundle bob:bundle1.
hasProvenanceIn(ex:report1, bob:bundle1, -, -, -)
According to the following provenance locator, provenance descriptions about ex:report1 can be found in bundle bob:bundle1, which is available from the provenance service identified by the provided URI.
hasProvenanceIn(ex:report1, bob:bundle1, -, "http://example.com/service"^xsd:anyURI, -)
According to the following provenance locator, provenance descriptions about ex:report1 can be found in resource identified by the provided URI.
hasProvenanceIn(ex:report1, -, -, -, "http://example.com/some-provenance.pn"^xsd:anyURI)
Let us again consider the same scenario involving two entities ex:report1 and ex:report2.
The first bundle can be expressed with all Bob's observations about the creation of ex:report1.
bundle bob:bundle4 entity(ex:report1, [ prov:type="report", ex:version=1 ]) wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01) endBundle
Likewise, Alice's observation about the derivation of ex:report2 from ex:report1, is expressed in a separate bundle.
bundle alice:bundle5 entity(ex:report1) hasProvenanceIn(ex:report1, bob:bundle4, -, -, -) entity(ex:report2, [ prov:type="report", ex:version=2 ]) wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01) wasDerivedFrom(ex:report2, ex:report1) endBundle
In bundle alice:bundle5, there is a description for entity ex:report1, and a provenance locator pointing to bundle bob:bundle4. The locator indicates that some provenance description for ex:report1 can be found in bundle bob:bundle4. The purpose of the locator is twofold. First, it allows for incremental navigation of provenance [[PROV-AQ]]. Second, it makes entity ex:report1 described in alice:bundle5 an alternate of ex:report1 described in bob:bundle4.
Alternatively, Alice may have decided to use a different identifier for ex:report1.
bundle alice:bundle6 entity(alice:report1) hasProvenanceIn(alice:report1, bob:bundle4, ex:report1, -, -) entity(ex:report2, [ prov:type="report", ex:version=2 ]) wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01) wasDerivedFrom(ex:report2, alice:report1) endBundle
Alice can specify the target in the provenance locator to be ex:report1. With such a statement, Alice states that provenance information about alice:report1 can be found in bundle bob:bundle4 under the name ex:report1. In effect, alice:report1 and ex:report1 are declared to be alternate.
Consider that the following bundle of descriptions, in which derivation and generations have been identified.
bundle obs:bundle7 entity(ex:report1, [prov:type="report", ex:version=1]) wasGeneratedBy(ex:g1; ex:report1,-,2012-05-24T10:00:01) entity(ex:report2, [prov:type="report", ex:version=2]) wasGeneratedBy(ex:g2; ex:report2,-,2012-05-25T11:00:01) wasDerivedFrom(ex:d; ex:report2, ex:report1) endBundle entity(obs:bundle7, [ prov:type='prov:Bundle' ]) wasAttributedTo(obs:bundle7, ex:observer01)Bundle obs:bundle7 is rendered by a visualisation tool. It may useful for the tool configuration for this bundle to be shared along with the provenance descriptions, so that other users can render provenance as it was originally rendered. The original bundle obviously cannot be changed. However, one can create a new bundle, as follows.
bundle tool:bundle8 entity(tool:bundle8, [ prov:type='viz:Configuration', prov:type='prov:Bundle' ]) wasAttributedTo(tool:bundle8, viz:Visualizer) entity(ex:report1, [viz:color="orange"]) hasProvenanceIn(ex:report1, obs:bundle7, -, -, -) entity(ex:report2, [viz:color="blue"]) hasProvenanceIn(ex:report2, obs:bundle7, -, -, -) wasDerivedBy(ex:d; ex:report2, ex:report1, [viz:style="dotted"]) hasProvenanceIn(ex:d, obs:bundle7, -, -, -) endBundle
In bundle tool:bundle8, the prefix viz is used for naming visualisation-specific attributes, types or values.
Bundle tool:bundle8 is given type viz:Configuration to indicate that it consists of descriptions that pertain to the configuration of the visualisation tool. This type attribute can be used for searching bundles containing visualization-related descriptions.
Alternates of the entities ex:report1 and ex:report2 have a visualization attribute for the color to be used when rendering these entities. Likewise, the derivation has a style attribute. To be able to express this alternate of the derivation, it is necessary for it to have an identifier in the first place (ex:d).
The idea of bundling provenance descriptions is crucial to the PROV approach. Indeed, it allows multiple provenance perspectives to be provided for a given entity. It is also the mechanism by which provenance of provenance can be expressed. Descriptions in bundles are expected to satisfy constraints specified in the companion specification [[PROV-CONSTRAINTS]].
WG membership to be listed here.