PROV-DM, the PROV data model, is a data model for provenance that describes the entities, people and activities involved in producing a piece of data or thing. PROV-DM distinguishes core structures, forming the essence of provenance descriptions, from extended structures catering for more advanced uses of provenance. PROV-DM is organized in six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) agents bearing responsibility for entities that were generated and activities that happened; (3) derivations of entities from entities; (4) properties to link entities that refer to the same thing; (5) notion of bundle, a mechanism to support provenance of provenance; (6) collections forming a logical structure for its members.
This document introduces the provenance concepts found in PROV and defines PROV-DM types and relations. PROV data model is domain-agnostic, but is equipped with extensibility points allowing domain-specific information to be included.
Two further documents complete the specification of PROV-DM. First, a companion document specifies the set of constraints that provenance descriptions should follow. Second, a separate document describes a provenance notation for expressing instances of provenance for human consumption; this notation is used in examples in this document.
This is the fourth public release of the PROV-DM document. Following feedback, the Working Group has decided to reorganize this document substantially, separating the data model from its contraints and the notation used to illustrate it. The PROV-DM release is synchronized with the release of the PROV-O, PROV-PRIMER, PROV-N, and PROV-CONSTRAINTS documents. We are now clarifying the entry path to the PROV family of specifications.
The fourth component of PROV-DM is concerned with relations specialization and alternate between entities. Figure 8 depicts the fourth component with a single class and two associations.
Two provenance descriptions about the same thing may emphasize differents aspects of that thing.
User Alice writes an article. In its provenance, she wishes to refer to the precise version of the article with a date-specific URI, as she might edit the article later. Alternatively, user Bob refers to the article in general, independently of its variants over time.
The PROV data model introduces relations, called specialization and alternate, that allow entities to be linked together. They are defined as follows.
Examples of constraints include a time period, an abstraction, and a context associated with the entity.
The BBC news home page on 2012-03-23 ex:bbcNews2012-03-23 is a specialization of the BBC news page in general bbc:news/. This can be expressed as follows.
specializationOf(ex:bbcNews2012-03-23, bbc:news/)We have created a new qualified name, ex:bbcNews2012-03-23, in the namespace ex, to identify the specific page carrying this day's news, which would otherwise be the generic bbc:news/ page.
A given news item on the BBC News site bbc:news/science-environment-17526723 for desktop is an alternate of a bbc:news/mobile/science-environment-17526723 for mobile devices.
entity(bbc:news/science-environment-17526723, [ prov:type="a news item for desktop"]) entity(bbc:news/mobile/science-environment-17526723, [ prov:type="a news item for mobile devices"]) alternateOf(bbc:news/science-environment-17526723, bbc:news/mobile/science-environment-17526723)
They are both specialization of an (unspecified) entity.
Considering again the two versions of the technical report tr:WD-prov-dm-20111215 (second working draft) and tr:WD-prov-dm-20111018 (first working draft). They are alternate of each other.
entity(tr:WD-prov-dm-20111018) entity(tr:WD-prov-dm-20111215) alternateOf(tr:WD-prov-dm-20111018,tr:WD-prov-dm-20111215)
They are both specialization of the page http://www.w3.org/TR/prov-dm/.
Something that is a contextualization of another presents all aspects of the latter in a given context specified by descriptions found in a bundle.
In the following example, two bundles ex:run1 and ex:run2 refer to an agent ex:Bob that controlled two activities ex:a1 and ex:a2.
bundle ex:run1 activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:00:00) //duration: 1hour wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"]) endBundle bundle ex:run2 activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:00:00) //duration: 7hours wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"]) endBundle
A performance rating tool reads these bundles, and rates the performance of the agent described in these bundles. The performance rating tool creates a new bundle tool:analysis01 containing the following. A new agent tool:Bob1 is declared as a contextualization of ex:Bob as described in context ex:run1, and likewise for tool:Bob2 with respect to ex:run2. The tool then defines two specializations of these contextualized agents with an associated rating. The performance of the agent in the first bundle is judged to be good since the duration of ex:a1 is one hour, whereas it is judged to be bad in the second bundle since ex:a2's duration is seven hours.
bundle tool:analysis01 agent(tool:Bob1) contextualizationOf(tool:Bob1, ex:Bob, ex:run1) agent(tool:ratedBob1, [perf:rating="good"]) specialization(tool:ratedBob1, tool:Bob1) agent(tool:Bob2) contextualizationOf(tool:Bob2, ex:Bob, ex:run2) agent(tool:ratedBob2, [perf:rating="bad"]) specialization(tool:ratedBob2, tool:Bob2) endBundle
Consider the following bundle of descriptions, in which derivation and generations have been identified.
bundle obs:bundle7 entity(ex:report1, [prov:type="report", ex:version=1]) wasGeneratedBy(ex:g1; ex:report1,-,2012-05-24T10:00:01) entity(ex:report2, [prov:type="report", ex:version=2]) wasGeneratedBy(ex:g2; ex:report2,-,2012-05-25T11:00:01) wasDerivedFrom(ex:d; ex:report2, ex:report1) endBundle entity(obs:bundle7, [ prov:type='prov:Bundle' ]) wasAttributedTo(obs:bundle7, ex:observer01)Bundle obs:bundle7 is rendered by a visualisation tool. It may useful for the tool configuration for this bundle to be shared along with the provenance descriptions, so that other users can render provenance as it was originally rendered. The original bundle obviously cannot be changed. However, one can create a new bundle, as follows.
bundle tool:bundle8 entity(tool:bundle8, [ prov:type='viz:Configuration', prov:type='prov:Bundle' ]) wasAttributedTo(tool:bundle8, viz:Visualizer) entity(tool:report1, [viz:color="orange"]) // is it appropriate to add viz attributes to tool:report1 or should we specialize it? contextualizationOf(tool:report1, obs:bundle7, ex:report1) entity(tool:report2, [viz:color="blue"]) contextualizationOf(tool:report2, obs:bundle7, ex:report2) wasDerivedBy(tool:d; tool:report2, tool:report1, [viz:style="dotted"]) contextualizationOf(tool:d, obs:bundle7, ex:d) endBundle
In bundle tool:bundle8, the prefix viz is used for naming visualisation-specific attributes, types or values.
Bundle tool:bundle8 is given type viz:Configuration to indicate that it consists of descriptions that pertain to the configuration of the visualisation tool. This type attribute can be used for searching bundles containing visualization-related descriptions.
The visualisation tool created new identifiers tool:report1, tool:report2, and tool:d. They denote entities which are alternates of with ex:report1 and ex:report2, described in bundle obs:bundle7, with visualization attribute for the color to be used when rendering these entities. Likewise, the derivation has a style attribute.
According to their definition, derivations have an optional identifier. To express an alternate for a derivation, we need to be able to reference it, by means of an identifier. Hence, it is necessary for it to have an identifier in the first place (ex:d).
WG membership to be listed here.