Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. PROV-DM distinguishes core structures, forming the essence of provenance information, from extended structures catering for more specific uses of provenance. PROV-DM is organized in six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) derivations of entities from entities; (3) agents bearing responsibility for entities that were generated and activities that happened; (4) a notion of bundle, a mechanism to support provenance of provenance; (5) properties to link entities that refer to the same thing; and, (6) collections forming a logical structure for its members.

This document introduces the provenance concepts found in PROV and defines PROV-DM types and relations. The PROV data model is domain-agnostic, but is equipped with extensibility points allowing domain-specific information to be included.

Two further documents complete the specification of PROV-DM. First, a companion document specifies the set of constraints that provenance should follow. Second, a separate document describes a provenance notation for expressing instances of provenance for human consumption; this notation is used in examples in this document.

Last Call

This is the fifth public release of the PROV-DM document. This is a Last Call Working Draft. The design is not expected to change significantly, going forward, and now is the key time for external review.

This specification identifies one feature at risk: Mention (Section 5.5.3) might be removed from PROV if implementation experience reveals problems with supporting this construct.

PROV Family of Specifications

This document is part of the PROV family of specifications, a set of specifications defining various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. The specifications are:

How to read the PROV Family of Specifications

prov:value

The attribute prov:value provides a value that is a direct representation of an entity as a PROV-DM Value (Section 5.7.3) (5.7.3)

The attribute prov:value is an OPTIONAL attribute of entity. The value associated with the attribute prov:value MUST be a PROV-DM Value. The attribute prov:value MAY occur at most once in a set of attribute-value pairs.

The following example illustrates the provenance of the number 4 obtained by an activity that computed the length of an input string "abcd". The input and the output are expressed as entities ex:in and ex:out, respectively. They each have a prov:value attribute associated with the corresponding value.

entity(ex:in, [ prov:value="abcd" ]) 
entity(ex:out, [ prov:value=4 ]) 
activity(ex:len, [ prov:type="string-length" ])
used(ex:len, ex:in)
wasGeneratedBy(ex:out, ex:len)
wasDerivedFrom(ex:out, ex:in)

Two different entities MAY have the same value for the attribute prov:value. For instance, when two entities, with the same prov:value, are generated by two different activities, as illustrated by the following example.

Example REF illustrates an entity with a given value 4. This examples shows that another entity with the same value may be computed differently (by an addition).

entity(ex:in1, [ prov:value=3 ]) 
entity(ex:in2, [ prov:value=1 ]) 
entity(ex:out2, [ prov:value=4 ])      // ex:out2 also has value 4
activity(ex:add1, [ prov:type="addition" ])
used(ex:add1, ex:in1)
used(ex:add1, ex:in2)
wasGeneratedBy(ex:out2, ex:add1)