Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. PROV-DM distinguishes core structures, forming the essence of provenance information, from extended structures catering for more specific uses of provenance. PROV-DM is organized in six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) derivations of entities from entities; (3) agents bearing responsibility for entities that were generated and activities that happened; (4) a notion of bundle, a mechanism to support provenance of provenance; (5) properties to link entities that refer to the same thing; and, (6) collections forming a logical structure for its members.

This document introduces the provenance concepts found in PROV and defines PROV-DM types and relations. The PROV data model is domain-agnostic, but is equipped with extensibility points allowing domain-specific information to be included.

Two further documents complete the specification of PROV-DM. First, a companion document specifies the set of constraints that provenance should follow. Second, a separate document describes a provenance notation for expressing instances of provenance for human consumption; this notation is used in examples in this document.

Last Call

This is the fifth public release of the PROV-DM document. This is a Last Call Working Draft. The design is not expected to change significantly, going forward, and now is the key time for external review.

This specification identifies one feature at risk: Mention (Section 5.5.3) might be removed from PROV if implementation experience reveals problems with supporting this construct.

PROV Family of Specifications

This document is part of the PROV family of specifications, a set of specifications defining various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. The specifications are:

How to read the PROV Family of Specifications

PROV-DM Types and Relations

Provenance concepts, expressed as PROV-DM types and relations, are organized according to six components that are defined in this section. The components and their dependencies are illustrated in Figure 4. A component that relies on concepts defined in another is displayed above it in the figure. So, for example, component 5 (alternate) depends on concepts defined in component 4 (bundles), itself dependent on concepts defined in component 1 (entity and activity).

PROV-DM Components agents/responsibility agents/responsibility agents/responsibility derivations derivations alternate alternate collections activities/entities bundles
PROV-DM Components

While not all PROV-DM relations are binary, they all involve two primary elements. Hence, Table 4 indexes all relations according to their two primary elements (referred to as subject and object). The table adopts the same color scheme as Figure 4, allowing components to be readily identified. Note that for simplicity, this table does not include collection-oriented relations. Relation names appearing in bold correspond to the core structures introduced in Section 2.1.

PROV-DM Relations At a Glance
Object
EntityActivityAgent
SubjectEntityWasGeneratedBy
WasInvalidatedBy
R
T
L
WasAttributedTo
ActivityUsed
WasStartedBy
WasEndedBy
R
T
L
WasInformedByWasAssociatedWithR
AgentActedOnBehalfOf

The letters 'R' and 'L' appearing in the right-hand side of some cells of Table 4 indicate that attributes prov:role (TBD) and prov:location (TBD) are permitted for these relations. The letter 'T' indicates an OPTIONAL time is also permitted.

Some PROV-DM relations are not binary and involve extra optional element. They are summarized in Table 5 grouping secondary objects, according to their type. The table also adopts the same color scheme as Figure 4, allowing components to be readily identified. None of these associations correspond to the core structures introduced in Section 2.1.

Secondary optional elements in PROV-DM Relations
Secondary Object
EntityActivityAgent
SubjectEntityMentionOf (bundle)WasDerivedFrom (activity)
Revision (activity)
Quotation (activity)
PrimarySource (activity)
ActivityWasAssociatedWith (plan)WasStartedBy (starter)
WasEndedBy (ender)
AgentActedOnBehalfOf (activity)