PROV-DM, the PROV data model, is a data model for provenance that describes the entities, people and activities involved in producing a piece of data or thing. PROV-DM is structured in six components, dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) agents bearing responsibility for entities that were generated and activities that happened; (3) derivations of entities from entities; (4) properties to link entities that refer to the same thing; (5) collections forming a logical structure for its members; (6) a simple annotation mechanism.

This document introduces the provenance concepts found in PROV and defines PROV-DM types and relations. PROV data model is domain-agnostic, but is equipped with extensibility points allowing domain-specific information to be included.

Two further documents complete the specification of PROV-DM. First, a companion document specifies the set of constraints that provenance descriptions should follow. Second, a separate document describes a provenance notation for expressing instances of provenance for human consumption; this notation is used in examples in this document.

PROV Family of Specifications

This document is part of the PROV family of specifications, a set of specifications defining various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. The specifications are:

How to read the PROV Family of Specifications

Fourth Public Working Draft

This is the fourth public release of the PROV-DM document. Following feedback, the Working Group has decided to reorganize this document substantially, separating the data model from its contraints and the notation used to illustrate it. The PROV-DM release is synchronized with the release of the PROV-O, PROV-PRIMER, PROV-N, and PROV-CONSTRAINTS documents. We are now clarifying the entry path to the PROV family of specifications.

Introduction

PROV Starting Points

Illustration of PROV-DM by an Example

PROV-DM Types and Relations

Component 1: Entities and Activities

...

...

...

...

Start

Start is when an activity is deemed to have started. The activity did not exist before its start. Any usage or generation involving an activity follows the activity's start. A start may refer to an entity, known as trigger, that initiated the activity, or to an activity, known as starter, that generated the trigger.

An activity start, written wasStartedBy(id, a2, e, a1, t, attrs) in PROV-N, has:
  • id: an OPTIONAL identifier for the activity start;
  • activity: an identifier (a2) for the started activity;
  • trigger: an OPTIONAL identifier (e) for the entity triggering the activity;
  • starter: an OPTIONAL identifier (a1) for the activity that generated the (possibly unspecified) entity (e);
  • time: the OPTIONAL time (t) at which the activity was started;
  • attributes: an OPTIONAL set (attrs) of attribute-value pairs representing additional information about this activity start.

The following example contains the description of an activity a1 (a discussion), which was started at a specific time, and was triggered by an email message e1.

entity(e1, [prov:type="email message"] )
activity(a1, [ prov:type="Discuss" ])
wasStartedBy(a1, e1, -, 2011-11-16T16:05:00)
Furthermore, if the message is also an input to the activity, this can be described as follows:
used(a1, e1, -)

Alternatively, one can also describe the activity that generated the email message.

activity(a0, [ prov:type="Write" ])
wasGeneratedBy(e1, a0)
wasStartedBy(a1, e1, a0, 2011-11-16T16:05:00)

In the following example, a race is started by a bang, and responsibility for this trigger is attributed to an agent ex:Bob.

activity(ex:foot_race)
wasStartedBy(ex:foot_race,ex:bang,2012-03-09T08:05:08-05:00)
entity(ex:bang)
agent(ex:Bob)
wasAttributedTo(ex:bang,ex:Bob)

In this example, filling fuel was started as a consequence of observing the low fuel. The trigger entity is unspecified, it could for instance have been the low fuel warning light, the fuel tank indicator needle position, or the engine not running properly.

activity(ex:filling-fuel)
activity(ex:observing-low-fuel)

agent(ex:driver, [ prov:type="prov:Person" %% xsd:QName )
wasAssociatedWith(ex:filling-fuel, ex:driver)
wasAssociatedWith(ex:observing-low-fuel, ex:driver)

wasStartedBy(ex:filling-fuel, -, ex:observing-low-fuel, -)

The relations wasStartedBy and used are orthogonal, and thus need to be expressed independently, according to the situation being described.

...

...

...

DELETED: Start By Activity

Component 2: Agents and Responsibility

Component 3: Derivations

Component 4: Alternate Entities

Component 5: Collections

Component 6: Bundles

Creating Valid Provenance

Acknowledgements

WG membership to be listed here.