Provenance is information about entities, activities, and people, involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. PROV-DM distinguishes core structures, forming the essence of provenance information, from extended structures catering for more specific uses of provenance. PROV-DM is organized in six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) derivations of entities from entities; (3) agents bearing responsibility for entities that were generated and activities that happened; (4) a notion of bundle, a mechanism to support provenance of provenance; and, (5) properties to link entities that refer to the same thing; (6) collections forming a logical structure for its members.
This document introduces the provenance concepts found in PROV and defines PROV-DM types and relations. PROV data model is domain-agnostic, but is equipped with extensibility points allowing domain-specific information to be included.
Two further documents complete the specification of PROV-DM. First, a companion document specifies the set of constraints that provenance should follow. Second, a separate document describes a provenance notation for expressing instances of provenance for human consumption; this notation is used in examples in this document.
This is the fifth public release of the PROV-DM document. Publication as Last Call working draft means that the Working Group believes that it has satisfied the relevant technical requirements outlined in its charter on this document. The design is not expected to change significantly, going forward, and now is the key time for external review, before the implementation phase.
The PROV Working group seeks public feedback on this Working Draft. The end date of the Last Call review period is TBD, and we would appreciate comments by that date to public-prov-comments@w3.org
Provenance concepts, expressed as PROV-DM types and relations, are organized according to six components that are defined in this section. The components and their dependencies are illustrated in Figure 4. A component that relies on concepts defined in another is displayed above it in the figure. So, for example, component 6 (collections) depends on concepts defined in component 3 (derivation), itself dependen on concepts defined in component 1 (entity and activity).
The sixth component of PROV-DM is concerned with the notion of collections. A collection is an entity that has some members. The members are themselves entities, and therefore their provenance can be expressed. Some applications need to be able to express the provenance of the collection itself: e.g. who maintains the collection (attribution), which members it contains as it evolves, and how it was assembled. The purpose of Component 6 is to define the types and relations that are useful to express the provenance of collections. In PROV, the concept of Collection is implemented by means of dictionaries, which we introduce in this section.
Figure 10 depicts the sixth component with four new classes (Collection, Dictionary, EmptyDictionary, and Pair) and three associations (insertion, removal, and memberOf).
The intent of these relations and types is to express the history of changes that occurred to a collection. Changes to collections are about the insertion of entities into, and the removal of entities from the collection. Indirectly, such history provides a way to reconstruct the contents of the collection.
A collection is a multiset of entities (it is a multiset, rather than a set, because it may not be possible to verify that two distinct entity identitifiers do not denote, in fact, the same entity).
PROV-DM defines the following types related to collections:
entity(c0, [prov:type='prov:EmptyCollection' ]) // c0 is an empty collection entity(c1, [prov:type='prov:Collection' ]) // c1 is a collection, with unknown content
In PROV, the concept of Collection is provided as an extensibility point for specialized kinds of collections. One of these, Dictionary, is defined next.
A collection membership relation is defined, to allow stating the members of a Collection.
Note that the attribute complete indicates that the membership relation provides a complete description of the collection membership. It is possible for different provenance descriptions to provide different membership statements regarding the same collection. The resolution of any potential conflict amongst such membership statements is defined by applications.
PROV-DM defines a specific type of collection, specified as follows.
Conceptually, a dictionary has a logical structure consisting of key-entity pairs. This structure is often referred to as a map, and is a generic indexing mechanism that can abstract commonly used data structures, including associative lists, relational tables, ordered lists, and more. The specification of such specialized structures in terms of key-value pairs is out of the scope of this document.
A given dictionary forms a given structure for its members. A different structure (obtained either by insertion or removal of members) constitutes a different dictionary. Hence, for the purpose of provenance, a dictionary entity is viewed as a snapshot of a structure. Insertion and removal operations result in new snapshots, each snapshot forming an identifiable dictionary entity.
Following the earlier definitions for generic collections, PROV-DM defines the following types related to dictionaries:
entity(d0, [prov:type='prov:EmptyDictionary' ]) // d0 is an empty dictionary entity(d1, [prov:type='prov:Dictionary' ]) // d1 is a dictionary, with unknown content
The attribute complete is interpreted as for the general collection membership relation.
entity(d1, [prov:type='prov:Dictionary' ]) // d1 is a dictionary, with unknown content entity(d2, [prov:type='prov:Dictionary' ]) // d2 is a dictionary, with unknown content entity(e1) entity(e2) memberOf(d1, {("k1", e1), ("k2", e2)} ) memberOf(d2, {("k1", e1), ("k2", e2)}, true)From these descriptions, we conclude:
Thus, the membership of d1 is only partially known.
An Insertion relation, written derivedByInsertionFrom(id; d2, d1, {(key_1, e_1), ..., (key_n, e_n)}, attrs), has:
An Insertion relation derivedByInsertionFrom(id; d2, d1, {(key_1, e_1), ..., (key_n, e_n)}) states that d2 is the dictionary following the insertion of pairs (key_1, e_1), ..., (key_n, e_n) into dictionary d1.
entity(d0, [prov:type='prov:EmptyDictionary' ]) // d0 is an empty dictionary entity(e1) entity(e2) entity(e3) entity(d1, [prov:type='prov:Dictionary' ]) entity(d2, [prov:type='prov:Dictionary' ]) derivedByInsertionFrom(d1, d0, {("k1", e1), ("k2", e2)}) derivedByInsertionFrom(d2, d1, {("k3", e3)})From this set of descriptions, we conclude:
Insertion provides an "update semantics" for the keys that are already present in a dictionary, since a new pair replaces an existing pair with the same key in the new dictionary. This is illustrated by the following example.
entity(d0, [prov:type='prov:EmptyDictionary' ]) // d0 is an empty dictionary entity(e1) entity(e2) entity(e3) entity(d1, [prov:type='prov:Dictionary' ]) entity(d2, [prov:type='prov:Dictionary' ]) derivedByInsertionFrom(d1, d0, {("k1", e1), ("k2", e2)}) derivedByInsertionFrom(d2, d1, {("k1", e3)})This is a case of update of e1 to e3 for the same key, "k1".
A Removal relation, written derivedByRemovalFrom(id; d2, d1, {key_1, ... key_n}, attrs), has:
A Removal relation derivedByRemovalFrom(id; d2,d1, {key_1, ..., key_n}) states that d2 is the dictionary following the removal of the set of pairs corresponding to keys key_1...key_n from d1.
entity(d0, [prov:type="prov:EmptyDictionary"]) // d0 is an empty dictionary entity(e1) entity(e2) entity(e3) entity(d1, [prov:type="prov:Dictionary"]) entity(d2, [prov:type="prov:Dictionary"]) derivedByInsertionFrom(d1, d0, {("k1", e1), ("k2",e2)}) derivedByInsertionFrom(d2, d1, {("k3", e3)}) derivedByRemovalFrom(d3, d2, {"k1", "k3"})From this set of descriptions, we conclude:
Further considerations:
The following table summarizes how each constituent of a PROV-DM Membership maps to a non-terminal.
Dictionary Membership | Non-Terminal |
id | optionalIdentifier |
dictionary | dIdentifier |
key-entity-set | keyEntitySet |
complete | complete |
attributes | optionalAttributeValuePairs |
memberOf(mId, c, {e1, e2, e3}, []) // Collection membership memberOf(mId, c, {("k4", v4), ("k5", v5)}, []) // Dictionary membership
Here mid is the optional membership identifier, c is the identifier for the collection whose membership is stated, {("k4", v4), ("k5", v5)} is the set of key-value pairs that are members of c, and [] is the optional (empty) set of attributes.
The remaining examples show cases for Dictionaries, where some of the optionals are omitted. Key-entity sets are replaced with Entity sets for the corresponding generic Collections examples.memberOf(c3, {("k4", v4), ("k5", v5)}) memberOf(c3, {("k4", v4)}) memberOf(c3, {("k4", v4)}, false) memberOf(c3, {("k4", v4)}, true) memberOf(c3, {("k4", v4), ("k5", v5)},[]) memberOf(c3, {("k4", v4), ("k5", v5)},true, [])
The following table summarizes how each constituent of a PROV-DM Insertion maps to a non-terminal.
Insertion | Non-Terminal |
id | optionalIdentifier |
after | cIdentifier |
before | cIdentifier |
key-entity-set | keyEntitySet |
attributes | optionalAttributeValuePairs |
derivedByInsertionFrom(id; c1, c, {("k1", v1), ("k2", v2)}, [])
Here id is the optional insertion identifier, c1 is the identifier for the collection after the insertion, c is the identifier for the collection before the insertion, {("k1", v1), ("k2", v2)} is the set of key-value pairs that have been inserted in c, and [] is the optional (empty) set of attributes.
The remaining examples show cases where some of the optionals are omitted.derivedByInsertionFrom(c1, c, {("k1", v1), ("k2", v2)}) derivedByInsertionFrom(c1, c, {("k1", v1)}) derivedByInsertionFrom(c1, c, {("k1", v1), ("k2", v2)}, [])
The following table summarizes how each constituent of a PROV-DM Removal maps to a non-terminal.
Removal | Non-Terminal |
id | optionalIdentifier |
after | cIdentifier |
before | cIdentifier |
key-set | keySet |
attributes | optionalAttributeValuePairs |
derivedByRemovalFrom(id; c3, c, {"k1", "k3"}, [])
Here id is the optional removal identifier, c1 is the identifier for the collection after the removal, c is the identifier for the collection before the removal, {("k1", v1), ("k2", v2)} is the set of key-value pairs that have been removed from c, and [] is the optional (empty) set of attributes.
The remaining examples show cases where some of the optionals are omitted.derivedByRemovalFrom(c3, c1, {"k1", "k3"}) derivedByRemovalFrom(c3, c1, {"k1"}) derivedByRemovalFrom(c3, c1, {"k1", "k3"}, [])
Membership is a convenience notation, since it can be expressed in terms of an insertion into some dictionary. The membership definition is formalized by .
memberOf(d, {(k1, v1), ...}) holds IF AND ONLY IF there exists a dictionary d0, such that derivedByInsertionFrom(d, d0, {(k1, v1), ...}).
A dictionary may be obtained by insertion or removal, or said to satisfy the membership relation. To provide an interpretation of dictionaries, PROV-DM restricts one dictionary to be involved in a single derivation by insertion or removal, or to one membership relation. PROV-DM does not provide an interpretation for statements that consist of two (or more) insertion, removal, membership relations that result in the same dictionary.
The following constraint ensures unique derivation.
A dictionary MUST NOT be derived through multiple insertions, removal, or membership relations.
entity(d1, [prov:type='prov:Dictionary']) entity(d2, [prov:type='prov:Dictionary']) entity(d3, [prov:type='prov:Dictionary']) derivedByInsertionFrom(d3, d1, {("k1", e1), ("k2", e2)}) derivedByInsertionFrom(d3, d2, {("k3", e3)})
There is no interpretation for such statements since d3 is derived multiple times by insertion.
As a particular case, dictionary d is derived multiple times from the same d1.
derivedByInsertionFrom(id1, d, d1, {("k1", e1), ("k2", e2)}) derivedByInsertionFrom(id2, d, d1, {("k3", e3), ("k4", e4)})
The interpretation of such statements is also unspecified.
To describe the insertion of the 4 key-entity pairs, one would instead write:
derivedByInsertionFrom(id1, d, d1, {("k1", e1), ("k2", e2), ("k3", e3), ("k4", e4)})
The following statements
derivedByInsertionFrom(d, d1, {("k1", e1)}) derivedByRemovalFrom(d, d2, {"k2"})have no interpretation. Nor have the following:
derivedByInsertionFrom(d, d1, {("k1", e1)}) memberOf(d, {"k2",e2})
entity(d0, [prov:type='prov:EmptyDictionary']) // d0 is an empty dictionary entity(d1, [prov:type='prov:Dictionary']) entity(d2, [prov:type='prov:Dictionary']) entity(d3, [prov:type='prov:Dictionary']) entity(e1) entity(e2) entity(e3) derivedByInsertionFrom(d1, d0, {("k1", e1)}) derivedByInsertionFrom(d2, d0, {("k2", e2)}) derivedByInsertionFrom(d3, d1, {("k3", e3)})From this set of statements, we conclude:
d1 = { ("k1", e1) } d2 = { ("k2", e2) } d3 = { ("k1", e1), ("k3", e3)}
Since a set of statements regarding a dictionary's evolution may be incomplete, so is the reconstructed state obtained by querying those statements. In general, all statements reflect partial knowledge regarding a sequence of data transformation events. In the particular case of dictionary evolution, in which some of the state changes may have been missed, the more generic derivation relation should be used to signal that some updates may have occurred, which cannot be expressed as insertions or removals. The following example illustrates this.
entity(d0, [prov:type='prov:EmptyDictionary']) // d0 is an empty dictionary entity(d1, [prov:type='prov:Dictionary']) entity(d2, [prov:type='prov:Dictionary']) entity(d3, [prov:type='prov:Dictionary']) entity(e1) entity(e2) derivedByInsertionFrom(d1, d0, {("k1", e1)}) wasDerivedFrom(d2, d1) derivedByInsertionFrom(d3, d2, {("k2", e2)})From this set of statements, we conclude:
A prov:Dictionary
is an prov:Entity
that acts as a container to some members,
which are themselves entities.
Specifically, a dictionary is composed of set of key-value pairs, where a
literal key is used to identify a constituent entity within the dictionary.
To illustrate this, the example below describes a dictionary :c1
that has as members the two key value pairs ("k1", :e1)
and ("k2", :e2)
.
{% escape %}{% include "includes/prov/examples/eg-26-provo-collections-narrative/rdf/membership.ttl" %}{% endescape %}
It is worth noting that :c1
MAY also
have other members (i.e. prov:knownMembership
is
not functional). A dictionary MAY be empty and thus not have any known
memberships, in which case it SHOULD be described as an instance of the
subclass prov:EmptyDictionary
.
To describe the provenance of a dictionary, PROV-O provides two
kinds of involvements: prov:qualifiedInsertion
is used to
describe that a dictionary was obtained from an existing dictionary by
inserting a set of key-value pairs. prov:qualifiedRemoval
is used to specify
that a given dictionary was obtained from an existing dictionary by
removing a set of key-value pairs. The example below specifies that
the dictionary :c1
was obtained from the empty dictionary
:c
by inserting the key-value pairs ("k1",
:e1)
and ("k2", :e2)
.
{% escape %}{% include "includes/prov/examples/eg-26-provo-collections-narrative/rdf/insertion.ttl" %}{% endescape %}
Similarly, the example below specifies that the dictionary
:c3
was obtained by removing the key-value pairs associated with
the keys "k1"
and "k2"
from the dictionary
:c2
. Thus, :c3
does not contain the
members ("k1", :e1)
and ("k2",
:e2(
from :c2
.
{% escape %}{% include "includes/prov/examples/eg-26-provo-collections-narrative/rdf/removal.ttl" %}{% endescape %}
The terms used to describe the provenance of collections of key-value pairs are discussed in Section 3.4.
need to include aquarius:"includes/at-a-glance-collections.html" need to include aquarius:"includes/cross-reference-collections.html"WG membership to be listed here.