W3C

Semantics of the PROV Data Model

W3C Working Group Note 30 April 2013

This version:
http://www.w3.org/TR/2013/NOTE-prov-sem-20130430/
Latest published version:
http://www.w3.org/TR/prov-sem/
Previous version:
http://www.w3.org/TR/2013/WD-prov-sem-20130312/
Editor:
James Cheney, University of Edinburgh

This document is also available in this non-normative format: PDF


Abstract

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications.

This document presents a model-theoretic semantics for the PROV data model, viewing PROV-DM statements as atomic formulas in the sense of first-order logic, and viewing the constraints and inferences specified in PROV-CONSTRAINTS as a first-order theory. It is shown that valid PROV instances (in the sense of PROV-CONSTRAINTS) correspond to satisfiable theories. This information may be useful to researchers or users of PROV to understand the intended meaning and use of PROV for modeling information about the actual history, derivation or evolution of Web resources. It may also be useful for development of additional constraints or inferences for reasoning about PROV or integration of PROV with other Semantic Web vocabularies. It is not proposed as a canonical or required semantics of PROV and does not place any constraints on the use of PROV.

The PROV Document Overview describes the overall state of PROV, and should be read before other PROV documents.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

PROV Family of Documents

This document is part of the PROV family of documents, a set of documents defining various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. These documents are listed below. Please consult the [PROV-OVERVIEW] for a guide to reading these documents.

Implementations Encouraged

The Provenance Working Group encourages implementations that make use of or extend the semantics in this document. Although work on this document by the Provenance Working Group is complete, errors may be recorded in the errata or and these may be addressed in future revisions.

Please Send Comments

This document was published by the Provenance Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to public-prov-comments@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

Provenance is a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing. [PROV-DM] This document complements the PROV-DM specification [PROV-DM] that defines a data model for provenance on the Web, and the PROV-CONSTRAINTS specification [PROV-CONSTRAINTS] that specifies definitions, inferences, and constraints that can be used to reason about PROV documents, or determine their validity. This document provides a formal semantics of PROV, providing a formal counterpart to the informal descriptions and motivations given elsewhere in PROV specifications.

1.1 Purpose of this document

The PROV-DM and PROV-CONSTRAINTS specifications give motivating examples that provide an intuition about the meaning of the constructs. For some concepts, such as use, start, end, generation, invalidation, and derivation, the meaning is either obvious or situation-dependent. However, during the development of PROV, the importance of additional concepts became evident, but the intuitive meaning or correct use of these concepts were not clear. For example, the alternateOf and specializationOf relations are used in PROV to relate different entities that present aspects of "the same thing". Over time the working group came to a consensus about these concepts and how they are to be used, but this understanding is based on abstract notions that are not explicit in PROV documents; instead, some of their properties are captured formally through certain constraints and inferences, while others are not captured in PROV specifications at all.

The purpose of this document is to present the working group's consensus view of the semantics of PROV, using tools from mathematical logic, principally model theory (though our use of these tools is lightweight). This information may be useful to users for understanding the intent behind certain features of PROV, to researchers investigating richer forms of reasoning over provenance, or to future efforts building upon PROV. It is intended as an exploration of one semantics for PROV, not a definitive specification of the only semantics of PROV. We provide a semantics that satisfies all of the constraints on valid PROV instances, and such that valid PROV instances correspond to satisfiable theories: every valid instance has a model, and vice versa.

The semantics has some appealing properties. Specifically, it provides a declarative counterpart to the operational definition of validity taken in PROV-CONSTRAINTS. In the specification, validity is defined via a normalization process followed by constraint checking on the normal form. This approach was adopted to keep the specification closer to implementations, although other implementations are possible and allowed. In addition to providing a semantics, this document shows that the operational presentation of PROV validity checking is equivalent to the declarative presentation adopted here. This could help justify alternative approaches to validity checking.

This document mostly considers the semantics of PROV statements and instances. PROV documents can consist of multiple instances, such as named bundles. The semantics do not cover general PROV documents, but the semantics can be used on each instance in a document separately, just as PROV-CONSTRAINTS specifies that each instance in a document is to be validated separately. So, in the rest of this document, we discuss only PROV instances and not PROV documents. The semantics of extensions of PROV, such as dictionaries [PROV-DICTIONARY] and linking across bundles [PROV-LINKS], are beyond the scope of this document.

This document has been reviewed by the Working Group, but the theorems and proofs have not been formally peer-reviewed in the sense of an academic paper. Thus, the Working Group believes this document is an appropriate starting point for future study of the semantics of PROV, but further work may be needed.

1.2 Structure of this document

1.3 Audience

This document assumes familiarity with [PROV-DM] and [PROV-CONSTRAINTS] and employs (a simplified form of) [PROV-N] notation. In particular it assumes familiarity with the concepts from logic, and the relationship between PROV statements and instances and first-order formulas and theories, respectively, presented in Section 2.5 of PROV-CONSTRAINTS.

This document may be useful to users of PROV who have a formal background and are interested in the rationale for some of the constructs of PROV; for researchers investigating extensions of PROV or alternative approaches to reasoning about PROV; or for future efforts on provenance standardization.

2. Basics

2.1 Identifiers

A lowercase symbol x,y,... on its own denotes an identifier. Identifiers are viewed as variables from the point of view of logic. Identifiers denote objects, and two different identifiers x and y may denote equal or different objects. We write Identifiers for the set of identifiers of interest in a given situation (typically, the set of identifiers present in the PROV instance of interest).

2.2 Attributes and Values

We assume a set Attributes of attribute labels and a set Values of possible values of attributes. To allow for the fact that some attributes can have undefined or multiple values, we sometimes use the set P(Values), that is, the set of sets of values. Thus, we can use the empty set to stand for an undefined value and {a,b} to stand for the set of values of a two-valued attribute.

2.3 Times

We assume an ordered set (Times,) of time instants, where TimesValues and is a linear order.

2.4 Atomic Formulas

The following atomic formulas correspond to the statements of PROV-DM. We assume that definitions 1-4 of PROV-CONSTRAINTS have been applied in order to expand all optional parameters; thus, we use uniform notation r(id,a1,,an) instead of the semicolon notation r(id;a1,,an).

Each parameter is either an identifier, a constant (e.g. a time or other literal value in an attribute list), or a null symbol "". Placeholder symbols "" can only appear in the specified arguments pl in wasAssociatedWith and a,g,u in wasDerivedFrom, as shown in the grammar below.

atomic_formula::=element_formula|relation_formula|auxiliary_formulaelement_formula::=entity(id,attrs)|activity(id,st,et,attrs)|agent(id,attrs)relation_formula::=wasGeneratedBy(id,e,a,t,attrs)|used(id,e,a,t,attrs)|wasInvalidatedBy(id,e,a,t,attrs)|wasStartedBy(id,a2,e,a1,t,attrs)|wasEndedBy(id,a2,e,a1,t,attrs)|wasAssociatedWith(id,ag,act,pl,attrs)|wasAssociatedWith(id,ag,act,,attrs)|wasAttributedTo(id,e,ag,attrs)|actedOnBehalfOf(if,ag2,ag1,act,attrs)|wasInformedBy(id,a2,a1,attrs)|wasDerivedFrom(id,e2,e1,act,g,u,attrs)|wasDerivedFrom(id,e2,e1,,,,attrs)|wasInfluencedBy(id,x,y,attrs)|alternateOf(e1,e2)|specializationOf(e1,e2)|hadMember(c,e)auxiliary_formula::=x strictlyPrecedes y|x precedes y|notNull(x)|typeOf(x,ty)attrs::=[attr1=v1,,attrn=vn]ty::=entity|activity|agent|Collection|EmptyCollection

We include the standard PROV collection types (Collection and EmptyCollection) and the membership relation hadMember; however, we do not model dictionaries or the insertion or deletion relations in PROV-DICTIONARY [PROV-DICTIONARY], since these are not part of the PROV recommendations. If these features are incorporated into future standards, their semantics (and the soundness of the associated constraints) should be modeled. We omit the prov prefixes from the Collection and EmptyCollection types.

As stated in the Introduction, we do not explicitly model bundles or PROV documents; however, each instance can be viewed as a set of formulas and can be modeled separately. The semantics of the standard features of PROV can be defined without talking about multiple instances; however, the mentionOf relation in PROV-LINKS [PROV-LINKS] is intended to support linking across bundles. Future editions of PROV may incorporate mentionOf or other cross-instance assertions, and if so this semantics should be generalized in order to provide a rationale for such an extension and to establish the soundness of constraints associated with mentionOf.

2.5 First-Order Formulas

We also consider the usual connectives and quantifiers of first-order logic.

ϕ::=atomic_formula|True|False|x=y|¬ ϕ|ϕ1ϕ2|ϕ1ϕ2|ϕ1ϕ2|x.ϕ|x.ϕ

3. Structures and Interpretations

In this section we define mathematical structures W that can be used to interpret PROV formulas and instances. A structure consists of a collection of sets, functions and relations. The components of a structure W are given in the rest of the section in components, highlighted in boxes.

We use the term "component" here in a different sense than in PROV-DM. Here, the components are parts of a large definition, whereas PROV-DM defines six components that group different parts of the PROV data model.

3.1 Things

Things is a set of things in the situation being modeled. Each thing has an associated set of Events and attributes whose values can change over time. Different kinds of Events are specified further below.

To model this, a structure W includes:

  1. a set Things of things
  2. a set Events of events
  3. a function events:ThingsP(Events) from things to associated sets of events.
  4. a function value:Things×Attributes×EventsP(Values) giving the possible values of each attribute of a Thing at the instant of a given event.
  5. Attributes are only defined during the events of a thing, that is, value(T,a,evt) implies evtevents(T).

The range of value is the set P(Values), indicating that value is essentially a multi-valued function that returns a set of values (possibly empty). When value(x,a,evt)=, we say that attribute a is undefined for x at event evt.

Note that this description does not say what the structure of a Thing is, only how it may be described in terms of its events and attribute values. A thing could be a record of fixed attribute values; it could be a bear; it could be the Royal Society; it could be a transcendental number like π. All that matters from our point of view is that we know how to map the Thing to its events and attribute mapping.

The identity of a Thing is not observable through its attributes or events, so it is possible for two different Things to be indistinguishable by their attribute values and events. That is, if the set of Things={T0,T1} and the attributes are specified as value(T0,a,evt)=value(T1,a,evt) for each evtEvents and aAttributes, this does not imply that T0=T1.

Things are associated with certain kinds of Objects called Entities, defined in the next subsection. Specifically, the function thingOf associates an Entity to a Thing.

3.2 Objects

Things are things in the world that have attributes that can change over time. Things may not have distinguishing features that are readily observable and permanent. In PROV, we do not talk explicitly about Things, but instead we talk about various objects that have discrete, fixed features, and relationships among these objects. Some objects, called Entities, are associated with Things, and their fixed attributes need to match those of the associated Thing during their common events. Others correspond to agents, activities, or identifiable interactions among them.

In this section, we detail the different subsets of Objects, and give disjointness constraints and associated functions. Generally, these constraints are necessary to validate disjointness constraints from PROV-CONSTRAINTS [PROV-CONSTRAINTS].

An Object is described by a set of events and attributes with fixed values. Objects encompass entities, activities, agents, and interactions (i.e., usage, generation, and other events or influence relations). To model this, a structure includes:

  1. a set Objects
  2. a function events:ObjectsP(Events) from objects to associated sets of events.
  3. a function value:Objects×AttributesP(Values).

Intuitively, events(e) is the set of events in which e participated. The set value(e,a) is the set of values of attribute a during the object's events.

As with Things, the range of value is sets of values, making value effectively a multivalued function. It is also possible to have two different objects that are indistinguishable by their attributes and associated events. Objects are not things, and the sets of Objects and Things are disjoint; however, certain objects, namely entities, are associated with things.

Disjointness between Objects and Things is not necessary but is assumed in order to avoid confusion between the different categories (time-varying Things vs fixed Objects).

3.2.1 Entities

An entity is a kind of object that fixes some aspects of a thing. We assume:

  1. a set EntitiesObjects of entities, disjoint from Activities below.
  2. a function thingOf:EntitiesThings that associates each Entity e with a Thing, such that events(e)events(thingOf(e)) and for each evtevents(e) and for each attribute a we have value(e,a)value(thingOf(e),a,evt).

Although both entities and things can have undefined or multiple attribute values, their meaning is slightly different: for a thing, value(x,a,evt)= means that the attribute a has no value at event evt, whereas for an entity, value(x,a)= only means that the thing associated to entity x need not have a fixed value for a during the events of x. This does not imply that value(thingOf(e),a,evt)= when evtevents(e).

Furthermore, all of the attribute values of the entity must be present in the associated thing throughout the events of the entity. For example, suppose value(thingOf(e),a,evt) is {1} at some event evtevents(e) and value(thingOf(e),a,evt)={2} at some other event evt. Then value(e,a) must be because there is no other set of values that is simultaneously contained in both {1} and {2}.

In the above description of how Entities relate to Things, we require value(e,a)value(thingOf(e),a,evt) whenever evtevents(e). Intuitively, this means that if we are talking about a Thing indirectly by describing an Entity, then any attributes we ascribe to the Entity must also describe the associated Thing during their common events. Attributes of both Entities and Things are multi-valued, so there is no inconsistency in saying that an entity has two different values for some attribute. In some situations, further uniqueness constraints or range constraints could be imposed on attributes.

Only Entities are associated with Things, and this association is necessary to provide an interpretation for the alternateOf and specializationOf relations. It might also make sense to associate Agents, Activities, and Interactions with Things, or with some other structures; however, this is not necessary to model any of the current features of PROV, so in the interest of simplicity we do not do this.

3.2.1.1 Plans

We identify a specific subset of the entities called plans:

A set PlansEntities of plans.

3.2.1.2 Collections

We identify another specific subset of the entities called collections, with the following associated structure:

  • A set CollectionsEntities
  • A membership function members:CollectionsP(Entities) mapping each collection to its set of members.

3.2.2 Activities

An activity is an object corresponding to a continuing process rather than an evolving thing. We introduce:

  1. A set ActivitiesObjects of activities.
  2. Functions startTime:ActivitiesTimes and endTime:ActivitiesTimes giving the start and end time of each activity.
  3. Activities are disjoint from Entities: EntitiesActivities=.

3.2.3 Agents

An agent is an object that can act, by controlling, starting, ending, or participating in activities. An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity. Agents can act on behalf of other agents. An agent may be a particular type of entity or activity; an agent cannot be both entity and activity because the sets of entities and activities are disjoint. We introduce:

A set AgentsObjects of agents.

There is no requirement that every agent is either an activity or an entity.

3.2.4 Influences

We consider a set InfluencesObjects which has disjoint subsets Events connecting entities and activities, Associations between agents and activities, Attributions between entities and agents, Communications between pairs of activities, Delegations between pairs of agents, and Derivations that describe chains of generation and usage steps. These kinds of influences are discussed further below. Influences are disjoint from entities, activities and agents.

  1. A set Influences=EventsAssociationsCommunicationsDelegationsDerivationsObjects
  2. The sets Events, Associations, Communications, Delegations and Derivations are all pairwise disjoint.
  3. Influences are disjoint from entities, agents and activities: Influences(EntitiesActivitiesAgents)=
  4. An associated function influenced:InfluencesObjects×Objects giving the source and target of each influence.
3.2.4.1 Events

An Event is an instantaneous influence that relates an activity to an entity (either of which could also be an agent). Events have types including usage, generation, invalidation, starting and ending. Events are instantaneous. We introduce:

  1. A set EventsInfluences of events, partitioned into disjoint subsets Starts,Ends,Generations,Usages,Invalidations.
  2. A function time:EventsTimes.
  3. A quasi-ordering on events ⪯⊂Events×Events. We write ee when ee and e⪯̸e hold.
  4. A function started:StartsActivities×Entities×Activities, such that started(start)=(a,e,a) implies startevents(a)events(e)events(a).
  5. A function ended:EndsActivities×Entities×Activities, such that ended(end)=(a,e,a) implies endevents(a)events(e)events(a).
  6. A function used:UsagesActivities×Entities such that used(use)=(a,e) implies useevents(a)events(e).
  7. A function generated:GenerationsEntities×Activities such that generated(gen)=(a,e) implies genevents(a)events(e).
  8. A function invalidated:InvalidationsEntities×Activities such that invalidated(inv)=(a,e) implies invevents(a)events(e).
3.2.4.2 Associations

An Association is an influence relating an agent to an activity and optional plan. To model associations, we introduce:

A set AssociationsInfluences with associated function associatedWith:AssociationsAgents×Activities×Plans.

3.2.4.3 Attributions

An Attribution is an influence relating an entity to an agent. To model attributions, we introduce:

A set AttributionsInfluences with associated function attributedTo:AttributionsEntities×Agents.

3.2.4.4 Communications

A Communication is an influence indicating exchange of information between activities. To model communications, we introduce:

A set CommunicationsInfluences with associated function communicated:CommunicationsActivities×Activities.

3.2.4.5 Delegations

A Delegation is an influence relating two agents. To model delegations, we introduce:

A set DelegationsInfluences and associated function actedFor:DelegationsAgents×Agents×Activities

3.2.4.6 Derivations

A Derivation is an influence chaining one or more generation and use steps. To model derivations, we introduce an auxiliary notion of derivation path. These paths are of the form

entngnactnunentn1...ent1g1act1u1ent0

where the enti are entities, acti are activities, gi are generations, and ui are usages.

Formally, we consider the (regular) language:

DerivationPaths=Entities(GenerationsActivitiesUsagesEntities)+

with the constraints that for each derivation path:

  • for each substring entgact we have generated(g)=(ent,act), and
  • for each substring actuent we have used(u)=(act,ent).

A set DerivationsInfluences with an associated function derivationPath:DerivationsDerivationPaths linking each derivation to a derivation path.

The derivationPath function links each dDerivations to a derivation path. A derivation has exactly one associated derivation path. However, if the PROV-N statement wasDerivedFrom(e_2,e_1,-,-,-) is asserted in an instance, there may be multiple derivation paths linking e2 to e1, each corresponding to a different path, identified by different derivations dDerivations.

A derivation path implies the existence of at least one chained generation and use step. However, not all such potential derivation paths are associated with derivations; there can (and in general will) be many such paths that are not associated with derivation steps. In other words, because we require derivations to be explicitly associated with derivation paths, it is not sound to infer the existence of a derivation from the existence of an alternating generation/use chain.

The reason why we need paths and not just individual derivation steps is to reflect that wasDerivedFrom(id,e2,e1,,,,attrs) formulas can represent multiple derivation steps. However, there is no way to force a derivation to take multiple steps. Any valid PROV instance has a model in which all derivation paths are one-step.

3.3 Additional axioms

Above we have stated some properties of the components. We impose some additional properties that relate several components, as follows:

  1. If generated(g)=(e,a1) and used(u)=(a2,e) then there exists cCommunications such that communicated(c)=(a2,a1).
  2. If eEntities then there exist gen,inv,a,a such that generated(gen)=(e,a) and invalidated(inv)=(e,a).
  3. If started(start)=(a2,e,a1) then there exists gen such that generated(gen)=(e,a1).
  4. If ended(end)=(a2,e,a1) then there exists gen such that generated(gen)=(e,a1).
  5. If dDerivations and prov:Revisionvalue(d,prov:type) and there exists w(GenerationsActivitiesUsesEntities) such that derivationPath(deriv)=e2we1DerivationPaths then thingOf(e1)=thingOf(e2).
  6. If attributedTo(att)=(e,ag) then there exist gen, assoc and a such that generated(gen)=(e,a) and associatedWith(assoc)=(a,ag).
  7. If actedFor(deleg)=(ag2,ag1,act) then there exist assoc1,assoc2,pl1,pl2 such that associatedWith(assoc1)=(ag1,act,pl1) and associatedWith(assoc2)=(ag2,act,pl2).
  8. If generated(id)=(e,a) then influenced(id)=(e,a).
  9. If used(id)=(e,a) then influenced(id)=(e,a).
  10. If communicated(id)=(a2,a1) then influenced(id)=(a2,a1).
  11. If started(id)=(a2,e,a1) then influenced(id)=(a2,e).
  12. If ended(id)=(a2,e,a1) then influenced(id)=(a2,e).
  13. If invalidated(id)=(e,a) then influenced(id)=(e,a).
  14. If derivationPath(id)=e2we1 then influenced(id)=(e2,e1).
  15. If attributedTo(id)=(e,ag) then influenced(id)=(e,ag).
  16. If associatedWith(id)=(a,ag,pl) then influenced(id)=(a,ag).
  17. If actedFor(id)=(ag2,ag1) then influenced(id)=(ag2,ag1).
  18. If generated(gen)=(e,a)=generated(gen) then gen=gen.
  19. If invalidated(inv)=(e,a)=invalidated(inv) then inv=inv.
  20. If started(st)=(a,e1,a) and started(st)=(a,e2,a) then st=st.
  21. If ended(end)=(a,e1,a) and ended(end)=(a,e2,a) then end=end.
  22. If started(st)=(a,e,a) then stevt for all evtevents(a)Invalidations.
  23. If ended(end)=(a,e,a) then evtend for all evtevents(a)Invalidations.
  24. If generated(gen)=(e,a) then genevt for all evtevents(e).
  25. If invalidated(inv)=(e,a) then evtinv for all evtevents(e).
  26. For any derivation deriv, with path derivationPath(deriv)=w, if e2gaue1 is a substring of w where e1,e2Entities, gGenerations, uUsages and aActivities then ug.
  27. For any derivation deriv, with path derivationPath(deriv)=e2we1, if generated(gen1)=(e1,a1) and generated(gen2)=(e2,a2) then gen1gen2.
  28. If associatedWith(assoc)=(a,ag,pl) and started(start)=(a,e1,a1) and invalidated(inv)=(ag,a2) then startinv.
  29. If associatedWith(assoc)=(a,ag,pl) and generated(gen)=(ag,a1) and ended(end)=(a,e2,a2) then genend.
  30. If associatedWith(assoc)=(a,ag,pl) and started(start)=(a,e1,a1) and ended(end)=(ag,e2,a2) then startend.
  31. If associatedWith(assoc)=(a,ag,pl) and started(start)=(ag,e1,a1) and ended(end)=(a,e2,a2) then startend.
  32. If attributedTo(attrib)=(e,ag) and generated(gen1)=(ag1,a1) and generated(gen2)=(e,a2) then gen1gen2.
  33. If attributedTo(attrib)=(e,ag) and started(start)=(ag1,e1,a1) and generated(gen)=(e,a2) then startgen.
  34. If actedFor(deleg)=(ag2,ag1,a) and generated(gen)=(ag1,a1) and invalidated(inv)=(ag2,a2) then geninv.
  35. If actedFor(deleg)=(ag2,ag1,a) and started(start)=(ag1,e1,a1) and ended(end)=(ag2,e2,a2) then startend.
  36. If eEntity and prov:emptyCollectionvalue(e,prov:type) then eCollections and members(e)=.

These properties are called axioms, and they are needed to ensure that the PROV-CONSTRAINTS inferences and constraints hold in all structures.

Axioms 22 and 23 do not require that invalidation events originating from an activity follow the activity's start event(s) or precede its end event(s). This is because there is no such constraint in PROV-CONSTRAINTS. Arguably, there should be a constraint analogous to Constraint 34 that specifies that any invalidation event in which an activity participates must follow the activity's start event(s) and precede its end event(s).

Here, we exempt invalidations from axioms 22 and 23 in order to simplify the proof of weak completeness.

3.4 Putting it all together

A PROV structure W is a collection of sets, functions, and relations containing all of the above described components and satisfying all of the associated properties and axioms. If we need to talk about the objects or relations of more than one structure then we may write W1.Objects, W1.Things, etc.; otherwise, to decrease notational clutter, when we consider a fixed structure then the names of the sets, relations and functions above refer to the components of that model.

Some features of PROV structures are relatively obvious or routine, corresponding directly to features of PROV and associated inferences. For example, the functions used,generated,invalidated,started,ended mapping events to their associated entities or activities, and communicated,associatedWith,attributedTo,actedFor associating other types of influences with appropriate data.

On the other hand, some features are more distinctive, and represent areas where formal modeling has been used to guide the development of PROV. Derivation paths are one such distinctive feature; they correspond to an intuition that derivations may describe one or multiple generation-use steps leading from one entity to another. Another distinctive feature is the use of Things, which correspond to changing, real-world things, as opposed to Entities, which correspond to limited views or perspectives on Things, with some fixed aspects. The semantic structures of Things and Entities provide a foundation for the alternateOf and specializationOf relations.

3.5 Interpretations

We need to link identifiers to the objects they denote. We do this using a function which we shall call an interpretation. An interpretation is a function ρ:IdentifiersObjects describing which object is the target of each identifier. The mapping from identifiers to objects may not change over time; only Objects can be denoted by Identifiers.

4. Semantics

In what follows, let W be a fixed structure with the associated sets and relations discussed in the previous section, and let ρ be an interpretation of identifiers as objects in W. The annotations [WF] refer to well-formedness constraints that correspond to typing constraints.

4.1 Satisfaction

Consider a formula ϕ, a structure W and an interpretation ρ. We define notation W,ρϕ which means that ϕ is satisfied in W,ρ. For atomic formulas, the definition of the satisfaction relation is given in the next few subsections. We give the standard definition of the semantics of the other formulas:

  1. W,ρTrue always holds.
  2. W,ρFalse never holds.
  3. W,ρx=y holds if and only if ρ(x)=ρ(y).
  4. W,ρ¬ϕ holds if and only if W,ρϕ does not hold.
  5. W,ρϕψ holds if and only if W,ρϕ and W,ρψ.
  6. W,ρϕψ holds if either W,ρϕ or W,ρψ.
  7. W,ρϕψ holds if W,ρϕ implies W,ρψ.
  8. W,ρx.ϕ holds if there exists some objObjects such that W,ρ[x:=obj]ϕ.
  9. W,ρx.ϕ holds if there for every objObjects we have W,ρ[x:=obj]ϕ.

In the semantics above, note that the domain of quantification is the set of Objects; that is, quantifiers range over entities, activities, agents, or influences (which are in turn further subdivided into types of influences). Things and relations cannot be referenced directly by identifiers.

A PROV instance I consists of a set of statements, each of which can be translated to an atomic formula following the definitional rules in PROV-CONSTRAINTS, possibly by introducing fresh existential variables. Thus, we can view an instance I as a set of atomic formulas {ϕ1,,ϕn}, or equivalently a single formula x1,,xk. ϕ1ϕn, where x1,,xk are the existential variables of I.

4.2 Attribute matching

We say that an object obj matches attributes [attr1=val1,...] in structure W provided: for each attribute attri, we have valiW.value(obj,attri). This is sometimes abbreviated as: match(W,obj,attrs).

4.3 Semantics of Element Formulas

4.3.1 Entity

An entity formula is of the form entity(id,attrs) where id denotes an entity.

Entity formulas entity(id,attrs) can be interpreted as follows:

W,ρentity(id,attrs) holds if and only if:

  1. [WF] id denotes an entity ent=ρ(id)Entities.
  2. the attributes match: match(W,ent,attrs).

Not all of the attributes of an entity object are required to be present in an entity formula about that object. For example, the following formulas all hold if x denotes an entity e such that value(e,a)={4,5},value(e,b)={6} hold:

 entity(x,[])
 entity(x,[a=5])
 entity(x,[a=4,a=5])
 entity(x,[a=4,b=6])

Note that PROV-CONSTRAINTS normalization will merge these formulas to a single one:

  entity(x,[a=4,a=5,b=6])

4.3.2 Activity

An activity formula is of the form activity(id,st,et,attrs) where id is a identifier referring to the activity, st is a start time and et is an end time, and attrs are the attributes of activity id.

W,ρactivity(id,st,et,attrs) holds if and only if:

  1. [WF] The identifier id maps to an activity act=ρ(id)Activities.
  2. ρ(st)Times is the activity's start time, that is: startTime(act)=ρ(st).
  3. ρ(et) is the activity's end time, that is: endTime(act)=ρ(et).
  4. There exists start,e,a such that started(start)=(act,e,a), and for all such start events $startTime(act) = time(start).
  5. There exists end,e,a such that ended(end)=(act,e,a), and for all such end events endTime(act)=time(end).
  6. The attributes match: match(W,act,attrs).

The above definition is complicated for two reasons. First, we need to ensure that every activity has a start and end event. Second, when an activity formula is asserted, we need to make sure all of the associated start and end event times match.

4.3.3 Agent

An agent formula is of the form agent(id,attrs) where id denotes the agent and attrs describes additional attributes.

W,ρagent(id,attrs) holds if and only if:

  1. [WF] id denotes an agent ag=ρ(id)Agents.
  2. The attributes match: match(W,ag,attrs).

4.4 Semantics of Relations

4.4.1 Generation

The generation formula is of the form wasGeneratedBy(id,e,a,t,attrs) where id is an event identifier, e is an entity identifier, a is an activity identifier, attrs is a set of attribute-value pairs, and t is a time.

W,ρwasGeneratedBy(id,e,a,t,attrs) holds if and only if:

  1. [WF] The identifier id denotes a generation event evt=ρ(id)Generations.
  2. [WF] The identifier e denotes an entity ent=ρ(e)Entities.
  3. [WF] The identifier a denotes an activity act=ρ(a)Activities.
  4. The event evt occurred at time ρ(t)Times, i.e. time(evt)=ρ(t).
  5. The activity act generated ent via evt, i.e. generated(evt)=(ent,act).
  6. The attribute values match: match(W,evt,attrs).

4.4.2 Use

The use formula is of the form used(id,a,e,t,attrs) where id denotes an event, a is an activity identifier, e is an object identifier, attrs is a set of attribute-value pairs, and t is a time.

W,ρused(id,a,e,t,attrs) holds if and only if:

  1. [WF] The identifier id denotes a usage event evt=ρ(id)Usages.
  2. [WF] The identifier a denotes an activity act=ρ(id)Activities.
  3. [WF] The identifier e denotes an entity ent=ρ(e)Entities.
  4. The event evt occurred at time ρ(t)Times, i.e. time(evt)=ρ(t).
  5. The activity act used obj via evt, i.e. used(evt)=(act,ent).
  6. The attribute values match: match(W,evt,attrs).

4.4.3 Invalidation

The invalidation formula is of the form wasInvalidatedBy(id,e,a,t,attrs) where id is an event identifier, e is an entity identifier, a is an activity identifier, attrs is a set of attribute-value pairs, and t is a time.

An invalidation formula W,ρwasInvalidatedBy(id,e,a,t,attrs) holds if and only if:

  1. [WF] The identifier id denotes an invalidation event evt=ρ(id)Invalidations.
  2. [WF] The identifier e denotes an entity ent=ρ(e)Entities.
  3. [WF] The identifier a denotes an activity act=ρ(a)Activities.
  4. The event evt occurred at time ρ(t)Times, i.e. time(evt)=ρ(t).
  5. The activity act invalidated ent via evt, i.e. invalidated(evt)=(ent,act).
  6. The attribute values match: match(W,evt,attrs).

4.4.4 Association

An association formula has the form wasAssociatedWith(id,a,ag,pl,attrs).

W,ρwasAssociatedWith(id,a,ag,pl,attrs) holds if and only if:

  1. [WF] assoc denotes an association assoc=ρ(id)Associations.
  2. [WF] a denotes an activity act=ρ(a)Activities.
  3. [WF] ag denotes an agent agent=ρ(ag)Agents.
  4. [WF] pl denotes a plan plan=ρ(pl)Plans.
  5. The association associates the agent with the activity and plan, i.e. associatedWith(assoc)=(agent,act,plan).
  6. The attributes match: match(W,assoc,attrs).

W,ρwasAssociatedWith(id,a,ag,,attrs) holds if and only if:

  1. [WF] assoc denotes an association assoc=ρ(id)Associations.
  2. [WF] a denotes an activity act=ρ(a)Activities.
  3. [WF] ag denotes an agent agent=ρ(ag)Agents.
  4. The association associates the agent with the activity and no plan, i.e. associatedWith(assoc)=(agent,act,).
  5. The attributes match: match(W,assoc,attrs).

4.4.5 Start

A start formula wasStartedBy(id,a2,e,a1,t,attrs) is interpreted as follows:

W,ρwasStartedBy(id,a2,e,a1,t,attrs) holds if and only if:

  1. [WF] id denotes a start event evt=ρ(id)Starts.
  2. [WF] a2 denotes an activity act2=ρ(a2)Activities.
  3. [WF] e denotes an entity ent=ρ(e)Entities.
  4. [WF] a1 denotes an activity act1=ρ(a1)Activities.
  5. The event happened at time t, that is, ρ(t)==time(evt).
  6. The activity act1 started act2 via entity ent: that is, started(evt)=(act2,ent,act1).
  7. The attributes match: match(W,evt,attrs).

4.4.6 End

An activity end formula wasEndedBy(id,a2,e,a1,t,attrs) is interpreted as follows:

W,ρwasEndedBy(id,a2,e,a1,t,attrs) holds if and only if:

  1. [WF] id denotes an end event evt=ρ(id)Ends.
  2. [WF] a2 denotes an activity act2=ρ(a2)Activities.
  3. [WF] e denotes an entity ent=ρ(e)Entities.
  4. [WF] a1 denotes an activity act1=ρ(a1)Activities.
  5. The event happened at the end of act2, that is, ρ(t)=endTime(act2)=time(evt).
  6. The activity act1 ended act2 via entity ent: that is, ended(evt)=(act2,ent,act1).
  7. The attributes match: match(W,evt,attrs).

4.4.7 Attribution

An attribution formula wasAttributedTo(id,e,ag,attrs) is interpreted as follows:

W,ρwasAttributedTo(id,e,ag,attrs) holds if and only if:

  1. [WF] id denotes an association assoc=ρ(id)Associations.
  2. [WF] e denotes an entity ent=ρ(e)Entities.
  3. [WF] ag denotes an agent agent=ρ(ag)Agents.
  4. The entity was attributed to the agent, i.e. attributedTo(assoc)=(ent,agent).
  5. The attributes match: match(W,assoc,attrs).

4.4.8 Communication

A communication formula wasInformedBy(id,a2,a2,attrs) is interpreted as follows:

W,ρwasInformedBy(id,a2,a1,attrs) holds if and only if:

  1. [WF] id denotes a communication comm=ρ(id)Communications.
  2. [WF] a1,a2 denote activities act1=ρ(a1)Activities,act2=ρ(a2)Activities.
  3. There exist gen,use,ent such that communicated(comm)=(act2,act1) and generated(gen)=(ent,act1) and used(use)=(act2,ent).
  4. The attributes match: match(W,comm,attrs).

4.4.9 Delegation

The actedOnBehalfOf(id,ag2,ag1,act,attrs) relation is interpreted as follows:

W,ρactedOnBehalfOf(id,ag2,ag1,act,attrs) holds if and only if:

  1. [WF] id denotes a delegation deleg=ρ(id)Delegations.
  2. [WF] a denotes an activity act=ρ(a)Activities.
  3. [WF] ag1,ag2 denote agents agent1=ρ(ag1),agent2=ρ(ag2)Agents.
  4. The agent agent2 acted for the agent agent1 with respect to the activity act, i.e. actedFor(deleg)=(agent2,agent1,act).
  5. The attributes match: match(W,deleg,attrs).

4.4.10 Derivation

Derivation formulas can be of one of two forms:

  • wasDerivedFrom(id,e2,e1,a,g,u,attrs), which specifies an activity, generation and usage event. For convenience we call this a precise derivation.
  • and wasDerivedFrom(id,e2,e1,,,,attrs), which does not specify an activity, generation and usage event. For convenience we call this an imprecise derivation.
4.4.10.1 Precise

A precise derivation formula has the form wasDerivedFrom(id,e2,e1,a,g,u,attrs).

W,ρwasDerivedFrom(id,e2,e1,act,g,u,attrs) holds if and only if:

  1. [WF] id denotes a derivation deriv=ρ(id)Derivations.
  2. [WF] e1,e2 denote entities ent1=ρ(e1),ent2=ρ(e2)Entities.
  3. [WF] a denotes an activity act=ρ(a)Activities.
  4. [WF] g denotes a generation event gen=ρ(g)Generations.
  5. [WF] u denotes a use event use=ρ(u)Usages.
  6. The derivation denotes a one-step derivation path linking the entities via the activity, generation and use: derivationPath(deriv)=ent2genactuseent1.
  7. The attribute values match: match(W,deriv,attrs).
4.4.10.2 Imprecise

An imprecise derivation formula has the form wasDerivedFrom(id,e2,e1,,,,attrs).

W,ρwasDerivedFrom(id,e2,e1,,,,attrs) holds if and only if:

  1. [WF] id denotes a derivation deriv=ρ(id)Derivations.
  2. [WF] e1,e2 denote entities ent1=ρ(e1),ent2=ρ(e2)Entities .
  3. derivationPath(deriv)=ent2went1 for some w.
  4. The attribute values match: match(W,deriv,attrs).

4.4.11 Influence

W,ρwasInfluencedBy(id,o2,o1,attrs) holds if and only if at least one of the following hold:

  1. [WF] id denotes an influence inf=ρ(id)Influences.
  2. [WF] o1 and o2 denote objects obj1=ρ(o1)Objects and obj2=ρ(o2)Objects.
  3. The influence inf links o2 with o1; that is, influenced(inf)=(o2,o1).
  4. The attribute values match: match(W,deriv,attrs).

4.4.12 Specialization

The specializationOf(e1,e2) relation indicates when one entity formula presents more specific aspects of another.

W,ρspecializationOf(e1,e2) holds if and only if:

  1. [WF] Both e1 and e2 are entity identifiers, denoting entities ent1=ρ(e1)Entities and ent2=ρ(e2)Entities.
  2. The two entities present aspects of the same thing, that is, thingOf(ent1)=thingOf(ent2).
  3. The events of ent1 are contained in those of ent2, i.e. events(ent1)events(ent2).
  4. For each attribute attr we have value(ent1,attr)value(ent2,attr).
  5. At least one of these inclusions is strict: that is, either events(ent1)events(ent2) or for some attr we have value(ent1,attr)value(ent2,attr).

The second criterion says that the two Entities present (possibly different) aspects of the same Thing. Note that the third criterion allows ent1 and ent2 to have the same events (or events(ent2) can be larger). The last criterion allows ent1 to have more defined attributes than ent2, but they must include the attributes defined by ent2. Two different entities that have the same attributes can also be related by specialization. The fifth criterion (indirectly) ensures that specialization is irreflexive.

4.4.13 Alternate

The alternateOf relation indicates when two entity formulas present (possibly different) aspects of the same thing. The two entities may or may not overlap in time.

W,ρalternateOf(e1,e2) holds if and only if:

  1. [WF] Both e1 and e2 are entity identifiers, denoting ent1=ρ(e1) and ent2=ρ(e2).
  2. The two objects refer to the same underlying Thing: thingOf(ent1)=thingOf(ent2)

4.4.14 Membership

The hadMember relation relates a collection to an element of the collection.

W,ρhadMember(c,e) holds if and only if:

  1. [WF] Both e1 and e2 are entity identifiers, denoting coll=ρ(c)Collections and ent=ρ(e)Entities.
  2. The entity ent is a member of the collection coll: that is, entmembers(coll).

4.5 Semantics of Auxiliary Formulas

In this section, we define the semantics of additional formulas concerning ordering, null values, and typing. These are used in the logical versions of constraints.

4.5.1 Precedes and Strictly Precedes

The precedes relation x precedes y holds between two events, one taking place before (or simultaneously with) another. Its meaning is defined in terms of the quasiordering on events specified by . The semantics of strictly precedes (x strictlyPrecedes y) is similar, only x must take place strictly before y. It is interpreted as , which we recall is defined from as xyxy and y⪯̸x.

  1. W,ρx precedes y holds if and only if ρ(x),ρ(y)Events and ρ(x)ρ(y).
  2. W,ρx strictlyPrecedes y holds if and only if ρ(x),ρ(y)Events and ρ(x)ρ(y).

The ordering of time values associated to events is unrelated to the event ordering. For example:

entity(e)
activity(a1)
activity(a2)
wasGeneratedBy(gen1; e, a1, 2011-11-16T16:05:00)
wasGeneratedBy(gen2; e, a2, 2012-11-16T16:05:00) //different date

This instance is valid, and must satisfy precedence constraints gen1 precedes gen2 and gen2 precedes gen1, but this does not imply anything about the relative orderings of the associated times, or vice versa.

4.5.2 notNull

The notNull(x) formula is used to specify that a value may not be the null value . The symbol "" always denotes the null value (i.e. ρ()=).

W,ρnotNull(e) holds if and only if ρ(e).

4.5.3 typeOf

The typing formula typeOf(x,t) constrains the type of the value of x.

  1. W,ρtypeOf(e,entity) holds if and only if ρ(e)Entities.
  2. W,ρtypeOf(a,activity) holds if and only if ρ(a)Activities.
  3. W,ρtypeOf(ag,agent) holds if and only if ρ(ag)Agents.
  4. W,ρtypeOf(c,Collection) holds if and only if ρ(c)Collections.
  5. W,ρtypeOf(c,EmptyCollection) holds if and only if ρ(c)Collections and members(ρ(c)=.

5. Inferences and Constraints

In this section we restate all of the inferences and constraints of PROV-CONSTRAINTS in terms of first-order logic. For each, we give a proof sketch showing why the inference or constraint is sound for reasoning about the semantics. We exclude the definitional rules in PROV-CONSTRAINTS because they are only needed for expanding the abbreviated forms of PROV-N statements to the logical formulas used here.

5.1 Inferences

id,a2,a1,attrs. wasInformedBy(id,a2,a1,attrs)e,gen,t1,use,t2. wasGeneratedBy(gen,e,a1,t1,[])used(use,a2,e,t2,[])

This follows immediately from the semantics of wasInformedBy.

gen,e,a1,t1,attrs1,use,a2,t2,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)used(use,a2,e,t2,attrs2)id. wasInformedBy(id,a2,a1,[])

This follows from the semantics of wasInformedBy and Axiom 1.

e,attrs. entity(e,attrs)gen,a1,t1,inv,a2,t2. wasGeneratedBy(gen,e,a1,t1,[])wasInvalidatedBy(inv,e,a2,t2,[])

This follows from Axiom 2, which requires that generation and invalidation events exist for each entity.

a,t1,t2,attrs. activity(a,t1,t2,attrs)start,e1,a1,end,a2,e2. wasStartedBy(start,a,e1,a1,t1,[])wasEndedBy(end,a,e2,a2,t2,[])

This follows from the semantics of activity formulas, specifically the requirement that start and end events exist for the activity.

id,a,e1,a1,t,attrs. wasStartedBy(id,a,e1,a1,t,attrs)gen,t1. wasGeneratedBy(gen,e1,a1,t1,[])

This follows from Axiom 3.

id,a,e1,a1,t,attrs. wasEndedBy(id,a,e1,a1,t,attrs)gen,t1. wasGeneratedBy(gen,e1,a1,t1,[])

This follows from Axiom 4.

id,e2,e1,a,gen2,use1,attrs. notNull(a)notNull(gen2)notNull(use1)wasDerivedFrom(id,e2,e1,a,gen2,use1,attrs)t1,t2. used(use1,a,e1,t1,[])wasGeneratedBy(gen2,e2,a,t2,[])

This follows from the semantics of precise derivation steps.

id,e1,e2,a,g,u. wasDerivedFrom(id,e2,e1,a,g,u,[prov:type=prov:Revision]))alternateOf(e2,e1)

This follows from the semantics of derivation steps (precise or imprecise) and Axiom 5.

att,e,ag,attrs. wasAttributedTo(att,e,ag,attrs)a,t,gen,assoc,pl. wasGeneratedBy(gen,e,a,t,[])wasAssociatedWith(assoc,a,ag,pl,[])

This follows from the semantics of generation, association, and attribution, by Axiom 6.

id,ag1,ag2,a,attrs. actedOnBehalfOf(id,ag1,ag2,a,attrs)id1,pl1,id2,pl2. wasAssociatedWith(id1,a,ag1,pl1,[])wasAssociatedWith(id2,a,ag2,pl2,[])

This follows from the semantics of association and delegation, by Axiom 7.


  1. id,e,a,t,attrs. wasGeneratedBy(id,e,a,t,attrs)wasInfluencedBy(id,e,a,attrs)

  2. id,a,e,t,attrs. used(id,a,e,t,attrs)wasInfluencedBy(id,a,e,attrs)

  3. id,a2,a1,attrs. wasInformedBy(id,a2,a1,attrs)wasInfluencedBy(id,a2,a1,attrs)

  4. id,a2,e,a1,t,attrs. wasStartedBy(id,a2,e,a1,t,attrs)wasInfluencedBy(id,a2,e,attrs)

  5. id,a2,e,a1,t,attrs. wasEndedBy(id,a2,e,a1,t,attrs)wasInfluencedBy(id,a2,e,attrs)

  6. id,e,a,t,attrs. wasInvalidatedBy(id,e,a,t,attrs)wasInfluencedBy(id,e,a,attrs)

  7. id,e2,e1,a,g,u,attrs. wasDerivedFrom(id,e2,e1,a,g,u,attrs)wasInfluencedBy(id,e2,e1,attrs)

  8. id,e,ag,attrs. wasAttributedTo(id,e,ag,attrs)wasInfluencedBy(id,e,ag,attrs)

  9. id,a,ag,pl,attrs. wasAssociatedWith(id,a,ag,pl,attrs)wasInfluencedBy(id,a,ag,attrs)

  10. id,ag2,ag1,a,attrs. actedOnBehalfOf(id,ag2,ag1,a,attrs)wasInfluencedBy(id,ag2,ag1,attrs)

This follows via Axioms 8 through 17.

e. entity(e)alternateOf(e,e)

Suppose ent=ρ(e). Clearly entEntities and thingOf(ent)=thingOf(ent), so W,ρalternateOf(e,e).

e1,e2,e3. alternateOf(e1,e2)alternateOf(e2,e3)alternateOf(e1,e3)

Suppose ent1=ρ(e1) and ent2=ρ(e2) and ent3=ρ(e3). Then by assumption ent1, ent2, and ent3 are in Entities and thingOf(e1)=thingOf(e2) and thingOf(e2)=thingOf(e3), so thingOf(e1)=thingOf(e3), as required to conclude W,ρalternateOf(e1,e3).

e1,e2. alternateOf(e1,e2)alternateOf(e2,e1)

Suppose ent1=ρ(e1) and ent2=ρ(e2). Then by assumption both ent1 and ent2 are in Entities and thingOf(e1)=thingOf(e2), as required to conclude W,ρalternateOf(e2,e1).

e1,e2,e3. specializationOf(e1,e2)specializationOf(e2,e3)specializationOf(e1,e3)

Suppose the conditions for specialization hold of ent1 and ent2 and for ent2 and ent3, where ent1=ρ(e1) and ent2=ρ(e2) and ent3=ρ(e3). Then events(e1)events(e2)events(e3). Moreover, value(obj2,attr)value(obj3,attr), and similarly value(obj1,attr)value(obj2,attr) so value(obj1,attr)value(obj3,attr). Finally, at least one of the inclusions between obj1 and obj2 is strict, so the same is the case for obj1 and obj3.

e1,e2. specializationOf(e1,e2)alternateOf(e1,e2)

If ent1=ρ(e1) and ent2=ρ(e2) are specializations, then thingOf(ent1)=thingOf(ent2).

e1,attrs,e2. entity(e1,attrs)specializationOf(e2,e1)entity(e2,attrs)

Suppose ent1=ρ(e1) and ent2=ρ(e2). Suppose (att,v) is an attribute-value pair in attrs. Since entity(e1,attrs) holds, we know that vvalue(ent1,att). Thus vvalue(ent2,att) since value(ent2,att)value(ent1,att). Since this is the case for all attribute-value pairs in attrs, and since e2 obviously denotes an entity, we can conclude W,ρentity(e2,attrs).

5.2 Constraints

5.2.1 Uniqueness constraints


  1. id,attrs1,attrs2.entity(id,attrs1)entity(id,attrs2)entity(id,attrs1attrs2)

  2. id,t1,t1,t2,t2,attrs1,attrs2. activity(id,t1,t2,attrs1)activity(id,t1,t2,attrs2)activity(id,t1,t2,attrs1attrs2)t1=t1t2=t2

  3. id,attrs1,attrs2.agent(id,attrs1)agent(id,attrs2)agent(id,attrs1attrs2).

These properties follow immediately from the definitions of the semantics of the respective assertions, because functions are used for the underlying data.


  1. id,e,e,a,a,t,t,attrs1,attrs2. wasGeneratedBy(id,e,a,t,attrs)wasGeneratedBy(id,e,a,t,attrs2)wasGeneratedBy(id,e,a,t,attrs1attrs2)e=ea=at=t

  2. id,e,e,a,a,t,t,attrs1,attrs2. used(id,a,e,t,attrs)used(id,a,e,t,attrs2)used(id,a,e,t,attrs1attrs2)e=ea=at=t

  3. id,a1,a2,a1,a2,attrs1,attrs2. wasInformedBy(id,a1,a2,attrs)wasInformedBy(id,a1,a2,attrs2)wasInformedBy(id,a1,a2,attrs1attrs2)a1=a1a2=a2

  4. id,e,ea1,a2,a1,a2,t,t,attrs1,attrs2. wasStartedBy(id,a2,e,a1,t,attrs1)wasStartedBy(id,a2,e,a1,t,attrs2)wasStartedBy(id,a2,e,a1,t,attrs1attrs2)a1=a1e=ea2=a2t=t

  5. id,e,ea1,a2,a1,a2,t,t,attrs1,attrs2. wasEndedBy(id,a2,e,a1,t,attrs1)wasEndedBy(id,a2,e,a1,t,attrs2)wasEndedBy(id,a2,e,a1,t,attrs1attrs2)a1=a1e=ea2=a2t=t

  6. id,e,e,a,a,t,t,attrs1,attrs2. wasInvalidatedBy(id,e,a,t,attrs1)wasInvalidatedBy(id,e,a,t,attrs2)wasInvalidatedBy(id,e,a,t,attrs1attrs2)e=ea=at=t

  7. id,e1,e1,e2,e2,a,a,g,g,u,u,attrs1,attrs2. wasDerivedFrom(id,e2,e1,a,g2,u1,attrs1)wasDerivedFrom(id,e2,e1,a,g2,u1,attrs2)wasDerivedFrom(id,e2,e1,a,g2,u1,attrs1attrs2)e1=e1e2=e2a=ag=gu=u

  8. id,e,e,ag,ag,attrs1,attrs2. wasAttributedTo(id,e,ag,attrs1)wasAttributedTo(id,e,ag,attrs2)wasAttributedTo(id,e,ag,attrs1attrs2)e=eag=ag

  9. id,a,a,ag,ag,pl,pl,attrs1,attrs2. wasAssociatedWith(id,a,ag,pl,attrs1)wasAssociatedWith(id,a,ag,pl,attrs2)wasAssociatedWith(id,a,ag,pl,attrs1attrs2)a=aag=agpl=pl

  10. id,ag1,ag1,ag2,ag2,a,a,attrs1,attrs2. actedOnBehalfOf(id,ag2,ag1,a,attrs1)actedOnBehalfOf(id,ag2,ag1,a,attrs2)actedOnBehalfOf(id,ag2,ag1,a,attrs1attrs2)ag1=ag1ag2=ag2a=a

  11. id,o1,o2,o1,o2,attrs1,attrs2. wasInfluencedBy(id,o2,o1,attrs1)wasInfluencedBy(id,o2,o1,attrs2)wasInfluencedBy(id,o2,o1,attrs1attrs2)o1=o1o2=o2

These properties follow immediately from the definitions of the semantics of the respective assertions, again because functions are used for the underlying data.

gen1,gen2,e,a,t1,t2,attrs1,attrs2. wasGeneratedBy(gen1,e,a,t1,attrs1)wasGeneratedBy(gen2,e,a,t2,attrs2)gen1=gen2

This follows from Axiom 18.

inv1,inv2,e,a,t1,t2,attrs1,attrs2. wasInvalidatedBy(inv1,e,a,t1,attrs1)wasInvalidatedBy(inv2,e,a,t2,attrs2)inv1=inv2

This follows from Axiom 19.

start1,start2,a,e1,e2,a0,t1,t2,attrs1,attrs2. wasStartedBy(start1,a,e1,a0,t1,attrs1)wasStartedBy(start2,a,e2,a0,t2,attrs2)start1=start2

This follows from Axiom 20.

end1,end2,a,e1,e2,a0,t1,t2,attrs1,attrs2. wasEndedBy(end1,a,e1,a0,t1,attrs1)wasEndedBy(end2,a,e2,a0,t2,attrs2)end1=end2

This follows from Axiom 21.

start,a1,a2,t,t1,t2,e,attrs,attrs1. activity(a2,t1,t2,attrs)wasStartedBy(start,a2,e,a1,t,attrs1)t1=t

This follows from the semantics of wasStartedBy, since the start times must both match that of the activity.

end,a1,a2,t,t1,t2,e,attrs,attrs1. activity(a2,t1,t2,attrs)wasEndedBy(end,a2,e,a1,t,attrs1)t2=t

This follows from the semantics of wasEndedBy, since the end times must both match that of the activity.

5.2.2 Ordering constraints

start,end,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)wasEndedBy(end,a,e2,a2,t2,attrs2)start precedes end

This follows from Axiom 22.

start1,start2,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start1,a,e1,a1,t1,attrs1)wasStartedBy(start2,a,e2,a2,t2,attrs2)start1 precedes start2

This follows from Axiom 22.

end1,end2,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasEndedBy(end1,a,e1,a1,t1,attrs1)wasEndedBy(end2,a,e2,a2,t2,attrs2)end1 precedes end2

This follows from Axiom 23.


  1. start,use,a,e1,e2,a1,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)used(use,a,e2,t2,attrs2)start precedes use

  2. use,end,a,e1,e2,a2,t1,t2,attrs1,attrs2. used(use,a,e1,t1,attrs1)wasEndedBy(end,a,e2,a2,t2,attrs2)use precedes end

Part 1 follows from Axiom 22 and part 2 follows from Axiom 23.


  1. start,gen,e1,e2,a,a1,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)wasGeneratedBy(gen,e2,a,t2,attrs2)start precedes gen

  2. gen,end,e,e1,a,a1,t,t1,attrs,attrs1. wasGeneratedBy(gen,e,a,t,attrs)wasEndedBy(end,a,e1,a1,t1,attrs1)gen precedes end

Part 1 follows from Axiom 22 and part 2 follows from Axiom 23.

id,start,end,a1,a1,a2,a2,e1,e2,t1,t2,attrs,attrs1,attrs2. wasInformedBy(id,a2,a1,attrs)wasStartedBy(start,a1,e1,a1,t1,attrs1)wasEndedBy(end,a2,e2,a2,t2,attrs2)start precedes end

This follows from the semantics of wasInformedBy, Axiom 24, and the previous two constraints, because wasInformedBy implies the existence of intermediate generation and usage events linking a1 and a2 through an entity e. The generation of e must precede its use.

gen,inv,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)wasInvalidatedBy(inv,e,a2,t2,attrs2)gen precedes inv

This follows from Axiom 24.

gen,use,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)used(use,a2,e,t2,attrs2)gen precedes use

This follows from Axiom 24.

use,inv,a1,a2,e,t1,t2,attrs1,attrs2. used(use,a1,e,t1,attrs1)wasInvalidatedBy(inv,e,a2,t2,attrs2)use precedes inv

This follows from Axiom 25.

gen1,gen2,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen1,e,a1,t1,attrs1)wasGeneratedBy(gen2,e,a2,t2,attrs2)gen1 precedes gen2

This follows from Axiom 24.

inv1,inv2,e,a1,a2,t1,t2,attrs1,attrs2. wasInvalidatedBy(inv1,e,a1,t1,attrs1)wasInvalidatedBy(inv2,e,a2,t2,attrs2)inv1 precedes inv2

This follows from Axiom 25.

d,e1,e2,a,gen2,use1,attrs. notNull(a)notNull(gen2)notNull(use1)wasDerivedFrom(d,e2,e1,a,gen2,use1,attrs)use1 precedes gen2

This follows from Axiom 26.

d,gen1,gen2,e1,e2,a,a1,a2,g,u,t1,t2,attrs,attrs1,attrs2. wasDerivedFrom(d,e2,e1,a,g,u,attrs)wasGeneratedBy(gen1,e1,a1,t1,attrs1)wasGeneratedBy(gen2,e2,a2,t2,attrs2)gen1 strictlyPrecedes gen2

This follows from Axiom 27.


  1. gen,start,e,a,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)wasStartedBy(start,a,e,a2,t2,attrs2)gen precedes start

  2. start,inv,e,a,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e,a1,t1,attrs1)wasInvalidatedBy(inv,e,a2,t2,attrs2)start precedes inv

Part 1 follows from Axiom 24. Part 2 follows from Axiom 25.


  1. gen,end,e,a,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)wasEndedBy(end,a,e,a2,t2,attrs2)gen precedes end

  2. end,inv,e,a,a1,a2,t1,t2,attrs1,attrs2. wasEndedBy(end,a,e,a1,t1,attrs1)wasInvalidatedBy(inv,e,a2,t2,attrs2)end precedes inv

Part 1 follows from Axiom 24. Part 2 follows from Axiom 25.

gen1,gen2,e1,e2,a1,a2,t1,t2,attrs1,attrs2. specializationOf(e2,e1)wasGeneratedBy(gen1,e1,a1,t1,attrs1)wasGeneratedBy(gen2,e2,a2,t2,attrs2)gen1 precedes gen2

This follows from Axiom 24 and the fact that if e2 specializes e1 then all of the events of e2 are events of e1. Thus, the generation of e1 precedes all events of e2.

inv1,inv2,e1,e2,a1,a2,t1,t2,attrs1,attrs2. specializationOf(e1,e2)wasInvalidatedBy(inv1,e1,a1,t1,attrs1)wasInvalidatedBy(inv2,e2,a2,t2,attrs2)inv1 precedes inv2

This follows from Axiom 25 and the fact that if e2 specializes e1 then all of the events of e2 are events of e1. Thus, the invalidation of e1 follows all events of e2.

In the following inferences, pl may be a placeholder -.


  1. assoc,start1,inv2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)wasStartedBy(start1,a,e1,a1,t1,attrs1)wasInvalidatedBy(inv2,ag,a2,t2,attrs2)start1 precedes inv2

  2. assoc,gen1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)wasGeneratedBy(gen1,ag,a1,t1,attrs1)wasEndedBy(end2,a,e2,a2,t2,attrs2)gen1 precedes end2

  3. assoc,start1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)wasStartedBy(start1,a,e1,a1,t1,attrs1)wasEndedBy(end2,ag,e2,a2,t2,attrs2)start1 precedes end2

  4. assoc,start1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)wasStartedBy(start1,ag,e1,a1,t1,attrs1)wasEndedBy(end2,a,e2,a2,t2,attrs2)start1 precedes end2

The four parts follow from Axiom 28 through Axiom 31 respectively.


  1. att,gen1,gen2,e,a1,a2,t1,t2,ag,attrs,attrs1,attrs2. wasAttributedTo(att,e,ag,attrs)wasGeneratedBy(gen1,ag,a1,t1,attrs1)wasGeneratedBy(gen2,e,a2,t2,attrs2)gen1 precedes gen2

  2. att,start1,gen2,e,e1,a1,a2,ag,t1,t2,attrs,attrs1,attrs2. wasAttributedTo(att,e,ag,attrs)wasStartedBy(start1,ag,e1,a1,t1,attrs1)wasGeneratedBy(gen2,e,a2,t2,attrs2)start1 precedes gen2

These properties follow from Axiom 32 and Axiom 33.


  1. del,gen1,inv2,ag1,ag2,a,a1,a2,t1,t2,attrs,attrs1,attrs2. actedOnBehalfOf(del,ag2,ag1,a,attrs)wasGeneratedBy(gen1,ag1,a1,t1,attrs1)wasInvalidatedBy(inv2,ag2,a2,t2,attrs2)gen1 precedes inv2

  2. del,start1,end2,ag1,ag2,a,a1,a2,e1,e2,t1,t2,attrs,attrs1,attrs2. actedOnBehalfOf(del,ag2,ag1,a,attrs)wasStartedBy(start1,ag1,e1,a1,t1,attrs1)wasEndedBy(end2,ag2,e2,a2,t2,attrs2)start1 precedes end2

These properties follow from Axiom 34 and Axiom 35.

5.2.3 Typing constraints


  1. e,attrs. entity(e,attrs)typeOf(e,entity)

  2. ag,attrs. agent(ag,attrs)typeOf(ag,agent)

  3. a,t1,t2,attrs. activity(a,t1,t2,attrs)typeOf(a,activity)

  4. u,a,e,t,attrs. used(u,a,e,t,attrs)typeOf(a,activity)typeOf(e,entity)

  5. g,a,e,t,attrs. wasGeneratedBy(g,e,a,t,attrs)typeOf(a,activity)typeOf(e,entity)

  6. inf,a2,a1,t,attrs. wasInformedBy(inf,a2,a1,t,attrs)typeOf(a1,activity)typeOf(a2,activity)

  7. start,a2,e,a1,t,attrs. wasStartedBy(start,a2,e,a1,t,attrs)typeOf(a1,activity)typeOf(a2,activity)typeOf(e,entity)

  8. end,a2,e,a1,t,attrs. wasEndedBy(end,a2,e,a1,t,attrs)typeOf(a1,activity)typeOf(a2,activity)typeOf(e,entity)

  9. inv,a,e,t,attrs. wasInvalidatedBy(inv,e,a,t,attrs)typeOf(a,activity)typeOf(e,entity)

  10. id,e2,e1,a,g2,u1,attrs. notNull(a)notNull(g2)notNull(u1)wasDerivedFrom(id,e2,e1,a,g2,u1,attrs)typeOf(e2,entity)typeOf(e1,activity)typeOf(a,activity)

  11. id,e2,e1,attrs. wasDerivedFrom(id,e2,e1,,,,attrs)typeOf(e2,entity)typeOf(e1,activity)

  12. id,e,ag,attrs. wasAttributedTo(id,e,ag,attrs)typeOf(e,entity)typeOf(ag,agent)

  13. id,a,ag,pl,attrs. notNull(pl)wasAssociatedWith(id,a,ag,pl,attrs)typeOf(a,activity)typeOf(ag,agent)typeOf(pl,entity)

  14. id,a,ag,attrs. wasAssociatedWith(id,a,ag,,attrs)typeOf(a,activity)typeOf(ag,agent)

  15. id,ag2,ag1,a,attrs. actedOnBehalfOf(id,ag2,ag1,a,attrs)typeOf(ag2,agent)typeOf(ag1,agent)typeOf(a,activity)

  16. e2,e1. alternateOf(e2,e1)typeOf(e2,entity)typeOf(e1,entity)

  17. e2,e1. specializationOf(e2,e1)typeOf(e2,entity)typeOf(e1,entity)

  18. c,e. hadMember(c,e)typeOf(c,Collection)typeOf(e,entity)

  19. c. entity(c,[prov:type=prov:emptyCollection]))typeOf(c,entity)typeOf(c,Collection)typeOf(c,EmptyCollection)

Each typing constraint follows immediately from well-formedness criteria marked [WF] in the corresponding semantics for formulas. The final constraint requires Axiom 36.

5.2.4 Impossibility constraints


  1. id,e1,e2,g,attrs. notNull(g)wasDerivedFrom(id,e2,e1,,g,,attrs)False

  2. id,e1,e2,u,attrs. notNull(u)wasDerivedFrom(id,e2,e1,,,u,attrs)False

  3. id,e1,e2,g,u,attrs. notNull(g)notNull(u)wasDerivedFrom(id,e2,e1,,g,u,attrs)False

Each part follows from the fact that the semantics of wasDerivedFrom only allows formulas to hold when either all three of a,g,u are "" (denoting ) or none of them are.

e. specializationOf(e,e)False

This follows from the fact that in the semantics of specializationOf, the requirement that one of the inclusions is strict implies that the two entities cannot be the same.

For each r and s{used,wasGeneratedBy,wasInvalidatedBy,wasStartedBy,wasEndedBy,wasInformedBy,wasAttributedTo,wasAssociatedWith,actedOnBehalfOf} such that r and s are different relation names, the following constraint holds:

id,a1,,am,b1,,bn. r(id,a1,,am)s(id,b1,,bn)False

This follows from the assumption that the different kinds of influences are disjoint sets, characterized by their types. Note that generic influences are allowed to overlap with more specific kinds of influence.

For each p{entity,activity,agent} and each r{used,wasGeneratedBy,wasInvalidatedBy,wasStartedBy,wasEndedBy,wasInformedBy,wasAttributedTo,wasAssociatedWith,actedOnBehalfOf,wasInfluencedBy}, the following constraint holds:

id,a1,,am,b1,,bn. p(id,a1,,am)r(id,b1,,bn)False

This follows from the assumption that influences are distinct from other objects (entities, activities or agents).

id. typeOf(id,entity)typeOf(id,activity)False

This follows from the assumption that entities and activities are disjoint.

c,e. hasMember(c,e)typeOf(c,EmptyCollection)False

This follows from the definition of the semantics of typeOf(c,EmptyCollection), which requires that there are no members of the collection denoted by c.

6. Soundness and Completeness

Above we have presented arguments for the soundness of the constraints and inferences with respect to the semantics. Here, we relate the notions of validity and normal form defined in PROV-CONSTRAINTS to the semantics.

6.1 Soundness

Our main soundness result is:

Let W be a PROV structure, that is, a structure providing all of the components above and satisfying all of the axioms.

  1. If I is an instance and WI and I is obtained from I by applying one of the PROV inferences, then WI.
  2. If I is an instance and WI and I is obtained from I by applying one of the PROV key or uniqueness constraints, then WI.
  3. If I is an instance and WI then I has a normal form I and WI.
  4. If I is a normal form and WI then I satisfies all of the ordering, typing and impossibility constraints.
  5. If WI then I is valid.

For part 1, the arguments are as in the previous section.

For part 2, if WI then since W satisfies the logical forms of all uniqueness and key constraints, constraint application cannot fail on I and WI.

For part 3, proceed by induction on a terminating sequence of inference or uniqueness constraint steps: if I is in normal form then we are done. If I is not in normal form then if an inference is applicable, then use part 1; if a uniqueness constraint is applicable, then use part 2.

For part 4, the arguments are as in the previous section for each constraint.

Finally, for part 5, suppose WI. Then WI where I is the normal form of I by part 2. By part 3, I satisfies all of the remaining constraints, so I is valid.

6.2 Weak Completeness

In this section we give a translation from valid PROV instances to structures, and show that a valid PROV instance has a model. We call this property weak completeness.

The term weak refers to the fact that there are still some inferences that are sound in the semantics but not enforced by validation. For example, consider the following (valid) PROV instance fragment:

entity(e,[a=1])
agent(e,[b=2])
    

This instance is valid and has a model, but in every model satisfying the instance, it is also true that:

entity(e,[a=1,b=2])
agent(e,[a=1,b=2])
    

Thus, weak completeness captures the fact that every valid instance has a model, but does not imply that a valid instance satisfies all of the deductions possible in that model.

Let I be a valid PROV instance that is in normal form. We define a structure M(I) as follows, by giving the sets, functions and relations specified in the components in Section 3, and finally verifying that the axioms hold.

First, without loss of generality, we assume that all times specified in activity or event formulas in I are ground values. If not, set each variable in such a position to some dummy value. This is justified by the following fact:

If I is valid then S(I) is valid, where S is any substitution that maps time variables to time constants.

First, consider a substitution S=[t:=c] that maps a single time variable to a constant. It is straightforward to check that if I is in normal form, then S(I) is in normal form, since none of the inferences or uniqueness constraints can be enabled by changing a time variable uniformly in I. Similarly, the remaining constraints are insensitive to the time values, so S(I) is in normal form and satisfies all of the remaining constraints just as I does. The general case of a substitution that replaces multiple time variables with constants is a straightforward generalization since we can view such a substitution as a composition of single-variable substitutions.

6.2.1 Sets

The sets of structure M(I) are:

Entities={idItypeOf(id,entity)}Plans={plid,ag,ac,attrs. wasAssociatedWith(id,ag,act,pl,attrs)I,pl}Collections={cItypeOf(c,prov:Collection) or ItypeOf(c,prov:EmptyCollection)}Activities={idItypeOf(id,activity)}{aid,aididEntities}{aidid,e2,e1. wasDerivedFrom(id,e2,e1,,,,attrs)I}Agents={idItypeOf(id,agent)}Usages={ida,e,t,attrs. used(id,a,e,t,attrs)I}{uidid,e2,e1,attrs. wasDerivedFrom(id,e2,e1,,,,attrs)I}Generations={ide,a,t,attrs. wasGeneratedBy(id,e,a,t,attrs)I}{gidid,e2,e1,attrs. wasDerivedFrom(id,e2,e1,,,,attrs)I}{gididEntities}Invalidations={ide,a,t,attrs. wasInvalidatedBy(id,e,a,t,attrs)I}{iididEntities}Starts={ida,e,a,t,attrs. wasStartedBy(id,a,e,a,t,attrs)I}Ends={ida,e,a,t,attrs. wasEndedBy(id,a,e,a,t,attrs)I}Events=UsagesGenerationsInvalidationsStartsEndsAssociations={idag,act,pl,attrs. wasAssociatedWith(id,ag,act,pl,attrs)I}Attributions={ide,ag,attrs. wasAttributedTo(id,e,ag,attrs)I}Delegations={idag2,ag1,attrs. actedOnBehalfOf(id,ag2,ag1,act,attrs)I}Communications={ida2,a1,attrs. wasInformedBy(id,a2,a1,attrs)I}Derivations={ide2,e1,a,g,u,attrs. wasDerivedFrom(id,e2,e1,a,g,u,attrs)I}Influences=EventsAssociationsAttributionsCommunicationsDelegations{ido2,o1,attrs. wasInfluencedBy(id,o2,o1,attrs)I}Objects=EntitiesActivitiesAgentsInfluences

In the definitions of Entities, Collections, Activities and Agents we use the notation ItypeOf(id,t) to indicate that id must have type t in I according to the typing constraints. For example, for entities, this means that the set Entities contains all identifiers e,e appearing in the entity(e,attrs), alternateOf(e,e), or specializationOf(e,e) formulas, as well as all tose appearing in the appropriate positions of other formulas, as specified in the typing constraints.

In the definitions of Activities, Generations, Invalidations, and Usages we write aid, gid, iid and uid respectively to indicate additional activities, generations and usages added for imprecise derivations or entities.

In addition, to define the set of Things, we introduce an equivalence relation on Entities as follows:

e1e2alternateOf(e1,e2)I

The fact that this is an equivalence relation follows from the fact that I is in normal form, since the constraints on alternateOf ensure that it is an equivalence relation. Recall that given an equivalence relation on some set X, the equivalence class of xX is the set [x]={yXxy}. The quotient of X by an equivalence relation on X is the set of equivalence classes, X={[x]xX}. Now we define the set of Things as the quotient of -equivalence classes of Entities.

Things=Entities/={[e]eEntities}

Observe that since I is normalized and valid, entities and activities are disjoint, the influences are disjoint from entities, activities, and agents, and the different subsets of events and influences are pairwise disjoint, as required.

6.2.2 Functions

First, we consider the functions associated with Entities.

events(e)={idused(id,a,e,t,attrs)I}{idwasGeneratedBy(id,e,a,t,attrs)I}{idwasInvalidatedBy(id,e,a,t,attrs)I}{idwasStartedBy(id,a,e,a,t,attrs)I}{idwasEndedBy(id,a,e,a,t,attrs)I}{ge,ie}events(e)=events(e)specializationOf(e,e)Ievents(e)value(e,a)={ventity(e,attrs)I,(a=v)attrs}(auniq)value(e,uniq)={uniqe}value(e,a)=value(e)specializationOf(e,e)Ivalue(e)thingOf(e)=[e]

Above, we introduce a fresh attribute name uniq, not already in use in I, along with a fresh value e and for each entity e we add a value uniqe to values(e,uniq). This construction ensures that if an entity is a specialization of another in I then the specialization relationship will hold in M(I). We also define the set of all events involved in e as the set of events immediately involved in e or any specialization of e. Similarly, the values of attributes of e are those immediately declared for e along with those of any e that e specializes. We also introduce dummy generation and invalidation events for each entity e, along with activities ae,ae to perform them.

Similarly, for Things, we employ an auxiliary function events:ThingsP(Events) that collects the set of all events in which one of the entities constituting the thing participated.

events(T)=eTevents(e)value(T,a,evt)=eT,evtevents(e)value(e,a)

The functions events, startTime and endTime mapping activities to their start and end times are defined as follows:

events(a)={idused(id,a,e,t,attrs)I}{idwasGeneratedBy(id,e,a,t,attrs)I}{idwasInvalidatedBy(id,e,a,t,attrs)I}{idwasStartedBy(id,a,e,a,t,attrs)I}{idwasEndedBy(id,a,e,a,t,attrs)I}{ge,ie}startTime(id)=t1 where activity(a,t1,t2,attrs)IendTime(id)=t2 where activity(a,t1,t2,attrs)I

The start and end times are arbitrary (say, some zero value) for activities with no activity formula declaring the times. The above definitions of startTime and endTime ignore any start times asserted in wasStartedBy or wasEndedBy formulas. If both activity and wasStartedBy/wasEndedBy statements are present, then they must match, but PROV-CONSTRAINTS does not require that the times of multiple start or end events match for an activity with no activity statement.

The following valid instance exemplifies the above discussion, when t1t2:

wasStartedBy(id1;a,e1,a1,t1,[])
wasStartedBy(id2;a,e2,a2,t2,[])

This instance becomes invalid if we add an activity(a,[]) statement, because it expands to activity(a,T1,T2,[]) where T1,T2 are existential variables, and uniqueness constraints require that t1=T1=t2, which leads to uniqueness constraint failure.

For other Objects besides Entities and Activities, the associated sets of Events are defined to be empty. (An Agent that happens to be an Entity or Activity will have the set of events defined above for the appropriate kind of object. Note that since Entities and Activities are disjoint, this definition is unambiguous.)

The function time mapping Events to their Times is defined as follows:

time(id)=t where used(id,a,e,t,attrs)Itime(id)=t where wasGeneratedBy(id,e,a,t,attrs)Itime(id)=t where wasInvalidatedBy(id,e,a,t,attrs)Itime(id)=t where wasStartedBy(id,a,e,a,t,attrs)Itime(id)=t where wasEndedBy(id,a,e,a,t,attrs)I

This definition is deterministic because the sets of identifiers of different Events are disjoint, and the associated times are unique.

The functions giving the interpretations of the different identified influences are as follows:

used(id)=(a,e) where used(id,a,e,t,attrs)Iused(uid)=(aid,e1) where wasDerivedFrom(id,e2,e1,,,,attrs)Igenerated(id)=(e,a) where wasGeneratedBy(id,e,a,t,attrs)Igenerated(gid)=(e2,aid) where wasDerivedFrom(id,e2,e1,,,,attrs)Igenerated(ge)=(e,ae) where eEntitiesinvalidated(id)=(e,a) where wasInvalidatedBy(id,e,a,t,attrs)Iinvalidated(ie)=(e,ae) where eEntitiesstarted(id)=(a,e,a) where wasStartedBy(id,a,e,a,t,attrs)Iended(id)=(a,e,a) where wasEndedBy(id,a,e,a,t,attrs)IassociatedWith(id)=(ag,act,pl) where wasAssociatedWith(id,ag,act,pl,attrs)IattributedTo(id)=(e,ag) where wasAttributedTo(id,e,ag,attrs)IactedFor(id)=(ag2,ag1,act) where actedOnBehalfOf(id,ag2,ag1,act,attrs)Icommunicated(id)=(a2,a1) where wasInformedBy(id,a2,a1,attrs)IderivationPath(id)=e2gaue1 where wasDerivedFrom(id,e2,e1,a,g,u,attrs)IderivationPath(id)=e2gidaiduide1 where wasDerivedFrom(id,e2,e1,,,,attrs)I

Note that since I is normalized and valid, by the uniqueness constraints these functions are all well-defined. In the case for imprecise derivations, we generate additional activities, generations and usages linking e2 to e1.

The definition of the influenced function is more involved, and is as follows:

influenced(id)=used(id)generated(id)invalidated(id){(a,e)(a,e,a)started(id)}{(a,e)(a,e,a)ended(id)}{(ag,act)(ag,act,pl)associatedWith(id)}attributedTo(id){(ag2,ag1)(ag2,ag1,act)actedFor(id)}communicated(id){(e2,e1)e2we1derivationPath(id)}{(o2,o1)wasInfluencedBy(id,o2,o1)I}

This definition ensures that by construction influenced(id) contains all of the other associated relationships. For any specific id, however, most of the above sets will be empty, and the final line will often be redundant. It is not always redundant, because it is possible to assert an unspecified influence in I.

It is straightforward to verify (by their definitions) that the event sets associated with entities and activities satisfy the side-conditions in Component 9.

Finally, the collection membership function members is defined as follows:

members(c)={ehadMember(c,e)I

6.2.3 Relations

We introduced a relation corresponding to alternateOf above, in defining Things, but this relation is not a component of the semantics.

The event ordering relation is defined as follows:

evtevt(evt,evt)GI

closed under reflexivity and transitivity. Here, we are using a slight abuse of notation: we write GI for the directed graph that is used during validation of I to test for cycles among event ordering constraints. See Sec. 7.1 of PROV-CONSTRAINTS [PROV-CONSTRAINTS].

6.2.4 Axioms

To verify that the construction of M(I) yields a PROV structure, we must ensure that all of the axioms and side-conditions in the components are satisfied. As noted above, the disjointness constraints are satisfied by construction.

For each axiom we give the corresponding justification:

  1. Axiom 1 follows because I is normalized with respect to Inference 6.
  2. Axiom 2 follows from the construction, since we add dummy generation and invalidation events for every entity.
  3. Axioms 3 and 4 follow because I is normalized with respect to Inference 9 and 10 respectively.
  4. Axiom 5 follows because I is normalized with respect to Inference 12.
  5. Axioms 6 and 7 follow because I is normalized with respect to Inference 13 and 14 respectively.
  6. Axioms 8 through 17 follow because I is normalized with respect to Inference 15.
  7. Axioms 18 through 21 follow because I is normalized with respect to uniqueness constraints 24 through 27.
  8. Axiom 22 follows because constraints 30, 31, 33, 34 ensure that a start event for an activity precedes any other start, end, usage or generation events involving that activity.
  9. Axiom 23 follows because constraints 30, 32, 33, 34 ensure that an end event for an activity follows any other events involving that activity.
  10. Axiom 24 follows because constraints 34, 36, 37, 39 ensure that a generation event for an entity precedes any other events involving that entity.
  11. Axiom 25 follows because constraints 36, 38, 40, 43, 44 ensure that an invalidation event for an entity follows any other generation, usage, or invalidation events involving that entity.
  12. Axiom 26 follows from constraint 41.
  13. Axiom 27 follows from constraint 42 and from the fact that the event ordering constraint graph GI associated with a valid instance I cannot have any cycles involving a strict precedence edge.
  14. Axioms 28 through 31 follow from Constraint 47.
  15. Axioms 32 and 33 follow from Constraint 48.
  16. Axioms 34 and 35 follow from Constraint 49.
  17. Axiom 36 follows from Constraint 50, part 19, and the semantics of typeof.

6.2.5 Main results

The main results of this section are that if a valid PROV instance I has a model MI that satisfies all of the inferences and constraints. Thus, a form of completeness holds: every valid PROV instance has a model.

Suppose J is a valid PROV instance. Then there exists a PROV structure M such that MJ.

First, we consider the case where J itself is a valid, normalized PROV instance I, with no existential variables, and let M(I) be the corresponding structure. Then M(I) is a PROV structure, satisfying all of the axioms (and hence all of the inferences and constraints) stated above.

Moreover, M(I)I, as can be verified on a case-by-case basis for each type of formula by considering its semantics and the definition of the construction of M. Most cases are straightforward; we consider the cases of alternateOf and specializationOf since they are among the most interesting.

  • Suppose alternateOf(e1,e2)I. We wish to show that M(I)alternateOf(e1,e2). Since there are no existential variables in I, we know that e1,e2M(I).Entities. Moreover, e1e2 according to the equivalence relation defined above, and so thingOf(e1)=[e1]=[e2]=thingOf(e2), so we can conclude that M(I)alternateOf(e1,e2).
  • Suppose specializationOf(e1,e2)I. We wish to show that M(I)specializationOf(e1,e2). Again, clearly e1,e2Entities, and since I satisfies all inferences, we know that alternateOf(e1,e2)I so clearly thingOf(e2)=thingOf(e1) as argued above. Next,
    events(e1)=events(e1)specializationOf(e,e1)Ievents(e)events(e2)specializationOf(e,e2)Ievents(e2)=events(e2)
    because specializationOf(e1,e2)I and all e that are specializations of e1 are also specializations of e2. Furthermore, for each attr,
    value(e1,attr)=value(e1,attr)specializationOf(e1,e)Ivalue(e,attr)value(e2,attr)specializationOf(e2,e)Ivalue(e,attr)=value(e2,attr)
    for the same reason. Finally, by construction uniqe1value(e1,uniq) and uniqe1value(e2,uniq) so the inclusion is strict for the special attribute uniq. Thus, we have verified all of the conditions necessary to conclude M(I)specializationOf(e1,e2).

Next, we show how to handle a normalized, valid I contains existential variables x1,,xn. Choose fresh constants c1,,cn of appropriate types for the existential variables and define ρ(xi)=ci. Then M(ρ(I))ρ(I) by the above argument. Moreover, M(ρ(I)),ρI. So M(ρ(I)) is itself the desired model.

Finally, to handle the case where J is an arbitrary valid instance, we need to show that if J is not in normal form, and normalizes to some I such that MI, then MJ. We can prove this by induction on the length of the sequence of normalization steps. The base case, when J=I, is established already. Suppose J normalizes in n+1 steps and we can perform one normalization step on it to obtain J, which normalizes to I in n steps. By induction, we know that MJ. For each possible normalization step, we must show that if MJ then MJ.

First consider inference steps. These add information, that is, JJ. Hence it is immediate that MJ since every formula in J is in J, and all formulas of J are satisfied in M.

Next consider uniqueness constraint steps, which may involve merging formulas. That is, J=J0{r(id,a1,,an,attrs1),r(id,b1,,bn,attrs2)} and J=S(J0){r(id,S(a1),,S(an),attrs1attrs2)}, where S is a unifying substitution making S(ai)=S(bi) for each i{1,,n}. Since MJ, we must have M,ρJ for some ρ, and therefore we must also have that M,ρS(J0) and M,ρr(id,S(a1),,S(an),attrs1attrs2). We can extend ρ to a valuation ρ such that M,ρS(x1)=x1S(xk)=xk where dom(S)={x1,,xk}. Also, M,ρJ0 and M,ρr(id,a1,,an,attrs1attrs2). Moreover, since S is a unifier, we also have M,ρr(id,b1,,bn,attrs1attrs2). Finally, since we can always remove attributes from an atomic formula without damaging its satisfiability, we can conclude that M,ρr(id,a1,,an,attrs1)r(id,b1,,bn,attrs2). To conclude, we have shown MJ0{r(id,a1,,an,attrs1),r(id,b1,,bn,attrs2)}, that is, MJ, as desired.

A. Acknowledgements

This document has been produced by the PROV Working Group, and its contents reflect extensive discussion within the Working Group as a whole as well as feedback and comments from external reviewers. Thanks specifically to Khalid Belhajjame, Tom De Nies, Paolo Missier, Simon Miles, Luc Moreau, Satya Sahoo, Jan van den Bussche, Joachim Van Herwegen, and Antoine Zimmermann for detailed feedback.

We would also like to acknowledge Schloss Dagstuhl - Leibniz Center for Informatics, because significant progress was made on this document at Dagstuhl Seminar 12091 (Principles of Provenance) that took place from February 26 to March 2, 2012.

Thanks also to Robin Berjon for the ReSPec.js specification writing tool and to MathJax for their LaTeX-to-HTML conversion tools, both of which aided in the preparation of this document.

Members of the Provenance Working Group at the time of publication of this document were: Ilkay Altintas (Invited expert), Reza B'Far (Oracle Corporation), Khalid Belhajjame (University of Manchester), James Cheney (University of Edinburgh, School of Informatics), Sam Coppens (iMinds - Ghent University), David Corsar (University of Aberdeen, Computing Science), Stephen Cresswell (The National Archives), Tom De Nies (iMinds - Ghent University), Helena Deus (DERI Galway at the National University of Ireland, Galway, Ireland), Simon Dobson (Invited expert), Martin Doerr (Foundation for Research and Technology - Hellas(FORTH)), Kai Eckert (Invited expert), Jean-Pierre EVAIN (European Broadcasting Union, EBU-UER), James Frew (Invited expert), Irini Fundulaki (Foundation for Research and Technology - Hellas(FORTH)), Daniel Garijo (Universidad Politécnica de Madrid), Yolanda Gil (Invited expert), Ryan Golden (Oracle Corporation), Paul Groth (Vrije Universiteit), Olaf Hartig (Invited expert), David Hau (National Cancer Institute, NCI), Sandro Hawke (W3C/MIT), Jörn Hees (German Research Center for Artificial Intelligence (DFKI) Gmbh), Ivan Herman, (W3C/ERCIM), Ralph Hodgson (TopQuadrant), Hook Hua (Invited expert), Trung Dong Huynh (University of Southampton), Graham Klyne (University of Oxford), Michael Lang (Revelytix, Inc.), Timothy Lebo (Rensselaer Polytechnic Institute), James McCusker (Rensselaer Polytechnic Institute), Deborah McGuinness (Rensselaer Polytechnic Institute), Simon Miles (Invited expert), Paolo Missier (School of Computing Science, Newcastle university), Luc Moreau (University of Southampton), James Myers (Rensselaer Polytechnic Institute), Vinh Nguyen (Wright State University), Edoardo Pignotti (University of Aberdeen, Computing Science), Paulo da Silva Pinheiro (Rensselaer Polytechnic Institute), Carl Reed (Open Geospatial Consortium), Adam Retter (Invited Expert), Christine Runnegar (Invited expert), Satya Sahoo (Invited expert), David Schaengold (Revelytix, Inc.), Daniel Schutzer (FSTC, Financial Services Technology Consortium), Yogesh Simmhan (Invited expert), Stian Soiland-Reyes (University of Manchester), Eric Stephan (Pacific Northwest National Laboratory), Linda Stewart (The National Archives), Ed Summers (Library of Congress), Maria Theodoridou (Foundation for Research and Technology - Hellas(FORTH)), Ted Thibodeau (OpenLink Software Inc.), Curt Tilmes (National Aeronautics and Space Administration), Craig Trim (IBM Corporation), Stephan Zednik (Rensselaer Polytechnic Institute), Jun Zhao (University of Oxford), Yuting Zhao (University of Aberdeen, Computing Science).

B. References

B.1 Informative references

[PROV-AQ]
Graham Klyne; Paul Groth; eds. Provenance Access and Query. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-aq-20130430/
[PROV-CONSTRAINTS]
James Cheney; Paolo Missier; Luc Moreau; eds. Constraints of the PROV Data Model. 30 April 2013, W3C Recommendation. URL: http://www.w3.org/TR/2013/REC-prov-constraints-20130430/
[PROV-DC]
Daniel Garijo; Kai Eckert; eds. Dublin Core to PROV Mapping. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-dc-20130430/
[PROV-DICTIONARY]
Tom De Nies; Sam Coppens; eds. PROV-Dictionary: Modeling Provenance for Dictionary Data Structures. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-dictionary-20130430/
[PROV-DM]
Luc Moreau; Paolo Missier; eds. PROV-DM: The PROV Data Model. 30 April 2013, W3C Recommendation. URL: http://www.w3.org/TR/2013/REC-prov-dm-20130430/
Luc Moreau; Timothy Lebo; eds. Linking Across Provenance Bundles. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-links-20130430/
[PROV-N]
Luc Moreau; Paolo Missier; eds. PROV-N: The Provenance Notation. 30 April 2013, W3C Recommendation. URL: http://www.w3.org/TR/2013/REC-prov-n-20130430/
[PROV-O]
Timothy Lebo; Satya Sahoo; Deborah McGuinness; eds. PROV-O: The PROV Ontology. 30 April 2013, W3C Recommendation. URL: http://www.w3.org/TR/2013/REC-prov-o-20130430/
[PROV-OVERVIEW]
Paul Groth; Luc Moreau; eds. PROV-OVERVIEW: An Overview of the PROV Family of Documents. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
[PROV-PRIMER]
Yolanda Gil; Simon Miles; eds. PROV Model Primer. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/
[PROV-XML]
Hook Hua; Curt Tilmes; Stephan Zednik; eds. PROV-XML: The PROV XML Schema. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-xml-20130430/