Abstract
Provenance is information about entities, activities, and people
involved in producing a piece of data or thing, which can be used to
form assessments about its quality, reliability or
trustworthiness. PROV-DM is the conceptual data model that forms a
basis for the W3C provenance (PROV) family of specifications.
 This document presents a model-theoretic semantics for the PROV
data model (called the reference semantics), viewing
PROV-DM statements as atomic formulas in the sense of first-order
logic, and viewing the constraints and inferences specified in
PROV-CONSTRAINTS as a first-order theory. It is shown that the
first-order theory is sound with respect to the reference semantics.
This information may be useful to researchers or users of PROV to
understand the intended meaning and use of PROV for modeling
information about the actual history, derivation or evolution of Web
resources.  It may also be useful for development of additional
constraints or inferences for reasoning about PROV or integration of
PROV with other Semantic Web vocabularies.  It is not proposed
as a canonical or required semantics of PROV and does not place any
constraints on use of PROV.
The PROV Document Overview describes the overall state of PROV, and should be read before other PROV documents.
Status of This Document
  
    
      
        
          This section describes the status of this document at the time of its publication. Other
          documents may supersede this document. A list of current W3C publications and the latest revision
          of this technical report can be found in the W3C technical reports
          index at http://www.w3.org/TR/.
        
        
PROV Family of Documents
This document is part of the PROV family of documents, a set of documents defining various aspects that are necessary to achieve the vision of inter-operable
interchange of provenance information in heterogeneous environments such as the Web.  These documents are listed below. Please consult the [PROV-OVERVIEW] for a guide to reading these documents. 
-  PROV-OVERVIEW (To be published as Note), an overview of the PROV family of documents [PROV-OVERVIEW];
-  PROV-PRIMER (To be published as Note), a primer for the PROV data model [PROV-PRIMER];
-  PROV-O (Proposed Recommendation), the PROV ontology, an OWL2 ontology allowing the mapping of PROV to RDF [PROV-O];
-  PROV-DM (Proposed Recommendation), the PROV data model for provenance [PROV-DM];
-  PROV-N (Proposed Recommendation), a notation for provenance aimed at human consumption [PROV-N];
-  PROV-CONSTRAINTS (Proposed Recommendation), a set of constraints applying to the PROV data model  (this document);
-  PROV-XML (To be published as Note),  an XML schema for the PROV data model [PROV-XML];
-  PROV-AQ (To be published as Note), the mechanisms for accessing and querying provenance [PROV-AQ]; 
-  PROV-DICTIONARY (To be published as Note) introduces a specific type of collection, consisting of key-entity pairs [PROV-DICTIONARY];
-  PROV-DC (To be published as Note) provides a mapping between PROV and Dublic Core Terms [PROV-DC];
-  PROV-SEM (To be published as Note), a declarative specification in terms of first-order logic of the PROV data model [PROV-SEM];
-  PROV-LINKS (To be published as Note) introduces a mechanism to link across bundles [PROV-LINKS].
          This document was published by the Provenance Working Group as an Editor's Draft.
          
          
          If you wish to make comments regarding this document, please send them to 
          public-prov-comments@w3.org 
          (subscribe,
          archives).
          
          
          
          
        All comments are welcome.
        
        
          
            Publication as an Editor's Draft does not imply endorsement by the W3C Membership.
            This is a draft document and may be updated, replaced or obsoleted by other documents at 
            any time. It is inappropriate to cite this document as other than work in progress.
          
        
        
        
          
            This document was produced by a group operating under the 
            5 February 2004 W3C Patent Policy.
          
          
          
            
              W3C maintains a public list of any patent disclosures 
            
            made in connection with the deliverables of the group; that page also includes instructions for 
            disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains
            Essential Claim(s) must disclose the
            information in accordance with section
            6 of the W3C Patent Policy.
          
          
        
        
      
    
  
     
      1. Introduction
 
Provenance is a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing.
This document complements
  the PROV-DM specification [PROV-DM] that defines a data model for
  provenance on the Web, and the PROV-CONSTRAINTS specification
[PROV-CONSTRAINTS] that
specifies definitions, inferences, and constraints that can be used to
reason about PROV documents, or determine their validity.  This document
provides a reference formal semantics of PROV, providing a formal
counterpart to the informal descriptions and motivations given
elsewhere in PROV specifications.
1.1 Purpose of this document
The PROV-DM and PROV-CONSTRAINTS give motivating examples that
provide an intuition about the meaning of the constructs.  For some
concepts, such as use, start, end, generation/invalidation, and
derivation, the meaning is either obvious or situation-dependent.
However, during the development of PROV, the importance of additional
concepts became evident, but the intuitive meaning or correct use of
these concepts were not clear.  For example, the alternateOf and
specializationOf relations are used in PROV to relate different
entities that present aspects of "the same thing".  
Over time the working group came to a
consensus about these concepts and how they are to be used, but this
understanding is based on abstract notions that are not explicit in
PROV documents; instead, some of their properties are captured
formally through certain constraints and inferences, while others are
not captured in PROV specifications at all.
The purpose of this document is to present the working group's
consensus view of the semantics of PROV, using tools from mathematical
logic, principally model theory (though our use of these tools is lightweight).  This information may be useful to users for understanding the
intent behind certain features of PROV, to researchers investigating
richer forms of reasoning over provenance, or to future efforts
building upon PROV.  It is intended as an exploration of a reference
semantics for PROV, not a definitive specification of the  only
semantics of PROV.  
 Nevertheless, the reference semantics does have appealing
properties.  Specifically, it provides a declarative counterpart to
the operational definition of validity taken in PROV-CONSTRAINTS.  In
the specification, validity is defined via a normalization process
followed by constraint checking on the normal form.  This approach was adopted
to keep the specification closer to implementations, although other
implementations are possible and allowed.  In addition to providing a
reference semantics, this document shows that the operational
presentation of PROV validity checking is sound with respect to
declarative presentation adopted here.  This could help justify
alternative approaches to validity checking.
 1.2 Structure of this document
 
  - Section 2 summarizes the basic concepts
  from mathematical logic used in the semantics,  recapitulates how
  PROV statements can be viewed as atomic formulas, and introduces
  some auxiliary formulas.
- Section 3 presents the mathematical
  structures used for situations that PROV statements can
  describe.
- Section 4 defines the semantics of PROV
  statements and auxiliary formulas, indicating when a given formula
  is satisfied in a structure.
- Section 5 presents the inferences and
  constraints from PROV-CONSTRAINTS as first-order formulas, and gives
  brief justifications for their soundness.
- Section 6 relates the PROV notion of
  validity to the logical notion of consistency.  A PROV instance is
  valid if and only if its corresponding first-order theory is
  satisfiable.
Note
  Completeness direction above doesn't hold for current semantics,
  because attributes assumed to be single-valued; for
  example, entity(id,[a=1,a=2]) is valid but doesn't have a
  reference model.
  
1.3  Audience 
This document assumes familiarity with [PROV-DM] and
  [PROV-CONSTRAINTS] and employs (a
  simplified form of) 
[PROV-N] notation.  In particular it assumes familiarity with the concepts
  from logic, and the relationship between PROV statements and
  instances and first-order formulas and theories, respectively,
  presented in Section 2.5 of PROV-CONSTRAINTS.
  This document may be useful to users of PROV who have a formal
  background and are interested in the rationale for some of the
  constructs of PROV; for resaerchers investigating extensions of PROV
  or alternative approaches to reasoning about PROV; or for future
  efforts on provenance standardization.  
  
  
  
2.  Basics 
2.1  Identifiers 
A lowercase symbol x,y,... on its own denotes an identifier.
Identifiers may or may not be URIs.  Identifiers are viewed as variables in logic (or blank nodes in RDF): just because we have two different identifiers x and y doesn't tell us that they denote different things, since we could discover that they are actually the same later.  We write Identifiers for the set of identifiers of interest in a given situation (typically, the set of identifiers present in the PROV instance of interest).
2.2  Times and Intervals 
We assume a linearly ordered set (Times,≤) of time instants.  For convenience we assume the order is total or linear order, corresponding to a linear timeline; however, PROV does not assume that time is linear and events could be partially ordered and not necessarily reconciled to a single global clock.  
We also consider a set Intervals of closed intervals of the form {t∣t1≤t≤t2}.
2.3  Attributes and Values 
We assume a set Attributes of attribute labels and a set Values
of possible values of attributes.  To allow for the fact that some
attributes can have undefined or multiple values, we sometimes use the set
P(Value), that is, the set of sets of values.
 
2.4 Atomic Formulas
The following atomic formulas correspond to the statements of PROV-DM.  We assume that definitions 1-4 of PROV-CONSTRAINTS have been applied in order to expand all optional parameters; thus, we use uniform notation r(id,a1,…,an) instead of the semicolon notation r(id;a1,…,an).
Each parameter is either an identifier, a constant (e.g. a time or other literal value in an attribute list), or a null symbol "-".  Null symbols can only appear in the specified arguments in wasAssociatedWith and wasDerivedFrom, as shown in the grammar below.
atomic_formulaelement_formularelation_formulaauxiliary_formulaattrsty::=||::=||::=|||||||||||||||::=||||::=::=||||element_formularelation_formulaauxiliary_formulaentity(id,attrs)activity(id,st,et,attrs)agent(id,attrs)wasGeneratedBy(id,e,a,t,attrs)used(id,e,a,t,attrs)wasInvalidatedBy(id,e,a,t,attrs)wasStartedBy(id,a2,e,a1,attrs)wasEndedBy(id,a2,e,a1,attrs)wasAssociatedWith(id,ag,act,pl,attrs)wasAssociatedWith(id,ag,act,−,attrs)wasAttributedTo(id,e,ag,attrs)actedOnBehalfOf(if,ag2,ag1,act,attrs)wasInformedBy(id,a2,a1,attrs)wasDerivedFrom(id,e2,e1,act,g,u,attrs)wasDerivedFrom(id,e2,e1,−,−,−,attrs)wasInfluencedBy(id,x,y,attrs)alternateOf(e1,e2)specializationOf(e1,e2)hadMember(c,e)x=yx≺yx⪯ynotNull(x)typeOf(x,ty)[attr1=v1,…,attrn=vn]entityactivityagentprov:Collectionprov:EmptyCollection
3.  Structures and Interpretations 
3.1  Things 
 
Note
TODO: Containment of things / collections? (for hadMember).
Things are things in the world.  Each thing has a lifetime during which it exists and attributes whose values can change over time.
To model this, a structure W includes:
  -  a set Things of things
-  a function lifetime:Things→Intervals from objects to time intervals
- a function value:Things×Attributes×Times→P(Values)
The range of value is the set P(Values), indicating that value
is essentially a multi-valued that returns a set of values (possibly empty).    When value(x,a,t)=∅, we say that attribute a is undefined for x at time t.
Note that this description does not say what the structure of a
Thing is, only how it may be described in terms of its time interval
and attribute values.  An object could just be a record of fixed
attribute values; it could be a bear; it could be the Royal Society;
it could be a transcendental number like π.  All that matters from
our point of view is that we know how to map the Thing to its time interval and attribute mapping.
It is possible for two Things to be indistinguishable by their attribute values and lifetime, but have different identity.
3.2  Objects 
An Object is described by a time interval and attributes with unchanging values.  Objects encompass entities, interactions, and activities.
To model this, a structure includes:
  -  a set Objects 
-  a function lifetime:Objects→Intervals from objects to time intervals
-  a function value:Objects×Attributes→P(Values)
 
Intuitively, lifetime(e) is the time interval during which object
e exists.  The set value(e,a) is the set of values of attribute a during the object's lifetime.
As with Things, the range of value is sets of values,
making value effectively a multivalued function.  It is also possible to have two different objects that are indistinguishable by their attributes and time intervals.  Objects are not things, and the sets of Objects and Things are disjoint; however, certain objects, namely entities, are linked to things.
3.2.1  Entities 
An entity is a kind of object that describes a time-slice of a thing, during which some of the thing's attributes are fixed. We assume:
  -  a set Entities⊆Objects of entities, disjoint from Activities and Events below.
-  a function thingOf:Entities→Things that associates each Entity with a Thing, such that for each t∈lifetime(obj), and for each attribute a we have value(obj,a)⊆value(thingOf(obj),a,t).
-  lifetime(e)⊆lifetime(t).
 
  
3.2.1.1  Plans 
We identify a specific subset of the entities called
  plans:
  A set Plans⊆Entities of plans.
   
  
    
3.2.2  Actvities 
An activity is an object that encompasses a set of events.  We introduce
  - A set Activities⊆Objects of activities.
-  Activities are disjoint from Entities: Entities∩Activities=∅.
 
  
  
3.2.3  Agents 
An agent is an object that can act, by controlling, starting,
  ending, or participating in activities.  Agents can act on behalf of
  other agents. An agent can be an entity, an activity, or neither; an
  agent cannot be both entity and activity because the sets of
  entities and activities are disjoint.  We introduce:
  A set Agents⊆Objects of agents.
   
3.2.4  Interactions 
We consider a set Interactions⊆Objects which are split into
  Events connecting entities and activities,
  Associations between agents and activities,
  Communications between pairs of activities,
  Delegations between pairs of agents, and Derivations that describe chains of generation and usage steps.  (The first two sets may overlap.)  Interactions are disjoint from entities, activities and agents.
  -  A set Interactions=Events∪Associations∪Communications∪Delegations∪Derivations⊆Objects
-  A function type:Interactions→{start,end,usage,generation,invalidation,derivation,version,quotation,primarySource,attribution,delegation}.
-  The sets Events, Associations, Communications, Delegations
  and Derivations are all disjoint.
-  Interactions are disjoint from entities, agents and
activities:  Interactions∩(Entities∪Activities∪Agents)=∅
 
3.2.4.1  Events 
An Event is an interaction whose lifetime is a single time
instant, and relates an activity to an entity (which could be an
agent).  Events have types including usage, generation, invalidation, starting and ending.  Events are instantaneous.  We introduce:
-  A set Events⊆Interactions of events, such that
  type(evt)∈{start,end,generation,usage,invalidation} if and
  only of evt∈Events.
-  A function time:Events→Times giving the time of each event; i.e. lifetime(evt)={time(t)}.
-  The derived ordering on events given by evt1≤evt2⟺time(evt1)≤time(evt2)
 
3.2.4.2  Associations 
An Association is an interaction relating an agent to an activity.  To model associations, we introduce:
  A set Associations⊆Interactions, such that
  type(assoc)=association if and only if assoc∈Associations.
   
  
Associations are used below in the ActsFor and AssociatedWith relations.
  
  
  
  
  3.2.4.5  Derivations 
A Derivation is an interaction chaining one or more generation and use steps.  
  A set Derivations⊆Interactions, such that
  type(deriv)∈{derivation,version,primarySource,quotation}
  if and only if deriv∈Derivations.
 
See below for the associated derivation path and DerivedFrom relation.
3.3  Relations 
 Simple relations 
The entities, interactions, and activities in a structure are related in the following ways:
-  A relation Used⊆Events×Entities saying when an event used an entity.  An event can use at most one entity, and if (evt,e)∈Used then time(evt)∈lifetime(e) and type(g)=use must hold.
-  A relation Generated⊆Events×Entities saying when an event generated an entity.  An event can generate at most one entity, and if (evt,e)∈Generated then min(lifetime(e))=time(evt) and type(g)=generation must hold.
-  A relation Invalidated⊆Events×Entities saying when an event invalidated an entity.  An event can invalidate at most one entity, and if (evt,e)∈Invalidated then min(lifetime(e))=time(evt) and type(g)=invalidation must hold.
-  A relation EventActivity⊆Events×Activities associating activities with events, such that (act,evt)∈EventActivity implies time(evt)∈lifetime(act).
-  A relation AssociatedWith⊆Association×Agents×Activities×Plans⊥ indicating when an agent is associated with an activity, and giving the identity of the association relationship, and an optional plan. 
-  A relation ActsFor⊆Delegations×Agents×Agents×Activities indicating when one agent acts on behalf of another with respect to a given activity.  
Note
  TODO: Communication, start, end relations
  
Note
  TODO: Specialization relation
  
Note
    TODO: Explicit axioms concerning the relations
  
3.3.1  Derivation paths and DerivedFrom 
Recall that above we introduced a subset of interactions called Derivations.  These identify paths of the form 
entn⋅gn⋅actn⋅un⋅entn−1⋅...⋅ent1⋅g1⋅act1⋅u1⋅ent0
where the enti are entities, acti are activities, gi are generations, and ui are usages.
Formally, we consider the (regular) language:
DerivationPaths=Entities⋅(Events⋅Activities⋅Events⋅Entities)+
with the constraints that for each derivation path:
- for each substring ent⋅g⋅act we have (g,ent)∈Generated  and (g,act)∈EventActivities, and
- for each substring act⋅u⋅ent we have (u,ent)∈Used and (u,act)∈EventActivities.
We also consider a function derivedFrom:Derivations→DerivationPaths  linking each derivation to its path.
3.4  Putting it all together 
A structure W is a structure containing all of the above
described data.  If we need to talk about the objects or relations of
more than one structure then we may write W1.Objects, W1.Things,
etc.; otherwise, to
decrease notational clutter, when we consider a fixed structure then the names of the sets, relations and functions above refer to the components of that model.
3.5  Interpretations 
We need to link identifiers to the objects they denote.  We do this using a function which we shall call an interpretation.
 Thus, we consider interpretations as follows:
An interpretation is a function  ρ:Identifiers→Objects describing
which object is the target of each identifier. The mapping from identifiers to objects may not change over time. 
4.  Semantics 
In what follows, let W be a fixed structure with the associated sets and relations discussed in the previous section, and let I be an interpretation of identifiers as objects in W.
The annotations [WF] refer to well-formedness constraints that correspond to typing constraints.
4.1  Satisfaction 
Consider a formula ϕ, a structure W and an interpretation
 I.
We define notation W,ρ⊨ϕ which means that ϕ is
 satisfied in W,ρ. For atomic formulas, the definition of the
 satisfaction relation is given in the next few subsections.  We give
 the standard definition of the semantics of the other formulas:
  
    
    
4.2  Attribute matching 
We say that an object obj matches attributes [attr1=val1,...] in structure W provided:
for each attribute attri, we have vali∈W.value(obj,attri).
This is sometimes abbreviated as: match(W,obj,attrs).
    
4.4  Semantics of Relations 
4.4.1  Generation 
The generation formula is of the form
wasGeneratedBy(id,e,a,t,attrs) where id is an event identifier,
e is an entity identifier, a is an activity identifier, attrs is
a set of attribute-value pairs, and t is a time.
4.4.2  Use 
The use formula is of the form used(id,a,e,t,attrs) where id denotes an event, a is an activity identifier, e is an object identifier, attrs is a set of attribute-value pairs, and t is an optional time.
4.4.3  Invalidation 
The invalidation formula is of the form wasInvalidatedBy(id,e,a,t,attrs) where id is an event identifier, e is an entity identifier, a is an activity identifier, attrs is a set of attribute-value pairs, and t is an optional time.
4.4.4  Association 
An association formula has the form wasAssociatedWith(id,a,ag,pl,attrs).
4.4.6  End 
An activity end formula wasEndedBy(id,a2,e,a1,attrs) is interpreted as follows:
4.4.7  Attribution 
An attribution formula wasAttributedTo(id,e,ag,attrs) is interpreted as follows:
4.4.9  Responsibility 
The actedOnBehalfOf(id,ag2,ag1,act,attrs) relation is interpreted using the ActsFor relation as follows:
4.4.10  Derivation 
4.4.10.1  Precise 
A precise derivation formula has the form wasDerivedFrom(id,e2,e1,a,g,u,attrs).
4.4.10.2  Imprecise 
An imprecise derivation formula has the form wasDerivedFrom(id,e2,e1,−,−,−,attrs).
  
  4.4.11 Influence
  Note
TODO: Define influence semantics.
4.4.12  Specialization 
The specializationOf(e1,e2) relation indicates when one entity formula presents more specific aspects of another.  
  Note
    TODO: The content of this definition may be moved into the
    structure W via an irreflexive/transitive specialization relation, since by
    itself this definition is not transitive.
     The second criterion says that the two Entities present aspects of
the same Thing. Note that the third criterion allows obj1 and
obj2 to have the same lifetime (or that of obj2 can be larger).
The last criterion allows obj1 to have more defined attributes than
obj2, but they must include the attributes defined by obj2.
4.4.13  Alternate 
The alternateOf relation indicates when two entity formulas present (possibly different) aspects of the same thing.  The two entities may or may not overlap in time.
4.4.14  Membership 
The hadMember relation relates a collection to an element of the collection.
Note
  Additional constraints needed above to refer to (not yet defined)
collection   structure of entities/things.
   5.  Inferences and Constraints 
    
    In this section we restate all of the inferences and constraints
    of PROV-CONSTRAINTS in terms of first-order logic.  For each, we
    give a proof sketch showing why the inference or constraint is
    sound for reasoning about the reference semantics.  We exclude the
    definitional rules in PROV-CONSTRAINTS because they are only
    needed for expanding the abbreviated forms of PROV-N statements to
    the logical formulas used here.
5.1 Inferences
∀id,a2,a1,attrs. wasInformedBy(id,a2,a1,attrs)⇒∃e,gen,t1,use,t2. wasGeneratedBy(gen,e,a1,t1,[])∧used(use,a2,e,t2,[]) 
∀gen,e,a1,t1,attrs1,id2,a2,t2,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)∧used(id2,a2,e,t2,attrs2)⇒∃id. wasInformedBy(id,a2,a1,[]) 
∀e,attrs. entity(e,attrs)⇒∃gen,a1,t1,inv,a2,t2. wasGeneratedBy(gen,e,a1,t1,[])∧wasInvalidatedBy(inv,e,a2,t2,[]) 
∀a,t1,t2,attrs. activity(a,t1,t2,attrs)⇒∃start,e1,a1,end,a2,e2. wasStartedBy(start,a,e1,a1,t1,[])∧wasEndedBy(end,a,e2,a2,t2,[]) 
∀id,a,e1,a1,t,attrs. wasStartedBy(id,a,e1,a1,t,attrs)⇒∃gen,t1. wasGeneratedBy(gen,e1,a1,t1,[]) 
∀id,a,e1,a1,t,attrs. wasEndedBy(id,a,e1,a1,t,attrs)⇒∃gen,t1. wasGeneratedBy(gen,e1,a1,t1,[]) 
In this inference, none of a,gen2, or use1 can be placeholders -.
∀id,e2,e1,a,gen2,use1,attrs. notNull(a)∧notNull(gen2)∧notNull(use1)∧wasDerivedFrom(id,e2,e1,a,gen2,use1,attrs)⇒∃t1,t2. used(use1,a,e1,t1,[])∧wasGeneratedBy(gen2,e2,a,t2,[])In this inference, any of a,gen2, or use1 can be placeholders -.
∀id,e1,e2,a,g,u. wasDerivedFrom(id,e2,e1,a,g,u,[prov:type=prov:Revision]))⇒alternateOf(e2,e1)∀att,e,ag,attrs. wasAttributedTo(att,e,ag,attrs)⇒∃a,t,gen,assoc,pl. wasGeneratedBy(gen,e,a,t,[])∧wasAssociatedWith(assoc,a,ag,pl,[]) 
∀id,ag1,ag2,a,attrs. actedOnBehalfOf(id,ag1,ag2,a,attrs)⇒∃id1,pl1,id2,pl2. wasAssociatedWith(id1,a,ag1,pl1,[])∧wasAssociatedWith(id2,a,ag2,pl2,[]) 
- ∀id,e,a,t,attrs. wasGeneratedBy(id,e,a,t,attrs)⇒wasInfluencedBy(id,e,a,attrs)
- ∀id,a,e,t,attrs. used(id,a,e,t,attrs)⇒wasInfluencedBy(id,a,e,attrs)
- ∀id,a2,a1,attrs. wasInformedBy(id,a2,a1,attrs)⇒wasInfluencedBy(id,a2,a1,attrs)
- ∀id,a2,e,a1,t,attrs. wasStartedBy(id,a2,e,a1,t,attrs)⇒wasInfluencedBy(id,a2,e,attrs)
- ∀id,a2,e,a1,t,attrs. wasEndedBy(id,a2,e,a1,t,attrs)⇒wasInfluencedBy(id,a2,e,attrs)
- ∀id,e,a,t,attrs. wasInvalidatedBy(id,e,a,t,attrs)⇒wasInfluencedBy(id,e,a,attrs)
- ∀id,e2,e1,a,g,u,attrs. wasDerivedFrom(id,e2,e1,a,g,u,attrs)⇒wasInfluencedBy(id,e2,e1,attrs)
- In this rule, a,g, or u may be placeholders -. ∀id,e,ag,attrs. wasAttributedTo(id,e,ag,attrs)⇒wasInfluencedBy(id,e,ag,attrs)
- In this rule, pl may be a placeholder -. ∀id,a,ag,pl,attrs. wasAssociatedWith(id,a,ag,pl,attrs)⇒wasInfluencedBy(id,a,ag,attrs)
- ∀id,ag2,ag1,a,attrs. actedOnBehalfOf(id,ag2,ag1,a,attrs)⇒wasInfluencedBy(id,ag2,ag1,attrs)
∀e. entity(e)⇒alternateOf(e,e) 
   Suppose ent=rho(e).  Clearly e∈Entities and
  thingOf(e)=thingOf(e), so W,ρ⊨alternateOf(e,e).
   
∀e1,e2,e3. alternateOf(e1,e2)∧alternateOf(e2,e3)⇒alternateOf(e1,e3) 
   Suppose ent1=rho(e1) and ent2=ρ(e2) and ent3=ρ(e3).  Then  by
  assumption ent1, ent2, and ent3 are in Entities and
  thingOf(e1)=thingOf(e2) and thingOf(e2)=thingOf(e3), so
  thingOf(e1)=thingOf(e3), as required to conclude W,ρ⊨alternateOf(e2,e1).
   
∀e1,e2. alternateOf(e1,e2)⇒alternateOf(e2,e1) 
   Suppose ent1=rho(e1) and ent2=ρ(e2).  Then  by
  assumption both ent1 and ent2 are in Entities and
  thingOf(e1)=thingOf(e2), as required to conclude W,ρ⊨alternateOf(e2,e1).
   
  
∀e1,e2,e3. specializationOf(e1,e2)∧specializationOf(e2,e3)⇒specializationOf(e1,e3) 
   Suppose the conditions for specialization hold of ent1 and
  ent2 and for ent2 and ent3, where ent1=ρ(e1) and ent2=ρ(e2) and ent3=ρ(e3). Then lifetime(e1)⊆lifetime(e2)⊆lifetime(e3).  Moreover, 
  value(obj2,attr)⊇value(obj3,attr), and similarly
  value(obj1,attr)⊇value(obj2,attr) so value(obj1,attr)⊇value(obj3,attr).  (TODO:
  How do we know e3≠e1?  Need strict ordering on entities in semantics.)
 
∀e1,e2. specializationOf(e1,e2)⇒alternateOf(e1,e2) 
   If ent1=ρ(e1) and ent2=ρ(e2) are
  specializations, then thingOf(ent1)=thingOf(ent2).
 
∀e1,attrs,e2. entity(e1,attrs)∧specializationOf(e2,e1)⇒entity(e2,attrs) 
   Suppose ent1=ρ(e1) and ent2=ρ(e2).  Suppose
  (att,v) is an attribute-value pair in attrs.  Since
  entity(e1,attrs) holds, we know that v∈value(ent1,att).
  Thus v∈value(ent2,att) since value(ent2,att)⊇value(ent1,att).  Since
  this is the case for all attribute-value pairs in attrs, and since
  e2 obviously denotes an entity, we can conclude W,ρ⊨entity(e,attrs).
  
 
5.2 Constraints
5.2.1 Uniqueness constraints
  - The
  identifier field id is a KEY for
  the entity(id,attrs) statement.
- The identifier field id
  is a KEY for the
  activity(id,t1,t2,attrs) statement.
- The identifier field id is a KEY for the agent(id,attrs) statement.
 
  - The identifier field id is a KEY for the
    wasGeneratedBy(id,e,a,t,attrs) statement.
- The identifier field id is a KEY for the used(id,a,e,t,attrs)
    statement.
- The identifier field id is a KEY for the
    wasInformedBy(id,a2,a1,attrs) statement.
- The identifier field id is a KEY for the
    wasStartedBy(id,a2,e,a1,t,attrs) statement.
- The identifier field id is a KEY for the
    wasEndedBy(id,a2,e,a1,t,attrs) statement.
- The identifier field id is a KEY for the
    wasInvalidatedBy(id,e,a,t,attrs) statement.
- The identifier field id is a KEY for the
    wasDerivedFrom(id,e2,e1,a,g2,u1,attrs) statement.
- The identifier field id is a KEY for the
    wasAttributedTo(id,e,ag,attrs) statement.
- The identifier field id is a KEY for the
    wasAssociatedWith(id,a,ag,pl,attrs) statement.
- The identifier field id is a KEY for the
    actedOnBehalfOf(id,ag2,ag1,a,attrs) statement.
- The identifier field id is a KEY for the
    wasInfluencedBy(id,o2,o1,attrs) statement.
 
∀gen1,gen2,e,a,t1,t2,attrs1,attrs2. wasGeneratedBy(gen1,e,a,t1,attrs1)∧wasGeneratedBy(gen2,e,a,t2,attrs2)⇒gen1=gen2 
∀inv1,inv2,e,a,t1,t2,attrs1,attrs2. wasInvalidatedBy(inv1,e,a,t1,attrs1)∧wasInvalidatedBy(inv2,e,a,t2,attrs2)⇒inv1=inv2 
∀start1,start2,a,e1,e2,a0,t1,t2,attrs1,attrs2. wasStartedBy(start1,a,e1,a0,t1,attrs1)∧wasStartedBy(start2,a,e2,a0,t2,attrs2)⇒start1=start2 
∀end1,end2,a,e1,e2,a0,t1,t2,attrs1,attrs2. wasEndedBy(end1,a,e1,a0,t1,attrs1)∧wasEndedBy(end2,a,e2,a0,t2,attrs2)⇒end1=end2 
∀start,a1,a2,t,t1,t2,e,attrs,attrs1. activity(a2,t1,t2,attrs)∧wasStartedBy(start,a2,e,a1,t,attrs1)⇒t1=t 
∀end,a1,a2,t,t1,t2,e,attrs,attrs1. activity(a2,t1,t2,attrs)∧wasEndedBy(end,a2,e,a1,t,attrs1)⇒t2=t 
5.2.2 Ordering constraints
∀start,end,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)∧wasEndedBy(end,a,e2,a2,t2,attrs2)⇒start⪯end 
∀start1,start2,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start1,a,e1,a1,t1,attrs1)∧wasStartedBy(start2,a,e2,a2,t2,attrs2)⇒start1⪯start2 
∀end1,end2,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasEndedBy(end1,a,e1,a1,t1,attrs1)∧wasEndedBy(end2,a,e2,a2,t2,attrs2)⇒end1⪯end2 
- ∀start,use,a,e1,e2,a1,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)∧used(use,a,e2,t2,attrs2)⇒start⪯use
- ∀use,end,a,e1,e2,a2,t1,t2,attrs1,attrs2. used(use,a,e1,t1,attrs1)∧wasEndedBy(end,a,e2,a2,t2,attrs2)⇒use⪯end
- ∀start,gen,e1,e2,a,a1,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)∧wasGeneratedBy(gen,e2,a,t2,attrs2)⇒start⪯gen
- ∀gen,end,e,e1,a,a1,t,t1,attrs,attrs1. wasGeneratedBy(gen,e,a,t,attrs)∧wasEndedBy(end,a,e1,a1,t1,attrs1)⇒gen⪯end
∀gen,inv,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)∧wasInvalidatedBy(inv,e,a2,t2,attrs2)⇒gen⪯inv 
∀gen,use,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)∧used(use,a2,e,t2,attrs2)⇒gen⪯use 
∀use,inv,a1,a2,e,t1,t2,attrs1,attrs2. used(use,a1,e,t1,attrs1)∧wasInvalidatedBy(inv,e,a2,t2,attrs2)⇒use⪯inv 
∀gen1,gen2,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen1,e,a1,t1,attrs1)∧wasGeneratedBy(gen2,e,a2,t2,attrs2)⇒gen1⪯gen2 
∀inv1,inv2,e,a1,a2,t1,t2,attrs1,attrs2. wasInvalidatedBy(inv1,e,a1,t1,attrs1)∧wasInvalidatedBy(inv2,e,a2,t2,attrs2)⇒inv1⪯inv2 
In this constraint, a,gen2, or use1 must not be placeholders -.
∀d,e1,e2,a,gen2,use1,attrs. notNull(a)∧notNull(gen2)∧notNull(use1)∧wasDerivedFrom(d,e2,e1,a,gen2,use1,attrs)⇒use1⪯gen2In this constraint, any of a,g, or u may be placeholders -.
∀d,gen1,gen2,e1,e2,a,a1,a2,g,u,t1,t2,attrs,attrs1,attrs2. wasDerivedFrom(d,e2,e1,a,g,u,attrs)∧wasGeneratedBy(gen1,e1,a1,t1,attrs1)∧wasGeneratedBy(gen2,e2,a2,t2,attrs2)⇒gen1≺gen2- ∀gen,start,e,a,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)∧wasStartedBy(start,a,e,a2,t2,attrs2)⇒gen⪯start
- ∀start,inv,e,a,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e,a1,t1,attrs1)∧wasInvalidatedBy(inv,e,a2,t2,attrs2)⇒start⪯inv
- ∀gen,end,e,a,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)∧wasEndedBy(end,a,e,a2,t2,attrs2)⇒gen⪯end
- ∀end,inv,e,a,a1,a2,t1,t2,attrs1,attrs2. wasEndedBy(end,a,e,a1,t1,attrs1)∧wasInvalidatedBy(inv,e,a2,t2,attrs2)⇒end⪯inv
∀gen1,gen2,e1,e2,a1,a2,t1,t2,attrs1,attrs2. specializationOf(e2,e1)∧wasGeneratedBy(gen1,e1,a1,t1,attrs1)∧wasGeneratedBy(gen2,e2,a2,t2,attrs2)⇒gen1⪯gen2 
∀inv1,inv2,e1,e2,a1,a2,t1,t2,attrs1,attrs2. specializationOf(e1,e2)∧wasInvalidatedBy(inv1,e1,a1,t1,attrs1)∧wasInvalidatedBy(inv2,e2,a2,t2,attrs2)⇒inv1⪯inv2 
In the following inferences, pl may be a placeholder -.
- ∀assoc,start1,inv2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)∧wasStartedBy(start1,a,e1,a1,t1,attrs1)∧wasInvalidatedBy(inv2,ag,a2,t2,attrs2)⇒start1⪯inv2
- ∀assoc,gen1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)∧wasGeneratedBy(gen1,ag,a1,t1,attrs1)∧wasEndedBy(end2,a,e2,a2,t2,attrs2)⇒gen1⪯end2
- ∀assoc,start1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)∧wasStartedBy(start1,a,e1,a1,t1,attrs1)∧wasEndedBy(end2,ag,e2,a2,t2,attrs2)⇒start1⪯end2
- ∀assoc,start1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)∧wasStartedBy(start1,ag,e1,a1,t1,attrs1)∧wasEndedBy(end2,a,e2,a2,t2,attrs2)⇒start1⪯end2
- ∀att,gen1,gen2,e,a1,a2,t1,t2,ag,attrs,attrs1,attrs2. wasAttributedTo(att,e,ag,attrs)∧wasGeneratedBy(gen1,ag,a1,t1,attrs1)∧wasGeneratedBy(gen2,e,a2,t2,attrs2)⇒gen1⪯gen2
- ∀att,start1,gen2,e,e1,a1,a2,ag,t1,t2,attrs,attrs1,attrs2. wasAttributedTo(att,e,ag,attrs)∧wasStartedBy(start1,ag,e1,a1,t1,attrs1)∧wasGeneratedBy(gen2,e,a2,t2,attrs2)⇒start1⪯gen2
- ∀del,gen1,inv2,ag1,ag2,a,a1,a2,t1,t2,attrs,attrs1,attrs2. actedOnBehalfOf(del,ag2,ag1,a,attrs)∧wasGeneratedBy(gen1,ag1,a1,t1,attrs1)∧wasInvalidatedBy(inv2,ag2,a2,t2,attrs2)⇒gen1⪯inv2
- ∀del,start1,end2,ag1,ag2,a,a1,a2,e1,e2,t1,t2,attrs,attrs1,attrs2. actedOnBehalfOf(del,ag2,ag1,a,attrs)∧wasStartedBy(start1,ag1,e1,a1,t1,attrs1)∧wasEndedBy(end2,ag2,e2,a2,t2,attrs2)⇒start1⪯end2
5.2.3 Typing constraints
- ∀e,attrs. entity(e,attrs)⇒typeOf(e,entity)
- ∀ag,attrs. agent(ag,attrs)⇒typeOf(ag,agent)
- ∀a,t1,t2,attrs. activity(a,t1,t2,attrs)⇒typeOf(a,activity)
- ∀u,a,e,t,attrs. used(u,a,e,t,attrs)⇒typeOf(a,activity)∧typeOf(e,entity)
- ∀g,a,e,t,attrs. wasGeneratedBy(g,e,a,t,attrs)⇒typeOf(a,activity)∧typeOf(e,entity)
- ∀inf,a2,a1,t,attrs. wasInformedBy(inf,a2,a1,t,attrs)⇒typeOf(a1,activity)∧typeOf(a2,activity)
- ∀start,a2,e,a1,t,attrs. wasStartedBy(start,a2,e,a1,t,attrs)⇒typeOf(a1,activity)∧typeOf(a2,activity)∧typeOf(e,entity)
- ∀end,a2,e,a1,t,attrs. wasEndedBy(end,a2,e,a1,t,attrs)⇒typeOf(a1,activity)∧typeOf(a2,activity)∧typeOf(e,entity)
- ∀inv,a,e,t,attrs. wasInvalidatedBy(inv,e,a,t,attrs)⇒typeOf(a,activity)∧typeOf(e,entity)
- ∀id,e2,e1,a,g2,u1,attrs. notNull(a)∧notNull(g2)∧notNull(u1)∧wasDerivedFrom(id,e2,e1,a,g2,u1,attrs)⇒typeOf(e2,entity)∧typeOf(e1,activity)∧typeOf(a,activity)
- ∀id,e2,e1,attrs. wasDerivedFrom(id,e2,e1,−,−,−,attrs)⇒typeOf(e2,entity)∧typeOf(e1,activity)
- ∀id,e,ag,attrs. wasAttributedTo(id,e,ag,attrs)⇒typeOf(e,entity)∧typeOf(ag,agent)
- ∀id,a,ag,pl,attrs. notNull(pl)∧wasAssociatedWith(id,a,ag,pl,attrs)⇒typeOf(a,activity)∧typeOf(ag,agent)∧typeOf(pl,entity)
- ∀id,a,ag,attrs. wasAssociatedWith(id,a,ag,−,attrs)⇒typeOf(a,activity)∧typeOf(ag,agent)
- ∀id,ag2,ag1,a,attrs. actedOnBehalfOf(id,ag2,ag1,a,attrs)⇒typeOf(ag2,agent)∧typeOf(ag1,agent)∧typeOf(a,activity)
- ∀e2,e1. alternateOf(e2,e1)⇒typeOf(e2,entity)∧typeOf(e1,entity)
- ∀e2,e1. specializationOf(e2,e1)⇒typeOf(e2,entity)∧typeOf(e1,entity)
- ∀c,e. hadMember(c,e)⇒typeOf(c,prov:Collection)∧typeOf(e,entity)
- ∀c. entity(c,[prov:type=prov:emptyCollection]))⇒typeOf(c,entity)∧typeOf(c,prov:Collection)∧typeOf(c,prov:EmptyCollection)
  Each typing constraint follows immediately from well-formedness
  criteria marked [WF] in the corresponding semantics for formulas.
  
 
   
5.2.4 Impossibility constraints
- ∀id,e1,e2,g,attrs. notNull(g)∧wasDerivedFrom(id,e2,e1,−,g,−,attrs)⇒False
- ∀id,e1,e2,u,attrs. notNull(u)∧wasDerivedFrom(id,e2,e1,−,−,u,attrs)⇒False
- ∀id,e1,e2,g,u,attrs. notNull(g)∧notNull(u)∧wasDerivedFrom(id,e2,e1,−,g,u,attrs)⇒False
  Each part follows from the fact that the semantics of
  wasDerivedFrom only allows formulas to hold when either all three
  of a,g,u are − (denoting ⊥) or none of them are.
   
∀e. specializationOf(e,e)⇒False 
  This follows from the fact that in the semantics of
  specializationOf, the two entities denoted by the first and second
  arguments are required to be distinct.
  
   
For each r  and  s∈{used,wasGeneratedBy,wasInvalidatedBy,wasStartedBy,wasEndedBy,wasInformedBy,wasAttributedTo,wasAssociatedWith,actedOnBehalfOf} such that r  and  s are different relation names, the following constraint holds:
∀id,a1,…,am,b1,…,bn. r(id,a1,…,am)∧s(id,b1,…,bn)⇒False
  This follows from the assumption that the different classes of
  interactions are disjoint sets, characterized by their types.
  
   
For each p∈{entity,activity,agent}  and each r∈{used,wasGeneratedBy,wasInvalidatedBy,wasStartedBy,wasEndedBy,wasInformedBy,wasAttributedTo,wasAssociatedWith,actedOnBehalfOf}, the following constraint holds:
∀id,a1,…,am,b1,…,bn. p(id,a1,…,am)∧r(id,b1,…,bn)⇒False
  This follows from the assumption that interactions are distinct
  from other objects (entities, activities or agents).
  
   
∀id. typeOf(id,entity)∧typeOf(id,activity)⇒False 
  This follows from the assumption that entities and activities are disjoint.
  
   
∀c,e. hasMember(c,e)∧typeOf(c,prov:EmptyCollection)⇒False 
6. Soundness
Above we have presented arguments for the soundness of the
constraints and inferences with respect to the reference semantics.
This implies the following:
  
  - If I is an instance and W⊨I and I′ is obtained from I by applying one
  of the PROV inferences, then W⊨I′.
- If I is an instance and W⊨I then I has a normal
  form I′ and   W⊨I′.
- If I is a normal form and W⊨I then I satisfies all of the ordering,
  typing and impossibility constraints.
  
- If W⊨I then I is valid.
 
  For part 1, the arguments are as in the previous section.  
  For
  part 2, proceed by induction on a terminating sequence of inference
  or uniqueness constraint steps: if I is in normal form them we are
  done. If I is not in normal form then if an inference is  applicable, then use part 1; if a uniqueness constraint is
  applicable, then from W⊨I the uniqueness constraint cannot
  fail on I and W⊨I′.
  For part 3, the arguments are as
  in the previous section for each constraint. 
  Finally, for part 4,
  suppose W⊨I.  Then W⊨I′ where I′ is the normal
  form of I by part 2.  By part 3, I′ satisfies all of the
  remaining constraints, so I is valid.
   
  
 
      A. Acknowledgements
 
       
This  document has been produced by the PROV Working Group, and its contents reflect extensive discussion within the Working Group as a whole.
      
Thanks also to Robin Berjon for the ReSPec.js specification writing
tool and to MathJax for their LaTeX-to-HTML conversion tools, both of
      which aided in the preparation of this document.
Members of the PROV Working Group at the time of publication of this document were:
Ilkay Altintas (Invited expert),
Reza B'Far (Oracle Corporation),
Khalid Belhajjame (University of Manchester),
James Cheney (University of Edinburgh, School of Informatics),
Sam Coppens (IBBT),
David Corsar (University of Aberdeen, Computing Science),
Stephen Cresswell (The National Archives),
Tom De Nies (IBBT),
Helena Deus (DERI Galway at the National University of Ireland, Galway, Ireland),
Simon Dobson (Invited expert),
Martin Doerr (Foundation for Research and Technology - Hellas(FORTH)),
Kai Eckert (Invited expert),
Jean-Pierre EVAIN (European Broadcasting Union, EBU-UER),
James Frew (Invited expert),
Irini Fundulaki (Foundation for Research and Technology - Hellas(FORTH)),
Daniel Garijo (Universidad Politécnica de Madrid),
Yolanda Gil (Invited expert),
Ryan Golden (Oracle Corporation),
Paul Groth (Vrije Universiteit),
Olaf Hartig (Invited expert),
David Hau (National Cancer Institute, NCI),
Sandro Hawke (W3C/MIT),
Jörn Hees (German Research Center for Artificial Intelligence (DFKI) Gmbh),
Ivan Herman, (W3C/ERCIM),
Ralph Hodgson (TopQuadrant),
Hook Hua (Invited expert),
Trung Dong Huynh (University of Southampton),
Graham Klyne (University of Oxford),
Michael Lang (Revelytix, Inc.),
Timothy Lebo (Rensselaer Polytechnic Institute),
James McCusker (Rensselaer Polytechnic Institute),
Deborah McGuinness (Rensselaer Polytechnic Institute),
Simon Miles (Invited expert),
Paolo Missier (School of Computing Science, Newcastle university),
Luc Moreau (University of Southampton),
James Myers (Rensselaer Polytechnic Institute),
Vinh Nguyen (Wright State University),
Edoardo Pignotti (University of Aberdeen, Computing Science),
Paulo da Silva Pinheiro (Rensselaer Polytechnic Institute),
Carl Reed (Open Geospatial Consortium),
Adam Retter (Invited Expert),
Christine Runnegar (Invited expert),
Satya Sahoo (Invited expert),
David Schaengold (Revelytix, Inc.),
Daniel Schutzer (FSTC, Financial Services Technology Consortium),
Yogesh Simmhan (Invited expert),
Stian Soiland-Reyes (University of Manchester),
Eric Stephan (Pacific Northwest National Laboratory),
Linda Stewart (The National Archives),
Ed Summers (Library of Congress),
Maria Theodoridou (Foundation for Research and Technology - Hellas(FORTH)),
Ted Thibodeau (OpenLink Software Inc.),
Curt Tilmes (National Aeronautics and Space Administration),
Craig Trim (IBM Corporation),
Stephan Zednik (Rensselaer Polytechnic Institute),
Jun Zhao (University of Oxford),
Yuting Zhao (University of Aberdeen, Computing Science).
     
 
ˆ