Constraints of the Provenance Data Model

Abstract

PROV-DM, the PROVProvenance is information about entities, activities, and people involved in producing a piece of data model,or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a data model basis for the W3C provenance that describes the entities, people and activities involved in producing a piece(PROV) family of data or thing. PROV-DM is structured in six components, dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) agents bearing responsibility for entities that were generated and activities that happened; (3) derivations of entities from entities; (4) properties to link entities that refer to a same thing; (5) collections forming a logical structure for its members; (6) a simple annotation mechanism. specifications.

This document introducesdefines a further setsubset of concepts usefulPROV instances called valid PROV instances. The intent of validation is ensure that a PROV instance represents a history of objects and their interactions which is consistent, and thus safe to use for understanding the the purpose of logical reasoning and other kinds of analysis. Valid PROV data model and defines instances satisfy certain definitions, inferences , and constraints. These definitions, inferences, and constraints provide a measure of consistency checking for provenance and reasoning over provenance. They can also be used to normalize PROV instances to forms that can easily be compared in order to determine whether two PROV instances are allowed on provenance statements and validity constraints that equivalent. Validity and equivalence are also defined for PROV instances should follow. These inferences and constraints are useful for readers who develop applications that generate provenance bundles (that is, named instances) and documents (that is, a toplevel instance together with zero or reason over provenance. more bundles).

2. Rationale

This section is non-normative.

This section gives a high-level rationale that provides some further background for the constraints, but does not affect the technical content of the rest of the specification.

2.1 Entities, Activities and Agents

One of the central challenges in representing provenance information is how to deal with change. Real-world objects, information objects and Web resources change over time, and the characteristics that make them identifiable in a given situation are sometimes subject to change as well. PROV allows for things to be described in different ways, with different descriptions of their state.

An entity is a thing one wants to provide provenance for and whose situation in the world is described by some fixed attributes. An entity has a lifetime, defined as the period between its generation event and its invalidation event. An entity's attributes are established when the entity is created and (partially) describe the entity's situation and state during the entirety of the entity's lifetime.

A different entity (perhaps representing a different user or system perspective) may fix other aspects of the same thing, and its provenance may be different. Different entities that fix aspects of the same thing are called alternates, and the PROV relations of specializationOf and alternateOf can be used to link such entities.

Besides entities, a variety of other PROV objects have attributes, including activity, generation, usage, invalidation, start, end, communication, attribution, association, delegation, and derivation. Each object has an associated duration interval (which may be a single time point), and attribute-value pairs for a given object are expected to be descriptions that hold for the object's duration.

However, the attributes of entities have special meaning because they are considered to be fixed aspects of underlying, changing things. This motivates constraints on alternateOf and specializationOf relating the attribute values of different entities.

In order to describe the provenance of something during an interval over which relevant attributes of the thing are not fixed, a PROV instance would describe multiple entities, each with its own identifier, lifetime, and fixed attributes, and express dependencies between the various entities using events. For example, in order to describe the provenance of several versions of a document, involving attributes such as authorship that change over time, one can use different entities for the versions linked by appropriate generation, usage, revision, and invalidation events.

There is no assumption that the set of attributes listed in an entity statement is complete, nor that the attributes are independent or orthogonal of each other. Similarly, there is no assumption that the attributes of an entity uniquely identify it. Two different entities that present the same aspects of possibly different things can have the same attributes; this leads to potential ambiguity, which is mitigated through the use of identifiers.

An activity's lifetime is delimited by its start and its end events. It occurs over an interval delimited by two instantaneous events. However, an activity statement need not mention start or end time information, because they may not be known. An activity's attribute-value pairs are expected to describe the activity's situation during its lifetime.

An activity is not an entity. Indeed, an entity exists in full at any point in its lifetime, persists during this interval, and preserves the characteristics provided. In contrast, an activity is something that occurs, happens, unfolds, or develops through time. This distinction is similar to the distinction between 'continuant' and 'occurrent' in logic [Logic].

2.2 Events

Although time is important for provenance, provenance can be used in many different contexts within individual systems and across the Web. Different systems may use different clocks which may not be precisely synchronized, so when provenance statements are combined by different systems, an application may not be able to align the times involved to a single global timeline. Hence, PROV is designed to minimize assumptions about time. Instead, PROV talks about (identified) events.

The PROV data model is implicitly based on a notion of instantaneous events (or just events), that mark transitions in the world. Events include generation, usage, or invalidation of entities, as well as start or end of activities. This notion of event is not first-class in the data model, but it is useful for explaining its other concepts and its semantics [PROV-SEM]. Thus, events help justify inferences on provenance as well as validity constraints indicating when provenance is self-consistent.

Five kinds of instantaneous events are used in PROV. The activity start and activity end events delimit the beginning and the end of activities, respectively. The entity generation, entity usage, and entity invalidation events apply to entities, and the generation and invalidation events delimit the lifetime of an entity. More precisely:

An activity start event is the instantaneous event that marks the instant an activity starts.

An activity end event is the instantaneous event that marks the instant an activity ends.

An entity generation event is the instantaneous event that marks the final instant of an entity's creation timespan, after which it is available for use. The entity did not exist before this event.

An entity usage event is the instantaneous event that marks the first instant of an entity's consumption timespan by an activity. The described usage had not started before this instant, although the activity could potentially have used the same entity at a different time.

An entity invalidation event is the instantaneous event that marks the initial instant of the destruction, invalidation, or cessation of an entity, after which the entity is no longer available for use. The entity no longer exists after this event.

2.3 Types

As set out in other specifications, the identifiers used in PROV documents have associated type information. An identifier can have more than one type, reflecting subtyping or allowed overlap between types, and so we define a set of types of each identifier, typeOf(id). Some types are, however, required not to overlap (for example, no identifier can describe both an entity and an activity). In addition, an identifier cannot be used to identify both an object (that is, an entity, activity or agent) and a property (that is, a named event such as usage, generation, or a relationship such as attribution.) This specification includes disjointness and typing constraints that check these requirements. Here, we summarize the type constraints in Table 1.

Table 1: Summary of Typing Constraints
In relation...	identifier	has type(s)...

entity(e,attrs)	e	'entity'
activity(a,t1,t2,attrs)	a	'activity'
agent(ag,attrs)	ag	'agent'
used(id; a,e,t,attrs)	e	'entity'
used(id; a,e,t,attrs)	a	'activity'
wasGeneratedBy(id; e,a,t,attrs)	e	'entity'
wasGeneratedBy(id; e,a,t,attrs)	a	'activity'
wasInformedBy(id; a2,a1,attrs)	a2	'activity'
wasInformedBy(id; a2,a1,attrs)	a1	'activity'
wasStartedBy(id; a2,e,a1,t,attrs)	a2	'activity'
	e	'entity'
	a1	'activity'
wasEndedBy(id; a2,e,a1,t,attrs)	a2	'activity'
	e	'entity'
	a1	'activity'
wasInvalidatedBy(id; e,a,t,attrs)	e	'entity'
wasInvalidatedBy(id; e,a,t,attrs)	a	'activity'
wasDerivedFrom(id; e2,e1,a,g,u,attrs)	e2	'entity'
	e1	'entity'
	a	'activity'
wasAttributedTo(id; e,ag,attr)	e	'entity'
wasAttributedTo(id; e,ag,attr)	ag	'agent'
wasAssociatedWith(id; a,ag,pl,attrs)	a	'activity'
	ag	'agent'
	pl	'entity'
actedOnBehalfOf(id; ag2,ag1,a,attrs)	ag2	'agent'
	ag1	'agent'
	a	'activity'
alternateOf(e1,e2)	e1	'entity'
alternateOf(e1,e2)	e2	'entity'
specializationOf(e1,e2)	e1	'entity'
specializationOf(e1,e2)	e2	'entity'
mentionOf(e1,e2,b)	e1	'entity'
	e2	'entity'
	b	'entity'
hadMember(c,e)	c	'entity' 'prov:Collection'
hadMember(c,e)	e	'entity'
entity(c,[prov:type='prov:EmptyCollection,...])	c	'entity' 'prov:Collection' 'prov:EmptyCollection'

2.4 Validation Process Overview

This section collects common concepts and operations that are used throughout the specification, and relates them to background terminology and ideas from logic [Logic], constraint programming [CHR], and database constraints [DBCONSTRAINTS]. This section does not attempt to provide a complete introduction to these topics, but it is provided in order to aid readers familiar with one or more of these topics in understanding the specification, and to clarify some of the motivations for choices in the specification to all readers.

Constants, Variables and Placeholders

PROV statements involve identifiers, literals, placeholders, and attribute lists. Identifiers are, according to PROV-N, expressed as qualified names which can be mapped to URIs [IRI]. However, in order to specify constraints over PROV instances, we also need variables that represent unknown identifiers, literals, or placeholders. These variables are similar to those in first-order logic [Logic]. A variable is a symbol that can be replaced by other symbols, including either other variables or constant identifiers, literals, or placeholders. In a few special cases, we also use variables for unknown attribute lists. To help distinguish identifiers and variables, we also term the former 'constant identifiers' to highlight their non-variable nature.

Several definitions and inferences conclude by saying that some objects exist such that some other formulas hold. Such an inference introduces fresh existential variables into the instance. An existential variable denotes a fixed object that exists, but its exact identity is unknown. Existential variables can stand for unknown identifiers or literal values only; we do not allow existential variables that stand for unknown attribute lists.

In particular, many occurrences of the placeholder symbol - stand for unknown objects; these are handled by expanding them to existential variables. Some placeholders, however, indicate the absence of an object, rather than an unknown object. In other words, the placeholder is overloaded, with different meanings in different places.

An expression is called a term if it is either a constant identifier, literal, placeholder, or variable. We write t to denote an arbitrary term.

Substitution

A substitution is a function that maps variables to terms. Concretely, since we only need to consider substitutions of finite sets of variables, we can write substitutions as [x₁ = t₁,...,x_n=t_n]. A substitution S = [x₁ = t₁,...,x_n=t_n] can be applied to a term as follows.

If the term is a variable x_i, one of the variables in the domain of S, then S(x_i) = t_i.
If the term is a constant identifier or literal c, then S(c) = c.

In addition, a substitution can be applied to an atomic formula (PROV statement) p(t₁,...,t_n) by applying it to each term, that is, S(p(t₁,...,t_n)) = p(S(t₁),...,S(t_n)). Likewise, a substitution S can be applied to an instance I by applying it to each atomic formula (PROV statement) in I, that is, S(I) = {S(A) | A ∈ I}.

Formulas

For the purpose of constraint checking, we view PROV statements (possibly involving existential variables) as formulas. An instance is analogous to a "theory" in logic, that is, a set of formulas all thought to describe the same situation. The set can also be thought of a single, large formula: the conjunction of all of the atomic formulas.

The atomic constraints considered in this specification can be viewed as atomic formulas:

Uniqueness constraints employ atomic equational formulas t = t'.
Ordering constraints employ atomic precedence relations that can be thought of as binary formulas precedes(t,t') or strictly_precedes(t,t')
Typing constraints 'type' ∈ typeOf(id) can be represented as a atomic formulas typeOf(id,'type').
Impossibility constraints employ the conclusion INVALID, which is equivalent to the logical constant False.

Similarly, the definitions, inferences, and constraint rules in this specification can also be viewed as logical formulas, built up out of atomic formulas, logical connectives "and" (∧), "implies" (⇒), and quantifiers "for all" (∀) and "there exists" (∃). For more background on logical formulas, see a logic textbook such as [Logic].

A definition of the form A IF AND ONLY IF there exists y₁...y_m such that B₁ and ... and B_k can be thought of as a formula ∀ x₁,....,x_n. A ⇔ ∃ y₁...y_m . B₁ ∧ ... ∧ B_k, where x₁...x_n are the free variables of the definition.
An inference of the form IF A₁ and ... and A_l THEN there exists y₁...y_m such that B₁ and ... and B_k can be thought of as a formula ∀ x₁,....,x_n. A₁ ∧ ... ∧ A_l ⇒ ∃ y₁...y_m . B₁ ∧ ... ∧ B_k, where x₁...x_n are the free variables of the inference.
A uniqueness, ordering, or typing constraint of the form IF A₁ ∧ ... ∧ A_l THEN C can be viewed as a formula ∀ x₁...x_n. A₁ ∧ ... ∧ A_l ⇒ C.
A constraint of the form IF A₁ ∧ ... ∧ A_l THEN INVALID can be viewed as a formula ∀ x₁...x_n. A₁ ∧ ... ∧ A_l ⇒ False.

Satisfying definitions, inferences, and constraints

In logic, a formula's meaning is defined by saying when it is satisfied. We can view definitions, inferences, and constraints as being satisfied or not satisfied in a PROV instance, augmented with information about the constraints.

A logical equivalence as used in a definition is satisfied when the formula ∀ x₁,....,x_n. A ⇔ ∃ y₁...y_m . B₁ ∧ ... ∧ B_k holds, that is, for any substitution of the variables x₁,....,x_n, formula A and formula ∃ y₁...y_m. B₁ ∧ ... ∧ B_k are either both true or both false.
A logical implication as used in an inference is satisfied with the formula ∀ x₁,....,x_n. A₁ ∧ ... ∧ A_l ⇒ ∃ y₁...y_m . B₁ ∧ ... ∧ B_k holds, that is, for any substitution of the variables x₁,....,x_n, if A₁ ∧ ... ∧ A_l is true, then for some further substitution of terms for variables y₁...y_m, formula B₁ ∧ ... ∧ B_k is also true.
A uniqueness, ordering, or typing constraint is satisfied when its associated formula ∀ x₁...x_n. A₁ ∧ ... ∧ A_l ⇒ C holds, that is, for any substitution of the variables x₁,....,x_n, if A₁ ∧ ... ∧ A_l is true, then C is also true.
An impossibility constraint is satisfied when the formula ∀ x₁...x_n. A₁ ∧ ... ∧ A_l ⇒ False holds. This is logically equivalent to ∄ x₁...x_n. A₁ ∧ ... ∧ A_l, that is, there exists no substitution for x₁...x_n making A₁ ∧ ... ∧ A_l true.

Merging

Merging is an operation that takes two terms and compares them to see if they are equal, or can be made equal by substituting an existential variable with another term. This operation is a special case of unification, a common operation in logical reasoning, including logic programming and automated reasoning. Merging two terms t,t' results in either substitution S such that S(t) = S(t'), or failure indicating that there is no substitution that can be applied to both t and t' to make them equal.

Applying definitions, inferences, and constraints

Formulas can also be interpreted as having computational content. That is, if an instance does not satisfy a formula, we can often apply the formula to the instance to produce another instance that does satisfy the formula. Definitions, inferences, and uniqueness constraints can be applied to instances:

A definition of the form ∀ x₁,....,x_n. A ⇔ ∃ y₁...y_m . B₁ ∧ ... ∧ B_k can be applied by searching for any occurrences of A in the instance and adding B₁, ..., B_k, generating fresh existential variables y₁,...,y_m, and conversely, whenever there is an occurrence of B₁, ..., B_k, adding A. In our setting, the defined formulas A are never used in other formulas, so it is sufficient to replace all occurrences of A with their definitions. The formula A is then redundant, and can be removed from the instance.
An inference of the form ∀ x₁,....,x_n. A₁ ∧ ... ∧ A_p ⇒ ∃ y₁...y_m . B₁ ∧ ... ∧ B_k can be applied by searching for any occurrences of A₁ ∧ ... ∧ A_p in the instance and, for each such match, for which the entire conclusion does not already hold (for some y₁,...,y_m), adding B₁ ∧ ... ∧ B_k to the instance, generating fresh existential variables y₁,...,y_m.
A uniqueness constraint of the form ∀ x₁...x_n. A₁ ∧ ... ∧ A_p ⇒ t = t' can be applied by searching for an occurrence A₁ ∧ ... ∧ A_p in the instance, and if one is found, merging the terms t and t'. If successful, the resulting substitution is applied to the instance; otherwise, the application of the uniqueness constraint fails.
A key constraint can similarly be applied by searching for different occurrences of a statement with the same identifier, merging the corresponding parameters of the statements, and concatenating their attribute lists. The substitutions obtained by merging are applied to the instance.

As noted above, uniqueness or key constraint application can fail, if a required merging step fails. Failure of constraint application means that there is no way to add information to the instance to satisfy the constraint, which in turn implies that the instance is invalid.

The process of applying definitions, inferences, and constraints to a PROV instance until all of them are satisfied is similar to what is sometimes called chasing [DBCONSTRAINTS] or saturation [CHR]. We call this process normalization.

Termination

In general, applying sets of logical formulas of the above definition, inference, and constraint forms is not guaranteed to terminate. A simple example is the inference R(x,y) ⇒ ∃z. R(x,z) ∧R(z,y), which can be applied to {R(a,b)} to generate an infinite sequence of larger and larger instances. To ensure that normalization, validity, and equivalence are decidable, we require that normalization terminates. There is a great deal of work on termination of the chase in databases, or of sets of constraint handling rules. The termination of the notion of normalization defined in this specification is guaranteed because the definitions, inferences and uniqueness/key constraints correspond to a weakly acyclic set of tuple-generating and equality-generating dependencies, in the terminology of [DBCONSTRAINTS]. The termination of the remaining ordering, typing, and impossibility constraints is easy to show. Appendix C gives a proof that the definitions, inferences, and uniqueness and key constraints are weakly acyclic and therefore terminating.

There is an important subtlety that is essential to guarantee termination. This specification draws a distinction between knowing that an identifier has type 'entity', 'activity', or 'agent', and having an explicit entity(id), activity(id), or agent(id) statement in the instance. For example, focusing on entity statements, we can infer 'entity' ∈ typeOf(id) if entity(id) holds in the instance. In contrast, if we only know that 'entity' ∈ typeOf(id), this does not imply that entity(id) holds.

This distinction (for both entities and activities) is essential to ensure termination of the inferences, because we allow inferring that a declared entity(id,attrs) has a generation and invalidation event, using Inference 7 (entity-generation-invalidation-inference). Likewise, for activities, we allow inferring that a declared activity(id,t1,t2,attrs) has a generation and invalidation event, using Inference 8 (activity-start-end-inference). These inferences do not apply to identifiers whose types are known, but for which there is not an explicit entity or activity statement. If we strengthened the type inference constraints to add new entity or activity statements for the entities and activities involved in generating or starting other declared entities or activities, then we could keep generating new entities and activities in an unbounded chain into the past (as in the "chicken and egg" paradox). The design adopted here requires that instances explicitly declare the entities and activities that are relevant for validity checking, and only these can be inferred to have invalidation/generation and start/end events. This inference is not supported for identifiers that are indirectly referenced in other relations and therefore have type 'entity' or 'activity'.

Figure 1^◊: Overview of the Validation Process

Checking ordering, typing, and impossibility constraints

The ordering, typing, and impossibility constraints are checked rather than applied. This means that they do not generate new formulas expressible in PROV, but they do generate basic constraints that might or might not be consistent with each other. Checking such constraints follows a saturation strategy similar to that for normalization:

For ordering constraints, we check by generating all of the precedes and strictly-precedes relationships specified by the rules. These can be thought of as a directed graph whose nodes are terms, and whose edges are precedes or strictly-precedes relationships. An ordering constraint of the form ∀ x₁...x_n. A₁ ∧ ... ∧ A_p ⇒ precedes(t,t') can be applied by searching for occurrences of A₁ ∧ ... ∧ A_p and for each such match adding the atomic formula precedes(t,t') to the instance, and similarly for strictly-precedes constraints. After all such constraints have been checked, and the resulting edges added to the graph, the ordering constraints are violated if there is a cycle in the graph that includes a strictly-precedes edge, and satisfied otherwise.
For typing constraints, we check by constructing a function typeOf(id) mapping identifiers to sets of possible types. We start with a function mapping each identifier to the empty set, reflecting no constraints on the identifiers' types. A typing constraint of the form ∀ x₁...x_n. A₁ ∧ ... ∧ A_p ⇒ 'type' ∈ typeOf(id) is checked by adjusting the function by adding 'type' to typeOf(id) for each conclusion 'type' ∈ typeOf(id) of the rule. Typing constraints with multiple conclusions are handled analogously. Once all constraints have been checked in all possible ways, we check that the disjointness constraints hold of the resulting typeOf function. (These are essentially impossibility constraints).
For impossibility constraints, we check by searching for the forbidden pattern that the impossibility constraint describes. Any match of this pattern leads to failure of the constraint checking process. An impossibility constraint of the form ∀ x₁...x_n. A₁ ∧ ... ∧ A_p ⇒ False can be applied by searching for occurrences of A₁ ∧ ... ∧ A_p in the instance, and if any such occurrence is found, signaling failure.

A normalized instance that satisfies all of the checked constraints is called valid. Validity can be, but is not required to be, checked by normalizing and then checking constraints. Any other algorithm that provides equivalent behavior (that is, accepts the same valid instances and rejects the same invalid instances) is allowed. In particular, the checked constraints and the applied definitions, inferences and uniqueness constraints do not interfere with one another, so it is also possible to mix checking and application. This may be desirable in order to detect invalidity more quickly.

Equivalence and Isomorphism

Given two normal forms, a natural question is whether they contain the same information, that is, whether they are equivalent (if so, then the original instances are also equivalent.) By analogy with logic, if we consider normalized PROV instances with existential variables to represent sets of possible situations, then two normal forms may describe the same situation but differ in inessential details such as the order of statements or of elements of attribute-value lists. To remedy this, we can easily consider instances to be equivalent up to reordering of attributes. However, instances can also be equivalent if they differ only in choice of names of existential variables. Because of this, the appropriate notion of equivalence of normal forms is isomorphism. Two instances I₁ and I₂ are isomorphic if there is an invertible substitution S mapping existential variables to existential variables such that S(I₁) = I₂. This is similar to the notion of equivalence used in [RDF], where blank nodes play an analogous role to existential variables.

Equivalence can be checked by normalizing instances, checking that both instances are valid, then testing whether the two normal forms are isomorphic. (It is technically possible for two invalid normal forms to be isomorphic, but to be considered equivalent, the two instances must also be valid.) As with validity, the algorithm suggested by this specification is just one of many possible ways to implement equivalence checking; it is not required that implementations compute normal forms explicitly, only that their determinations of equivalence match those obtained by the algorithm in this specification.

From Instances to Bundles and Documents

PROV documents can contain multiple instances: a toplevel instance consisting of the set of statements not appearing within a bundle, and zero or more named instances called bundles. For the purpose of inference and constraint checking, these instances are treated independently. That is, a PROV document is valid provided that each instance in it is valid and the names of its bundles are distinct. Similarly, a PROV document is equivalent to another if their toplevel instances are equivalent, they have the same number of bundles with the same names, and the instances of their corresponding bundles are equivalent. Analogously to blank nodes in [RDF], the scope of an existential variable in PROV is the instance level, so existential variables with the same name occurring in different instances do not necessarily denote the same term. This is a consequence of the fact that the instances of two equivalent documents only need to be pairwise isomorphic; this is a weaker property than requiring that there be a single isomorphism that works for all of the corresponding instances.

2.5 Summary of inferences and constraints

Table 2 summarizes the inferences, and constraints specified in this document, broken down by component and type or relation involved.

Table: work in progress; these entries might change when the document is updated.

Table 2: Summary of inferences and constraints for PROV Types and Relations
Type or Relation Name	Inferences and Constraints	Component

Entity	Inference 7 (entity-generation-invalidation-inference) Inference 21 (specialization-attributes-inference) Constraint 23 (key-object) Constraint 56 (impossible-object-property-overlap) Constraint 57 (entity-activity-disjoint)	1
Activity	Inference 8 (activity-start-end-inference) Constraint 23 (key-object) Constraint 29 (unique-startTime) Constraint 30 (unique-endTime) Constraint 56 (impossible-object-property-overlap) Constraint 57 (entity-activity-disjoint)
Generation	Inference 6 (generation-use-communication-inference) Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 25 (unique-generation) Constraint 36 (generation-within-activity) Constraint 38 (generation-precedes-invalidation) Constraint 39 (generation-precedes-usage) Constraint 41 (generation-generation-ordering) Constraint 43 (derivation-usage-generation-ordering) Constraint 44 (derivation-generation-generation-ordering) Constraint 45 (wasStartedBy-ordering) Constraint 46 (wasEndedBy-ordering) Constraint 47 (specialization-generation-ordering) Constraint 49 (wasAssociatedWith-ordering) Constraint 50 (wasAttributedTo-ordering) Constraint 51 (actedOnBehalfOf-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)
Usage	Inference 6 (generation-use-communication-inference) Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 35 (usage-within-activity) Constraint 39 (generation-precedes-usage) Constraint 40 (usage-precedes-invalidation) Constraint 43 (derivation-usage-generation-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)
Communication	Inference 5 (communication-generation-use-inference) Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 37 (wasInformedBy-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)
Start	Inference 9 (wasStartedBy-inference) Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 27 (unique-wasStartedBy) Constraint 29 (unique-startTime) Constraint 32 (start-precedes-end) Constraint 35 (usage-within-activity) Constraint 36 (generation-within-activity) Constraint 37 (wasInformedBy-ordering) Constraint 33 (start-start-ordering) Constraint 45 (wasStartedBy-ordering) Constraint 49 (wasAssociatedWith-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)
End	Inference 10 (wasEndedBy-inference) Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 28 (unique-wasEndedBy) Constraint 30 (unique-endTime) Constraint 32 (start-precedes-end) Constraint 35 (usage-within-activity) Constraint 36 (generation-within-activity) Constraint 37 (wasInformedBy-ordering) Constraint 34 (end-end-ordering) Constraint 46 (wasEndedBy-ordering) Constraint 49 (wasAssociatedWith-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)
Invalidation	Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 26 (unique-invalidation) Constraint 38 (generation-precedes-invalidation) Constraint 40 (usage-precedes-invalidation) Constraint 42 (invalidation-invalidation-ordering) Constraint 45 (wasStartedBy-ordering) Constraint 46 (wasEndedBy-ordering) Constraint 48 (specialization-invalidation-ordering) Constraint 49 (wasAssociatedWith-ordering) Constraint 50 (wasAttributedTo-ordering) Constraint 51 (actedOnBehalfOf-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)

Derivation	Inference 11 (derivation-generation-use-inference) Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 43 (derivation-usage-generation-ordering) Constraint 44 (derivation-generation-generation-ordering) Constraint 52 (typing)	2
Revision	Inference 12 (revision-is-alternate-inference)
Quotation	No specific constraints
Primary Source	No specific constraints
Influence	No specific constraints

Agent	Constraint 23 (key-object) Constraint 56 (impossible-object-property-overlap)	3
Attribution	Inference 13 (attribution-inference) Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 50 (wasAttributedTo-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)
Association	Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 49 (wasAssociatedWith-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)
Delegation	Inference 14 (delegation-inference) Inference 15 (influence-inference) Constraint 24 (key-properties) Constraint 51 (actedOnBehalfOf-ordering) Constraint 55 (impossible-property-overlap) Constraint 52 (typing)
Influence	Inference 15 (influence-inference) Constraint 24 (key-properties)

Bundle constructor	No specific constraints; see section 6.2 Bundles and Documents	4
Bundle type	No specific constraints; see section 6.2 Bundles and Documents	4

Alternate	Inference 16 (alternate-reflexive) Inference 17 (alternate-transitive) Inference 18 (alternate-symmetric) Constraint 52 (typing)	5
Specialization	Inference 19 (specialization-transitive) Inference 20 (specialization-alternate-inference) Inference 21 (specialization-attributes-inference) Constraint 47 (specialization-generation-ordering) Constraint 48 (specialization-invalidation-ordering) Constraint 54 (impossible-specialization-reflexive) Constraint 52 (typing)
Mention	Inference 22 (mention-specialization-inference) Constraint 31 (unique-mention) Constraint 52 (typing)

Collection	No specific constraints	6
Membership	Constraint 58 (membership-empty-collection) Constraint 52 (typing)	6

2.3. Compliance with this document

TODO: Add collection and account constraint sections to the compliance list as appropriate.

For the purpose of compliance, the normative sections of this document are section 2.3. Compliance with this document, section 3.4. Definitions and Inferences and Definitions, section 4. Equivalence, and section 5. Validity Constraints, and section 6. Normalization, Validity, and Equivalence. To be compliant:

When processing provenance, an application may apply the inferences and definitions in section 3.4. Definitions and Inferences and Definitions.
If determining whether a PROV instance or document is valid, an application must check that all of the constraints of section 5. Constraints are satisfied on the normal form of the instance or document.
If producing provenance meant for other applications to use, the application should produce valid provenance, as specified in section 6. Normalization, Validity, and Equivalence.
WhenIf determining whether two PROV instances or documents are equivalent, an application must determine whether their normal forms are equal, as specified in section 4.6. Normalization, Validity, and Equivalence.

When determining whether

Compliant applications are not required to explicitly compute normal forms; however, if checking validity or equivalence, the results should be the same as would be obtained by computing normal forms as defined in this specification.

All figures are for illustration purposes only. Information in tables is normative if it appears in a PROV instance normative section; specifically, Table 3 is valid, an application must check that all of the constraints of section 5. Validity Constraints are satisfiednormative. on the normal form of the instance. When producing provenance meant for other applications to use, the application should produce valid provenance. Should we specifyText in appendices and in boxes labeled "Remark" is informative. Where there is any apparent ambiguity between the descriptive text and the formal text in a way for PROV instances to say whether they are meant to be validated "definition", "inference" or not? Seems outside the scope of this document, may require changes to PROV-N. "constraint" box, the formal text takes priority.

3. 4. Definitions and Inferences and Definitions

In this section, we describeThis section describes definitions and inferences and definitions that may be used on provenance data, and preserve equivalence on valid PROV instances (as detailed in section 6. Normalization, Validity, and Equivalence). A definition is a notion of equivalence on rule that can be applied to PROV instances. instances to replace defined expressions with definitions. An inference is a rule that can be applied to PROV instances to add new PROV statements. A definition states that a provenance statement is equivalent to some other statements, whereas an inference only states one direction of an implication; thus, defined provenance statements can be replaced by their definitions.

Definitions have the following general form:

Definition-example NNN (definition-example)

defined_stmt IF AND ONLY IF there exists a rule₁,..., a_m such that statesdefining_stmt₁ and ... and defining_stmt_n.

A definition can be applied to a PROV instance, since its defined_stmt is defined in terms of other statements. Applying a definition to an instance means that if an occurrence of a defined provenance expressionstatement defined_stmt can be found in a PROV instance, then we can remove it and add all of the statements defining_stmt₁ ... defining_stmt_n to the instance, possibly after generating fresh identifiers a₁,...,a_m for existential variables. In other words, it is equivalentsafe to replace a defined statement with its definition.

Inferences have the following general form:

Inference-example NNN (inference-example)

IF hyp₁ and ... and hyp_k THEN there exists a₁ and ... and a_m such that concl₁ and ... and concl_n.

Inferences can be applied to PROV instances. Applying an inference to an instance means that if all of the provenance statements matching hyp₁... hyp_k can be found in the instance, then we check whether the conclusion concl₁ ... concl_n is satisfied for some other expressions; thus, defined provenance expressions canvalues of existential variables. If so, application of the inference has no effect on the instance. If not, then a copy the conclusion should be replaced by their definitions, and vice versa. added to the instance, after generating fresh identifiers a₁,...,a_m for the existential variables. These fresh identifiers might later be found to be equal to known identifiers; they play a similar role in PROV constraints to existential variables in logic, to "labeled nulls" in database theory [DBCONSTRAINTS], or to blank nodes in [RDF]. In general, omitted optional parameters to [PROV-N] statements, or explicit - markers, are placeholders for existentially quantified variables; that is, they denote unknown values. There are a few exceptions to this general rule, which are specified in Definition 4 (optional-placeholders).

InferencesDefinitions and inferences can be viewed as logical formulas; similar formalisms are often used in rule-based reasoning [CHR] and in databases [DBCONSTRAINTS]. In particular, the identifiers a₁ ... a_n should be viewed as existentially quantified variables, meaning that through subsequent reasoning steps they may turn out to be equal to other identifiers that are already known, or to other existentially quantified variables. Their treatment is analogous to that of blank nodes in RDF. In contrast, distinct URIs or literal values in PROV are assumed to be distinct for the purpose of checking validity or inferences. This issue is discussed in more detail under Uniqueness Constraints.

In a [definition|inference], term symbols such as id, start, end, e, a, attrs, are assumed to be variables unless otherwise specified. These variables are scoped at the [definition|inference|constraint] level, so the rule is equivalent to any one-for-one renaming of the variable names. When several rules are collected within a [definition|inference] as an ordered list, the scope of the variables in each rule is at the level of list elements, and so reuse of variable names in different rules does not affect the meaning.

4.1 Optional Identifiers and Attributes

Many PROV relation statements have an identifier, identifying a link between two or more related objects. Identifiers can sometimes be omitted in [PROV-O] notation. For the purpose of inference and validity checking, we generate special identifiers called existential variables denoting the unknown values.

Existential variables can be substituted with other terms. Specifically, a substitution is a function from a set of existential variables to identifiers, literals, the placeholder -, or other existential variables. A substitution S can be applied to an instance I by replacing all occurrences of existential variables x in the instance with S(x).

Definition 1 (optional-identifiers), Definition 2 (optional-attributes), and Definition 3 (definition-short-forms), explain how to expand the compact forms of PROV-N notation into a normal form. Definition 4 (optional-placeholders) indicates when other optional parameters can be replaced by existential variables.

Definition 1 (optional-identifiers)

For each r in { used, wasGeneratedBy, wasInvalidatedBy, wasInfluencedBy, wasStartedBy, wasEndedBy, wasInformedBy, wasDerivedFrom, wasAttributedTo, wasAssociatedWith, actedOnBehalfOf}, the following general form: definitional rules hold:

r(a₁,...,a_n) IF hyp_1 and ... and hyp_k AND ONLY IF there exists id such that r(id; a₁,...,a_n).
r(-; a₁,...,a_n) THENIF AND ONLY IF there exists a_1 and ... and a_mid such that conclusion_1 and ... and conclusion_n r(id; a₁,...,a_n).

This meansLikewise, many PROV-N statements allow for an optional attribute list. If it is omitted, this is the same as specifying an empty attribute list:

Definition 2 (optional-attributes)

For each p in {entity, activity, agent}, if a_n is not an attribute list parameter then the following definitional rule holds:
p(a₁,...,a_n) IF AND ONLY IF p(a₁,...,a_n,[]).
For each r in { used, wasGeneratedBy, wasInvalidated, wasInfluencedBy, wasStartedBy, wasEndedBy, wasInformedBy, wasDerivedFrom, wasAttributedTo, wasAssociatedWith, actedOnBehalfOf}, if a_n is not an attribute list parameter then the following definition holds:
r(id; a₁,...,a_n) IF AND ONLY IF r(id; a₁,...,a_n,[]).

Definitions Definition 1 (optional-identifiers) and Definition 2 (optional-attributes). do not apply to alternateOf, specializationOf, and mentionOf, which do not have identifiers and attributes.

Finally, many PROV statements have other optional arguments or short forms that can be used if allnone of the provenance expressions matching hyp_1... hyp_k can be found in a PROV instance, we can add all of the expressions concl_1 ... concl_n to the instance, possibly after generating fresh identifiers a_1,...,a_m for unknown objects.optional arguments is present. These are handled by specific rules listed below.

Definition 3 (definition-short-forms)

activity(id,attrs) IF AND ONLY IF activity(id,-,-,attrs).
wasGeneratedBy(id; e,attrs) IF AND ONLY IF wasGeneratedBy(id; e,-,-,attrs).
used(id; a,attrs) IF AND ONLY IF used(id; a,-,-,attrs).
wasStartedBy(id; a,attrs) IF AND ONLY IF wasStartedBy(id; a,-,-,-,attrs).
wasEndedBy(id; a,attrs) IF AND ONLY IF wasEndedBy(id; a,-,-,-,attrs).
wasInvalidatedBy(id; e,attrs) IF AND ONLY IF wasInvalidatedBy(id; e,-,-,attrs).
wasDerivedFrom(id; e2,e1,attrs) IF AND ONLY IF wasDerivedFrom(id; e2,e1,-,-,-,attrs).
wasAssociatedWith(id; e,attrs) IF AND ONLY IF wasAssociatedWith(id; e,-,-,attrs).
actedOnBehalfOf(id; a2,a1,attrs) IF AND ONLY IF actedOnBehalfOf(id; a2,a1,-,attrs).

There are no expansion rules for entity, agent, communication, attribution, influence, alternate, specialization, or mention relations, because these have no optional parameters aside from the identifier and attributes, which are expanded by the rules in Definition 1 (optional-identifiers) and Definition 2 (optional-attributes).

Finally, most optional parameters (written -) are, for the purpose of this document, considered to be distinct, fresh existential variables. Optional parameters are defined in [PROV-DM] and in [PROV-N] for each type of PROV statement. Thus, before proceeding to apply other definitions or inferences, most occurrences of - are to be replaced by fresh existential variables, distinct from any others occurring in the instance. The only exceptions to this general rule, where - are to be left in place, are the activity, generation, and usage parameters in wasDerivedFrom and the plan parameter in wasAssociatedWith. This is further explained in remarks below.

The treatment of optional parameters is specified formally using the auxiliary concept of expandable parameter. An expandable parameter is one that can be omitted using the placeholder -, and if so, it is to be replaced by a fresh existential identifier. Table 3 defines the expandable parameters of the properties of PROV, needed in Definition 4 (optional-placeholders). For emphasis, the four optional parameters that are not expandable are also listed. Parameters that cannot have value -, and identifiers might later be found to be equal to known identifiers; these fresh identifiers play that are expanded by Definition 1 (optional-identifiers), are not listed.

Table 3: Expandable and Non-Expandable Parameters
Relation	Expandable	Non-expandable

used(id; a,e,t,attrs)	e,t
wasGeneratedBy(id; e,a,t,attrs)	a,t
wasStartedBy(id; a2,e,a1,t,attrs)	e,a1,t
wasEndedBy(id; a2,e,a1,t,attrs)	e,a1,t
wasInvalidatedBy(id; e,a,t,attrs)	a,t
wasDerivedFrom(id; e2,e1,-,g,u,attrs)		g,u
wasDerivedFrom(id; e2,e1,a,g,u,attrs) (where a similar role in PROV constraints to existential variables in logic. TODO: Is this re-inventing blank nodes in PROV-DM, and do we want to do this? A lot of the inferences have existentially quantified conclusions (and there is some theory that supports this). TODO: Make sure conjunctive reading of conclusion is clear. not placeholder -)	g,u	a
wasAssociatedWith(id; a,ag,pl,attrs)	ag	pl
actedOnBehalfOf(id; ag2,ag1,a,attrs)	a

DefinitionsDefinition 4 (optional-placeholders) states how parameters are to be expanded, using the expandable parameters defined in Table 3. The last two parts, 4 and 5, indicate how to handle expansion of parameters for wasDerivedFrom expansion, which is only allowed for the generation and use parameters when the activity is specified. Essentially, the definitions state that parameters g,u are expandable only if the activity is specified, i.e., if parameter a is provided. The rationale for this is that when a is provided, then there have to be two events, namely u and g, which account for the usage of e1 and the generation of e2, respectively, by a. Conversely, if a is not provided, then one cannot tell whether one or more activities are involved in the derivation, and the explicit introduction of such events, which correspond to a single acitivity, would therefore not be justified.

A later constraint, Constraint 53 (impossible-unspecified-derivation-generation-use), forbids specifying generation and use parameters when the activity is unspecified.

Definition 4 (optional-placeholders)

activity(id,-,t2,attrs) IF AND ONLY IF there exists t1 such that activity(id,t1,t2,attrs). Here, t2 may be a placeholder.
activity(id,t1,-,attrs) IF AND ONLY IF there exists t2 such that activity(id,t1,t2,attrs). Here, t1 must not be a placeholder.
For each r in { used, wasGeneratedBy, wasStartedBy, wasEndedBy, wasInvalidatedBy, wasAssociatedWith, actedOnBehalfOf }, if the ith parameter of r is an expandable parameter of r as specified in Table 3 then the following general form:definition holds:
r(a₀;...,a_i-1, -, a_i+1, ...,a_n) IF AND ONLY IF there exists a' such that r(a₀;...,a_i-1,a',a_i+1,...,a_n).
If a is not the placeholder -, and u is any term, then the following definition holds:
wasDerivedFrom(id;e2,e1,a,-,u,attrs) IF AND ONLY IF there exists g such that wasDerivedFrom(id; e2,e1,a,g,u,attrs).
If a is not the placeholder -, and g is any term, then the following definition holds:
wasDerivedFrom(id;e2,e1,a,g,-,attrs) IF AND ONLY IF there exists u such that wasDerivedFrom(id; e2,e1,a,g,u,attrs).

In an association of the form wasAssociatedWith(id; a,ag,-,attr), the absence of a plan means: either no plan exists, or a plan exists but it is not identified. Thus, it is not equivalent to wasAssociatedWith(id; a,ag,p,attr) where a plan p is given.

A derivation wasDerivedFrom(id; e2,e1,a,gen,use,attrs) that specifies an activity explicitly indicates that this activity achieved the derivation, with a usage use of entity e1, and a generation gen of entity e2. It differs from a derivation of the form wasDerivedFrom(id; e2,e1,-,-,-,attrs) with missing activity, generation, and usage. In the latter form, it is not specified if one or more activities are involved in the derivation.

Let us consider a system, in which a derivation is underpinned by multiple activities. Conceptually, one could also model such a system with a new activity that encompasses the two original activities and underpins the derivation. The inferences defined in this specification do not allow the latter modelling to be inferred from the former. Hence, the two modellings of the same system are regarded as different in the context of this specification.

4.2 Entities and Activities

Communication between activities implies the existence of an underlying entity generated by one activity and used by the other, and vice versa.

defined_exp holds

Inference 5 (communication-generation-use-inference)

IF AND ONLY IF wasInformedBy(_id; a2,a1,_attrs) THEN there exists a_1,..., a_m such that defining_exp_1 andexist e, ... and defining_exp_n. This means that a provenance expression defined_exp is defined in terms of other expressions. This can be viewed as a two-way inference: If defined_exp can be found in a PROV instance, we can add all of the expressions defining_exp_1 ... defining_exp_n to the instance, possibly after generating fresh identifiers a_1,...,a_m for unknown objects. Conversely, if there exist identifiers a_1...a_m such that defining_exp_1 and ... and defining_exp_n hold in the instance, we can add the defined expression def_exp. When an expression is defined in terms of others, it is in a sense redundant; it is safe to replace it with its definition. 3.1 Component 1: Entities and Activities Communication between activities is defined in terms as the existence of an underlying entity generated by one activity and used by the other. Given two activities identified by a1 and a2_gen, _t1, _use, and _t2, wasInformedBy(a2,a1) holds such that wasGeneratedBy(_gen; e,a1,_t1,[]) and used(_use; a2,e,_t2,[]) hold.

Inference 6 (generation-use-communication-inference)

IF AND ONLY IF wasGeneratedBy(_gen; e,a1,_t1,_attrs1) and used(_id2; a2,e,_t2,_attrs2) hold THEN there is an entity with some identifier e and some sets of attribute-value pairs attrs1 and attrs2,exists _id such that wasGeneratedBy(-,e,a1,-,attrs1) and used(-,a2,e,-,attrs2) hold. wasInformedBy(_id; a2,a1,[])

The relationship wasInformedBy is not transitive.transitive. Indeed, consider the following statements.

wasInformedBy(a2,a1)
wasInformedBy(a3,a2)

We cannot infer wasInformedBy(a3,a1) from these expressions.statements alone. Indeed, from wasInformedBy(a2,a1), we know that there exists e1 such that e1 was generated by a1 and used by a2. Likewise, from wasInformedBy(a3,a2), we know that there exists e2 such that e2 was generated by a2 and used by a3. The following illustration shows a case for which transitivity cannot hold.counterexample to transitivity. The horizontal axis represents the event line. We see that e1 was generated after e2 was used. Furthermore, the illustration also shows that a3 completes before a1. started. So in this example (with no other information) it is impossible for a3 to have used an entity generated by a1. This is illustrated in Figure 2.

Figure 2^◊: Counter-example for transitivity of wasInformedBy

From an entity statement, we can infer the existence of generation and invalidation events.

Inference 7 (entity-generation-invalidation-inference)

IF entity(e,_attrs) THEN there exist _gen, _a1, _t1, _inv, _a2, and _t2 such that wasGeneratedBy(_gen; e,_a1,_t1,[]) and wasInvalidatedBy(_inv; e,_a2,_t2,[]).

From an activity statement, we can infer start and end events whose times match the start and end times of the activity, respectively.

Inference 8 (activity-start-end-inference)

Start IF activity(a,t1,t2,_attrs) THEN there exist _start, _e1, _a1, _end, _a2, and _e2 such that wasStartedBy(_start; a,_e1,_a1,t1,[]) and wasEndedBy(_end; a,_e2,_a2,t2,[]).

The start of a2an activity a triggered by entity e1 implies that e1 was generated by the starting activity a1 is defined as follows..

Given two activities with identifiers a1 and a2, wasStartedByActivity(a2,a1) holds

Inference 9 (wasStartedBy-inference)

IF AND ONLY wasStartedBy(_id; a,e1,a1,_t,_attrs), THEN there exist _gen and _t1 such that wasGeneratedBy(_gen; e1,a1,_t1,[]).

Likewise, the ending of activity a by triggering entity e1 implies that e1 was generated by the ending activity a1.

Inference 10 (wasEndedBy-inference)

IF wasEndedBy(_id; a,e1,a1,_t,_attrs), THEN there exists an entity e exist _gen and _t1 such that wasGeneratedBy(-,e,a1,-,-) and wasStartedBy(-,a2,e,-,-) hold.wasGeneratedBy(_gen; e1,a1,_t1,[]).

3.2 Component 2: 4.3 Derivations

Derivations with explicit activity, generation, and usage admit the following inference:

Inference 11 (derivation-generation-use-inference)

In this inference, none of a, gen2 or use1 can be placeholders -.

IF wasDerivedFrom(_id; e2,e1,a,gen2,use1,_attrs), THEN there exists _t1 and _t2 such that used(use1; a,e1,_t1,[]) and wasGeneratedBy(gen2; e2,a,_t2,[]).

A revision admits the following inference, stating that the two entities linked by a revision are also alternates.

Inference 12 (revision-is-alternate-inference)

In this inference, any of _a, _g or _u may be placeholders.

IF wasDerivedFrom(_id; e2,e1,_a,_g,_u,[prov:type='prov:Revision']), THEN alternateOf(e2,e1).

There is no inference stating that wasDerivedFrom is transitive.

4.4 Agents

Attribution identifiesis the ascribing of an agent as responsible forentity to an entity.agent. An entity can only be ascribed to an agent can only be responsible for an entity if itthe agent was associated with an activity that generated the entity. If the activity, generation and association events are not explicit in the instance, they can be inferred.

Inference 13 (attribution-inference)

IF wasAttributedTo(e,ag) holds for some identifiers e and ag, wasAttributedTo(_att; e,ag,_attrs) THEN there exists an activity with some identifier exist a , _t, _gen, _assoc, _pl, such that the following statements hold: activity(a, -, -,-) wasGeneratedBy(-,e, a, -,_) wasAssociatedWith(-,a, ag, -, -) wasGeneratedBy(_gen; e,a,_t,[]) and wasAssociatedWith(_assoc; a,ag,_pl,[]).

Responsibility

In the above inference, _pl is an existential variable, so it can be merged with a constant identifier, another existential variable, or a placeholder -, as explained in the definition of merging.

Delegation relates agents where one agent acts on behalf of another, in the context of some activity. The supervising agent delegates some responsibility for part of the activity to the subordinate agent, while retaining some responsibility for the overall activity. Both agents are associated with this activity.

@@TODO: Could this be an inference? Does it imply

Inference 14 (delegation-inference)

IF actedOnBehalfOf(_id; ag1, ag2, a, _attrs) THEN there exist _id1, _pl1, _id2, and _pl2 such that a1 is associated with all activities a2 is associated with? wasAssociatedWith(_id1; a, ag1, _pl1, []) and wasAssociatedWith(_id2; a, ag2, _pl2, []).

The two associations between the agents and the activity may have different identifiers, different plans, and different attributes. In particular, the plans of the two agents need not be the same, and one, both, or neither can be the placeholder - indicating that there is no plan, because the existential variables _pl1 and _pl2 can be replaced with constant identifiers, existential variables, or placeholders - independently, as explained in the definition of merging.

The wasInfluencedBy relation is implied by other relations, including usage, start, end, generation, invalidation, communication, derivation, attribution, association, and delegation. To capture this explicitly, we allow the following inferences:

Inference 15 (influence-inference)

IF wasGeneratedBy(id; e,a,_t,attrs) THEN wasInfluencedBy(id; e, a, attrs).
IF used(id; a,e,_t,attrs) THEN wasInfluencedBy(id; a, e, attrs).
IF wasInformedBy(id; a2,a1,attrs) THEN wasInfluencedBy(id; a2, a1, attrs).
IF wasStartedBy(id; a2,e,a1,_t,attrs) THEN wasInfluencedBy(id; a2, e, attrs).
IF wasEndedBy(id; a2,e,_a1,_t,attrs) THEN wasInfluencedBy(id; a2, e, attrs).
IF wasInvalidatedBy(id; e,a,_t,attrs) THEN wasInfluencedBy(id; e, a, attrs).
IF wasDerivedFrom(id; e2, e1, a, g, u, attrs) THEN wasInfluencedBy(id; e2, e1, attrs). Here, a, g, u may be placeholders -.
IF wasAttributedTo(id; e,ag,attrs) THEN wasInfluencedBy(id; e, ag, attrs).
IF wasAssociatedWith(id; a,ag,_pl,attrs) THEN wasInfluencedBy(id; a, ag, attrs). Here, _pl may be a placeholder -.
IF actedOnBehalfOf(id; ag2,ag1,_a,attrs) THEN wasInfluencedBy(id; ag2, ag1, attrs).

The inferences above permit the use of same identifier for an influence relationship and a more specific relationship.

3.3 Component 3: Derivations4.5 Alternate and Specialized Entities

The relation alternateOf is an equivalence relation: that is, it is reflexive, transitive and symmetric. As a consequence, the following inferences can be applied:

Inference 16 (alternate-reflexive)

Derivations with an explicit activity and no usage admit the following inference: Given an activity a, entities denoted by e1 and e2, IF wasDerivedFrom(-,e2,e1, a, -) and wasGeneratedBy(-,e2,a,-,-) hold, entity(e) THEN used(-,a,e1,-,-) also holds. alternateOf(e,e).

This inference is justified by the fact that the entity denoted by e2 is generated by at most one activity in a given account (see generation-uniqueness). Hence, this activity is also the one referred to by the usage of e1. There is some redundancy in the following discussion. The converse inference does not hold. From wasDerivedFrom(e2,e1) and used(a,e1,-), one cannot derive wasGeneratedBy(e2,a,-) because identifier e1 may occur in usages performed by many activities, which may have not generated the entity denoted by e2. Note that derivation cannot in general be inferred from the existence of related usage and generation events. Indeed, when a generation wasGeneratedBy(g, e2, a, -, attrs2) precedes used(u, a, e1, -, attrs1), for some e1, e2, attrs1, attrs2, and a, one cannot infer derivation wasDerivedFrom(e2, e1, a, g, u) or wasDerivedFrom(e2,e1) since e2 cannot possibly be derived from e1, given the creation of e2 precedes the use of e1. That is, if e1 is generated by an activity before e2 is used, then obviously e2 cannot have been derived from e1. However, even if e2 happens used before e1 is generated, it is not safe to assume that e2 was derived from e1. Derivation is not defined to be transitive. Domain-specific specializations of derivation may be defined in such a way that the transitivity property holds. A revision admits the following inference, linking the two entities by a derivation, and stating them to be alternates. Given two identifiers e1 and e2 identifying two entities, and an identifier ag identifying an agent, IF wasRevisionOf(-,e2,e1,ag) holds, THEN the following hold: wasDerivedFrom(-,e2,e1,-) alternateOf(e1,e2) wasAttributedTo(e2,ag) The following doesn't make sense because wasRevisionOf and wasDerivedFrom have different types. wasRevisionOf is a strict sub-relation of wasDerivedFrom since two entities e2 and e1 may satisfy wasDerivedFrom(e2,e1) without being a variant of each other. Motivation for quotation inference IF wasQuotedFrom(e2,e1,ag2,ag1,attrs) holds for some identifiers e2, e1, ag2, ag1, THEN the following hold: wasDerivedFrom(e2,e1) wasAttributedTo(e2,ag2) wasAttributedTo(e1,ag1) Traceability allows an entity to be transitively linked to another entity it is derived from, to an agent it is attributed to, or another agent having some responsibility, or a trigger of an activity that generated it. Traceability can be inferred from existing statements, or can be asserted stating that a dependency path exists without its individual steps being expressed. This is captured by the following inferences: Given two identifiers e2 and e1 for entities, the following statements hold: IF wasDerivedFrom(e2,e1,a,g2,u1) holds, for some a, g2, u1, THEN tracedTo(e2,e1) also holds. IF wasDerivedFrom(e2,e1) holds, THEN tracedTo(e2,e1) also holds. IF wasAttributedTo(e2,ag1,aAttr) holds, THEN tracedTo(e2,ag1) also holds. IF wasAttributedTo(e2,ag2,aAttr), wasGeneratedBy(-,e2,a,-,gAttr), and actedOnBehalfOf(ag2,ag1,a,rAttr) hold, for some a, ag2, ag1, aAttr, gAttr, and rAttr, THEN tracedTo(e2,ag1) also holds. IF wasGeneratedBy(e2,a,gAttr) and wasStartedBy(a,e1,sAttr) hold, for some a, gAttr , sAttr then tracedTo(e2,e1) holds. IF tracedTo(e2,e) and tracedTo(e,e1) hold for some e, THEN tracedTo(e2,e1) also holds. We note that the inference rule traceability-inference does not allow us to infer anything about the attributes of the related entities, agents or events. 3.4 Component 4: Alternate Entities TODO: There is currently no consensus what inferences on alternate or specialization should be assumed. The following section lists possible inferences that may or may not be adopted. Section is under review, pending ISSUE-29. The relation alternateOf is an equivalence relation: reflexive, transitive and symmetric. For any entity e, we have alternateOf(e,e). For any entities e1, e2, e3, IF alternateOf(e1,e2) and alternateOf(e2,e3) THEN alternateOf(e1,e3). For any entity e1, e2, IF alternateOf(e1,e2) THEN alternateOf(e2,e1).

Inference 17 (alternate-transitive)

IF alternateOf(e1,e2) and alternateOf(e2,e3) THEN alternateOf(e1,e3).

Inference 18 (alternate-symmetric)

IF alternateOf(e1,e2) THEN alternateOf(e2,e1).

Similarly, specialization is a strict partial order:order: it is irreflexive, anti-symmetric and transitive. Irreflexivity is handled later as Constraint 54 (impossible-specialization-reflexive)

Inference 19 (specialization-transitive)

IF specializationOf(e1,e2) For any entity e, it is not the case that have specializationOf(e,e)and specializationOf(e2,e3) THEN specializationOf(e1,e3).

For any entities e1, e2, it is not the case that specializationOf(e1,e2) and specializationOf(e2,e1). For any entities e1, e2, e3, IF specializationOf(e1,e2) and specializationOf(e2,e3) THEN specializationOf(e1,e3). Finally, if

If one entity specializes another, then they are also alternates:

For any entities e1, e2,

Inference 20 (specialization-alternate-inference)

IF specializationOf(e1,e2) THEN alternateOf(e1,e2).

If one entity specializes another then all attributes of the more general entity are also attributes of the more specific one.

Inference 21 (specialization-attributes-inference)

IF entity(e1, attrs) and specializationOf(e2,e1), THEN entity(e2, attrs).

TODO: Possible inferences about attributes, generation, invalidation?

Note: The following inference is associated with a feature "at risk" and may be removed from this specification based on feedback. Please send feedback to public-prov-comments@w3.org.

The following sections are retained from an older version, and are not consistent with the above constraints. This will be revised once the consensus on ISSUE-29

If one entity is clearer.a mention of another in a bundle, then the former is also a specialization of the latter:

Inference 22 (mention-specialization-inference)

IF mentionOf(e2,e1,b) THEN specializationOf(e2,e1).

3.4.1 Specialization Specialization is neither symmetric nor anti-symmetric. "Alice's toyota car on fifth Avenue" is a specialization of "Alice's toyota car", but the converse does not hold.

anti-symmetric counter-example???

5. Validity Constraints

This section defines a collection of constraints on PROV instances. A PROV instance is valid if, after applying all possible inference and definition rules from Section 2, the resulting instance satisfies all of the constraints specified in this section. There are twothree kinds of constraints:

uniqueness constraints that say that a PROV instance can contain at most one expression or that multiple expressions aboutstatement of each kind with a given identifier. For example, if we describe the same objects need togeneration event twice, then the two statements should have the same values (for example, if we describe the same generation event twice, then the two expressions should have the same times);times;
and event ordering constraints that say that it should be possible to arrange the events (generation, usage, invalidation, start, end) described in a PROV instance into a partial orderpreorder that corresponds to a sensible "history" (for example, an entity should not be generated after it is used).used); and
impossibility constraints, which forbid certain patterns of statements in valid PROV instances.

The PROV data modelAs in a [definition|inference], term symbols such as id, start, end, e, a, attrs in a constraint, are assumed to be variables unless otherwise specified. These variables are scoped at the constraint level, so the rule is implicitly based onequivalent to any one-for-one renaming of the variable names. When several rules are collected within a notionconstraint as an ordered list, the scope of instantaneous events (or just events), that mark transitionsthe variables in the world. Events include generation, usage, or invalidationeach rule is at the level of entities, as well as starting or endinglist elements, and so reuse of activities. This notion of event is variable names in different rules does not first-class in the data model, but it is useful for explaining its other concepts and its semantics [PROV-SEM]. Thus, events help justify inferences on provenance as well as validity constraints indicating when provenance is self-consistent. In section 8.4 Events and Time we discuss the motivation for instantaneous events and their relationship to time in greater detail.affect the meaning.

PROV-DM identifies five kinds of instantaneous events, namely entity generation event, entity usage event, entity invalidation event, activity start event and activity end event. PROV-DM adopts Lamport's clock assumptions [CLOCK] in the form of a reflexive, transitive partial order follows (and its inverse precedes) between instantaneous events. Furthermore, PROV-DM assumes the existence of a mapping from instantaneous events to time clocks, though the actual mapping is not in scope of this specification. TODO: More about what it means for constraints to be satisfied; constraint template(s)

5.1 Uniqueness Constraints

Attribute

In the absence of existential variables, uniqueness constraints? constraints could be checked directly by checking that no identifier appears more than once for a given statement. However, in the presence of existential variables, we need to be more careful to combine partial information that might be present in multiple compatible statements, due to inferences. Uniqueness constraints are enforced through merging pairs of statements subject to equalities. For example, suppose we have two activity statements activity(a,2011-11-16T16:00:00,_t1,[a=1]) and activity(a,_t2,2011-11-16T18:00:00,[b=2]), with existential variables _t1 and _t2. The merge of these two statements (describing the same activity a) is activity(a,2011-11-16T16:00:00,2011-11-16T18:00:00,[a=1,b=2]).

Merging is an operation that can be applied to a pair of terms, or a pair of attribute lists. The result of merging is either a substitution (mapping existentially quantified variables to terms) or failure, indicating that the merge cannot be performed. Merging of pairs of terms, attribute lists, or statements is defined as follows.

If t and t' are constant identifiers or values (including the placeholder -), then their merge exists only if they are equal, otherwise merging fails.
If x is an existential variable and t' is any term (identifier, constant, placeholder -, or existential variable), then their merge is t', and the resulting substitution is [x=t']. In the special case where t'=x, the merge is x and the resulting substitution is empty.
If t is any term (identifier, constant, placeholder -, or existential variable) and x' is an existential variable, then their merge is the same as the merge of x and t.
The merge of two attribute lists attrs1 and attrs2 is their union, considered as sets of key-value pairs, written attrs1 ∪ attrs2. Duplicate keys with different are allowed, but equal key-value pairs are merged.

Merging for terms is analogous to unification in logic programming and theorem proving, restricted to flat terms with no function symbols. No occurs check is needed because there are no function symbols.

We assume A typical uniqueness constraint is as follows:

Constraint-example NNN (uniqueness-example)

IF hyp₁ and ... and hyp_n THEN t₁ = u₁ and ... and t_n = u_n.

Such a constraint is enforced as follows:

Suppose PROV instance I contains all of the hypotheses hyp₁ and ... and hyp_n.
Attempt to merge all of the equated terms in the conclusion t₁ = u₁ and ... and t_n = u_n.
If merging fails, then the constraint is unsatisfiable, so application of the constraint to I fails. If this failure occurs during normalization prior to validation, then I is invalid, as explained in Section 6.
If merging succeeds with a substitution S, then S is applied to the instance I, yielding result S(I).

Key constraints are uniqueness constraints that the various identified objectsspecify that a particular key field of PROV-DM have unique statements describing them within a PROV instance.relation uniquely determines the other parameters. Key constraints are written as follows:

Given an entity identifier e, there is at most one expression entity(e,attrs), where attrs is some set of attribute-values. Given an activity identifier a, there is at most one expression activity(a,t1,t2,attrs), where attrs is some set of attribute-values.

Constraint-example NNN (key-example)

TODO: Same goes

The a_k field is a KEY for relation r(a₀; a₁,...,a_n).

Because of the presence of attributes, key constraints do not reduce directly to uniqueness constraints. Instead, we enforce key constraints as follows.

Suppose r(a₀; a₁,...a_n,attrs1) and r(b₀; b₁,...b_n,attrs2) hold in PROV instance I, where the key fields a_k = b_k are equal.
Attempt to merge all of the corresponding parameters a₀ = b₀ and ... and a_n = b_n.
If merging fails, then the constraint is unsatisfiable, so application of the key constraint to I fails.
If merging succeeds with substitution S, then we remove r(a₀; a₁,...a_n,attrs1) and r(b₀; b₁,...b_n,attrs2) from I, obtaining instance I', and return instance {r(S(a₀); S(a₁),...S(a_n),attrs1 ∪ attrs2)} ∪ S(I').

Thus, if a PROV instance contains an apparent violation of a uniqueness constraint or key constraint, merging can be used to determine whether the constraint can be satisfied by instantiating some existential variables with other objects: agent, note, generation, usage, invalidation, start, end, communication, start by, attribution, association, responsibility, derivation, revision, quotation.terms. We should findFor key constraints, this is the same as merging pairs of statements whose keys are equal and whose corresponding arguments are compatible, because after merging respective arguments and attribute lists, the two statements become equal and one can be omitted.

The various identified objects of PROV must have unique statements describing them within a way valid PROV instance. This is enforced through the following key constraints:

Constraint 23 (key-object)

The identifier field e is a KEY for the entity(e,attrs) statement.
The identifier field a is a KEY for the activity(a,t1,t2,attrs) statement.
The identifier field ag is a KEY for the agent(ag,attrs) statement.

Likewise, the statements in a valid PROV instance must provide consistent information about each identified object or relationship. The following key constraints require that all of saying this once concisely. the information about each identified statement can be merged into a single, consistent statement:

Constraint 24 (key-properties)

We assume that an

The identifier field id is a KEY for the wasGeneratedBy(id; e,a,t,attrs) statement.
The identifier field id is a KEY for the used(id; a,e,t,attrs) statement.
The identifier field id is a KEY for the wasInformedBy(id; a2,a1,attrs) statement.
The identifier field id is a KEY for the wasStartedBy(id; a2,e,a1,t,attrs) statement.
The identifier field id is a KEY for the wasEndedBy(id; a2,e,a1,t,attrs) statement.
The identifier field id is a KEY for the wasInvalidatedBy(id; e,a,t,attrs) statement.
The identifier field id is a KEY for the wasDerivedFrom(id; e2, e1, a, g2, u1, attrs) statement.
The identifier field id is a KEY for the wasAttributedTo(id; e,ag,attr) statement.
The identifier field id is a KEY for the wasAssociatedWith(id; a,ag,pl,attrs) statement.
The identifier field id is a KEY for the wasAssociatedWith(id; a,ag,-,attrs) statement.
The identifier field id is a KEY for the actedOnBehalfOf(id; ag2,ag1,a,attrs) statement.
The identifier field id is a KEY for the wasInfluencedBy(id; o2,o1,attrs) statement.

Entities may have multiple generation or invalidation events (either or both may, however, be left implicit). An entity has exactlycan be generated by more than one activity, with one generation and invalidation event (either or both may, however,per each entity-activity pair. These events must be left implicit). So, PROV-DM allows for two distinct generations g1 and g2 referencing the same entity provided they occur simultaneouslysimultaneous, as required by Constraint 41 (generation-generation-ordering) and Constraint 42 (invalidation-invalidation-ordering). This implies that the two generation events are actually the same and caused by the same activity, though provenance may contain several statements for the same world activity.

Given an entity denoted by e, two activities denoted by a1 and a2, two time instants t1 and t2, and two sets of attribute-value pairs attrs1 and attrs2,

Constraint 25 (unique-generation)

IF wasGeneratedBy(id1, e, a1, t1, attrs1) and wasGeneratedBy(id2, e, a2, t2, attrs2) exist,wasGeneratedBy(gen1; e,a,_t1,_attrs1) and wasGeneratedBy(gen2; e,a,_t2,_attrs2), THEN id1gen1 =id2, a1=a2, t1=t2 and attrs1=attrs2 gen2. Wouldn't the above constraint violate uniqueness? Invalidation uniqueness? A generation can be used to indicate a generation time without having to specify the involved activity. A generation time is unique, as specified by the following constraint.

Seems redundant given generation-uniqueness Given an entity denoted by e and two time instants t1 and t2, IF wasGeneratedBy(e, -, t1) and wasGeneratedBy(e, -, t2) hold, THEN t1=t2.

Constraint 26 (unique-invalidation)

IF wasInvalidatedBy(inv1; e,a,_t1,_attrs1) and wasInvalidatedBy(inv2; e,a,_t2,_attrs2), THEN inv1 = inv2.

It follows from the above uniqueness and key constraints that the generation and invalidation events linking an entity and activity are unique, if specified. However, because we apply the constraints by merging, it is possible for a valid PROV instance to contain multiple statements about the same generation or invalidation event, for example:

wasGeneratedBy(id1; e,a,-,[prov:location="Paris"])
wasGeneratedBy(-; e,a,-,[color="Red"])

When the uniqueness and key constraints are applied, the instance is normalized to the following form:

wasGeneratedBy(id1; e,a,_t,[prov:location="Paris",color="Red"])

where _t is a new existential variable.

An activity may have more than one start and end event, each having a different activity (either or both may, however, be left implicit). However, the triggering entity linking any two activities in a start or end event is unique. That is, an activity may be started by several other activities, with shared or separate triggering entities. If an activity is started or ended by multiple events, they must all be simultaneous, as specified in Constraint 33 (start-start-ordering) and Constraint 34 (end-end-ordering).

Constraint 27 (unique-wasStartedBy)

IF wasStartedBy(start1; a,_e1,a0,_t1,_attrs1) and wasStartedBy(start2; a,_e2,a0,_t2,_attrs2), THEN start1 = start2.

Constraint 28 (unique-wasEndedBy)

IF wasEndedBy(end1; a,_e1,a0,_t1,_attrs1) and wasEndedBy(end2; a,_e2,a0,_t2,_attrs2), THEN end1 = end2.

An activity start event is the instantaneous event that marks the instant an activity starts. It allows for an optional time attribute. Activities also allow for an optional start time attribute. If both are specified, they must be the same, as expressed by the following constraint.

Constraint 29 (unique-startTime)

IF activity(a,t1,t2,-) and wasStartedBy(id,a,e,t,-)activity(a2,t1,_t2,_attrs) and wasStartedBy(_start; a2,_e,_a1,t,_attrs), THEN t1=t=t1.

An activity end event is the instantaneous event that marks the instant an activity ends. It allows for an optional time attribute. Activities also allow for an optional end time attribute. If both are specified, they must be the same, as expressed by the following constraint.

Constraint 30 (unique-endTime)

IF activity(a,t1,t2,-) and wasEndedBy(id,a,e,t,-)activity(a2,_t1,t2,_attrs) and wasEndedBy(_end; a2,_e,_a1,t,_attrs1), THEN t2 = t .

Note: The following constraint is associated with a feature "at risk" and may be removed from this specification based on feedback. Please send feedback to public-prov-comments@w3.org.

An entity can be the subject of at most one mention relation.

Constraint 31 (unique-mention)

IF mentionOf(e, e1, b1) and mentionOf(e, e2, b2), THEN e1= t2e2 and b1=b2.

5.2 Event Ordering Constraints

Given that provenance consists of a description of past entities and activities, valid provenance instances must satisfy ordering constraints between instantaneous events, which we introduceare introduced in this section. For instance, an entity can only be used after it was generated; hence, we say thatin other words, an entity's generation event precedes any of this entity's usage events. Should this ordering constraint be violated, the associated generation and usage couldwould not be credible. The rest of this section defines the temporal interpretation of provenance instances as a set of instantaneous event ordering constraints.

To allow for minimalistic clock assumptions, like Lamport [CLOCK], PROV-DMPROV relies on a notion of relative ordering of instantaneous events, without using physical clocks. This specification assumes that a partial orderpreorder exists between instantaneous events.

Specifically, precedes is a partial order preorder between instantaneous events. When we sayA constraint of the form e1 precedes e2, this means that either the two events are equale1 happened at the same time as or e1 happened before e2. For symmetry, follows is defined as the inverse of precedes; that is, whena constraint of the form e1 follows e2 means that e1 happened at the same time as or after e2. Both relations are preorders, meaning that they are reflexive and transitive. Moreover, we sometimes consider strict forms of these orders: we say e1 followsstrictly precedes e2 to indicate that e1 happened before e2, this means that either the two events are equal or e1 happened after e2. Both relations are partial orders, meaning that they are reflexive, transitive, and antisymmetric. but not at the same time. This is a transitive relation.

Do we want

PROV also allows for time observations to allow an event to "precede" itself? Perhaps precedes should be strict. The following discussion is unclear: what is being said here, and why? PROV-DM also allowsinserted in specific provenance statements, for time observations to be inserted in specific provenance statements, for each of the five kinds of instantaneous events introduced in this specification. Times in provenance records arising from different sources might be with respect to different timelines (e.g. different time zones) leading to apparent inconsistencies. For the purpose of checking ordering constraints, the times associated with events are irrelevant; thus, there is no inference that time ordering implies event ordering, or vice versa. However, an application may flag time values that appear inconsistent with the event ordering as possible inconsistencies. When generating provenance, an application should use a consistent imeline for related PROV statements within an instance.

A typical ordering constraint is as follows.

Constraint-example NNN (ordering-example)

IF hyp₁ and ... and hyp_n THEN evt1 precedes/strictly precedes evt2.

The presence conclusion of an ordering constraint is either precedes or strictly precedes. One way to check ordering constraints is to generate all precedes and strictly precedes relationships arising from the ordering constraints to form a time observation fordirected graph, with edges marked precedes or strictly precedes, and check that there is no cycle containing a given instantaneous event fixes the mapping of this instantaneous event to the timeline. The presence of time information in a provenance statement instantiates the ordering constraint with that time information. It is expected that such instantiated constraints can help corroborate provenance information. We anticipate that verification algorithms could be developed, though this verification is outside the scope of this specification. strictly precedes edge.

The following figure summarizes the ordering constraints in a graphical manner. For each subfigure, an event time line points to the right. Activities are represented by rectangles, whereas entities are represented by circles. Usage, generation and derivation are represented by the corresponding edges between entities and activities. The five kinds of instantaneous events are represented by vertical dotted lines (adjacent to the vertical sides of an activity's rectangle, or intersecting usage and generation edges). The ordering constraints are represented by triangles: an occurrence of a triangle between two instantaneous event vertical dotted lines represents that the event denoted by the left line precedes the event denoted by the right line. Summary of instantaneous event ordering constraints for activities

5.2.1 Activity constraints

In thisThis section we discussspecifies ordering constraints from the perspective of the lifetime of an activity. An activity starts, then during its lifetime uses, generates entities and communicatescan use, generate or invalidate entities, communicate with, start, or end other activities, or be associated with or starts other activities, agents, and finally it ends. The following constraints amount to checking that all of the events associated with an activity take place within the activity's lifetime, and the start and end events mark the start and endpoints of its lifetime.

Figure 3 summarizes the ordering constraints on activities in a graphical manner. For this and subsequent figures, an event time line points to the right. Activities are represented by rectangles, whereas entities are represented by circles. Usage, generation and invalidation are represented by the corresponding edges between entities and activities. The five kinds of instantaneous events are represented by vertical dotted lines (adjacent to the vertical sides of an activity's rectangle, or intersecting usage and generation edges). The ordering constraints are represented by triangles: an occurrence of a triangle between two instantaneous event vertical dotted lines represents that the event denoted by the left line precedes the event denoted by the right line.

Miscellaneous suggestions about figures (originally from Tim Lebo):

I think it would help if the "corresponding edges between entities and activities" where the same visual style as the vertical line marking the time the Usage, generation and derivation occurred. A matching visual style provides a Gestalt that matches the concept. I am looking at subfigures b and c in 5.2.

Figure 3^◊: Summary of instantaneous event ordering constraints for activities

The existence of an activity implies that the activity start event always precedes the corresponding activity end event. This is illustrated by Subfigure ordering-activity-fig Figure 3 (a) and expressed by constraint start-precedes-endConstraint 32 (start-precedes-end).

Constraint 32 (start-precedes-end)

IF wasStartedBy(start,a,-,-)wasStartedBy(start; a,_e1,_a1,_t1,_attrs1) and wasEndedBy(end,a,-,-)wasEndedBy(end; a,_e2,_a2,_t2,_attrs2) THEN start precedes end.

If an activity is started by more than one activity, the events must all be simultaneous. The following constraint requires that if there are two start events that start the same activity, then one precedes end the other. Using this constraint in both directions means that each event precedes the other.

Constraint 33 (start-start-ordering)

IF wasStartedBy(start1; a,_e1,_a1,_t1,_attrs1) and wasStartedBy(start2; a,_e2,_a2,_t2,_attrs2) THEN start1 precedes start2.

If an activity is ended by more than one activity, the events must all be simultaneous. The following constraint requires that if there are two end events that end the same activity, then one precedes the other. Using this constraint in both directions means that each event precedes the other, that is, they are simultaneous.

Constraint 34 (end-end-ordering)

IF wasEndedBy(end1; a,_e2,_a1,_t1,_attrs1) and wasEndedBy(end2; a,_e2,_a2,_t2,_attrs2) THEN end1 precedes end2.

A usage implies ordering of events, since the usage event had to occur during the associated activity. This is illustrated by Subfigure ordering-activity-figFigure 3 (b) and expressed by constraint usage-within-activityConstraint 35 (usage-within-activity).

Constraint 35 (usage-within-activity)

IF used(use,a,e,-,-) and wasStartedBy(start,a,-,-)wasStartedBy(start; a,_e1,_a1,_t1,_attrs1) and used(use; a,_e2,_t2,_attrs2) THEN start precedes use.
IF used(use,a,e,-,-)used(use; a,_e1,_t1,_attrs1) and wasEndedBy(end,a,-,-)wasEndedBy(end; a,_e2,_a2,_t2,_attrs2) THEN use precedes end.

A generation implies ordering of events, since the generation event had to occur during the associated activity. This is illustrated by Figure 3 (c) and expressed by Constraint 36 (generation-within-activity).

Constraint 36 (generation-within-activity)

IF wasStartedBy(start; a,_e1,_a1,_t1,_attrs1) and wasGeneratedBy(gen; _e2,a,_t2,_attrs2) THEN start precedes gen.
IF wasGeneratedBy(gen; _e,a,_t,_attrs) and wasEndedBy(end; a,_e1,_a1,_t1,_attrs1) THEN gen precedes end.

Communication between two activities a1 and a2 also implies ordering of events, since some entity must have been generated by the former and used by the latter, which implies that the start event of a1 cannot follow the end event of a2. This is illustrated by Figure 3 (d) and expressed by Constraint 37 (wasInformedBy-ordering).

Constraint 37 (wasInformedBy-ordering)

IF wasInformedBy(_id; a2,a1,_attrs) and wasStartedBy(start; a1,_e1,_a1',_t1,_attrs1) and wasEndedBy(end; a2,_e2,_a2',_t2,_attrs2) THEN start precedes end.

5.2.2 Entity constraints

The figure(s) in this section should have vertical lines with visual styles that match the diagonal arrow that they go with.

As with activities, entities have lifetimes: they are generated, then can be used, other entities can be derived from them, and finally they can be invalidated. The constraints on these events are illustrated graphically in Figure 4 and Figure 5.

Figure 4^◊: Summary of instantaneous event ordering constraints for entities

Generation of an entity precedes its invalidation. (This follows from other constraints if the entity is used, but it is stated explicitly here to cover the case of an entity that is generated and invalidated without being used.)

Constraint 38 (generation-precedes-invalidation)

IF wasGeneratedBy(gen; e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv; e,_a2,_t2,_attrs2) THEN gen precedes inv.

A usage and a generation for a given entity implies ordering of events, since the generation event had to precede the usage event. This is illustrated by Figure 4(a) and expressed by Constraint 39 (generation-precedes-usage).

Constraint 39 (generation-precedes-usage)

IF wasGeneratedBy(gen; e,_a1,_t1,_attrs1) and used(use; _a2,e,_t2,_attrs2) THEN gen precedes use.

All usages of an entity precede its invalidation, which is captured by Constraint 40 (usage-precedes-invalidation) (without any explicit graphical representation).

Constraint 40 (usage-precedes-invalidation)

IF used(use; _a1,e,_t1,_attrs1) and wasInvalidatedBy(inv; e,_a2,_t2,_attrs2) THEN use precedes inv.

If an entity is generated by more than one activity, the events must all be simultaneous. The following constraint requires that if there are two generation events that generate the same entity, then one precedes the other. Using this constraint in both directions means that each event precedes the other.

Constraint 41 (generation-generation-ordering)

A generation implies ordering of events, since the generation event had to occur during the associated activity. This is illustrated by Subfigure ordering-activity-fig (c) and expressed by constraint generation-within-activity. IF wasGeneratedBy(gen,a,e,-,-)wasGeneratedBy(gen1; e,_a1,_t1,_attrs1) and wasStartedBy(start,a,-,-)wasGeneratedBy(gen2; a,_a2,_t2,_attrs2) THEN startgen1 precedes gengen2. IF wasGeneratedBy(gen,a,e,-,-) and wasEndedBy(end,a,-,-) THEN gen precedes end. Communication between two activities a1 and a2 also implies ordering of events, since some entity must have been generated by the former and used by the latter, which implies that the start event of a1 cannot follow the end event of a2. This is illustrated by Subfigure ordering-activity-fig (d) and expressed by constraint wasInformedBy-ordering.

IF wasInformedBy(a2,a1) and wasStartedBy(start,a1,-,-) and wasEndedBy(end,a2,-,-) THEN start precedes end.

If an entity is invalidated by more than one activity, the events must all be simultaneous. The following constraint requires that if there are two invalidation events that invalidate the same entity, then one precedes the other. Using this constraint in both directions means that each event precedes the other, that is, they are simultaneous.

Constraint 42 (invalidation-invalidation-ordering)

Start IF wasInvalidatedBy(inv1; a,_e2,_a1,_t1,_attrs1) and wasInvalidatedBy(inv2; a,_e2,_a2,_t2,_attrs2) THEN inv1 precedes inv2.

If there is a derivation relationship linking e2 and e1, then this means that the entity e1 had some influence on the entity e2; for this to be possible, some event ordering must be satisfied. First, we consider derivations, where the activity and usage are known. In that case, the usage of a2e1 has to precede the generation of e2. This is illustrated by activity a1 also implies ordering of events, since a1 must have been active before a2 started. This is illustrated by Subfigure ordering-activity-fig (e)Figure 4 (b) and expressed by constraint wasStartedByActivity-ordering Constraint 43 (derivation-usage-generation-ordering).

Constraint 43 (derivation-usage-generation-ordering)

In this constraint, _a, gen2, use1 must not be placeholders.

IF wasStartedByActivity(a2,a1) and wasStartedBy(start1,a1,-,-) and wasStartedBy(start2,a2,-,-)wasDerivedFrom(_d; _e2,_e1,_a,gen2,use1,_attrs) THEN start1use1 precedes start2gen2.

5.2.2 Entity constraints

As with activities, entities have lifetimes: they are generated, then can be used, revised, or other entities can be derived from them, and finally are invalidated.

Summary of instantaneous event ordering constraints for entities

Generation of an entity precedes

When the activity, generation or usage is unknown, a similar constraint exists, except that the constraint refers to its invalidation. (This follows from other constraints if the entity is used, but we state it explicitly to cover the case of an entity that is generated and invalidated without being used.) generation event, as illustrated by Figure 4 (c) and expressed by Constraint 44 (derivation-generation-generation-ordering).

Constraint 44 (derivation-generation-generation-ordering)

In this constraint, any _a, _g, _u may be placeholders.

IF wasGeneratedBy(gen,e,_,_) and wasInvalidatedBy(inv,e,-,-) wasDerivedFrom(_d; e2,e1,_a,_g,_u,attrs) and wasGeneratedBy(gen1; e1,_a1,_t1,_attrs1) and wasGeneratedBy(gen2; e2,_a2,_t2,_attrs2) THEN gengen1 strictly precedes inv. A usage and a generation for a given entity implies ordering of events, since the generation event had to precede the usage event. This is illustrated by Subfigure ordering-entity-fig (a) and expressed by constraint generation-precedes-usage. IF wasGeneratedBy(gen,e,_,_) and used(use,_,e,-) THEN gen precedes usegen2.

This constraint requires the derived entity to be generated strictly following the generation of the original entity. This follows from the [PROV-DM] definition of derivation: A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity, thus the derived entity must be newer than the original entity.

The event ordering is between generations of e1 and e2, as opposed to derivation where usage is known, which implies ordering between the usage of e1 and generation of e2.

All usages

The entity that triggered the start of an entity precede its invalidation, whichactivity must exist before the activity starts. This is captured illustrated by constraint usage-precedes-invalidation (without any explicit graphical representation). IF used(use,_,e,-) and wasInvalidatedBy(inv,e,_,_) THEN use precedes inv. If there is a derivation between e2 and e1, then this means that the entity e1 had some form of influence on the entity e2; for this to be possible, some event ordering must be satisfied. First, we consider derivations, where the activity and usage are known. In that case, the usage of e1 has to precede the generation of e2. This is illustrated by Subfigure ordering-entity-fig (b)Figure 5(a) and expressed by constraint derivation-usage-generation-ordering. IF wasDerivedFrom(d,e2,e1,a,g2,u1,-) THEN u1 precedes g2. When the usage is unknown, a similar constraint exists, except that the constraint refers to its generation event, as illustrated by Subfigure ordering-entity-fig (c) and expressed by constraint derivation-generation-generation-ordering. IF wasDerivedFrom(e2,e1,attrs) and wasGeneratedBy(gen1,e1,_,_) and wasGeneratedBy(gen2,e2,_,_) THEN gen1 precedes gen2. Note that event ordering is between generations of e1 and e2, as opposed to derivation where usage is known, which implies ordering ordering between the usage of e1 and generation of e2. The entity that triggered the start of an activity must exist before the activity starts. This is illustrated by Subfigure ordering-entity-trigger-fig (a) and expressed by constraint wasStartedBy-orderingConstraint 45 (wasStartedBy-ordering).

Constraint 45 (wasStartedBy-ordering)

IF wasStartedBy(start,a,e,-)wasGeneratedBy(gen; e,_a1,_t1,_attrs1) and wasGeneratedBy(gen,e,-,-)wasStartedBy(start; _a,e,_a2,_t2,_attrs2) THEN gen precedes start.
IF wasStartedBy(start,a,e,-)wasStartedBy(start; _a,e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv,e,-,-)wasInvalidatedBy(inv; e,_a2,_t2,_attrs2) THEN start precedes inv.

Similarly, the entity that triggered the end of an activity must exist before the activity ends, as illustrated by Subfigure ordering-entity-trigger-fig Figure 5(b).

Constraint 46 (wasEndedBy-ordering)

IF wasEndedBy(end,a,e,-)wasGeneratedBy(gen; e,_a1,_t1,_attrs1) and wasGeneratedBy(gen,e,-,-) wasEndedBy(end; _a,e,_a2,_t2,_attrs2) THEN gen precedes end.
IF wasEndedBy(end,a,e,-)wasEndedBy(end; _a,e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv,e,-,-)wasInvalidatedBy(inv; e,_a2,_t2,_attrs2) THEN end precedes inv.

Figure 5^◊: Summary of instantaneous event ordering constraints for trigger entities

If an entity is a specialization of another, then the more specific entity must have been generated after the less specific entity was generated.

Constraint 47 (specialization-generation-ordering)

IF specializationOf(e2,e1) and wasGeneratedBy(gen1; e1,_a1,_t1,_attrs1) and wasGeneratedBy(gen2; e2,_a2,_t2,_attrs2) THEN gen1 precedes inv gen2.

Summary

Similarly, if an entity is a specialization of instantaneousanother entity, and then the invalidation event ordering constraints for trigger entities of the more specific entity precedes that of the less specific entity.

Constraint 48 (specialization-invalidation-ordering)

IF specializationOf(e1,e2) and wasInvalidatedBy(inv1; e1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv2; e2,_a2,_t2,_attrs2) THEN inv1 precedes inv2.

5.2.3 Agent constraints

Like entities and activities, agents have lifetimes that follow a familiar pattern:pattern. An agent that is also an entity can be generated and invalidated; an agent that is generated,also an activity can be started or ended. During its lifetime, an agent can participate in interactions such as starting,starting or ending other activities, association with an activity, attribution, or association with an activity, attribution, or delegation, and finally the agent is invalidated. delegation.

Further constraints associated with agents appear in Figure ordering-agents6 and are discussed below.

Figure 6^◊: Summary of instantaneous event ordering constraints (continued) for agents

An activity that was associated with an agent must have some overlap with the agent. The agent must have been generated (or started), or must have become associated with the activity, after the activity start: so, the agent must exist before the activity end. Likewise, the agent may be generated,destructed (or ended), or may only become associatedterminate its association with the activity, before the activity end: hence, the agent invalidation (or end) is required to happen after the activity start: so, the agentstart. This is required to exist before the activity end. Likewise, the agent may be destructed, or may terminate its association with the activity, before the activity end: hence, the agent invalidation is required to happen after the activity start. This is illustrated by Subfigure ordering-agentsFigure 6 (a) and expressed by constraint wasAssociatedWith-orderingConstraint 49 (wasAssociatedWith-ordering).

Constraint 49 (wasAssociatedWith-ordering)

In the following inferences, _pl may be a placeholder -.

IF wasAssociatedWith(a,ag)wasAssociatedWith(_assoc; a,ag,_pl,_attrs) and wasStartedBy(start,a,-,-)wasStartedBy(start1; a,_e1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv,ag,-,-)wasInvalidatedBy(inv2; ag,_a2,_t2,_attrs2) THEN startstart1 precedes invinv2.
IF wasAssociatedWith(a,ag)wasAssociatedWith(_assoc; a,ag,_pl,_attrs) and wasGeneratedBy(gen,ag,-,-)wasGeneratedBy(gen1; ag,_a1,_t1,_attrs1) and wasEndedBy(end,a,-,-)wasEndedBy(end2; a,_e2,_a2,_t2,_attrs2) THEN gengen1 precedes end. An entity that was attributed to an agent must have some overlap with the agent. The agent is required to exist before the entity invalidation. Likewise, the entity generation must precede the agent destruction. This is illustrated by Subfigure ordering-agents (b) and expressed by constraint wasAttributedTo-ordering. IF wasAttributedTo(e,ag) and wasGeneratedBy(gen,e,-,-) and wasInvalidatedBy(inv,ag,-,-) THEN gen precedes invend2.
IF wasAttributedTo(e,ag)wasAssociatedWith(_assoc; a,ag,_pl,_attrs) and wasGeneratedBy(gen,ag,-,-)wasStartedBy(start1; a,_e1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv,e,-,-)wasEndedBy(end2; ag,_e2,_a2,_t2,_attrs2) THEN genstart1 precedes invend2.
IF wasAssociatedWith(_assoc; a,ag,_pl,_attrs) and wasStartedBy(start1; ag,_e1,_a1,_t1,_attrs1) and wasEndedBy(end2; a,_e2,_a2,_t2,_attrs2) THEN start1 precedes end2.

An agent to which an entity was attributed, must exist before this entity was generated. This is illustrated by Figure 6 (b) and expressed by Constraint 50 (wasAttributedTo-ordering).

Constraint 50 (wasAttributedTo-ordering)

IF wasAttributedTo(_at; e,ag,_attrs) and wasGeneratedBy(gen1; ag,_a1,_t1,_attrs1) and wasGeneratedBy(gen2; e,_a2,_t2,_attrs2) THEN gen1 precedes gen2.
IF wasAttributedTo(_at; e,ag,_attrs) and wasStartedBy(start1; ag,_e1,_a1,_t1,_attrs1) and wasGeneratedBy(gen2; e,_a2,_t2,_attrs2) THEN start1 precedes gen2.

For responsibility,delegation, two agents need to have some overlap in their lifetime.

Constraint 51 (actedOnBehalfOf-ordering)

IF actedOnBehalfOf(ag2,ag1)actedOnBehalfOf(_del; ag2,ag1,_a,_attrs) and wasGeneratedBy(gen,ag1,-,-)wasGeneratedBy(gen1; ag1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv,ag2,-,-)wasInvalidatedBy(inv2; ag2,_a2,_t2,_attrs2) THEN gengen1 precedes invinv2.
IF actedOnBehalfOf(_del; ag2,ag1,_a,_attrs) and wasStartedBy(start1; ag1,_e1,_a1,_t1,_attrs1) and wasEndedBy(end2; ag2,_e2,_a2,_t2,_attrs2) THEN start1 precedes end2.

5.3 Type Constraints

The following rule establishes types denoted by identifiers from their use within expressions. The function typeOf gives the set of types denoted by an identifier. That is, typeOf(e) returns the set of types associated with identifier e. The function typeOf is not a term of PROV, but a construct introduced to validate PROV statements.

For any identifier id, typeOf(id) is a subset of {'entity', 'activity', 'agent', 'prov:Collection', 'prov:EmptyCollection'}. For identifiers that do not have a type, typeOf gives the empty set. Identifiers can have more than one type, because of subtyping (e.g. 'prov:EmptyCollection' is a subtype of 'prov:Collection') or because certain types are not disjoint (such as 'agent' and 'entity'). The set of types does not reflect all of the distinctions among objects, only those relevant for checking validity. In particular, subtypes such as 'plan' and 'bundle' are omitted, and statements such as wasAssociatedWith and mentionOf that have plan or bundle parameters only check that these parameters are entities.

To check if a PROV instance satisfies type constraints, one obtains the types of identifiers by application of Constraint 52 (typing) and check that none of the impossibility constraints Constraint 57 (entity-activity-disjoint) and Constraint 58 (membership-empty-collection) are violated as a result.

Constraint 52 (typing)

IF entity(e,attrs) THEN 'entity' ∈ typeOf(e).
IF agent(ag,attrs) THEN 'agent' ∈ typeOf(ag).
IF activity(a,attrs) THEN 'activity' ∈ typeOf(a).
IF used(u; a,e,t,attrs) THEN 'activity' ∈ typeOf(a) AND 'entity' ∈ typeOf(e).
IF wasGeneratedBy(gen; e,a,t,attrs) THEN 'entity' ∈ typeOf(e) AND 'activity' ∈ typeOf(a).
IF wasInformedBy(id; a2,a1,attrs) THEN 'activity' ∈ typeOf(a2) AND 'activity' ∈ typeOf(a1).
IF wasStartedBy(id; a2,e,a1,t,attrs) THEN 'activity' ∈ typeOf(a2) AND 'entity' ∈ typeOf(e) AND 'activity' ∈ typeOf(a1).
IF wasEndedBy(id; a2,e,a1,t,attrs) THEN 'activity' ∈ typeOf(a2) AND 'entity' ∈ typeOf(e) AND 'activity' ∈ typeOf(a1).
IF wasInvalidatedBy(id; e,a,t,attrs) THEN 'entity' ∈ typeOf(e) AND 'activity' ∈ typeOf(a).
IF wasDerivedFrom(id; e2, e1, a, g2, u1, attrs) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1) AND 'activity' ∈ typeOf(a). In this constraint, a, g2, and u1 must not be placeholders.
IF wasDerivedFrom(id; e2, e1, -, -, -, attrs) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1).
IF wasAttributedTo(id; e,ag,attr) THEN 'entity' ∈ typeOf(e) AND 'agent' ∈ typeOf(ag).
IF wasAssociatedWith(id; a,ag,pl,attrs) THEN 'activity' ∈ typeOf(a) AND 'agent' ∈ typeOf(ag) AND 'entity' ∈ typeOf(pl). In this constraint, pl must not be a placeholder.
IF wasAssociatedWith(id; a,ag,-,attrs) THEN 'activity' ∈ typeOf(a) AND 'agent' ∈ typeOf(ag).
IF actedOnBehalfOf(id; ag2,ag1,a,attrs) THEN 'agent' ∈ typeOf(ag2) AND 'agent' ∈ typeOf(ag1) AND 'activity' ∈ typeOf(a).
IF alternateOf(e2, e1) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1).
IF specializationOf(e2, e1) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1).
IF mentionOf(e2,e1,b) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1) AND 'entity' ∈ typeOf(b).
IF hadMember(c,e) THEN 'prov:Collection' ∈ typeOf(c) AND 'entity' ∈ typeOf(c) AND 'entity' ∈ typeOf(e).
IF entity(c,[prov:type='prov:EmptyCollection']) THEN 'entity' ∈ typeOf(c) AND 'prov:Collection' ∈ typeOf(c)AND 'prov:EmptyCollection' ∈ typeOf(c).

5.4 Impossibility constraints

Impossibility constraints require that certain patterns of statements never appear in valid PROV instances. Impossibility constraints have the following general form:

Constraint-example NNN (impossible-example)

IF hyp₁ and ... and hyp_n THEN INVALID.

Checking an impossibility constraint on instance I means checking whether there is any way of matching the pattern hyp₁, ..., hyp_n. If there is, then checking the constraint on I fails (which implies that I is invalid).

A derivation with unspecified activity wasDerivedFrom(id;e1,e2,-,g,u,attrs) represents a derivation that takes one or more steps, whose activity, generation and use events are unspecified. It is forbidden to specify a generation or use event without specifying the activity.

Constraint 53 (impossible-unspecified-derivation-generation-use)

In the following rules, g and u must not be -.

IF wasDerivedFrom(_id;_e2,_e1,-,g,-,attrs) THEN INVALID.
IF wasDerivedFrom(_id;_e2,_e1,-,-,u,attrs) THEN INVALID.
IF wasDerivedFrom(_id;_e2,_e1,-,g,u,attrs) THEN INVALID.

As noted previously, specialization is a strict partial order: it is irreflexive and transitive.

Constraint 54 (impossible-specialization-reflexive)

IF specializationOf(e,e) THEN INVALID.

Furthermore, identifiers of basic relationships are disjoint.

Constraint 55 (impossible-property-overlap)

For each r and s in { used, wasGeneratedBy, wasInvalidatedBy, wasStartedBy, wasEndedBy, wasInformedBy, wasAttributedTo, wasAssociatedWith, actedOnBehalfOf} such that r and s are different relation names, the following constraint holds:

IF r(id; a₁,...,a_n) and s(id; b₁,...,b_n) THEN INVALID.

Since wasInfluencedBy is a superproperty of many other properties, it is excluded from the set of properties whose identifiers are required to be pairwise disjoint. The following example illustrates this observation:

wasInfluencedBy(id;e2,e1)
wasDerivedFrom(id;e2,e1)

This satisfies the disjointness constraint.

There is, however, no constraint requiring that every influence relationship is accompanied by a more specific relationship having the same identifier. The following valid example illustrates this observation:

wasInfluencedBy(id; e2,e1)

This is valid; there is no inferrable information about what kind of influence relates e2 and e1, other than its identity.

Identifiers of entities, agents and activities cannot also be identifiers of properties.

Constraint 56 (impossible-object-property-overlap)

For each p in {entity, activity or agent} and for each r in { used, wasGeneratedBy, wasInvalidatedBy, wasInfluencedBy, wasStartedBy, wasEndedBy, wasInformedBy, wasDerivedFrom, wasAttributedTo, wasAssociatedWith, actedOnBehalfOf}, the following impossibility constraint holds:

IF p(id,a₁,...,a_n) and r(id; b₁,...,b_n) THEN INVALID.

The set of entities and activities are disjoint, expressed by the following constraint:

Constraint 57 (entity-activity-disjoint)

IF 'entity' ∈ typeOf(id) AND 'activity' ∈ typeOf(id) THEN INVALID.

There is no disjointness between entities and agents. This is because one might want to make statements about the provenance of an agent, by making it an entity. For example, one can assert both entity(a1) and agent(a1) in a valid PROV instance. Similarly, there is no disjointness between activities and agents, and one can assert both activity(a1) and agent(a1) in a valid PROV instance. However, one should keep in mind that some specific types of agents may not be suitable as activities. For example, asserting statements such as agent(Bob, [type=prov:Person]) and activity(Bob) is discouraged. In these cases, disjointness can be ensured by explicitly asserting the agent as both agent and entity, and applying Constraint 57 (entity-activity-disjoint).

An empty collection cannot contain any member, expressed by the following constraint:

Constraint 58 (membership-empty-collection)

IF hasMember(c,e) and 'prov:EmptyCollection' ∈ typeOf(c) THEN INVALID.

6. Collection ConstraintsNormalization, Validity, and Equivalence

Work on collections and on

We define the notions of normalization, validity and equivalence of PROV documents and instances. We first define these constraints is deferred until after the next working draft, so this section may not be stable. Membership is a convenience notation, since it can be expressed in terms of an insertion into some collection. The membership definition is formalized by constraint membership-as-insertion.concepts for PROV instances and then extend them to PROV documents.

memberOf(c, {(k1, v1), ...}) holds IF AND ONLY IF there exists

6.1 Instances

Implementations should decide up front what reasoning about co-reference should be applied, and rewrite the instance (by replacing co-referent identifiers with a collection c0, suchsingle common identifier) to make this explicit, before doing validation, equivalence checking, or normalization. All of the following definitions assume that derivedByInsertionFrom(c, c0, {(k1, v1), ...}). the application has already determined which URIs in the PROV instance are co-referent (e.g. owl:sameAs as a result of OWL reasoning).

We define the normal form of a PROV instance as the set of provenance statements resulting from applying all definitions, inferences, and uniqueness constraints.

Apply all definitions to I by replacing each defined statement by its definition (possibly introducing fresh existential variables in the process), yielding an instance I₁.
Apply all inferences to I₁ by adding the conclusion of each inference whose hypotheses are satisfied and whose entire conclusion does not already hold (again, possibly introducing fresh existential variables), yielding an instance I₂.
Apply all uniqueness constraints to I₂ by merging terms or statements and applying the resulting substitution to the instance, yielding an instance I₃. If some uniqueness constraint cannot be applied, then normalization fails.
If no definitions, inferences, or uniqueness constraints can be applied to instance I₃, then I₃ is the normal form of I.
Otherwise, the normal form of I is the same as the normal form of I₃ (that is, proceed by recursively normalizing I₃).

Because of the potential interaction among definitions, inferences, and constraints, the above algorithm is recursive. Nevertheless, all of our constraints fall into a class of tuple-generating dependencies and equality-generating dependencies that satisfy a termination condition called weak acyclicity that has been studied in the context of relational databases [DBCONSTRAINTS]. Therefore, the above algorithm terminates, independently of the order in which inferences and constraints are applied. Appendix C gives a proof that normalization terminates and produces a unique (up to isomorphism) normal form.

A collection mayPROV instance is valid if its normal form exists and satisfies all of the validity constraints; this implies that the instance satisfies all of the definitions, inferences, and constraints. The following algorithm can be used to test validity:

Normalize the instance I, obtaining normalized instance I'. If normalization fails, then I is not valid.
Apply all event ordering constraints to I' to build a graph G whose nodes are event identifiers and edges are labeled by "precedes" and "strictly precedes" relationships among events induced by the constraints.
Determine whether there is a cycle in G that contains a "strictly precedes" edge. If so, then I is not valid.
Apply the type constraints (section 5.3) to determine whether there are any violations of disjointness. If so, then I is not valid.
Check that none of the impossibility constraints (section 5.4) are violated. If any are violated, then I is not valid. Otherwise, I is valid

A normal form of a PROV instance does not exist when a uniqueness constraint fails due to merging failure.

Two PROV instances are equivalent if they have isomorphic normal forms (that is, after applying all possible inference rules, the two instances produce the same set of PROV statements, up to reordering of statements and attributes within attribute lists, and renaming of existential variables). Equivalence has the following characteristics:

The order of provenance statements is irrelevant to the meaning of a PROV instance. That is, a PROV instance is equivalent to any other instance obtained by insertion reordering its statements.
The order of attribute-value pairs in attribute lists is irrelevant to the meaning of a PROV statement. That is, a PROV statement carrying attributes is equivalent to any other statement obtained by reordering attribute-value pairs and eliminating duplicate pairs.
The particular choices of names of existential variables are irrelevant to the meaning of an instance; that is, the names can be renamed without changing the meaning, as long as different names are always replaced with different names. (Replacing two different names with equal names, however, can change the meaning, so does not preserve equivalence.)
Applying inference rules, definitions, and uniqueness constraints preserves equivalence. That is, a PROV instance is equivalent to the instance obtained by applying any inference rule or removal,definition, or saidby merging two statements to satisfy the membership relation. To provide an interpretation enforce a uniqueness constraint.
Equivalence is reflexive, symmetric, and transitive.

An application that processes PROV data should handle equivalent instances in the same way. (Common exceptions to this rule include, for example, pretty-printers that seek to preserve the original order of collections, PROV-DM restricts one collection to be involvedstatements in a single derivation by insertion or removal, or to one membership relation. PROV-DM does not provide an interpretation for statements that consist of two (or more) insertion, removal, membership relations that result in the same collection.file and avoid expanding inferences.)

The following constraint ensures unique derivation. The following constraint is unclear. A collection must not be derived through multiple insertions, removal, or membership relations.

Consider the following statements about three collections. entity(c1, [prov:type="prov:Collection" %% xsd:QName]) entity(c2, [prov:type="prov:Collection" %% xsd:QName]) entity(c3, [prov:type="prov:Collection" %% xsd:QName]) derivedByInsertionFrom(c3, c1, {("k1", e1), ("k2", e2)}) derivedByInsertionFrom(c3, c2, {("k3", e3)}) There is no interpretation for such statements since c3 is derived multiple times by insertion. As a particular case, collection c is derived multiple times from the same c1. derivedByInsertionFrom(id1, c, c1, {("k1", e1), ("k2", e2)}) derivedByInsertionFrom(id2, c, c1, {("k3", e3), ("k4", e4)}) The interpretation of such statements is also unspecified. To describe the insertion of the 4 key-entity pairs, one would instead write: derivedByInsertionFrom(id1, c, c1, {("k1", e1), ("k2", e2), ("k3", e3), ("k4", e4)}) The same is true for any combination of insertions, removals, and membership relations: The following statements derivedByInsertionFrom(c, c1, {("k1", e1)}) derivedByRemovalFrom(c, c2, {"k2"}) have no interpretation. Nor have the following: derivedByInsertionFrom(c, c1, {("k1", e1)}) memberOf(c, {"k2"}). 6.1 Collection branching It is allowed to have multiple derivations from a single root collection, as long as the resulting entities are distinct, as shown in the following example. entity(c0, [prov:type="prov:EmptyCollection" %% xsd:QName]) // c0 is an empty collection entity(c1, [prov:type="prov:Collection" %% xsd:QName]) entity(c2, [prov:type="prov:Collection" %% xsd:QName]) entity(c3, [prov:type="prov:Collection" %% xsd:QName]) entity(e1) entity(e2) entity(e3) derivedByInsertionFrom(c1, c0, {("k1", e1)}) derivedByInsertionFrom(c2, c0, {("k2", e2)}) derivedByInsertionFrom(c3, c1, {("k3", e3)}) From this set of statements, we conclude: c1 = { ("k1", e1) } c2 = { ("k2", e2) } c3 = { ("k1", e1), ("k3", e3)}

6.2 Collections and Weaker Derivation RelationBundles and Documents

The statedefinitions, inferences, and constraints, and the resulting notions of normalization, validity and equivalence, assume a collectionPROV document that consists only of a toplevel instance, containing all PROV statements in the top level of the document (that is, not enclosed in a named bundle). In this section, we describe how to deal with general PROV documents, possibly including multiple named bundles. Briefly, each bundle is only known to the extent that a chain handled independently; there is no interaction between bundles from the perspective of derivations starting from an empty collection can be found. Since a set of statements regarding a collection's evolution may be incomplete, so is the reconstructed state obtained by querying those statements. In general, all statements reflect partial knowledge regarding a sequence of data transformation events. In the particular case of collection evolution, in which some of the state changes may have been missed, the more generic derivation relation should be used to signal that some updates may have occurred, which cannot be expressed as insertionsapplying definitions, inferences, or removals. The following example illustrates this. In the example, the state of c2 is only partially known because the collection is constructed from partially known other collections. entity(c0, [prov:type="prov:EmptyCollection" %% xsd:QName]) // c0 is an empty collection entity(c1, [prov:type="prov:Collection" %% xsd:QName]) entity(c2, [prov:type="prov:Collection" %% xsd:QName]) entity(c3, [prov:type="prov:Collection" %% xsd:QName]) entity(e1) entity(e2) derivedByInsertionFrom(c1, c0, {("k1", e1)}) wasDerivedFrom(c2, c1) derivedByInsertionFrom(c3, c2, {("k2", e2)}) From this set of statements, we conclude: c1 = { ("k1", e1) } c2 is somehow derived from c1, but the precise sequence of updates is unknown c3 includes ("k2", e2) but the earlier "gap" leaves uncertainty regarding ("k1", e1) (it may have been removed)constraints, computing normal forms, or any other pair that may have been added as part of the derivation activities. Do the insertion/removal derivation steps imply wasDerivedFrom, wasVersionOf, alternateOf? 7. Account Constraints Work on accounts has been deferred until after the next working draft, so this section is very unstable PROV-DM allows for multiple descriptions of entities (and in general any identifiable object) to be expressed. Let us consider two statements about the same entity, which we have taken from two different contexts. A working draft published by the w3:Consortium: entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ]) The second version of a document edited by some authors: entity(tr:WD-prov-dm-20111215, [ prov:type="document", ex:version="2" ]) Both statements are about the same entity identified by tr:WD-prov-dm-20111215, but they contain different attributes, describing the situationchecking validity or partial state of the these entities according to the context in which they occur. Two different statements about the same entity cannot co-exist in PROV instance as formalized in entity-unique. In some cases, there may be a requirement for two different statements concerning the same entity to be included in the same account. To satisfy the constraint entity-unique, we can adopt a different identifier for one of them, and relate the two statements with the alternateOf relation. We now reconsider the same two statements of a same entity, but we change the identifier for one of them: entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ]) entity(ex:alternate-20111215, [ prov:type="document", ex:version="2" ]) alternateOf(tr:WD-prov-dm-20111215,ex:alternate-20111215) Since we are not specifying ways to take the union of two accounts, we may drop this discussion Taking the union of two accounts is another account, formed by the union of the statements they respectively contain. We note that the resulting union may or may not invalidate some constraints: Two entity statements with the same identifier but different sets of attributes exist in each PROV instance may invalidate entity-unique in the union, unless some form of statement merging or renaming (as per Example) occurs. Structurally well-formed accounts are not closed under union because the constraint generation-uniqueness may no longer be satisfied in the resulting union. How to reconcile such accounts is beyond the scope of this specification. Material transplanted from old structural well-formedness constraints section. This example isn't very clear, since the sub-workflow-ness isn't represented in the data. According to what was written above, we should conclude that a0 and a2 are equal! In the following statements, a workflow execution a0 consists of two sub-workflow executions a1 and a2. Sub-workflow execution a2 generates entity e, so does a0. activity(a0, [prov:type="workflow execution"]) activity(a1, [prov:type="workflow execution"]) activity(a2, [prov:type="workflow execution"]) wasInformedBy(a2,a1) wasGeneratedBy(e,a0) wasGeneratedBy(e,a2) So, we have two different generations for entity e. Such an example is permitted in PROV-DM if the two activities denoted by a0 and a2 are a single thing happening in the world but described from different perspectives. While this example is permitted in PROV-DM, it does not make the inter-relation between activities explicit, and it mixes statements expressed from different perspectives together. While this may acceptable in some specific applications, it becomes challenging for inter-operability. Indeed, PROV-DM does not offer any relation describing the structure of activities. Such instances are said not to be structurally well-formed.equivalence.

Structurally well-formed provenance can We model a general PROV document, containing n named bundles b₁...b_n, as a tuple (B₀,[b₁=B₁,...,b_n=B_n]) where B₀ is the set of statements of the toplevel instance, and for each i, B_i is the set of statements of bundle b_i. Names b₁...b_n are assumed to be obtained by partitioning the generations into different accounts. distinct. This makes it clear that these generations provide alternative descriptions of the same real-world generation event, rather than describing two distinct generation eventsnotation is shorthand for the same entity. When accounts are used, the example can be encoded as follows. following PROV-N syntax:

The same example is now revisited, with the following statements that are structurally well-formed. Two accounts are introduced, and there is a single generation for entity e per account.

B₀
bundle b₁
  B₁
endBundle
...
bundle b_n
  B_n
endBundle

In The normal form of a first account, entitled "summary", we find: activity(a0,t1,t2,[prov:type="workflow execution"]) wasGeneratedBy(e,a0,-) In a second account, entitled "detail", we find: activity(a1,t1,t3,[prov:type="workflow execution"]) activity(a2,t3,t2,[prov:type="workflow execution"]) wasInformedBy(a2,a1) wasGeneratedBy(e,a2,-) Structurally well-formed provenance satisfies some constraints, which force the structurePROV document (B₀,[b₁=B₁,...,[b_n=B_n]) is (B'₀,[b₁=B'₁,...,b_n=B'_n]) where B'_i is the normal form of statements to be exposed by means of accounts. With these constraints satisfied, further inferences can be made about structurally well-formed statements. The uniqueness of generations in accounts is formulated as follows. 8. RationaleB_i for inferences and constraintsThis section is non-normative. This section collects all of the explanatory material that I was not certain how to interpret as an unambiguous inference or constraint. Some of these observations may need to be folded into the explanatory text in respective sections (for example for events, accounts or collections). Editing is also needed to decrease redundancy. 8.1 Entities and Attributes When we talk about things in the world in natural language and even when we assign identifiers, we are often imprecise in ways that make it difficult to clearly and unambiguously report provenance: a resource with a URL may be understood as referring to a report available at that URL, the version of the report available there today, the report independent of where it is hosted over time, etc. However, to write precise descriptions of the provenance of things that change over time, we need ways of disambiguating which versions of things we are talking about. To describe the provenance of things that can change over time, PROV-DM uses the concept of entities with fixed attributes. From a provenance viewpoint, it is important to identify a partial state of something, i.e. something with some aspects that have been fixed, so that it becomes possible to express its provenance (i.e. what caused the thing with these specific aspects). An entity encompasses a part of a thing's history during which some of the attributes are fixed. An entity can thus be thought of as a part of a thing with some associated partial state. Attributes in PROV-DM are used to fix certain aspects of entities. An entity is a thing one wants to provide provenance for and whose situation in the world is described by some fixed attributes. An entity has a characterization interval, or lifetime, defined as the period each i between its generation event and its invalidation event0 and n. An entity's attributes are established when the entity is created and describe the entity's situation and (partial) state during an entity's lifetime. A different entity (perhaps representing a different user or system perspective) may fix other aspects of the same thing, and its provenance may be different. Different entities that are aspects of the same thing are called alternate, and the PROV-DM relations of specialization and alternate can be used to link such entities. Different users may take different perspectives on a resource with a URL. A provenance record might use one (or more) different entities to talk about different perspectives, such as: a report available at a URL: fixes the nature of the thing, i.e. a document, and its location; the version of the report available there today: fixes its version number, contents, and its date; the report independent of where it is hosted and of its content over time: fixes the nature of the thing as a conceptual artifact. The provenance of these three entities may differ, and may be along the following lines: the provenance of a report available at a URL may include: the act of publishing it and making it available at a given location, possibly under some license and access control; the provenance of the version of the report available there today may include: the authorship of the specific content, and reference to imported content; the provenance of the report independent of where it is hosted over time may include: the motivation for writing the report, the overall methodology for producing it, and the broad team involved in it. We do not assume that any entity is a better or worse description of reality than any other. That is, we do not assume an absolute ground truth with respect to which we can judge correctness or completeness of descriptions. In fact, it is possible to describe the processing that occurred for the report to be commissioned, for individual versions to be created, for those versions to be published at the given URL, etc., each via a different entity with attribute-value pairs that fix some aspects of the report appropriately. Besides entities, a variety of other PROV-DM objects have attributes, including activity, generation, usage, start, end, communication, attribution, association, responsibility, and derivation. Each object has an associated duration interval (which may be a single time point), and attribute-value pairs for a given object are expected to be descriptions that hold for the object's duration. However, the attributes of entities have special meaning because they are considered to be fixed aspects of underlying, changing things. This motivates constraints on alternateOf and specializationOf relating the attribute values of different entities. TODO: Constraints on alternateOf/specializationOf for this? TODO: Further discussion of entities moved from the old "Definitional constraints" section. Should merge with the surrounding discussion to avoid repetition. An entity is a thing one wants to provide provenance for and whose situation in the world is described by some attribute-value pairs. An entity's attribute-value pairs are established as part of the entity statement and their values remain unchanged for the lifetime of the entity. An entity's attribute-value pairs are expected to describe the entity's situation and (partial) state during an entity's characterization interval. If an entity's situation or state changes, this may result in its statement being invalid, because one or more attribute-value pairs no longer hold. In that case, from the PROV viewpoint, there exists a new entity, which needs to be given a distinct identifier, and associated with the attribute-value pairs that reflect its new situation or state. Further considerations: In order to describe the provenance of something during an interval over which relevant attributes of the thing are not fixed, it is required to create multiple entities, each with its own identifier, characterization interval, and fixed attributes, and express dependencies between the various entities using events. For example, if we want to describe the provenance of several versions of a document, involving attributes such as authorship that change over time, we need different entities for the versions linked by appropriate generation, usage, revision, and invalidation events. There is no assumption that the set of attributes is complete, nor that the attributes are independent or orthogonal of each other. There is no assumption that the attributes of an entity uniquely identify it. Two different entities that are aspects of different things can have the same attributes. A characterization interval may collapse into a single instant. 8.2 Activities TODO: Further discussion of activities moved from old "Definitional constraints and inferences" section. Edit to avoid repeating information. An activity is delimited by its start and its end events; hence, it occurs over an interval delimited by two instantaneous events. However, an activity record need not mention start or end time information, because they may not be known. An activity's attribute-value pairs are expected to describe the activity's situation during its interval, i.e. an interval between two instantaneous events, namely its start event and its end event. Further considerations: An activity is not an entity. Indeed, an entity exists in full at any point in its lifetime, persists during this interval, and preserves the characteristics that makes it identifiable. In contrast, an activity is something that occurs, happens, unfolds, or develops through time, but is typically not identifiable by the characteristics it exhibits at any point during its duration. This distinction is similar to the distinction between 'continuant' and 'occurrent' in logic [Logic]. 8.3 Description, Assertion, and Inference PROV-DM is a provenance data model designed to express descriptions of the world. A file at some point during its lifecycle, which includes multiple edits by multiple people, can be described by its type, its location in the file system, a creator, and content. The data model is designed to capture activities that happened in the past, as opposed to activities that may or will happen. However, this distinction is not formally enforced. Therefore, PROV-DM descriptions are intended to be interpreted as what has happened, as opposed to what may or will happen. This specification does not prescribe the means by which descriptions can be arrived at; for example, descriptions can be composed on the basis of observations, reasoning, or any other means. Sometimes, inferences about the world can be made from descriptions conformant to the PROV-DM data model. This specification defines some such inferences, allowing new descriptions to be inferred from existing ones. Hence, descriptions of the world can result either from direct assertion or from inference by application of inference rules defined by this specification. 8.4 Events and Time Time is critical in the context of provenance, since it can help corroborate provenance claims. For instance, if an entity is claimed to be obtained by transforming another, then the latter must have existed before the former. If it is not the case, then there is something wrong with such a provenance claim. Although time is critical, we should also recognize that provenance can be used in many different contexts within individual systems and across the Web. Different systems may use different clocks which may not be precisely synchronized, so when provenance records are combined by different systems, we may not be able to align the times involved to a single global timeline. Hence, PROV-DM is designed to minimize assumptions about time. Hence, to talk about the constraints on valid PROV-DM data, we refer to instantaneous events that correspond to interactions between activities and entities. The term "event" is commonly used in process algebra with a similar meaning. For instance, in CSP [CSP], events represent communications or interactions; they are assumed to be atomic and instantaneous. 8.4.1 Event Ordering The following paragraphs are unclear and need to be revised, to address review concerns: if we aren't saying anything about how events and time relate, and time is the only concrete information about event ordering in PROV-DM, then how can implementations check that event ordering constraints are satisfied? How the precedes partial order is implemented in practice is beyond the scope of this specification. This specification only assumes that each instantaneous event can be mapped to an instant in some form of timeline. The actual mapping is not in scope of this specification. Likewise, whether this timeline is formed of a single global timeline or whether it consists of multiple Lamport-style clocks is also beyond this specification. The follows and precedes orderings of events should be consistent with the ordering of their associated times over these timelines. This specification defines event ordering constraints between instantaneous events associated with provenance descriptions. PROV-DM data must satisfy such constraints.

PROV-DM also allows for time observations to be inserted in specific statements, forA PROV document is valid if each recognized instantaneous event introduced in this specification. The presence of a time observation for a given instantaneous event fixes the mappingthe bundles B₀, ..., B_n are valid and none of this instantaneous event to the timeline. It can also help with the verification of associated ordering constraints (though, again, this verification is outside the scope of this specification). 8.4.2 Types of Events Five kinds of instantaneous eventsthe bundle identifiers b_i are used for the PROV-DM data model. The activity start and activity end events delimit the beginning and the end of activities, respectively. The entity usage, entity generation, and entity invalidation events apply to entities, and the generation and invalidation events delimit the characterization interval of an entity. More specifically: repeated.

An activity start eventTwo (valid) PROV documents (B₀,[b₁=B₁,...,b_n=B_n]) and (B'₀,[b₁'=B'₁,...,b'_m=B'_m]) are equivalent if B₀ is the instantaneous event equivalent to B'₀ and n = m and there exists a permutation P : {1..n} -> {1..n} such that marks the instant an activity starts. An activity end eventfor each i, b_i = b'_P(i) and B_i is the instantaneous event that marks the instant an activity ends. An entity usage event is the instantaneous event that marks the first instant of an entity's consumption timespan by an activity. Before this instant the entity had not begunequivalent to be used by the activity. An entity generation event is the instantaneous event that marks the final instant of an entity's creation timespan, after which it is available for use. The entity did not exist before this event. An entity invalidation event is the instantaneous event that marks the initial instant of the destruction, invalidation, or cessation of an entity, after which the entity is no longer available for use. The entity no longer exists after this event.B'_P(i).

Stage #	Inference	Hypotheses	Conclusions
1	19, 20, 21, 22	specializationOf, mentionOf	specializationOf, entity
2	7, 8, 13, 14	entity, activity, wasAttributedTo, actedOnBehalfOf	wasInvalidatedBy, wasStartedBy, wasEndedBy
3	9, 10	wasStartedBy, wasEndedBy	wasGeneratedBy
4	11, 12	wasDerivedFrom	wasGeneratedBy, used, alternateOf
5	16, 17, 18	alternateOf, entity	alternateOf
6	5, 6	wasInformedBy, generated, used	wasInformedBy, generated, used
7	15	many	wasInfluencedBy

Constraints of the Provenance Data Model

W3C Working Draft 03 May11 September 2012

Abstract

Status of This Document

Last Call

PROV Family of Specifications

How to read the PROV Family of Specifications

Table of Contents

1. Introduction

1.1 Conventions

1.2 Purpose of this document

1.3 Structure of this document

1.4 Audience

2. Rationale

2.1 Entities, Activities and Agents

2.2 Events

2.3 Types

2.4 Validation Process Overview

Constants, Variables and Placeholders

Substitution

Formulas

Satisfying definitions, inferences, and constraints

Merging

Applying definitions, inferences, and constraints

Termination

Checking ordering, typing, and impossibility constraints

Equivalence and Isomorphism

From Instances to Bundles and Documents

2.5 Summary of inferences and constraints

2.3. Compliance with this document

3. 4. Definitions and Inferences and Definitions

4.1 Optional Identifiers and Attributes

4.2 Entities and Activities

3.2 Component 2: 4.3 Derivations

4.4 Agents

3.3 Component 3: Derivations4.5 Alternate and Specialized Entities

5. Validity Constraints

5.1 Uniqueness Constraints

5.2 Event Ordering Constraints

5.2.1 Activity constraints

5.2.2 Entity constraints

5.2.3 Agent constraints

5.3 Type Constraints

5.4 Impossibility constraints

6. Collection ConstraintsNormalization, Validity, and Equivalence

6.1 Instances

6.2 Collections and Weaker Derivation RelationBundles and Documents

B.7. Glossary

A. Termination of normalization

B. Acknowledgements

C. References

C.1 Normative references

C.2 Informative references