This document defines a data model for Provenance.
Derivation expresses that some characterized thing is transformed from, created from, or affected by another characterized thing.
PIL offers two different kinds of assertions by which asserters can formulate derivations. The first one is tightly connected to the notion of process execution, whereas the second one is not. The first kind of assertion is particularly suitable for asserters who have an intimate knowledge of process executions, and offers a more precise description of derivation, whereas the second does not put such a requirement on the asserter, and allows a less precise description of derivation to be asserted. From these assertions, further derivations can be inferred by transitive closure.
A process execution linked derivation, which, by definition of derivation, expresses that some characterized thing is transformed from, created from, or affected by another characterized thing, entails a process execution that transforms, creates or affects this characterized thing.
In its full form, a process-execution linked derivation assertion, noted isDerivedFrom(e2,e1,pe,r2,r1):
The following inference rule allows generation and use assertions to be inferred.
For convenience, PIL allows for a compact, process-execution linked derivation assertion, written isDerivedFrom(e2,e1), which:
The compact version has the same meaning as the fully formed process-execution linked derivation, except that a process execution is known to exist, though it does not need to be asserted.
This is formalized by the following inference rule, referred to as process execution introduction:
If e2 is derived from e1, then this means that the thing represented by e1 has an influence on the thing represented by e2, which is captured by a dependency between their attribute values; it also implies temporal ordering. These are specified as follows:
Given a process execution pe, entities e1 and e2, roles r1 and r2, if the assertion isDerivedFrom(e2,e1,pe,r2,r1) or isDerivedFrom(e2,e1) holds, if and only if: the values of some attributes of e2 are partly or fully determined by the values of some attributes of e1.
Given a process execution pe, entities e1 and e2, roles r1 and r2, if the assertion isDerivedFrom(e2,e1,pe,r2,r1) or isDerivedFrom(e2,e1) holds, then the use of characterized thing denoted by e1 precedes the generation of the characterized thing denoted by e2.
Note that inferring derivation from use and generation does not hold in general. Indeed, when a generation isGeneratedBy(e2,pe,r2) precedes uses(pe,e1,r1), for some e1, e2, r1, r2, and pe, one cannot infer derivation isDerivedFrom(e2,e1,pe,r2,r1) or isDerivedFrom(e2,e1) since the values of attributes of e2 cannot possibly be determined by the values of attributes of e1, given the creation of e2 precedes the use of e1.
isDerivedFrom(e5,e3)
A further inference is permitted from the compact version of derivation:
We note that the "symmetric" inference, does not hold. From isDerivedFrom(e2,e1) and uses(pe,e1), one cannot derive isGeneratedBy(e2,pe,r2) because e1 may be used by many process execution, not all of them generating e2.
A process execution independent derivation states the existence of a derivation, by any means whether direct or not, and regardless of any process executions.
A process execution independent derivation, written isEventuallyDerivedFrom (e2, e1),
If e2 is derived (isEventuallyDerivedFrom) from e1, then this means that the thing represented by e1 has an influence on the thing represented by e2, which at the minimum implies temporal ordering, specified as follows:
Given two entities e1 and e2, if the assertion isEventuallyDerivedFrom(e2,e1) holds, then: generation of the characterized thing denoted by e1 precedes the generation of the characterized thing denoted by e2.
Note that temporal ordering is between generations of e1 and e2, as opposed to process execution linked derivation, which implied temporal ordering between the use of e1 and generation of e2. Indeed, in the case of isEventuallyDerivedFrom, nothing is known about the use of e1, since there is no associated process execution.
If isDerivedFrom(e2,e1) holds because attribute a2.1 of e2 is determined by attribute a1.1 of e1, and if isDerivedFrom(e3,e2) holds because attribute a3.1of e3 is determined by attribute a2.2 of e1, it is not necessary the case that an attribute of e3 is determined by an attribute of e1, so, an asserter may not be able to assert isDerivedFrom(e3,e1). Hence, constraints on attributes invalidate transitivit in the general case.
However, there is sense that e3 still depends on e1, since e3 could not be generated without e1 existing. Hence, we introduce a weaker notion of derivation, which is transitive.
The relationship dependsOn is defined as follows: