This document defines a data model for Provenance.

Derivation

This section remains very much work in progress. Many issues have been raised and discussed, and for several of them, consensus still remains difficult to reach. The presentation of derivation has been altered, and new names adopted, in the hope of clarifying this notion. Key outstanding issues include:

Derivation expresses that some characterized thing is transformed from, created from, or affected by another characterized thing.

PIL offers two different kinds of assertions by which asserters can formulate derivations. The first one is tightly connected to the notion of process execution, whereas the second one is not. The first kind of assertion is particularly suitable for asserters who have an intimate knowledge of process executions, and offers a more precise description of derivation, whereas the second does not put such a requirement on the asserter, and allows a less precise description of derivation to be asserted. From these assertions, further derivations can be inferred by transitive closure.

Process Execution Linked Derivation

A process execution linked derivation, which, by definition of derivation, expresses that some characterized thing is transformed from, created from, or affected by another characterized thing, entails a process execution that transforms, creates or affects this characterized thing.

In its full form, a process-execution linked derivation assertion, noted isDerivedFrom(e2,e1,pe,r2,r1):

This assertion expresses that the process execution pe, by using the thing denoted by e1 with role r1 derived the thing denoted by entity e2 and generated it with role r2.

The following inference rule allows generation and use assertions to be inferred.

If isDerivedFrom(e2,e1,pe,r2,r1) holds, then isGeneratedBy(e2,pe,r2) and uses(pe,e1,r1) also hold.

For convenience, PIL allows for a compact, process-execution linked derivation assertion, written isDerivedFrom(e2,e1), which:

The compact version has the same meaning as the fully formed process-execution linked derivation, except that a process execution is known to exist, though it does not need to be asserted. This is formalized by the following inference rule, referred to as process execution introduction:

If isDerivedFrom(e2,e1) holds, then there exists a process execution pe, and roles r1,r2, such that: isGeneratedBy(e2,pe,r2) and uses(pe,e1,r1).

If e2 is derived from e1, then this means that the thing represented by e1 has an influence on the thing represented by e2, which is captured by a dependency between their attribute values; it also implies temporal ordering. These are specified as follows:

Given a process execution pe, entities e1 and e2, roles r1 and r2, if the assertion isDerivedFrom(e2,e1,pe,r2,r1) or isDerivedFrom(e2,e1) holds, if and only if: the values of some attributes of e2 are partly or fully determined by the values of some attributes of e1.

Should this dependency of attributes be made explicit as argument of the derivation? By making it explicit, we would allow someone to verify the validity of the derivation.

Given a process execution pe, entities e1 and e2, roles r1 and r2, if the assertion isDerivedFrom(e2,e1,pe,r2,r1) or isDerivedFrom(e2,e1) holds, then the use of characterized thing denoted by e1 precedes the generation of the characterized thing denoted by e2.

Note that inferring derivation from use and generation does not hold in general. Indeed, when a generation isGeneratedBy(e2,pe,r2) precedes uses(pe,e1,r1), for some e1, e2, r1, r2, and pe, one cannot infer derivation isDerivedFrom(e2,e1,pe,r2,r1) or isDerivedFrom(e2,e1) since the values of attributes of e2 cannot possibly be determined by the values of attributes of e1, given the creation of e2 precedes the use of e1.

isDerivedFrom(e5,e3)

A further inference is permitted from the compact version of derivation:

Given a process execution pe, entities e1 and e2, and role r2, if isDerivedFrom(e2,e1) and isGeneratedBy(e2,pe,r2) hold, then there exists a role r1, such that uses(pe,e1,r1) also holds.
This inference is justified by the fact that e2 is generated by at most one process execution. Hence, this process execution is also the one that uses e1.

There is a suggestion by Simon that this notion of derivation is only meaningful in the context of an account. See email. It is not clear it is the case anymore. However, the inference above is only meaning full if unicity of generation hold.

We note that the "symmetric" inference, does not hold. From isDerivedFrom(e2,e1) and uses(pe,e1), one cannot derive isGeneratedBy(e2,pe,r2) because e1 may be used by many process execution, not all of them generating e2.

Process Execution Independent Derivation

A process execution independent derivation states the existence of a derivation, by any means whether direct or not, and regardless of any process executions.

A process execution independent derivation, written isEventuallyDerivedFrom (e2, e1),

If e2 is derived (isEventuallyDerivedFrom) from e1, then this means that the thing represented by e1 has an influence on the thing represented by e2, which at the minimum implies temporal ordering, specified as follows:

Given two entities e1 and e2, if the assertion isEventuallyDerivedFrom(e2,e1) holds, then: generation of the characterized thing denoted by e1 precedes the generation of the characterized thing denoted by e2.

Note that temporal ordering is between generations of e1 and e2, as opposed to process execution linked derivation, which implied temporal ordering between the use of e1 and generation of e2. Indeed, in the case of isEventuallyDerivedFrom, nothing is known about the use of e1, since there is no associated process execution.

Should we link isEventuallyDerivedFrom to attributes as we did for isDerivedFrom? If so, this type of inference should be presented upfront, for both.

Transitivity

If isDerivedFrom(e2,e1) holds because attribute a2.1 of e2 is determined by attribute a1.1 of e1, and if isDerivedFrom(e3,e2) holds because attribute a3.1of e3 is determined by attribute a2.2 of e1, it is not necessary the case that an attribute of e3 is determined by an attribute of e1, so, an asserter may not be able to assert isDerivedFrom(e3,e1). Hence, constraints on attributes invalidate transitivit in the general case.

However, there is sense that e3 still depends on e1, since e3 could not be generated without e1 existing. Hence, we introduce a weaker notion of derivation, which is transitive.

The relationship dependsOn is defined as follows: