Note: all links use http://www.w3.org/TR/prov-constraints/ so most of them don't land in the right place just yet
On Sept. 11, 2012 the Provenance Working Group has announced Last Call on a new document: PROV-CONSTRAINTS in the suite that defines the core of the PROV family of specifications.
This follows the recent Last Call announcement for 3 other documents, namely PROV-DM, PROV-O, PROV-N. The meaning of Last Call is clarified in the earlier announcement. Essentially, it means that the specification document is open to public comments for a set period of time, at the end of which the editors commit to produce the final version of the document, where all such comments are accounted for following internal group discussions.
The PROV-CONSTRAINTS document complements the first three, and is focused on the notion of valid provenance. The intent of provenance validation is to ensure that a set of PROV statements represents a history of objects and their interactions which is consistent, and thus safe to use for the purpose of logical reasoning and other kinds of analysis.
Thus, the document can be used to design a validator that can be used to check the consistency of a PROV statements.
Three types of constraints are defined.
e
is key for statement entity(e,attrs)
, but also constraints that state the uniqueness of events such as the generation of an entity by an activity. Constraint 25 for example states that only one generation event can be associated to a generated entity and a generating activity:
IF wasGeneratedBy(gen1; e,a,_t1,_attrs1) and wasGeneratedBy(gen2; e,a,_t2,_attrs2), THEN gen1 = gen2.
IF wasGeneratedBy(gen; e,_a1,_t1,_attrs1) and used(use; _a2,e,_t2,_attrs2) THEN gen precedes use.
entity(foo)
and activity(foo)
is an illegal combination), but also to state property of relations, for example "specialization is irreflexive" (Constraint 54):
IF specializationOf(e,e) THEN INVALID.
and "the set of entities and activities are disjoint" (Constraint 57):
IF 'entity' ∈ typeOf(id) AND 'activity' ∈ typeOf(id) THEN INVALID.
We now show an inference process involving ordering constaints, which leads to concluding that a set of provenance statements is invalid because it cannot satisfy ordering constraints. The example involves a case of mutual derivation of an entity from another. Consider the following statements:
entity(e1)
entity(e2)
activity(a1)
activity(a2)
wasGeneratedBy(gen2; e2,a2,t2)
wasGeneratedBy(gen1; e1,a1,t1)
wasDerivedFrom(d1; e2,e1,-,-,-)
That is, e2
was derived from e1
, each of e2
, e1
being respectively generated by an activity a2
, a1
, at time t2
, t1
. This set of statements is illustrated by the following figure, in which activities are rectangles and entities ellipses.
Constraint 44 defines the precedence of generation of the first entity over generation of the second entity in the context of derivation:
IF wasDerivedFrom(d; e2,e1,a,g,u,attrs) and wasGeneratedBy(gen1; e1,a1,t1,attrs1) and wasGeneratedBy(gen2; e2,a2,t2,attrs2) THEN gen1 strictly precedes gen2.
Intuitively, e1
must be generated prior to generating e2
:
gen1 strictly precedes gen2.
Suppose we add the following statement to the our set of statements:
wasDerivedFrom(d2; e1,e2,-,-,-)
This would form the following overall PROV graph.
Adding this new statement, however, creates a circular derivation between e1
and e2
, an invalid situation. We therefore expect that our constraint system be able to tell us something interesting. Indeed, by application of the same Constraint 44, this new statement entails:
gen2 strictly precedes gen1.
Hence, we obtain that gen2 strictly precedes gen1 strictly precedes gen2
, which is impossible.
This example was simple and may not have required an automated validator to detect invalidation. However, when graph patterns become more complex, an automated validator turns out to be an essential component for provenance users, whether they intend to publish provenance, or whether they intend to consume it. The prov-constraints document defines a set of constraints that validators are expected to implement.
We encourage developers to implement these constraints. Several people are already working on validators and we encourage you to do so as well.