Editors' working copy can change at any time.
Following F2F2 guidance, in this document we try to:
For the purpose of this specification, provenance is defined as a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing in the world. In particular, the provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. In an open and inclusive environment such as the Web, where users find information that is often contradictory or questionable, provenance can help those users to make trust judgements.
The idea that a single way of representing and collecting provenance could be adopted internally by all systems does not seem to be realistic today. Instead, a pragmatic approach is to consider a core data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model and exchanged between systems. Heterogeneous systems can then export their provenance into such a core data model, and applications that need to make sense of provenance in heterogeneous systems can then import it, process it, and reason over it.
Thus, the vision is that different provenance-aware systems natively adopt their own model for representing their provenance, but a core provenance data model can be readily adopted as a provenance interchange model across such systems.
A set of specifications, referred to as the PROV family of specifications, define the various aspects that are necessary to achieve this vision in an interoperable way:
The PROV-DM data model for provenance consists of a set of core concepts, and a few common relations, based on these core concepts. PROV-DM is a domain-agnostic model, but with clear extensibility points allowing further domain-specific and application-specific extensions to be defined.
This specification intentionally presents the key concepts of the PROV Data Model, without drilling down into all its subtleties. Using these key concepts, it becomes possible to write useful provenance assertions very quickly, and publish or embed them along side the data they relate to.
However, it becomes challenging for provenance, like for any other form of metadata, when the data it is about changes. To address this challenge, an upgrade path is proposed to enrich simple provenance, with extra-descriptions that help qualify the subject of provenance and provenance itself, with attributes and interval, intended to satisfy a comprehensive set of constraints. These aspects are covered in the companion specification [[PROV-DM-CONSTRAINTS]].
Section 2 provides an overview of PROV-DM listing its core types and their relations.
In section 3, PROV-DM is applied to a short scenario, encoded in PROV-ASN, and illustrated graphically.
Section 4 provides the definition of PROV-DM.
Section 5 introduces further relations offered by PROV-DM, including relations for data collections and domain-independent common relations.
Section 6 summarizes PROV-DM extensibility points.
Section 7 introduces constraints that can be applied to the PROV data model and that are covered in [[PROV-DM-CONSTRAINTS]].
The PROV-DM namespace is http://www.w3.org/ns/prov-dm/ (TBC).
All the elements, relations, reserved names and attributes introduced in this specification belong to the PROV-DM namespace.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [[!RFC2119]].
PROV-DM is a data model for describing the provenance of Entities, that is, of things in the world. The term "Things" encompasses a broad diversity of concepts, including digital objects such as a file or web page, physical things such as a building or a printed book, or a car as well as abstract concepts and ideas. One can regard any Web resource as an example of Entity in this context.
An entity may be the document at URI http://www.w3.org/TR/prov-dm/, a file in a file system, a car or an idea.
Activities that operate on digital entities may for example move, copy, or duplicate them.
An activity may be the publishing of a document on the web, sending a twitter message, extracting metadata embedded in a file, or driving a car from Boston to Cambridge, assembling a data set based on a set of measurements, performing a statistical analysis over a data set, sorting news items according to some criteria, running a SPARQL query over a triple store, and editing a file.
The key purpose of agents is to assign responsibility for activities. The definition of agent intentionally stays away from using concepts such as enabling, causing, initiating, affecting, etc, because many entities also enable, cause, initiate, and affect in some way the activities. So the notion of having some degree of responsibility is really what makes an agent.
An agent is a particular type of Entity. This means that the model can be used to express provenance of the agents themselves.
Software for checking the use of grammar in a document may be defined as an agent of a document preparation activity, and at the same time one can describe its provenance, including for instance the vendor and the version history.
Activities and entities are associated with each other in two different ways: activities are consumers of entities and activities are producers of entities. For the purpose of provenance, we define the following notions of generation and usage.
Activities are consumers of entities and producers of entities. In some case, the consumption of an entity influences the creation of another in some way. This notion is captured by derivations, defined as follows.
Examples of derivation include the transformation of a relational table into a linked data set, the transformation of a canvas into a painting, the transportation of a work of art from London to New York, and a physical transformation such as the melting of ice into water.
There are some useful types of entities and agents that are commonly encountered in applications making data and documents available on the Web; we introduce them in this section.
PROV-DM is not prescriptive about the nature of plans, their representation, the actions or steps they consist of, or their intended goals. Since plans may evolve over time, it may become necessary to track their provenance, so plans themselves are entities.
A plan can be a blog post tutorial for how to set up a web server, a list of instructions for a micro-processor execution, a cook's written recipe for a chocolate cake, or a workflow for a scientific experiment.
This concept allows for the provenance of the collection, but also of its constituents to be expressed. Such a notion of collection corresponds to a wide variety of concrete data structures, such as a maps, dictionaries or associative arrays.
An example of collection is an archive of documents. Each document has its own provenance, but the archive itself also has some provenance: who maintained it, which documents it contained at which point in time, how it was assembled, etc.
An AccountEntity is an entity that contains a bundle of provenance assertions.
Having found a resource, a user may want to retrieve its provenance. For users to decide whether they can place their trust in that resource, they may want to analyze its provenance, but also determine who the provenance is attributed to, and when it was generated. Hence, from the PROV-DM data model, the provenance is regarded as an entity, an AccountEntity, for which provenance can be sought.
Three types of agents are recognized by PROV-DM because they are commonly encountered in applications making data and documents available on the Web: persons, software agents, and organizations.
Even software agents can be assigned some responsibility for the effects they have in the world, so for example if one is using a Text Editor and one's laptop crashes, then one would say that the Text Editor was responsible for crashing the laptop. If one invokes a service to buy a book, that service can be considered responsible for drawing funds from one's bank to make the purchase (the company that runs the service and the web site would also be responsible, but the point here is that we assign some measure of responsibility to software as well).
So when someone models software as an agent for an activity in the PROV-DM model, they mean the agent has some responsibility for that activity.
It is important to reflect that there is a degree in the responsibility of agents, and that is a major reason for distinguishing among all the agents that have some association with an activity and determine which ones are really the originators of the entity. For example, a programmer and a researcher could both be associated with running a workflow, but it may not matter which programmer clicked the button to start the workflow while it would matter a lot which researcher told the programmer to do so. So there is some notion of responsibility that needs to be captured.
Provenance reflects activities that have occurred. In some cases, those activities reflect the execution of a plan that was designed in advance to guide the execution. PROV-DM allows attaching a plan to an activity, which represents what was intended to happen. Representing the plan explicitly in the provenance can be useful for various tasks: for example, to validate the execution as represented in the provenance record, to manage expectation failures, or to provide explanations.
Examples of association between an activity and agent are:
The nature of this relation is intended to be broad, including delegation or a contractual relation.
A student publishing a web page describing an academic department could result in both the student and the department being agents associated with the activity, and it may not matter which student published a web page but it matters a lot that the department told the student to put up the web page.
The following diagram summarizes the elements and relations just described
In this example, we consider the second version of the PROV-DM document http://www.w3.org/TR/2011/WD-prov-dm-20111215. Its provenance can be expressed from several perspectives, which we present. In the first one, provenance is concerned with the W3C process, whereas in the second one, it takes the authors' viewpoint.
In this section, we show the kind of provenance record that the WWW Consortium could keep for auditors to check that due processes are followed. All entities involved in this example are Web resources, with well defined URIs (some of which locating archived email messages, available to W3C Members).
We now paraphrase some PROV-DM assertions, and illustrate them with the PROV-ASN notation, a notation for PROV-DM aimed at human consumption. Full details of the provenance record can be found here.
entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ])
activity(ex:pub2,,,[prov:type="publish"])
wasGeneratedBy(tr:WD-prov-dm-20111215, ex:pub2)
wasDerivedFrom(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018)
used(ex:pub2,ar3:0111)
wasAssociatedWith(ex:pub2, w3:Consortium @ pr:rec-advance)
Provenance descriptions can be illustrated graphically. The illustration is not intended to represent all the details of the model, but it is intended to show the essence of a set of provenance statements. Therefore, it should not be seen as an alternate notation for expressing provenance.
The graphical illustration takes the form of a graph. Entities, activities and agents are represented as nodes, with oval, rectangular, and octagonal shapes, respectively. Usage, Generation, Derivation, and Activity Association are represented as directed edges.
Entities are laid out according to the ordering of their generation event. We endeavor to show time progressing from top to bottom. This means that edges for Usage, Generation and Derivation typically point upwards.
This simple example has shown a variety of PROV-DM constructs, such as Entity, Agent, Activity, Usage, Generation, Derivation, and ActivityAssociation. In this example, it happens that all entities were already Web resources, with readily available URIs, which we used. We note that some of the resources are public, whereas others have restricted access: provenance statements only make use of their identifiers. If identifiers do not pre-exist, e.g. for activities, then they can be minted, for instance ex:pub2, occurring in the namespace identified by prefix ex. We note that the URI scheme developed by W3C is particularly suited for expressing provenance of these reports, since each URI denotes a specific version of a report. It then becomes very easy to relate the various versions, with PROV-DM constructs.
In this section, we consider another perspective on technical report http://www.w3.org/TR/2011/WD-prov-dm-20111215. Here, provenance is concerned with the document editing activity, as perceived by authors. This kind of information could be used by authors in their CV or in a narrative about this document.
Again, we paraphrase some PROV-DM assertions, and illustrate them with the PROV-ASN notation. Full details of the provenance record can be found here.
entity(tr:WD-prov-dm-20111215, [ prov:type="document", ex:version="2" ])
While this description is about the same report tr:WD-prov-dm-20111215, its details differ from the author's perspective: it is a document and it has a version number.
activity(ex:edit1,,,[prov:type="edit"])
wasGeneratedBy(tr:WD-prov-dm-20111215, ex:edit1)
agent(ex:Paolo, [ prov:type="Human" ]) agent(ex:Simon, [ prov:type="Human" ])
wasAssociatedWith(ex:edit1, ex:Paolo, [prov:role="editor"]) wasAssociatedWith(ex:edit1, ex:Simon, [prov:role="contributor"])
The two previous sections provide two different perspectives on the provenance of a technical report. By design, the PROV approach allows for the provenance of a subject to be provided by multiple sources. For users to decide whether they can place their trust in the technical report, they may want to analyze its provenance, but also determine who the provenance is attributed to, and when it was generated, etc. In other words, we need to be able to express the provenance of provenance.
No new mechanism is required to support this requirement. PROV-DM makes the assumption that provenance statements have been bundled up, and named, by some mechanism outside the scope of PROV-DM. For instance, in this case, provenance statements were put in a file and exposed on the Web, respectively at ex:prov1 and ex:prov3. To express their respective provenance, these resources must be seen as entities, and all the constructs of PROV-DM are now available to characterize their provenance. In the example below, ex:prov1 is attributed to the agent w3:Consortium, whereas ex:prov3 to ex:Simon.
entity(ex:prov1, [prov:type="prov:AccountEntity" %% xsd:QName ]) wasAttributedTo(ex1:prov1,w3:Consortium) entity(ex:prov3, [prov:type="prov:AccountEntity" %% xsd:QName ]) wasAttributedTo(ex1:prov3,ex:Simon)
In this section, we revisit each concept introduction in Section 2, and provide its detailed definition in the PROV data model, in terms of its various constituents.
In PROV-DM, we distinguish elements from relations, which are respectively discussed in Section 4.1 and Section 4.2.
The following expression
entity(tr:WD-prov-dm-20111215, [ prov:type="document", ex:version="2" ])states the existence of an entity, denoted by identifier tr:WD-prov-dm-20111215, with type document and version number 2. The attributes ex:version is application specific, whereas the attribute type is reserved in the PROV-DM namespace.
Further considerations:
The following expression
activity(a1,2011-11-16T16:05:00,2011-11-16T16:06:00, [ex:host="server.example.org",prov:type="ex:edit" %% xsd:QName])
states the existence of an activity with identifier a1, start time 2011-11-16T16:05:00, and end time 2011-11-16T16:06:00, running on host server.example.org, and of type edit. The attribute host is application specific (declared in some namespace with prefix ex). The attribute type is a reserved attribute of PROV-DM, allowing for sub-typing to be expressed.
Further considerations:
From an interoperability perspective, it is useful to define some basic categories of agents since it will improve the use of provenance by applications. There should be very few of these basic categories to keep the model simple and accessible. There are three types of agents in the model since they are common across most anticipated domains of use:
These types are mutually exclusive, though they do not cover all kinds of agent.
The following expression is about an agent identified by e1, which is a person, named Alice, with employee number 1234.
agent(e1, [ex:employee="1234", ex:name="Alice", prov:type="prov:Human" %% xsd:QName])
It is optional to specify the type of an agent. When present, it is expressed using the prov:type attribute.
As provenance descriptions are exchanged between systems, it may be useful to add extra-information to what they are describing. For instance, a "trust service" may add value-judgements about the trustworthiness of some of the entities or agents involved. Likewise, an interactive visualization component may want to enrich a set of provenance descriptions with information helping reproduce their visual representation. To help with interoperability, PROV-DM introduces a simple annotation mechanism allowing anything that is identifiable to be associated with notes.
A note, noted note(id, [ attr1=val1, ...]) in PROV-ASN, contains:
A separate PROV-DM relation is used to associate a note with something that is identifiable (see Section on annotation). A given note may be associated with multiple identifiable things.
The following note consists of a set of application-specific attribute-value pairs, intended to help the rendering of what it is associated with, by specifying its color and its position on the screen.
note(ex2:n1,[ex2:color="blue", ex2:screenX=20, ex2:screenY=30]) hasAnnotation(tr:WD-prov-dm-20111215,ex2:n1)
The note is associated with the entity tr:WD-prov-dm-20111215 previously introduced (hasAnnotation is discussed in Section Annotation). The note's identifier and attributes are declares in a separate namespace denoted by prefix ex2.
Alternatively, a reputation service may enrich a provenance record with notes providing reputation ratings about agents. In the following fragment, both agents ex:Simon and ex:Paolo are rated "excellent".
note(ex3:n2,[ex3:reputation="excellent"]) hasAnnotation(ex:Simon,ex3:n2) hasAnnotation(ex:Paolo,ex3:n2)
The note's identifier and attributes are declares in a separate namespace denoted by prefix ex3.
This section describes all the PROV-DM relations between the elements introduced in Section Element. While these relations are not binary, they all involve two primary elements. They can be summarized as follows.
Entity | Activity | Agent | Note | |
Entity | wasDerivedFrom alternateOf specializationOf | wasGeneratedBy | — | hasAnnotation |
Activity | used | — | wasStartedBy wasEndedBy wasAssociatedWith | hasAnnotation |
Agent | — | — | actedOnBehalfOf | hasAnnotation |
Note | — | — | — | hasAnnotation |
While each of the components activity, time, and attributes is OPTIONAL, at least one of them MUST be present.
The following expressions
wasGeneratedBy(e1,a1, 2001-10-26T21:32:52, [ex:port="p1", ex:order=1]) wasGeneratedBy(e2,a1, 2001-10-26T10:00:00, [ex:port="p1", ex:order=2])
state the existence of two generations (with respective times 2001-10-26T21:32:52 and 2001-10-26T10:00:00), at which new entities, identified by e1 and e2, are created by an activity, identified by a1. The first one is available as the first value on port p1, whereas the other is the second value on port p1. The semantics of port and order are application specific.
In some cases, we may want to record the time at which an entity was generated without having to specify the activity that generated it. To support this requirement, the activity component in generation is optional. Hence, the following expression indicates the time at which an entity is generated, without naming the activity that did it.
wasGeneratedBy(e,,2001-10-26T21:32:52)
A reference to a given entity MAY appear in multiple usages that share a given activity identifier.
The following usages
used(a1,e1,2011-11-16T16:00:00,[ex:parameter="p1"]) used(a1,e2,2011-11-16T16:00:01,[ex:parameter="p2"])
state that the activity identified by a1 consumed two entities identified by e1 and e2, at times 2011-11-16T16:00:00 and 2011-11-16T16:00:01, respectively; the first one was found as the value of parameter p1, whereas the second was found as value of parameter p2. The semantics of parameter is application specific.
A usage record's id is OPTIONAL. It MUST be present when annotating usage records (see Section Annotation Record) or when defining precise-1 derivations (see Derivation).
As far as responsibility is concerned, PROV-DM offers two kinds of constructs. The first, introduced in this section, is a relation between an agent, a plan, and an activity; the second, introduced in Section Responsibility, is a relation between agents expressing that an agent was acting on behalf of another, in the context of an activity.
activity(ex:a,[prov:type="workflow execution"]) agent(ex:ag1,[prov:type="operator"]) agent(ex:ag2,[prov:type="designer"]) wasAssociatedWith(ex:a,ex:ag1,[prov:role="loggedInUser", ex:how="webapp"]) wasAssociatedWith(ex:a,ex:ag2,ex:wf,[prov:role="designer", ex:context="project1"]) entity(ex:wf,[prov:type="prov:Plan"%% xsd:QName, ex:label="Workflow 1", ex:url="http://example.org/workflow1.bpel" %% xsd:anyURI])Since the workflow ex:wf is itself an entity, its provenance can also be expressed in PROV-DM: it can be generated by some activity and derived from other entities, for instance.
A activity start is a representation of an agent starting an activity. An activity end is a representation of an agent ending an activity. Both relations are specialized forms of wasAssociatedWith. They contain attributes describing the modalities of acting/ending activities.
An activity start, written wasStartedBy(id,a,ag,attrs) in PROV-ASN, contains:
An activity end, written wasEndedBy(id,a,ag,attrs) in PROV-ASN, contains:
In the following example,
wasStartedBy(a,ag,[ex:mode="manual"]) wasEndedby(a,ag,[ex:mode="manual"])
there is an activity denoted by a that was started and ended by an agent denoted by ag, in "manual" mode, an application specific characterization of these relations.
PROV-DM offers a mild version of responsibility in the form of a relation to represent when an agent acted on another agent's behalf. So in the example of someone running a mail program, the program is an agent of that activity and the person is also an agent of the activity, but we would also add that the mail software agent is running on the person's behalf. In the other example, the student acted on behalf of his supervisor, who acted on behalf of the department chair, who acts on behalf of the university, and all those agents are responsible in some way for the activity to take place but we do not say explicitly who bears responsibility and to what degree.
We could also say that an agent can act on behalf of several other agents (a group of agents). This would also make possible to indirectly reflect chains of responsibility. This also indirectly reflects control without requiring that control is explicitly indicated. In some contexts there will be a need to represent responsibility explicitly, for example to indicate legal responsibility, and that could be added as an extension to this core model. Similarly with control, since in particular contexts there might be a need to define specific aspects of control that various agents exert over a given activity.
activity(a,[prov:type="workflow"]) agent(ag1,[prov:type="programmer"]) agent(ag2,[prov:type="researcher"]) agent(ag3,[prov:type="funder"]) wasAssociatedWith(a,ag1,[prov:role="loggedInUser"]) wasAssociatedWith(a,ag2) actedOnBehalfOf(ag1,ag2,a,[prov:type="delegation"]) actedOnBehalfOf(ag2,ag3,a,[prov:type="contract"])
According to Section Conceptualization, for an entity to be transformed from, created from, or affected by another in some way, there must be some underpinning activities performing the necessary actions resulting in such a derivation. However, asserters may not assert or have knowledge of these activities and associated details: they may not assert or know their number, they may not assert or know their identity, they may not assert or know the attributes characterizing how the relevant entities are used or generated. To accommodate the varying circumstances of the various asserters, PROV-DM allows more or less precise derivations to be asserted. Hence, PROV-DM uses the terms precise and imprecise to characterize the different kinds of derivations. We note that the derivation itself is exact (i.e., deterministic, non-probabilistic), but it is its description, expressed in a derivation assertion, that may be imprecise.
The lack of precision may come from two sources:
Hence, we can consider two axis. An activity number axis that has values single, multiple, and unknown, respectively representing the case where one activity is known to have occurred, more than one activities are known to have occurred, or an unknown number of activities have occurred. Likewise, we can consider another axis to cover other details (identities, generation, usage, and attributes), with values asserted and not asserted. We can then form a matrix of possible derivations. Out of the six possibilities, PROV-DM offers three forms of derivations to cater for five of them, while the remaining one is not meaningful. The following table summarizes names for the three kinds of derivation, which we then explain.
other details axis | |||
asserted | not asserted | ||
activity number axis | single | precise-1 derivation | imprecise-1 derivation |
multiple | imprecise-n derivation | imprecise-n derivation | |
unknown | — |
We note that the last theoretical cases cannot occur, since asserting the details of an unknown number of activities is a contradiction.
In order to represent the number of activities in a derivation, we introduce a PROV-DM attribute steps, which can take two possible values: single and any. When prov:steps="single", derivation is due to one activity; when prov:steps="any", the number of activities is multiple or not known.
The three kinds of derivations are successively introduced. Making use of the attribute steps, we can distinguish the various derivation types.
It is OPTIONAL to include the attribute prov:steps in a precise-1 derivation since it already refers to the one and only one activity underpinning the derivation.
An imprecise-1 derivation, written wasDerivedFrom(id, e2,e1, t, attrs) in PROV-ASN, contains:
An imprecise-1 derivation MUST include the attribute prov:steps, since it is the only means to distinguish this derivation from an imprecise-n derivation.
An imprecise-n derivation, written wasDerivedFrom(id, e2, e1, t, attrs) in PROV-ASN, contains:
It is OPTIONAL to include the attribute prov:steps in an imprecise-n derivation. It defaults to prov:steps="any".
None of the three kinds of derivation is defined to be transitive. Domain-specific specializations of these derivations may be defined in such a way that the transitivity property holds.
The following descriptions state the existence of derivations.
wasDerivedFrom(e5,e3,a4,g2,u2) wasDerivedFrom(e5,e3,a4,g2,u2,[prov:steps="single"]) wasDerivedFrom(e3,e2,[prov:steps="single"]) wasDerivedFrom(e2,e1,[]) wasDerivedFrom(e2,e1,[prov:steps="any"]) wasDerivedFrom(e2,e1,2012-01-18T16:00:00, [prov:steps="any"])
The first two are precise-1 derivations expressing that the activity identified by a4, by using the entity denoted by e3 according to usage u2 derived the entity denoted by e5 and generated it according to generation g2.
The third line describes an imprecise-1 derivation, which is similar for e3 and e2, but it leaves the activity and associated attributes implicit. The fourth and fifth lines are about imprecise-n derivations between e2 and e1, but no information is provided as to the number and identity of activities underpinning the derivation. The sixth derivation extends the fifth with the derivation time of e2.
The purpose of this section is to introduce relations between two entities that refer to the same thing in the world. Consider for example three entities:
These entities refer to the same real person Bob, either in different contexts, or at different levels of abstraction. Specifically:
The following two relations are introduced for expressing alternative or specialized entities.
An alternate relation, written alternateOf(id, alt1, alt2, attrs) in PROV-ASN, addresses case (1). It has the following constituents:
The following expressions describe two persons, respectively holder of a Facebook account and a Twitter account, and their relation as alternate.
entity(facebook:ABC, [ prov:type="person with Facebook account " ]) entity(twitter:XYZ, [ prov:type="person with Twitter account" ]) alternateOf(facebook:ABC, twitter:XYZ)
A specialization relation, written specializationOf(id, sub, super, attrs) in PROV-ASN, addresses case (2). It has the following constituents:
The following expressions describe two persons, the second of which is holder of a Twitter account. The second entity is a specialization of the first.
entity(ex:Bob, [ prov:type="person", ex:name="Bob" ]) entity(twitter:XYZ, [ prov:type="person with Twitter account" ]) specializationOf(twitter:XYZ, ex:Bob)
An annotation relation, written hasAnnotation(id,r,n,attrs) in PROV-ASN, has the following constituents:
The following expressions
entity(e1,[prov:type="document"]) entity(e2,[prov:type="document"]) activity(a,t1,t2) used(u1,a,e1,[ex:file="stdin"]) wasGeneratedBy(e2, a, [ex:file="stdout"]) note(n1,[ex:icon="doc.png"]) hasAnnotation(e1,n1) hasAnnotation(e2,n1) note(n2,[ex:style="dotted"]) hasAnnotation(u1,n2)
describe two documents (attribute-value pair: prov:type="document") identified by e1 and e2, and their annotation with a note indicating that the icon (an application specific way of rendering provenance) is doc.png. The example also includes an activity, its usage of the first entity, and its generation of the second entity. The usage is annotated with a style (an application specific way of rendering this edge graphically). To be able to express this annotation, the usage was provided with an identifier u1, which was then referred to in hasAnnotation(u1,n2).
A PROV-DM namespace is identified by an IRI reference [[!IRI]]. In PROV-DM, attributes, identifiers, and literals with qualified names as data type can be placed in a namespace using the mechanisms described in this specification.
A namespace declaration consists of a binding between a prefix and a namespace. Every qualified name with this prefix in the scope of this declaration refers to this namespace. A default namespace declaration consists of a namespace. Every un-prefixed qualified name in the scope of this default namespace declaration refers to this namespace.
The PROV-DM namespace is http://www.w3.org/ns/prov-dm/ (TBC).
An identifier is a qualified name.
A qualified name is a name subject to namespace interpretation. It consists of a namespace, denoted by an optional prefix, and a local name.
PROV-DM stipulates that a qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part.
A qualified name's prefix is OPTIONAL. If a prefix occurs in a qualified name, it refers to a namespace declared in a namespace declaration. In the absence of prefix, the qualified name refers to the default namespace.
An attribute is a qualified name.
The PROV data model introduces a pre-defined set of attributes in the PROV-DM namespace, which we define below. The interpretation of any attribute declared in another namespace is out of scope.
The attribute prov:role denotes the function of an entity with respect to an activity, in the context of a usage, generation, activity association, activity start, and activity end. The attribute prov:role is allowed to occur multiple times in a list of attribute-value pairs. The value associated with a prov:role attribute MUST be a PROV-DM Literal.
The following activity start describes the role of the agent identified by ag in this start relation with activity a.
wasStartedBy(a,ag, [prov:role="program-operator"])
The attribute prov:type provides further typing information for an element or relation. PROV-DM liberally defines a type as a category of things having common characteristics. PROV-DM is agnostic about the representation of types, and only states that the value associated with a prov:type attribute MUST be a PROV-DM Literal. The attribute prov:type is allowed to occur multiple times.
The following describes an agent of type software agent.
agent(ag, [prov:type="prov:ComputingSystem" %% xsd:QName])
The attribute prov:steps defines the level of precision associated with a derivation. The value associated with a prov:steps attribute MUST be "single" or "any". The attribute prov:steps occurs at most once in a derivation. A derivation without attribute prov:steps is considered to be equivalent to the same derivation extended with an extra attribute prov:steps and associated value "any".
The following expression declares an imprecise-1 derivation, which is known to involve one activity, though its identity, usage details of ex:e1, and generation details of ex:e2 are not explicit.
wasDerivedFrom(ex:e2, ex:e1, [prov:steps="single"])
The attribute prov:label provides a human-readable representation of a PROV-DM element or relation. The value associated with the attribute prov:label MUST be a string.
Location is an identifiable geographic place (ISO 19112). As such, there are numerous ways in which location can be expressed, such as by a coordinate, address, landmark, row, column, and so forth. This document does not specify how to concretely express locations, but instead provide a mechanism to introduce locations, by means of attributes.
The attribute prov:location is an OPTIONAL attribute of entity and activity. The value associated with the attribute prov:location MUST be a PROV-DM Literal, expected to denote a location.
The following expression describes entity Mona Lisa, a painting, with a location attribute.
entity(ex:MonaLisa, [prov:location="Le Louvres, Paris", prov:type="StillImage"])
A PROV-DM Literal represents a data value such as a particular string or number. A PROV-DM Literal represents a value whose interpretation is outside the scope of PROV-DM.
The following examples respectively are the string "abc", the string "abc", the integer number 1, and the IRI "http://example.org/foo".
"abc" 1 "http://example.org/foo" %% xsd:anyURI
The following example shows a literal of type xsd:QName (see QName [[!XMLSCHEMA-2]]). The prefix ex MUST be bound to a namespace declared in a namespace declaration.
"ex:value" %% xsd:QName
Time instants are defined according to xsd:dateTime [[!XMLSCHEMA-2]].
Time is OPTIONAL in usage, generation, and activity
The following figure summarizes the additional relations described in this section.
A revision is the result of revising an entity into a revised version. Deciding whether something is made available as a revision of something else usually involves an agent who takes responsibility for approving that the former is a due variant of the latter. The agent who is responsible for the revision may optionally be specified. Revision is a particular case of derivation of an entity into its revised version.
A revision relation, written wasRevisionOf(id,e2,e1,ag,attrs) in PROV-ASN, contains:
Revisiting the example of Section 3.1, we can now state that the report tr:WD-prov-dm-20111215 is a revision of the report tr:WD-prov-dm-20111018, approved by agent w3:Consortium.
entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ]) entity(tr:WD-prov-dm-20111018, [ prov:type="pr:RecsWD" %% xsd:QName ]) wasRevisionOf(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, w3:Consortium)
Attribution is the ascribing of an entity to an agent. More precisely, when an entity e is attributed to agent ag, entity e was generated by some activity a, which in turn was associated to agent ag. Thus, this relation is useful when the activity is not known, or irrelevant.
An attribution relation, written wasAttributedTo(id,e,ag,attr) in PROV-ASN, contains the following elements:
Revisiting the example of Section 3.2, we can ascribe tr:WD-prov-dm-20111215 to some agents without having to make an activity explicit.
agent(ex:Paolo, [ prov:type="Human" ]) agent(ex:Simon, [ prov:type="Human" ]) entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ]) wasAttributedTo(tr:WD-prov-dm-20111215, ex:Paolo, [prov:role="editor"]) wasAttributedTo(tr:WD-prov-dm-20111215, ex:Simon, [prov:role="contributor"])
The following relations express dependencies amongst activities.
An information flow ordering relation, written as wasInformedBy(id,a2,a1,attrs) in PROV-ASN, contains:
Relation wasInformedBy is not transitive.
Consider two long running services, which we represent by activities s1 and s2.
activity(s1,,,[prov:type="service"]) activity(s2,,,[prov:type="service"]) wasInformedBy(s2,s1)The last line indicates that some entity was generated by s1 and used by s2.
A control ordering relation, written as wasStartedBy(id, a2, a1, attrs) in PROV-ASN, contains:
Suppose activities a1 and a2 are computer processes that are executed on different hosts, and that a1 started a2. This can be expressed as in the following fragment:
activity(a1,t1,t2,[ex:host="server1.example.org",prov:type="workflow"]) activity(a2,t3,t4,[ex:host="server2.example.org",prov:type="subworkflow"]) wasStartedBy(a2,a1)
A traceability relation between two entities e2 and e1 is a generic dependency of e2 on e1 that indicates either that e1 was necessary for e2 to be created, or that e1 bears some responsibility for e2's existence.
A traceability relation, written tracedTo(id,e2,e1,attrs) in PROV-ASN, contains:
We note that the ancestor is allowed to be an agent since agents are entities.
We refer to the example of Section 3.1, and specifically to Figure prov-tech-report. We can see that there is a path from tr:WD-prov-dm-20111215 to w3:Consortium or to pr:rec-advance. This is expressed as follows.
tracedTo(tr:WD-prov-dm-20111215,w3:Consortium) tracedTo(tr:WD-prov-dm-20111215,pr:rec-advance)
Further considerations:
Further considerations:
A quotation is the repeat of an entity (such as text or image) by someone other that its original author. Quotation is a particular case of derivation in which entity e2 is derived from entity e1 by copying, or "quoting", parts of it.
A quotation relation, written wasQuotedFrom(id,e2,e1,ag2,ag1,attrs) in PROV-ASN, contains:
An original source relation is a particular case of derivation that states that an entity e2 (derived) was originally part of some other entity e1 (the original source).
An original source relation, written hadOriginalSource(id,e2,e1,attrs), contains:
Collection relations address the need to describe the evolution of entities that have a collection structure, that is, which may contain other entities. Specifically, this section exploits the built-in type for entities, called collection, and two relations to describe the effect of adding elements to, and removing elements from, a collection entity. The intent of these relations and entity types is to capture the history of changes that occurred to a collection.
A collection is an entity that has a logical internal structure consisting of key-value pairs, often referred to as a map. More precisely, the following entity types are introduced:
entity(c, [prov:type="EmptyCollection"]) // e is an empty collection entity(v1) entity(v2) entity(c1, [prov:type="Collection"]) entity(c2, [prov:type="Collection"]) CollectionAfterInsertion(c1, c, "k1", v1) // c1 = { ("k1",v1) } CollectionAfterInsertion(c2, c1, "k2", v2) // c2 = { ("k1",v1), ("k2", v2) } CollectionAfterRemoval(c3, c2, k1) // c3 = { ("k2",v2) }
A relation CollectionAfterInsertion, written CollectionAfterInsertion(collAfter, collBefore, key, value), contains:
A relation CollectionAfterDeletion, written CollectionAfterDeletion(collAfter, collBefore, key), contains:
Further considerations:
The PROV data model provides several extensibility points that allow designers to specialize it to specific applications or domains. We summarize these extensibility points here:
The PROV-DM namespace declares a set of reserved attributes catering for extensibility: type, location.
To this end, the PROV-DM namespace declares a reserved attribute: role.
The PROV data model is designed to be application and technology independent, but specializations of PROV-DM are welcome and encouraged. To ensure interoperability, specializations of the PROV data model that exploit the extensibility points summarized in this section MUST preserve the semantics specified in the PROV-DM documents (part 1 to 3).
The example of section 3 contains identifiers such as tr:WD-prov-dm-20111215, which denotes a specific version of a technical report. On the other hand, a URI such as http://www.w3.org/TR/prov-dm/ points to the latest version of a document. One needs to ensure that provenance descriptions for the latter document remain valid as denoted resources change.
To this end, PROV-DM allows asserters to describe "partial states" of entities by means of attributes and associated values. Some further constraints apply to the use of these attributes, since the values associated with them are expected to remain unchanged for some period of time. The constraints associated to attributes are also specified in the companion specification [[PROV-DM-CONSTRAINTS]].
Even though a mechanism for blundling up provenance descriptions and naming them is not part of PROV-DM, the idea of a bundle of descriptions is crucial to the PROV approach. Indeed, it allows multiple provenance perspectives to be provided for a given entity. It is also the mechanism by which provenance of provenance can be expressed. Such a named bundle is being referred to as an account and is regarded as an AccountEntity so that its provenance can be expressed. The notion of account is specified in the companion specification [[PROV-DM-CONSTRAINTS]], as well as constraint that structurally well-formed descriptions are expected to satisfy.