W3C

Linking Across Provenance Bundles

W3C Working Draft 11 December 2012

This version:
http://www.w3.org/TR/2012/WD-prov-links-20121211/
Latest published version:
http://www.w3.org/TR/prov-links/
Latest editor's draft:
http://dvcs.w3.org/hg/prov/raw-file/default/links/prov-links.html
Previous version:
Editors:
Luc Moreau, University of Southampton
Timothy Lebo, Rensselaer Polytechnic Institute

Abstract

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. Bundles, defined in [PROV-DM] as sets of provenance descriptions, were introduced in PROV as the mechanism by which provenance of provenance can be expressed. Bundles, whose validity is established independently of each other [PROV-CONSTRAINTS], are essentially independent of each other, acting as islands of provenance descriptions.

In applications where provenance is created by multiple parties over time, it is useful for provenance descriptions created by one party to link to provenance descriptions created by another party. Such a mechanism would allow the "stitching" of provenance descriptions together. Given that provenance descriptions are expected to be contained in bundles, this would require a capability to link entity descriptions across bundles. To address this requirement, this document introduces a relation Mention allowing an entity description to be linked to another entity description occurring in another bundle.

The PROV Document Overview describes the overall state of PROV, and should be read before other PROV documents.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

PROV Family of Documents

This document is part of the PROV family of documents, a set of documents defining various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. These documents are:

How to read the PROV Family of Documentation

This document was published by the Provenance Working Group as a First Public Working Draft. If you wish to make comments regarding this document, please send them to public-prov-comments@w3.org (subscribe, archives). All feedback is welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

Provenance is a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing. The specifications [PROV-O], [PROV-DM], [PROV-N], and [PROV-XML] have respectively defined the PROV ontology, the PROV conceptual model, the PROV notation, and the PROV XML schema, allowing provenance descriptions to be expressed, represented in various representations, and interchanged between systems across the Web.

The provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. To support this, provenance itself should be trusted, and therefore, provenance of provenance is itself a critical aspect of an information infrastructure such as the Web. To this end, PROV introduces the concept of Bundle: defined as a set of provenance descriptions, it is a mechanism by which provenance of provenance can be expressed (see also Bundle [PROV-O] and Bundle [PROV-XML]). With bundles, blobs of provenance descriptions can be given names and can themselves be regarded as entities, whose provenance can in turn be described using PROV. These blobs of provenance descriptions are independent of each other, as illustrated by [PROV-CONSTRAINTS] which determines their validity by examining them in isolation of each other.

In a distributed environment, it is common to encounter applications that involve multiple parties: it is a common situation that some party creates some data and its provenance, whereas another party consumes the data and its provenance. In such a situation, the consumer, when it in turn generates provenance, often wants to augment the descriptions of entities generated by another producer. For the consumer, it is not suitable to repeat the provenance created by the producer, and augment it according to their need. Instead, a consumer wants to refer to the description as created by the producer in situ, i.e. in its bundle, and specialize it, allowing the consumer to add their own view on this entity. Such a capability would allow parties to "stitch together" provenance descriptions that would otherwise be disconnected.

This document introduces a new concept Mention allowing an entity to be described as the specialization of another entity, itself described in another bundle. The document provides not only a conceptual definition of Mention, but also the corresponding ontological, schema, and notational definitions, for the various representations of PROV. It also includes constraints that apply to this construct specifically. It is our aim to promote interoperability by defining Mention conceptually and in the representations of PROV.

Note
The concept Mention is experimental, and for this reason was not defined in PROV recommendation-track documents. The Provenance Working Group is seeking feedback from the community on its usefulness in practical scenarios.

2. Conceptual Definition of Mention

An entity e1 may be mentioned in a bundle b, which contains some descriptions about this entity e1: how e1 was generated and used, which activities e1 is involved with, the agents e1 is attributed to, etc. Other bundles may contain other descriptions about the same entity e1. Some applications may want to augment the descriptions of entity e1 found in bundle b with other information. To this end, PROV allows a new entity e2 to be created and defined as a specialization of the preceding entity e1, and which presents at least an additional aspect: the bundle b containing some descriptions of e1. With this relation, applications that process e2 can know that the attributes of e2 may have been computed according to the descriptions of e1 in b. (The term 'aspect' should be understood informally as "a particular part or feature of something"; the term is used in [PROV-DM]'s definitions of entity (Section 5.1.1), specialization (Section 5.5.1), alternate (Section 5.5.2), and in section 2.1 of [PROV-CONSTRAINTS]).

Figure 1 depicts the relation MentionOf (mention) as a ternary relation.

mention
Figure 1 ◊: UML Diagram for Mention

Thus, Mention is a relation between two entities with regard to a bundle. It is a special case of specialization.

The mention of an entity in a bundle (containing a description of this entity) is another entity that is a specialization of the former and that presents at least the bundle as a further additional aspect.

An entity is interpreted with respect to a bundle's description in a domain specific manner. The mention of this entity with respect to this bundle offers the opportunity to specialize it according to some domain-specific interpretation.

A mention of an entity in a bundle results in a specialization of this entity with extra fixed aspects, including the bundle that it is described in.

A mention relation, written mentionOf(infra, supra, b) in PROV-N, has:

Like specialization, a mention is not, as defined here, an influence, and therefore does not have an id and attributes. Its grammar, in the provenance notation, is written as follows.

    mentionExpression    ::=    "mentionOf" "(" eIdentifier "," eIdentifier "," bIdentifier ")"
    bIdentifier    ::=    identifier

The following table summarizes how each constituent of a Mention maps to a syntax element, in the provenance notation.

MentionNon-Terminal
specificEntityeIdentifier
generalEntityeIdentifier
bundlebIdentifier

Let us consider a bundle and the expression specializationOf(e2,e1) occuring in this bundle. The entity e1 may described im multiple other bundle bi. From specializationOf(e2,e1), one cannot infer mentionOf(e2,e1,b) for a given b, since it is unknown which bi's descriptions were used to computed additional aspects of e2. Hence, mentionOf has to be asserted.

Section 5. presents constraints applicable to Mention, and in particular, the fact that an entity can be a specific entity of a Mention at most once.

Example 1

This example is concerned with a performance rating tool that reads and processes provenance to determine the performance of agents. To keep the example simple, an agent's performance is determined by the duration of the activities it is associated with.

As an illustration, we consider that two bundles ex:run1 and ex:run2 refer to an agent ex:Bob that controlled two activities ex:a1 and ex:a2.

bundle ex:run1
    activity(ex:a1, 2011-11-16T16:00:00, 2011-11-16T17:00:00)  //duration: 1hour
    wasAssociatedWith(ex:a1, ex:Bob, [prov:role="controller"])
endBundle

bundle ex:run2
    activity(ex:a2, 2011-11-17T10:00:00, 2011-11-17T17:00:00)  //duration: 7hours
    wasAssociatedWith(ex:a2, ex:Bob, [prov:role="controller"])
endBundle

The performance rating tool reads these bundles, and rates the performance of the agent described in these bundles. The performance rating tool creates a new bundle tool:analysis01 containing the following. A new agent tool:Bob-2011-11-16 is declared as a mention of ex:Bob as described in bundle ex:run1, and likewise for tool:Bob-2011-11-17 with respect to ex:run2. The tool adds a domain-specific performance attribute to each of these specialized entities as follows: the performance of the agent in the first bundle is judged to be good since the duration of ex:a1 is one hour, whereas it is judged to be bad in the second bundle since ex:a2's duration is seven hours. The attribute perf:rating is an example of additional attribute of the specialized agents tool:Bob-2011-11-16 and tool:Bob-2011-11-17.

bundle tool:analysis01
    agent(tool:Bob-2011-11-16, [perf:rating="good"])
    mentionOf(tool:Bob-2011-11-16, ex:Bob, ex:run1)

    agent(tool:Bob-2011-11-17, [perf:rating="bad"])
    mentionOf(tool:Bob-2011-11-17, ex:Bob, ex:run2)
endBundle
Example 2

Consider the following bundle of descriptions, in which derivation and generations have been identified.

 
bundle obs:bundle1
  entity(ex:report1, [ prov:type="report", ex:version=1 ])
  wasGeneratedBy(ex:g1; ex:report1, -, 2012-05-24T10:00:01)
  entity(ex:report2, [ prov:type="report", ex:version=2 ])
  wasGeneratedBy(ex:g2; ex:report2, -, 2012-05-25T11:00:01)
  wasDerivedFrom(ex:report2, ex:report1)
endBundle
entity(obs:bundle1, [ prov:type='prov:Bundle' ])
wasAttributedTo(obs:bundle1, ex:observer01)
Bundle obs:bundle1 is rendered by a visualisation tool. It may useful for the visualization layout of this bundle to be shared along with the provenance descriptions, so that other users can render provenance as it was originally rendered. The original bundle obviously cannot be changed. However, one can create a new bundle, as follows.
 
bundle tool:bundle2
  entity(tool:bundle2, [ prov:type='viz:Configuration', prov:type='prov:Bundle' ])
  wasAttributedTo(tool:bundle2, viz:Visualizer)

  entity(tool:report1, [ viz:color="orange" ])
  mentionOf(tool:report1, ex:report1, obs:bundle1)

  entity(tool:report2, [ viz:color="blue" ])              
  mentionOf(tool:report2, ex:report2, obs:bundle1)
endBundle

In bundle tool:bundle2, the prefix viz is used for naming visualisation-specific attributes, types or values.

This example is typical of a common situation in distributed environments, where the consumer and producer of provenance are different.

Bundle tool:bundle2 is given type viz:Configuration to indicate that it consists of descriptions that pertain to the configuration of the visualisation tool. This type attribute can be used for searching bundles containing visualization-related descriptions.

The visualisation tool created new identifiers tool:report1 and tool:report2. They denote entities which are specializations of ex:report1 and ex:report2, described in bundle obs:bundle1, with visualization attribute for the color to be used when rendering these entities.

3. Ontological Definition of Mention

The ternary relation mentionOf is encoded as two properties: prov:mentionOf and prov:asInBundle, defined as follows.

Property: prov:mentionOf op

IRI:http://www.w3.org/ns/prov#mentionOf

When :x prov:mentionOf :y and :y is described in Bundle :b, the triple :x prov:asInBundle :b is also asserted to cite the Bundle in which :y was described.

prov:asInBundle is used to cite the Bundle in which the generalization was mentioned.

has super-properties
has domain
has range
PROV-DM term
mention

Property: prov:asInBundle op

IRI:http://www.w3.org/ns/prov#asInBundle

When :x prov:mentionOf :y and :y is described in Bundle :b, the triple :x prov:asInBundle :b is also asserted to cite the Bundle in which :y was described.

has domain
has range
PROV-DM term
mention
Example 3

We revisit Example 1, encoding in RDF the rating of Bob in the context of the second activity.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix tool: <http://example.com/tool/> .
@prefix perf: <http://example.com/performance/> .
@prefix :     <http://example.com/> .

:run2 {
   :activity_2
      a prov:Activity;
      prov:startedAtTime "2011-11-17T10:00:00"^^xsd:dateTime;
      prov:endedAtTime   "2011-11-17T17:00:00"^^xsd:dateTime; 
      prov:wasAssociatedWith :bob;
   .
}

tool:analysis_01 {
   tool:bob-2011-11-17
      a prov:Agent;
      prov:mentionOf  :bob;
      prov:asInBundle :run2;
      perf:rating     perf:very-slow;
   .
}

# This is inferred from prov:mentionOf
tool:bob-2011-11-17 prov:specializationOf :bob . 

# This is inferred from prov:specializationOf
tool:bob-2011-11-17 prov:alternateOf      :bob . 

4. XML Schema for Mention

Type definition in XML Schema:

<xs:complexType xmlns:xs="http://www.w3.org/2001/XMLSchema" name="Mention">
  <xs:sequence>
    <xs:element name="specificEntity" type="prov:EntityRef"/>
    <xs:element name="generalEntity" type="prov:EntityRef"/>
    <xs:element name="bundle" type="prov:EntityRef"/>
  </xs:sequence>
</xs:complexType>

Usage in XML:

<xs:element xmlns:xs="http://www.w3.org/2001/XMLSchema" name="mentionOf" type="prov:Mention"/>
Example 4
<prov:document
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:prov="http://www.w3.org/ns/prov#"
    xmlns:ex="http://example.com/ns/ex#"
    xmlns:perf="http://example.com/ns/perf#"
    xmlns:tool="http://example.com/ns/tool#">

  <prov:bundle prov:id="ex:run1">
    <prov:activity prov:id="ex:a1">
      <prov:startTime>2011-11-16T16:00:00</prov:startTime>
      <prov:endTime>2011-11-16T17:00:00</prov:endTime>
    </prov:activity>

    <prov:wasAssociatedWith>
      <prov:activity prov:ref="ex:a1" />
      <prov:agent prov:ref="ex:Bob" />
      <prov:role xsi:type="xsd:QName">controller</prov:role>
    </prov:wasAssociatedWith>
  </prov:bundle>

  <prov:bundle prov:id="ex:run2">
    <prov:activity prov:id="ex:a2">
      <prov:startTime>2011-11-17T10:00:00</prov:startTime>
      <prov:endTime>2011-11-17T17:00:00</prov:endTime>
    </prov:activity>

    <prov:wasAssociatedWith>
      <prov:activity prov:ref="ex:a2" />
      <prov:agent prov:ref="ex:Bob" />
      <prov:role xsi:type="xsd:QName">controller</prov:role>
    </prov:wasAssociatedWith>
  </prov:bundle>

  <prov:bundle prov:id="tool:analysis01">
    <prov:agent prov:id="tool:Bob-2011-11-16">
      <ex:perfrating>good</ex:perfrating>
    </prov:agent>

    <prov:mentionOf>
      <prov:specificEntity prov:ref="tool:Bob-2011-11-16" />
      <prov:generalEntity prov:ref="ex:Bob" />
      <prov:bundle prov:ref="ex:run1" />
    </prov:mentionOf>

    <prov:agent prov:id="tool:Bob-2011-11-17">
      <ex:perfrating>bad</ex:perfrating>
    </prov:agent>

    <prov:mentionOf>
      <prov:specificEntity prov:ref="tool:Bob-2011-11-17" />
      <prov:generalEntity prov:ref="ex:Bob" />
      <prov:bundle prov:ref="ex:run2" />
    </prov:mentionOf>
  </prov:bundle>

</prov:document>

5. Constraints associated with Mention

If one entity is a mention of another in a bundle, then the former is also a specialization of the latter:

IF mentionOf(e2,e1,b) THEN specializationOf(e2,e1).

An entity can be the subject of at most one mention relation.

IF mentionOf(e, e1, b1) and mentionOf(e, e2, b2), THEN e1=e2 and b1=b2.

A. Acknowledgements

This document has been produced by the PROV Working Group, and its contents reflect extensive discussion within the Working Group as a whole. The editors extend special thanks to Ivan Herman (W3C/ERCIM).

Members of the PROV Working Group at the time of publication of this document were: Ilkay Altintas (Invited expert), Reza B'Far (Oracle Corporation), Khalid Belhajjame (University of Manchester), James Cheney (University of Edinburgh, School of Informatics), Sam Coppens (IBBT), David Corsar (University of Aberdeen, Computing Science), Stephen Cresswell (The National Archives), Tom De Nies (IBBT), Helena Deus (DERI Galway at the National University of Ireland, Galway, Ireland), Simon Dobson (Invited expert), Martin Doerr (Foundation for Research and Technology - Hellas(FORTH)), Kai Eckert (Invited expert), Jean-Pierre EVAIN (European Broadcasting Union, EBU-UER), James Frew (Invited expert), Irini Fundulaki (Foundation for Research and Technology - Hellas(FORTH)), Daniel Garijo (Universidad Politécnica de Madrid), Yolanda Gil (Invited expert), Ryan Golden (Oracle Corporation), Paul Groth (Vrije Universiteit), Olaf Hartig (Invited expert), David Hau (National Cancer Institute, NCI), Sandro Hawke (W3C/MIT), Jörn Hees (German Research Center for Artificial Intelligence (DFKI) Gmbh), Ivan Herman, (W3C/ERCIM), Ralph Hodgson (TopQuadrant), Hook Hua (Invited expert), Trung Dong Huynh (University of Southampton), Graham Klyne (University of Oxford), Michael Lang (Revelytix, Inc.), Timothy Lebo (Rensselaer Polytechnic Institute), James McCusker (Rensselaer Polytechnic Institute), Deborah McGuinness (Rensselaer Polytechnic Institute), Simon Miles (Invited expert), Paolo Missier (School of Computing Science, Newcastle university), Luc Moreau (University of Southampton), James Myers (Rensselaer Polytechnic Institute), Vinh Nguyen (Wright State University), Edoardo Pignotti (University of Aberdeen, Computing Science), Paulo da Silva Pinheiro (Rensselaer Polytechnic Institute), Carl Reed (Open Geospatial Consortium), Adam Retter (Invited Expert), Christine Runnegar (Invited expert), Satya Sahoo (Invited expert), David Schaengold (Revelytix, Inc.), Daniel Schutzer (FSTC, Financial Services Technology Consortium), Yogesh Simmhan (Invited expert), Stian Soiland-Reyes (University of Manchester), Eric Stephan (Pacific Northwest National Laboratory), Linda Stewart (The National Archives), Ed Summers (Library of Congress), Maria Theodoridou (Foundation for Research and Technology - Hellas(FORTH)), Ted Thibodeau (OpenLink Software Inc.), Curt Tilmes (National Aeronautics and Space Administration), Craig Trim (IBM Corporation), Stephan Zednik (Rensselaer Polytechnic Institute), Jun Zhao (University of Oxford), Yuting Zhao (University of Aberdeen, Computing Science).

B. References

B.1 Informative references

[PROV-AQ]
Graham Klyne; Paul Groth; eds. Provenance Access and Query. 19 June 2012, Working Draft. URL: http://www.w3.org/TR/2012/WD-prov-aq-20120619/
[PROV-CONSTRAINTS]
James Cheney; Paolo Missier; Luc Moreau; eds. Constraints of the PROV Data Model. 11 December 2012, W3C Candidate Recommendation. URL: http://www.w3.org/TR/2012/CR-prov-constraints-20121211/
[PROV-DM]
Luc Moreau; Paolo Missier; eds. PROV-DM: The PROV Data Model. 11 December 2012, W3C Candidate Recommendation. URL: http://www.w3.org/TR/2012/CR-prov-dm-20121211/
[PROV-N]
Luc Moreau; Paolo Missier; eds. PROV-N: The Provenance Notation. 11 December 2012, W3C Candidate Recommendation. URL: http://www.w3.org/TR/2012/CR-prov-n-20121211/
[PROV-O]
Timothy Lebo; Satya Sahoo; Deborah McGuinness; eds. PROV-O: The PROV Ontology. 11 December 2012, W3C Candidate Recommendation. URL: http://www.w3.org/TR/2012/CR-prov-o-20121211/
[PROV-OVERVIEW]
Paul Groth; Luc Moreau; eds. PROV-OVERVIEW: An Overview of the PROV Family of Documents. 11 December 2012, Working Draft. URL: http://www.w3.org/TR/2012/WD-prov-overview-20121211/
[PROV-PRIMER]
Yolanda Gil; Simon Miles; eds. PROV Model Primer. 11 December 2012, Working Draft. URL: http://www.w3.org/TR/2012/WD-prov-primer-20121211/
[PROV-XML]
Hook Hua; Curt Tilmes; Stephan Zednik; eds. PROV-XML: The PROV XML Schema. 11 December 2012, Working Draft. URL: http://www.w3.org/TR/2012/WD-prov-xml-20121211/