Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. The PROV Family of Documents defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. This document provides an overview of this family of documents.
The Provenance Working Group encourages implementation of the specifications overviewed in this document. Although work on this document by the Provenance Working Group is complete, errors may be recorded in the errata and these may be addressed in future revisions.
The design of PROV stems from the recommendations of the Provenance Incubator Group ([[PROV-XG]]) which performed an extensive information gathering process including use case cataloging, requirements elicitation and a literature survey. From this process, 8 broad recommendations were defined. Summarizing, the report recommends that a provenance framework should support:
Figure 1 shows the organization of PROV and how the documents (roughly) depend on each other. The coloring scheme corresponds to the document roadmap below.
At its core is a conceptual data model (PROV-DM), which defines a common vocabulary used to describe provenance. This is instantiated by various serializations. These serializations are used by implementations to interchange provenance. To help developers and users express valid provenance, a set of constraints (PROV-Constraints) are defined, which can be used to implement provenance validators. This is complimented by a formal semantics (PROV-SEM). Finally, to further support the interchange of provenance, additional specifications are provided for protocols to locate and access provenance (PROV-AQ), connect bundles of provenance descriptions (PROV-Links), represent dictionary style collections (PROV-Dictionary) and define how to interoperate with the widely used Dublin Core vocabulary (PROV-DC).
In the table below and Figure 1, we denote whether the document is a W3C Recommendation or a Working Group Note. In Figure 1, bold bordered boxes signal a W3C Recommendation.
Part | Audience | Type | Document |
---|---|---|---|
1 | Users | Note | PROV-PRIMER is the entry point to PROV offering an introduction to the provenance data model. This is where you should start and for many may be the only document needed. |
2 | Developers | Rec | PROV-O defines a light-weight OWL2 ontology for the provenance data model. This is intended for the Linked Data and Semantic Web community. |
3 | Developers | Note | PROV-XML defines an XML schema for the provenance data model. This is intended for developers who need a native XML serialization of the PROV data model. |
4 | Advanced | Rec | PROV-DM defines a conceptual data model for provenance including UML diagrams. PROV-O, PROV-XML and PROV-N are serializations of this conceptual model. |
5 | Advanced | Rec | PROV-N defines a human-readable notation for the provenance model. This is used to provide examples within the conceptual model as well as used in the definition of PROV-CONSTRAINTS. |
6 | Advanced | Rec | PROV-CONSTRAINTS defines a set of constraints on the PROV data model that specifies a notion of valid provenance. It is specifically aimed at the implementors of validators. |
7 | Developers | Note | PROV-AQ defines how to use Web-based mechanisms to locate and retrieve provenance information. |
8 | Developers | Note | PROV-DC defines a mapping between Dublin Core and PROV-O. |
9 | Developers | Note | PROV-DICTIONARY defines constructs for expressing the provenance of dictionary style data structures. |
10 | Advanced | Note | PROV-SEM defines a declarative specification in terms of first-order logic of the PROV data model. |
11 | Advanced | Note | PROV-LINKS defines extensions to PROV to enable linking provenance information across bundles of provenance descriptions. |
This document has been produced by the PROV Working Group, and its contents reflect extensive discussion within the Working Group as a whole.
Members of the PROV Working Group at the time of publication of this document were: Ilkay Altintas (Invited expert), Reza B'Far (Oracle Corporation), Khalid Belhajjame (University of Manchester), James Cheney (University of Edinburgh, School of Informatics), Sam Coppens (iMinds - Ghent University), David Corsar (University of Aberdeen, Computing Science), Stephen Cresswell (The National Archives), Tom De Nies (iMinds - Ghent University), Helena Deus (DERI Galway at the National University of Ireland, Galway, Ireland), Simon Dobson (Invited expert), Martin Doerr (Foundation for Research and Technology - Hellas(FORTH)), Kai Eckert (Invited expert), Jean-Pierre EVAIN (European Broadcasting Union, EBU-UER), James Frew (Invited expert), Irini Fundulaki (Foundation for Research and Technology - Hellas(FORTH)), Daniel Garijo (Ontology Engineering Group, Universidad Politécnica de Madrid, Spain), Yolanda Gil (Invited expert), Ryan Golden (Oracle Corporation), Paul Groth (VU University Amsterdam), Olaf Hartig (Invited expert), David Hau (National Cancer Institute, NCI), Sandro Hawke (W3C/MIT), Jörn Hees (German Research Center for Artificial Intelligence (DFKI) Gmbh), Ivan Herman, (W3C/ERCIM), Ralph Hodgson (TopQuadrant), Hook Hua (Invited expert), Trung Dong Huynh (University of Southampton), Graham Klyne (University of Oxford), Michael Lang (Revelytix, Inc.), Timothy Lebo (Rensselaer Polytechnic Institute), James McCusker (Rensselaer Polytechnic Institute), Deborah McGuinness (Rensselaer Polytechnic Institute), Simon Miles (Invited expert), Paolo Missier (School of Computing Science, Newcastle university), Luc Moreau (University of Southampton), James Myers (Rensselaer Polytechnic Institute), Vinh Nguyen (Wright State University), Edoardo Pignotti (University of Aberdeen, Computing Science), Paulo da Silva Pinheiro (Rensselaer Polytechnic Institute), Carl Reed (Open Geospatial Consortium), Adam Retter (Invited Expert), Christine Runnegar (Invited expert), Satya Sahoo (Invited expert), David Schaengold (Revelytix, Inc.), Daniel Schutzer (FSTC, Financial Services Technology Consortium), Yogesh Simmhan (Invited expert), Stian Soiland-Reyes (University of Manchester), Eric Stephan (Pacific Northwest National Laboratory), Linda Stewart (The National Archives), Ed Summers (Library of Congress), Maria Theodoridou (Foundation for Research and Technology - Hellas(FORTH)), Ted Thibodeau (OpenLink Software Inc.), Curt Tilmes (National Aeronautics and Space Administration), Craig Trim (IBM Corporation), Stephan Zednik (Rensselaer Polytechnic Institute), Jun Zhao (University of Oxford), Yuting Zhao (University of Aberdeen, Computing Science).