- Why Provenance?
- The W3C Working Group on Provenance
- A Quick Introduction to the Data Model
- An Overview of the Specifications
- We should be able to express special “meta” information on the data
- who played what role in creating the data (author, reviewer, etc.)
- view of the full revision chain of the data
- in case of integrated data which part comes from which original data and under what process
- what vocabularies/ontologies/rules were used to generate some portions of the data
A Definition of Provenance
Lots of application areas need provenance
- Open Information Systems
- origin of the data, who was responsible for its creation
- Science applications
- how the results were obtained
origins and references of blogs, news items
- licensing attribution of documents, data
- privacy information
Provenance is not a new subject
- There has been lot of work around
- workflow systems
- knowledge representation
- information retrieval
- There are communities and vocabularies out there
- Open Provenance Model (OPM)
- Dublin Core
- Provenir ontology
- Provenance vocabulary
- SWAN provenance ontology
Need to Interchange Provenance
The idea that a single way of representing and collecting provenance could be adopted internally by all systems does not seem to be realistic today.
Instead, a pragmatic approach is to
consider a core data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model and exchanged between systems.
Heterogeneous systems can then export their provenance into such a core data model, and applications that need to make sense of provenance in heterogeneous systems can then import it,
process it, and reason over it.
Thus, the vision is that different provenance-aware systems natively adopt their own model for representing their provenance, but a core provenance data model can be readily adopted as a
provenance interchange model across such systems.
- DERI Galway
- European Broadcasting Union
- Financial Services Technology Consortium
- Library of Congress
- Mayo Clinic
- Open Geospatial Consortium
- OpenLink Software
- Pacific Northwest National Laboratory
- Rensselaer Polytechnic Institute
- Revelytix, Inc
- Newcastle University
- The National Archives
- Universidad Politecnica de Madrid
- University of Aberdeen
- University of Edinburgh
- University of Manchester
- University of Oxford
- University of Southampton
- VU University Amsterdam
- Wright State University
What is PROV?
- PROV is a family of specifications that help define how to interchange provenance
- PROV-DM, the PROV data model for provenance
- PROV-CONSTRAINTS, a set of constraints applying to the PROV data model;
- PROV-O, the PROV ontology, an OWL2 ontology allowing the mapping of PROV to RDF];
- PROV-N, a notation for provenance aimed at human consumption ;
- PROV-AQ, the mechanisms for accessing and querying provenance
- PROV-PRIMER, a primer for the PROV data model
Where should I start?
- The prov-primer provides an overview and introduction to the data model.
- It covers the basic components of the data model from three different perspectives.
- A worked example of using provenance in online news publication.
- The examples cover using prov in RDF Turtle and the PROV-N notation.
PROV Starting Points
Some Other Concepts
PROV contains a variety of other concepts that help express rich provenance
Relation to Dublin Core
- Dublin Core and PROV can be mapped together
- In some areas, PROV expands on Dublin Core allowing for richer provenance
- PROV also adds information that cannot be expressed in Dublin Core.
- A Mappings working draft is at:
- There are already a number of implementations using PROV
- See http://www.w3.org/2011/prov/wiki/ProvImplementations
- We encourage you to mark-up your data using PROV and write software that outputs PROV
Help us make a provenance-aware Web!
- The Rec Track documents are almost in CR
- Plan is to finish the work in March 2013