W3C PROV - A Quick Introduction

W3C Prov Specifications - A Quick Introduction

The W3C Provenance Working Group

Outline

Why Provenance?
The W3C Working Group on Provenance
A Quick Introduction to the Data Model
An Overview of the Specifications

The Goal

We should be able to express special “meta” information on the data

who played what role in creating the data (author, reviewer, etc.)
view of the full revision chain of the data
in case of integrated data which part comes from which original data and under what process
what vocabularies/ontologies/rules were used to generate some portions of the data
etc.

A Definition of Provenance

Lots of application areas need provenance

Open Information Systems

origin of the data, who was responsible for its creation

Science applications

how the results were obtained

News
origins and references of blogs, news items

Law

licensing attribution of documents, data
privacy information

Etc.

Provenance is not a new subject

There has been lot of work around

workflow systems
databases
knowledge representation
information retrieval

There are communities and vocabularies out there

Open Provenance Model (OPM)
Dublin Core
Provenir ontology
Provenance vocabulary
SWAN provenance ontology
etc.

Need to Interchange Provenance

The idea that a single way of representing and collecting provenance could be adopted internally by all systems does not seem to be realistic today.

Instead, a pragmatic approach is to consider a core data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model and exchanged between systems.

Heterogeneous systems can then export their provenance into such a core data model, and applications that need to make sense of provenance in heterogeneous systems can then import it, process it, and reason over it.

Thus, the vision is that different provenance-aware systems natively adopt their own model for representing their provenance, but a core provenance data model can be readily adopted as a provenance interchange model across such systems.

Working Group Charter

http://lists.w3.org/Archives/Public/public-prov-wg/

Participants

DERI Galway
European Broadcasting Union
FORTH
Financial Services Technology Consortium
DFKI
IBBT
IBM
Library of Congress
Mayo Clinic
NASA
OCLC
Open Geospatial Consortium
OpenLink Software
Oracle

Pacific Northwest National Laboratory
Rensselaer Polytechnic Institute
Revelytix, Inc
Newcastle University
The National Archives
TopQuadrant
Universidad Politecnica de Madrid
University of Aberdeen
University of Edinburgh
University of Manchester
University of Oxford
University of Southampton
VU University Amsterdam
Wright State University

What is PROV?

PROV is a family of specifications that help define how to interchange provenance

PROV-DM, the PROV data model for provenance

PROV-CONSTRAINTS, a set of constraints applying to the PROV data model;

PROV-O, the PROV ontology, an OWL2 ontology allowing the mapping of PROV to RDF];

PROV-N, a notation for provenance aimed at human consumption ;

PROV-AQ, the mechanisms for accessing and querying provenance

PROV-PRIMER, a primer for the PROV data model

Where should I start?

The prov-primer provides an overview and introduction to the data model.
It covers the basic components of the data model from three different perspectives.
A worked example of using provenance in online news publication.
The examples cover using prov in RDF Turtle and the PROV-N notation.

PROV Starting Points

Some Other Concepts

PROV contains a variety of other concepts that help express rich provenance

Relation to Dublin Core

Dublin Core and PROV can be mapped together

In some areas, PROV expands on Dublin Core allowing for richer provenance
PROV also adds information that cannot be expressed in Dublin Core.
A Mappings working draft is at:

https://dvcs.w3.org/hg/prov/raw-file/6b795ed2e6c9/dc-note/Overview.html

Implementations

There are already a number of implementations using PROV

Vocabularies
Software

See http://www.w3.org/2011/prov/wiki/ProvImplementations
We encourage you to mark-up your data using PROV and write software that outputs PROV

Help us make a provenance-aware Web!

Conclusion:

The Rec Track documents are almost in CR
Plan is to finish the work in March 2013
Specifications:

prov-primer	http://www.w3.org/TR/prov-primer/
prov-o	http://www.w3.org/TR/prov-o/
prov-dm	http://www.w3.org/TR/prov-dm/
prov-constraints	http://www.w3.org/TR/prov-constraints/
prov-n	http://www.w3.org/TR/prov-n/
prov-aq	http://www.w3.org/TR/prov-aq/
prov-sem	work in progress
prov-xml	work in progress
best practice	PROV-DC mapping. work in progress