Include Pragmatic Provenance
authorBoris Villazon-Terrazas <bvillazon@fi.upm.es>
Wed, 07 Mar 2012 19:16:58 +0100
changeset 113 f563a941aea6
parent 112 9c926a1308b0
child 115 28b2ee4a6f27
Include Pragmatic Provenance
bp/index.html
--- a/bp/index.html	Fri Mar 02 19:35:36 2012 +0100
+++ b/bp/index.html	Wed Mar 07 19:16:58 2012 +0100
@@ -789,12 +789,99 @@
 <!--    Pragmatic Provenance  -->
 <!-- Note to Editors: This section is not part of our charter and probably will be folded into another section.  Yet to be determined. -->
 
-<section>
-<h3>TBD - Pragmatic Provenance - Boris</h3>
+<section><!-- << Pragmatic Provenance -->
+<h3>Pragmatic Provenance - Boris</h3>
 <p class='responsible'>John Erickson (RPI)</p>
-<p class="todo">Integrate Wiki <a href="http://www.w3.org/2011/gld/wiki/228_Best_Practices_Pragmatic_Provenance">content</a>. 
+
+<p>Provide best practice recommendations for stakeholders on documenting the provenance of their linked government data and how to interpret that data so that consumers know what they are looking at.</p>
+
+<section><h4>Background</h4>
+<p>In 1997 Tim Berners-Lee called for pervasive provenance on the Web:</p>
+<p class="highlight">
+<i>At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons.</i>
+</p>
+<p>W3C GLD therefore seeks to recommend practices that enable government providers to create the metadata necessary to answer their users' "oh yeah?" questions about the linked data they publish. Our recommendations may include processes as well as the application of specific vocabularies/ontologies.
+</p>
+
+<section><h3>What do we mean by "Provenance?"</h3>
+<p>The W3C's Provenance Incubator Group (2010) <a href="http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#What_is_provenance" target="_blank">provides</a> this simple definition of provenance:</p>
+<p class="highlight">Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource. Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility. Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.</p>
+<p>More recently the W3C Provenance WG (PROV-WG) defines "provenance" for their work:</p>
+<p class="highlight"><i>The <b>provenance</b> of digital objects represents their origins. The PROV Data Model (<a href="http://www.w3.org/TR/2012/WD-prov-primer-20120110/#bib-PROV-DM" target="_blank">PROV-DM</a>) is a proposed standard to represent provenance records, which contain assertions about the entities and activities involved in producing and delivering or otherwise influencing a given object. By knowing the provenance of an object, we can make determinations about how to use it. Provenance records can be used for many purposes, such as understanding how data was collected so it can be meaningfully used, determining ownership and rights over an object, making judgments about information to determine whether to trust it, verifying that the process and steps used to obtain a result complies with given requirements, and reproducing how something it was generated...As a standard for provenance, PROV-DM accommodates all those different uses of provenance. Different people may have different perspectives on provenance, and as a result different types of information might be captured in provenance records.</i></p>
+</section>	
+
+
+<section><h3>What do we mean by "Pragmatic Provenance?"</h3>
+<p>The W3C Government Linked Data WG accepts PROV WG's definition of provenance but recognizes that PROV-DM is a powerful tool. W3C GLD WG seeks to provide best practice recommendations that will be useful to government data stakeholders, that make sense for GLD use cases and are easily adopted by practitioners.</p>
+<p>W3C GLD could recommend a simple <b>provenance scoring system</b> for GLD analogous to TBL's 5 stars for linked data. Such a system might include:</p>
+<ul>
+	<li><b>One star:</b> Using the basic <a href="http://bit.ly/imvRX1" target="_blank">W3C DCAT</a> for Linked Data at the catalogs and dataset level</li>
+	<li><b>Two stars:</b> DCAT enhanced with more complete Dublin Core and other metadata</li>
+	<li><b>Three stars:</b> Above, but with based provenance metadata "within" the datasets</li>
+	<li><b>More stars:</b> More rigorous use of PROV DM</li>
+</ul>
 </section>
 
+<section><h3>Use cases for provenance in GLD</h3>
+<p>Provide use cases here...</p>
+<ul>
+	<li>Specifying catalog- and dataset-level provenance</li>
+	<li>Specifying provenance within datasets</li>
+	<ul>	
+		<li>Preserving and encoding pre-existing provenance data</li>
+		<li>Generating provenance when processing data (e.g. during the Linked Data creation process)</li>
+	</ul>
+</ul>
+<p>Possible organization of use cases (Adapted from <a href="http://bit.ly/wlKOEF" target="_blank">Trust and Linked Data</a>):</p>
+
+<ul>
+	<li>Simple "Oh Yeah?" scenario</li>
+	<ul>
+		<li>User retrieves a dataset, then clicks on “oh yeah” button, then site returns a provenance record</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Licensing scenario</li>
+	<ul>
+		<li>User retrieves dataset, then wants to check permission to use</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Referral scenario</li>
+	<ul>
+		<li>Site refers queries about provenance in terms of pointers to another site’s provenance facilities</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Repeated queries scenario</li>
+	<ul>
+		<li>Service repeatedly queries a site, wants provenance for all the answers</li>
+		<li>This is similar to PROV WG example, where user follows provenance record, asking follow-up questions based on previous answers</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Versioning scenario</li>
+	<ul>
+		<li>User retrieves a dataset, then wants to see its provenance, but the dataset has been updated in the original site (its provenance as well)</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Dynamic scenario</li>
+	<ul>
+		<li>User retrieves a resource that is dynamically created</li>
+	</ul>
+</ul>
+</section>
+
+
+
+</section> <!-- Pragmatic Provenance >> -->
+
 
 <!--    Epilogue: The Social Contract of a Linked Open Data Publisher   -->
 <!-- Note to Editors: This section is not part of our charter and probably will be folded into another section.  Yet to be determined. -->