Added perspectives on provenance
authorSimon Miles <simon.miles@kcl.ac.uk>
Wed, 05 Oct 2011 18:12:41 +0100
changeset 516 e3ac35e2969f
parent 414 428dcafc38d4
child 517 3e885c4ed0ee
Added perspectives on provenance
Changed PIDM to Prov-DM
primer/Primer.html
--- a/primer/Primer.html	Thu Sep 29 13:54:30 2011 +0100
+++ b/primer/Primer.html	Wed Oct 05 18:12:41 2011 +0100
@@ -1,285 +1,334 @@
 <!DOCTYPE html>
 <html><head> 
-        <title>Prov Model Primer</title>
-        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-        <!--
-          === NOTA BENE ===
-          For the three scripts below, if your spec resides on dev.w3 you can check them
-          out in the same tree and use relative links so that they'll work offline,
-        -->
-        <!-- PM -->
-        <style type="text/css">
-            .note { font-size:small; margin-left:50px }
-        </style>
-
-        <script src="http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js" class="remove"></script>
-
-        <script class="remove">
-            var respecConfig = {
-                // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
-                specStatus:           "ED",
-          
-                // the specification's short name, as in http://www.w3.org/TR/short-name/
-                shortName:            "Prov-Primer",
- 
-                // if your specification has a subtitle that goes below the main
-                // formal title, define it here
-                subtitle   :  "Initial draft for internal discussion",
- 
-                // if you wish the publication date to be other than today, set this
-                // publishDate:  "2009-08-06",
- 
-                // if the specification's copyright date is a range of years, specify
-                // the start date here:
-                // copyrightStart: "2005"
- 
-                // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
-                // and its maturity status
-                // previousPublishDate:  "1977-03-15",
-                // previousMaturity:  "WD",
- 
-                // if there a publicly available Editor's Draft, this is the link
-                edDraftURI:           "http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html",
- 
-                // if this is a LCWD, uncomment and set the end of its review period
-                // lcEnd: "2009-08-05",
- 
-                // if you want to have extra CSS, append them to this list
-                // it is recommended that the respec.css stylesheet be kept
-                extraCSS:             ["http://dev.w3.org/2009/dap/ReSpec.js/css/respec.css", "./extra.css"],
- 
-                // editors, add as many as you like
-                // only "name" is required
-                editors:  [
-                    { name: "Yolanda Gil", url: "http://www.isi.edu/~gil/",
-                        company: "Information Sciences Institute, University of Southern California, US" },
-                    { name: "Simon Miles", url: "http://www.inf.kcl.ac.uk/~simonm",
-                        company: "King's College London, UK" },
-                ],
- 
-                // authors, add as many as you like.
-                // This is optional, uncomment if you have authors as well as editors.
-                // only "name" is required. Same format as editors.
- 
-                authors:  [
-                    { name: "TBD"},
-                ],
-          
-                // name of the WG
-                wg:           "Provenance Working Group",
-          
-                // URI of the public WG page
-                wgURI:        "http://www.w3.org/2011/prov/wiki/Main_Page",
-          
-                // name (with the @w3c.org) of the public mailing to which comments are due
-                wgPublicList: "public-prov-wg",
-          
-                // URI of the patent status for this WG, for Rec-track documents
-                // !!!! IMPORTANT !!!!
-                // This is important for Rec-track documents, do not copy a patent URI from a random
-                // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
-                // Team Contact.
-                wgPatentURI:  "",
-            };
-        </script>
-    </head>
-    <body>
-        <section id="abstract">
-            <p>This document aims to provide an intuitive guide to the Prov Data Model,
-                with worked examples.</p>
-
-            <p>
-                This is a document for internal discussion, which will ultimately
-                evolve in the first Public Working Draft of the Primer.</p>
-        </section> 
-
-        <section> 
-            <h2>Introduction</h2>
-            <p>The Prov Data Model (Prov-DM) is used to describe the provenance of things, i.e.
-                how something came to be, from what sources, its history, etc. As such, Prov-DM data consists
-                of assertions about the past. These assertions are not assessments, e.g. as to something's
-                authenticity, but the plain facts from which such assessments might be derived.</p>
-
-            <p>This guide aims to ease the adoption of the standard by providing:</p>
-            <ul>
-                <li>An intuitive explanation of how Prov-DM models provenance.</li>
-                <li>Worked examples that can be followed to produce your own Prov-DM data.</li>
-                <li>Answers to frequently asked questions regarding how the model should be applied.</li>
-            </ul>
-        </section>
-
-        <section>
-            <h2>Intuitive overview of Prov-DM</h2>
-
-            <p><i>This section provides an intuitive explanation of the concepts in Prov-DM.
-                    As with the rest of this document, it should be treated as a starting point for understanding the model, and not normative in itself.
-                    The model specification provides the precise definitions and constraints to be followed in using Prov-DM.</i></p>
-
-            <section>
-                <h2>Provenance</h2>
-
-                <p>Provenance has many meanings depending on what one is interested with regards to the object or resource in question.  Different people may have different perspectives, focusing on different types of information that might be captured in a provenance record.</p>
-
-                <p>One perspective might focus on entity-centered provenance, that is, what entities were involved in generating or manipulating the information in question.  Examples of entities include author, editor, publisher, curator, etc.</p>
-
-                <p>A second perspective might be one to focus on document-centered provenance, by tracing the origins of portions of a document to other documents. An example is referring to other news sources, quoting statistics from reports by some government or non-government agencies, etc.</p>
+  <title>Prov Model Primer</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <!--
+    === NOTA BENE ===
+    For the three scripts below, if your spec resides on dev.w3 you can check them
+    out in the same tree and use relative links so that they'll work offline,
+  -->
+  <!-- PM -->
+  <style type="text/css">
+   .note { font-size:small; margin-left:50px }
+  </style>
 
-                <p>A third perspective one might take is on process-centered provenance, capturing the actions and steps taken to generate the information in question.   (e.g., a data transformation, an edit, etc.).  An example is the records of execution of processes as workflows of web services.</p>
-
-            </section>
-
-                    
-            <section>
-                <h3>Provenance as data</h3>
-                <p>Describes a common pattern of production of Prov-DM data, e.g. the asserter being software
-                    enacting the process that it is asserting about, to clarify who might be using the model
-                    and in what context.</p>
-            </section>
-
-            <section>
-                <h3>Prov-DM perspective on the world</h3>
-
-                <p>A brief and intuitive description of the way of thinking about the world when modelling it in Prov-DM.
-                    In particular, this should contrast with other things users of provenance models may commonly have in mind,
-                    such as Dublin Core-style attribution metadata or lab management system logs of closed, well-defined experiments.</p>
-
-                <p>This will introduce things/entities (explained in more detail below), activities/process executions, agents,
-                    was generated by, used, was controlled by, and was generated by.
-                    The explanation will be on what these model in the world, not the data structure contents.
-                </p>
-            </section>
-
-            <section>
-
-                <h3>Entities, attributes and perspectives</h3>
-
-                <p>An intuitive overview of how to think about entities and their defining attributes in Prov-DM.</p>
-
-            </section>
-
-            <section>
-                <h3>Roles</h3>
-
-                <p>An intuitive description of how the roles of entities in processes are expressed.</p>
-
-            </section>
-
-            <section>
-                <h3>Serialisation in RDF</h3>
-
-                <p>A brief introduction to the formal model used to illustrate the examples below.</p>
-            </section>
-        </section>
-
-        <section>
-            <h2>Worked Examples</h2>
-
-            <p>In the following sections, we show how Prov-DM can be used to model 
-                provenance in specific examples.</p>
-
-            <p>We include examples of how the formal ontology 
-                can be used to represent the Prov-DM assertions as RDF triples.
-                These are shown using the Turtle notation. In 
-                the latter depictions, the namespace prefix <b>po</b> denotes 
-                terms from the Prov ontology, while <b>ex1</b>, <b>ex2</b>, etc. 
-                denote terms specific to the example.</p>
+  <script src="http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js" class="remove"></script>
 
-            <p>We also provide a representation of the examples in the Abstract
-                Syntax Model used in the conceptual model document. The full ASM data is
-                included in the appendix.</p>
-
-            <section>
-                <h3>News article example, part 1</h3>
-
-                <p>Charlie has published an article he has written as a web page,
-                    and wishes to make available Prov-DM data describing the history of that page.
-                    He considers expressing that he was the article's publisher and that its content draws on a data set made available by the government.</p>
-
-                <p>First, he identifies the entities, agents, and process executions present in the scenario.
-                    The key fact is that the article was published, so he identifies this as a process execution.
-                    By modelling what occurred this way, he can distinguish the article before and after publication.
-                    Therefore, there are three entities: the unpublished article, the published web page,
-                    and the government data set on which the article was based.
-                    Finally, there is one agent that both wrote and publishes the article, himself.</p>
-
-                <p>If encoded using the Prov ontology, we create identifiers for each, and declare their type:</p>
-                <blockquote>
-                    ex1:webpage1  a  prov:Entity .<br/>
-                    ex1:unpublishedArticle1  a  prov:Entity .<br/>
-                    ex1:dataSet1  a  prov:Entity .<br/>
-                    ex1:published1  a  prov:ProcessExecution .<br/>
-                    ex1:charlie  a  prov:Agent .<br/>
-                </blockquote>
+  <script class="remove">
+   var respecConfig = {
+    // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
+    specStatus:           "ED",
+          
+    // the specification's short name, as in http://www.w3.org/TR/short-name/
+    shortName:            "Prov-Primer",
+ 
+    // if your specification has a subtitle that goes below the main
+    // formal title, define it here
+    subtitle   :  "Initial draft for internal discussion",
+ 
+    // if you wish the publication date to be other than today, set this
+    // publishDate:  "2009-08-06",
+ 
+    // if the specification's copyright date is a range of years, specify
+    // the start date here:
+    // copyrightStart: "2005"
+ 
+    // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
+    // and its maturity status
+    // previousPublishDate:  "1977-03-15",
+    // previousMaturity:  "WD",
+ 
+    // if there a publicly available Editor's Draft, this is the link
+    edDraftURI:           "http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html",
+ 
+    // if this is a LCWD, uncomment and set the end of its review period
+    // lcEnd: "2009-08-05",
+ 
+    // if you want to have extra CSS, append them to this list
+    // it is recommended that the respec.css stylesheet be kept
+    extraCSS:             ["http://dev.w3.org/2009/dap/ReSpec.js/css/respec.css", "./extra.css"],
+ 
+    // editors, add as many as you like
+    // only "name" is required
+    editors:  [
+     { name: "Yolanda Gil", url: "http://www.isi.edu/~gil/",
+      company: "Information Sciences Institute, University of Southern California, US" },
+     { name: "Simon Miles", url: "http://www.inf.kcl.ac.uk/~simonm",
+      company: "King's College London, UK" },
+    ],
+ 
+    // authors, add as many as you like.
+    // This is optional, uncomment if you have authors as well as editors.
+    // only "name" is required. Same format as editors.
+ 
+    authors:  [
+     { name: "TBD"},
+    ],
+          
+    // name of the WG
+    wg:           "Provenance Working Group",
+          
+    // URI of the public WG page
+    wgURI:        "http://www.w3.org/2011/prov/wiki/Main_Page",
+          
+    // name (with the @w3c.org) of the public mailing to which comments are due
+    wgPublicList: "public-prov-wg",
+          
+    // URI of the patent status for this WG, for Rec-track documents
+    // !!!! IMPORTANT !!!!
+    // This is important for Rec-track documents, do not copy a patent URI from a random
+    // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
+    // Team Contact.
+    wgPatentURI:  "",
+   };
+  </script>
+ </head>
+ <body>
+  <section id="abstract">
+   <p>This document aims to provide an intuitive guide to the Prov Data Model,
+    with worked examples.</p>
 
-                <p>The entities, which includes the agent, are distinguished by their attributes.
-                    The article, and so the webpage, is entitled 'Crime wave in London',
-                    while the data set is called 'London crime statistics'.
-                    These attributes, part of the identity of the entities, are asserted as follows.</p>
-                <blockquote>
-                    ex1:webpage1  dc:title  "Crime wave in London" .<br/>
-                    ex1:unpublishedArticle1  dc:title  "Crime wave in London" .<br/>
-                    ex1:charlie  foaf:givenName  "Charlie" .<br/>
-                </blockquote>
+   <p>
+    This is a document for internal discussion, which will ultimately
+    evolve in the first Public Working Draft of the Primer.</p>
+  </section> 
 
-                <p>So far, there is no connection asserted between the entities, execution and agent,
-                    so the provenance of the webpage has not yet been expressed.
-                    To assert that the webpage is the result of publishing the webpage,
-                    Charlie makes two assertions: that the publication process execution used the
-                    unpublished article, and that it generated the published webpage.
-                    He expresses the fact that the article draws on, was derived from, the data set.
-                    Finally, he asserts that he published the article, i.e. controlled the publishing process execution.</p>
-                <blockquote>
-                    ex1:webpage1  prov:wasGeneratedBy  ex1:published1 .<br/>
-                    ex1:published1  prov:used  ex1:unpublishedArticle1 .<br/>
-                    ex1:unpublishedArticle  prov:wasGeneratedBy  ex1:wrote1 .<br/>
-                    ex1:unpublishedArticle1  prov:wasDerivedFrom  ex1:dataSet1 .<br/>
-                    ex1:published1  prov:wasControlledBy  ex1:charlie .<br/>
-                </blockquote>
+  <section> 
+   <h2>Introduction</h2>
+   <p>The Prov Data Model (Prov-DM) is used to describe the provenance of things, i.e.
+    how something came to be, from what sources, its history, etc. As such, Prov-DM data consists
+    of assertions about the past. These assertions are not assessments, e.g. as to something's
+    authenticity, but the plain facts from which such assessments might be derived.</p>
 
-            </section><section>
-                <h3>News article example, part 2</h3>
-
-                <p>Include roles, e.g. the role of the government data set vs the role of the unpublished article.</p>
-            </section><section>
+   <p>This guide aims to ease the adoption of the standard by providing:</p>
+   <ul>
+    <li>An intuitive explanation of how Prov-DM models provenance.</li>
+    <li>Worked examples that can be followed to produce your own Prov-DM data.</li>
+    <li>Answers to frequently asked questions regarding how the model should be applied.</li>
+   </ul>
 
-                <h3>News article example, part 3</h3>
+   <section>
+    <h3>Provenance</h3>
 
-                <p>Include multiple perspectives on an entity, e.g. "the article" versus "version 1 of the article".</p>
+    <p>Provenance has many meanings depending on what one is interested with regards to the object or resource in question.  Different people may have different perspectives, focusing on different types of information that might be captured in a provenance record.</p>
 
-            </section>
-            <section>
-                <h3>A File Scenario</h3>
+    <p>One perspective might focus on entity-centered provenance, that is, what entities were involved in generating or manipulating the information in question.  Examples of entities include author, editor, publisher, curator, etc.</p>
+
+    <p>A second perspective might be one to focus on document-centered provenance, by tracing the origins of portions of a document to other documents. An example is referring to other news sources, quoting statistics from reports by some government or non-government agencies, etc.</p>
+
+    <p>A third perspective one might take is on process-centered provenance, capturing the actions and steps taken to generate the information in question.   (e.g., a data transformation, an edit, etc.).  An example is the records of execution of processes as workflows of web services.</p>
+
+   </section>
 
 
-                <p>A large complete example, taken from conceptual model document.</p>
-
-                <section>
-                    <h3>Graphical Illustration</h3>
-
-                    Provenance assertions can be illustrated as a graph.
-                    Details about the graphical illustration can be found in <a href="#illustration-convention">appendix</a>.
-
-                    <img src="example-graphical.png"/>
-                    <p/>
-                    <img src="timeline.png"/>
-                </section>
-
-            </section>
-        </section>
+   <section>
+    <h3>Provenance as data</h3>
+    <p>Describes a common pattern of production of Prov-DM data, e.g. the asserter being software
+     enacting the process that it is asserting about, to clarify who might be using the model
+     and in what context.</p>
+   </section>
+  </section>
 
-        <section>
-            <h2>Frequently asked questions</h2>
-        </section>
+  <section>
+   <h2>Intuitive overview of Prov-DM</h2>
 
-        <section>
-            <h2>Abstract Syntax Notation for Examples</h2>
-        </section>
-        
-        <section class="appendix">
-            <h2>Acknowledgements</h2>
-            <p>
-                WG membership to be listed here.
-            </p>
-        </section>
+   <p><i>This section provides an intuitive explanation of the concepts in Prov-DM. 
+     As with the rest of this document, it should be treated as a starting point for
+     understanding the model, and not normative in itself. The model specification
+     provides the precise definitions and constraints to be followed in using Prov-DM.</i></p>
 
-    </body></html>
+   <section>
+    <h3>Entities</h3>
+
+    <p>An intuitive overview of how to think about entities and their characterising attributes in Prov-DM.</p>
+   </section>
+
+   <section>
+    <h3>Process Executions</h3>
+
+    <p>An intuitive overview of how to think about provenance executions in Prov-DM.</p>
+   </section>
+
+   <section>
+    <h3>Used and WasGeneratedBy</h3>
+
+    <p>An intuitive overview of how to think about use and generation events in Prov-DM.</p>
+   </section>
+
+   <section>
+    <h3>Agents</h3>
+
+    <p>An intuitive overview of how to think about agents in Prov-DM.</p>
+   </section>
+
+   <section>
+    <h3>Accounts</h3>
+
+    <p>An intuitive overview of how to think about use and generation events in Prov-DM.</p>
+   </section>
+
+   <section>
+    <h3>Roles</h3>
+
+    <p>An intuitive description of how the roles of entities in processes are expressed.</p>
+   </section>
+
+   <section>
+    <h3>Revision</h3>
+
+    <p>An intuitive overview of how to think about revision relations in Prov-DM.</p>
+   </section>
+
+   <section>
+    <h3>Complementarity</h3>
+
+    <p>An intuitive overview of how to think about complementarity in Prov-DM.</p>
+   </section>
+
+   <section>
+    <h3>Derivation</h3>
+
+    <p>An intuitive overview of how to think about the different kinds of derivation relation in Prov-DM.</p>
+   </section>
+  </section>
+
+  <section>
+   <h2>Worked Examples</h2>
+
+   <p>In the following sections, we show how Prov-DM can be used to model 
+    provenance in specific examples.</p>
+
+   <p>We include examples of how the formal ontology 
+    can be used to represent the Prov-DM assertions as RDF triples.
+    These are shown using the Turtle notation. In 
+    the latter depictions, the namespace prefix <b>po</b> denotes 
+    terms from the Prov ontology, while <b>ex1</b>, <b>ex2</b>, etc. 
+    denote terms specific to the example.</p>
+
+   <p>We also provide a representation of the examples in the Abstract
+    Syntax Model used in the conceptual model document. The full ASM data is
+    included in the appendix.</p>
+
+   <section>
+    <h3>Entities</h3>
+
+    <p>
+     An online newspaper publishes an article making using of data (GovData) provided through a government portal, in England. 
+     The article includes a chart based on GovData.
+     A blogger, Betty, looking at the chart, spots what she thinks to be an error.
+     Betty retrieves the provenance of the chart, to determine from where the facts presented derive.
+    </p>
+    <p>The Prov data includes the assertions:</p>
+    <blockquote>
+     ex1:chart1 a prov:Entity .<br/>
+     ex1:dataSet1 a prov:Entity .<br/>
+    </blockquote>
+    <p>These statements, in order, assert that the chart (ex1:chart1)
+     is an entity, the data set (ex1:dataSet1) is an entity.</p>
+
+   </section>
+
+   <section>
+    <h3>Process Executions</h3>
+
+    <p>Further, the Prov data asserts that there was
+     a process execution (ex1:compiled) denoting the compilation of the
+     chart from the data set</p>
+    <blockquote>
+     ex1:compiled a prov:ProcessExecution .<br/>
+    </blockquote>
+   </section>
+
+   <section>
+    <h3>Used and WasGeneratedBy</h3>
+
+    <p>Finally, the Prov data asserts that the chart was generated by this compilation
+     process, the compilation process made use of GovData, and the chart was
+     derived from the data set (more on derivation below).</p>
+
+    <blockquote>
+     ex1:chart1 prov:wasGeneratedBy ex1:compiled .<br/>
+     ex1:compiled prov:used ex1:dataSet1 .<br/>
+     ex1:chart1 prov:wasDerivedFrom ex1:dataSet1 .<br/>
+    </blockquote>
+
+    <p>From this information Betty can see that
+     the mistake could have been in the original data set or else was introduced
+     in the compilation process, and sets out to discover which.</p>
+
+   </section>
+
+   <section>
+    <h3>Agents</h3>
+
+    <p><i>Suggested example:</i> Digging deeper, Betty wants to know who compiled
+     the chart. This turns out to be an independent analyst, Derek.</p>
+   </section>
+
+   <section>
+    <h3>Accounts</h3>
+
+    <p><i>Suggested example:</i> The analyst provides his own record of how he compiled GovData to create 
+     the chart, which provides more detail than in the newspaper's provenance data. 
+     Specifically, the analysts account separates compilation into two stages: aggregating 
+     data by region and then producing the graphic. Therefore, there are two separate 
+     accounts of the same events.</p>
+   </section>
+
+   <section>
+    <h3>Roles</h3>
+
+    <p><i>Suggested example:</i> For Betty to know where the error lies, she needs 
+     to understand what other information the compilation process was based on. The 
+     aggregation step of the process used a list of regions not present in the original 
+     data, but determined by the analyst. How does she distinguish the roles played by 
+     the two inputs to the aggregation process?</p>
+   </section>
+
+   <section>
+    <h3>Revision</h3>
+
+    <p><i>Suggested example:</i> After looking at the detail of the compilation process, there appears
+     to be nothing wrong, so Betty concludes the error is in GovData. She contacts
+     the government, and a new version of GovData is created. How does the provenance
+     document that the new version is a revision of the old version?</p>
+   </section>
+
+   <section>
+    <h3>Complementarity</h3>
+
+    <p><i>Suggested example:</i> Betty lets Derek know that a new version of the data set exists,
+     and he looks at the provenance of the new data to understand what he needs to
+     reanalyse. When understanding how the new data differs from the old, how does he
+     interpret the relation of the two versions and GovData independent of version?</p>
+   </section>
+
+   <section>
+    <h3>Derivation</h3>
+
+    <p><i>Suggested example:</i> Derek creates a new chart based on the revised data, 
+     using the same compilation process as before. Betty checks the article again at a
+     later point, and wants to know if it is based on the old or new GovData. The newspaper's
+     provenance data says that the article is "derived from" the updated GovData, while the
+     analyst's provenance data says it is "eventually derived from" the same. How should she
+     interpret this?</p>
+   </section>
+
+  </section>
+
+  <section>
+   <h2>Frequently asked questions</h2>
+  </section>
+
+  <section class="appendix">
+   <h2>Abstract Syntax Notation for Examples</h2>
+  </section>
+
+  <section class="appendix">
+   <h2>Acknowledgements</h2>
+   <p>
+    WG membership to be listed here.
+   </p>
+  </section>
+
+ </body></html>