Removed accounts section, added revision and derivation intuition, updated use and generation examples
authorSimon Miles <simon.miles@kcl.ac.uk>
Sun, 13 Nov 2011 15:29:27 +0000
changeset 885 5dd3df81a4b4
parent 884 700299077a9f
child 886 dd1c61167cbc
Removed accounts section, added revision and derivation intuition, updated use and generation examples
primer/Primer.html
--- a/primer/Primer.html	Sun Nov 13 12:15:47 2011 +0000
+++ b/primer/Primer.html	Sun Nov 13 15:29:27 2011 +0000
@@ -121,10 +121,10 @@
    </section>
 
 
-   <section>
+   <!-- section>
     <h3>Provenance as data</h3>
     <p>Explains the contexts in which the reader may see or create Prov-DM data.</p>
-   </section>
+   </section -->
   </section>
 
   <section>
@@ -207,11 +207,11 @@
      was used to create the graph.</p>
    </section>
 
-   <section>
+   <!--section>
     <h3>Accounts</h3>
 
     <p>An intuitive overview of how to think about accounts in Prov-DM.</p>
-   </section>
+   </section -->
 
    <section>
     <h3>Roles</h3>
@@ -226,7 +226,18 @@
    <section>
     <h3>Revision</h3>
 
-    <p>An intuitive overview of how to think about revision relations in Prov-DM.</p>
+    <p>
+     A single resource, such as a document, may go through multiple <i>revisions</i> (also called versions and
+     other comparable terms) over time. Between revisions, several changes may have
+     taken place to the resource, possibly controlled by different agents.
+     Each revision is, in Prov-DM terms, an entity, and Prov-DM allows one to assert the relation
+     between entities that one is a revision of another.
+    </p>
+    <p>
+     In some contexts, for one entity to be considered to be a new revision
+     of something represented by an earlier entity, may require it to be declared a
+     new revision by some agent, thus 'signing off' the changes since the prior revision.
+    </p>
    </section>
 
    <section>
@@ -291,7 +302,39 @@
    <section>
     <h3>Derivation</h3>
 
-    <p>An intuitive overview of how to think about the different kinds of derivation relation in Prov-DM.</p>
+    <p>
+     When one entity's existence, content, characteristics and so on are
+     at least partly due to another entity, then we say that the former is
+     derived from the latter. For example, one document may contain
+     material copied from another, a child is derived from his/her
+     ancestors, and a page displayed in a browser is derived from the same
+     page on the web server from which it was downloaded, as well as from
+     the designer's original sketches of what the page would look like.
+    </p>
+    <p>
+     There are different kinds of derivation expressible in Prov-DM.
+     Consider the case of the page in the browser above. It is derived from
+     the designer's sketch in the strictest sense, i.e. if the sketch had
+     been different so would the page. On the other hand, there are
+     entities that are part of the page's history but which did not inform
+     the content of that page, i.e. the page would have been the same even
+     if the earlier entity changed. For example, on creating the original
+     draft of the page, the designer may have included a banner image
+     saying "DRAFT - FOR REVIEW ONLY". This banner was not part of the
+     sketch, nor part of the published page downloaded to the browser, but
+     was part of the page's history, and while not affecting the browsed
+     page's content may have been a factor in its existence. Finally, in
+     some cases, we may be able to say not only that one entity was derived
+     from another, but also how it was derived, i.e. by what process
+     execution. For example, the page in the browser is derived from the
+     page on the web server because a download process sent the bytes of
+     the latter across an HTTP connection to the browser client.
+    </p>
+    <p>
+     In Prov-DM terms, we say that the page in the browser <i>was eventually
+      derived from</i> the sketch, <i>depended on</i> the banner image, and <i>was derived
+     from</i> the page on the web server due to the download process.
+    </p>
    </section>
   </section>
 
@@ -317,47 +360,99 @@
 
     <p>
      An online newspaper publishes an article making using of data (GovData) provided through a government portal, in England. 
-     The article includes a chart based on GovData.
+     The article includes a chart based on GovData, with data values aggregated by
+     regions of the country.
+    </p>
+    <p>
      A blogger, Betty, looking at the chart, spots what she thinks to be an error.
      Betty retrieves the provenance of the chart, to determine from where the facts presented derive.
     </p>
     <p>The Prov data includes the assertions:</p>
     <pre class="turtle example">
-     ex1:chart1   a prov:Entity .
-     ex1:dataSet1 a prov:Entity .
+     ex1:dataSet1   a prov:Entity .
+     ex1:aggregate1 a prov:Entity .
+     ex1:chart1     a prov:Entity .
     </pre>
-    <p>These statements, in order, assert that the chart (ex1:chart1)
-     is an entity, the data set (ex1:dataSet1) is an entity.</p>
+    <p>
+     These statements, in order, assert that the original data set is an entity (<code>ex1:dataSet1</code>),
+     the data aggregated by region is an entity (<<code>ex1:aggregate1</code>), and
+     the chart (ex1:chart1) is an entity.
+    </p>
 
    </section>
 
    <section>
     <h3>Process Executions</h3>
 
-    <p>Further, the Prov data asserts that there was
+    <p>
+     Further, the Prov data asserts that there was
      a process execution (ex1:compiled) denoting the compilation of the
-     chart from the data set</p>
+     chart from the data set.
+    </p>
     <pre class="turtle example">
      ex1:compiled a prov:ProcessExecution .
     </pre>
+    <p>
+     The provenance also includes reference to the steps involved in compilation,
+     aggregating the data by region and generating the chart graphic.
+    </p>
+    <pre class="turtle example">
+     ex1:aggregated a prov:ProcessExecution .
+     ex1:illustrated a prov:ProcessExecution .
+    </pre>
    </section>
 
    <section>
     <h3>Use and Generation</h3>
 
-    <p>Finally, the Prov data asserts that the chart was generated by this compilation
-     process, the compilation process made use of GovData, and the chart was
-     derived from the data set (more on derivation below).</p>
-
+    <p>
+     Finally, the Prov data asserts the key events that connected the above
+     entities and process executions, i.e. the use of an entity by a process,
+     or the generation of an entity by a process.
+    </p>
+    <p>
+     For example, the provenance declares the event (of type <code>prov:Usage</code>)
+     where the aggregation process execution used the GovData data set, and the event
+     (of type <code>prov:Generation</code>) where the same process execution generated
+     the data aggregated by region.
+    </p>
     <pre class="turtle example">
+     ex1:dataSet1Usage        a prov:Usage .
+     ex1:aggregate1Generation a prov:Generation .
+    </pre>
+    <p>
+     To describe these events, the provenance says within which process execution
+     they occur and what entity is used or generated.
+    </p>
+    <pre class="turtle example">
+     ex1:aggregated prov:qualifiedUsage      ex1:dataSet1Usage .
+     ex1:aggregated prov:qualifiedGeneration ex1:aggregate1Generation .
+     ex1:dataSet1Usage        prov:entity ex1:dataSet1 .
+     ex1:aggregate1Generation prov:entity ex1:aggregate1 .
+    </pre>
+    <p>
+     Comparable events are described for the process of generating the chart image
+     from the aggregated data.
+    </p>
+    <pre class="turtle example">
+     ex1:aggregate1Usage  a prov:Usage .
+     ex1:chart1Generation a prov:Generation .
+     ex1:illustrated prov:qualifiedUsage      ex1:aggregate1Usage .
+     ex1:illustrated prov:qualifiedGeneration ex1:chart1Generation .
+     ex1:aggregate1Usage  prov:entity ex1:aggregate1 .
+     ex1:chart1Generation prov:entity ex1:chart1 .
+    </pre>
+    
+    <!--pre class="turtle example">
      ex1:chart1   prov:wasGeneratedBy ex1:compiled .
      ex1:compiled prov:used           ex1:dataSet1 .
      ex1:chart1   prov:wasDerivedFrom ex1:dataSet1 .
-    </pre>
-
-    <p>From this information Betty can see that
+    </pre -->
+    <p>
+     From this information Betty can see that
      the mistake could have been in the original data set or else was introduced
-     in the compilation process, and sets out to discover which.</p>
+     in the compilation process, and sets out to discover which.
+    </p>
 
    </section>
 
@@ -368,7 +463,7 @@
      the chart. This turns out to be an independent analyst, Derek.</p>
    </section>
 
-   <section>
+   <!-- section>
     <h3>Accounts</h3>
 
     <p><i>Suggested example:</i> The analyst provides his own record of how he compiled GovData to create 
@@ -376,7 +471,7 @@
      Specifically, the analysts account separates compilation into two stages: aggregating 
      data by region and then producing the graphic. Therefore, there are two separate 
      accounts of the same events.</p>
-   </section>
+   </section -->
 
    <section>
     <h3>Roles</h3>
@@ -391,10 +486,17 @@
    <section>
     <h3>Revision</h3>
 
-    <p><i>Suggested example:</i> After looking at the detail of the compilation process, there appears
+    <p>
+     After looking at the detail of the compilation process, there appears
      to be nothing wrong, so Betty concludes the error is in GovData. She contacts
-     the government, and a new revision of GovData is created. How does the provenance
-     document that the new revision is a revision of the old revision?</p>
+     the government, and a new version of GovData is created, declared to be the
+     next revision of the data by Edith. The provenance data now includes a statement
+     that the new data set, <code>ex1:dataSet2</code> is a new revision of the
+     old data set, <code>ex1:dataSet1</code>.
+    </p>
+    <pre class="turtle example">
+     ex1:dataSet2 prov:wasRevisionOf ex1:dataSet1 .
+    </pre>
    </section>
 
    <section>
@@ -565,7 +667,7 @@
      raw data has changed. He can't from this provenance assertion
      alone tell if that is merely from the addition of the post code
      NW2, or if data for the other post codes have changed as well.
-     Derek desides to redo the aggregation by region using
+     Derek decides to redo the aggregation by region using
      <code>ex1:dataSet2</code> and regenerate the
      graphics using the same layout.
     </p>
@@ -574,14 +676,29 @@
    <section>
     <h3>Derivation</h3>
 
-    <p><i>Suggested example:</i> Derek creates a new chart based on the revised data, 
+    <p>
+     Derek creates a new chart based on the revised data, 
      using the same compilation process as before. Betty checks the article again at a
-     later point, and wants to know if it is based on the old or new GovData. The newspaper's
-     provenance data says that the article is "derived from" the updated GovData, while the
-     analyst's provenance data says it is "eventually derived from" the same. How should she
-     interpret this?</p>
+     later point, and wants to know if it is based on the old or new GovData.
+     She sees three new assertions about derivation in the provenance data, plus
+     an assertion about how the new chart was generated.
+    </p>
+    <pre class="example turtle">
+     ex1:chart2 prov:dependedOn               ex1:dataSet2 .
+     ex1:chart2 prov:wasEventuallyDerivedFrom ex1:dataSet2 .
+     ex1:chart2 prov:wasDerivedFrom           ex1:dataSet2 .
+     ex1:chart2 prov:wasGeneratedBy           ex1:compiled2 .
+    </pre>
+    <p>
+     She interprets these assertions as follows. The first says that the new chart included,
+     somewhere in the history of its creation, the revised data set.
+     The second says further that the new chart is as it because of the revised
+     data set, i.e. there is an explicit influence of the data on the chart.
+     Finally, the third and fourth assertions together say further that it was
+     the process execution <code>ex1:compiled2</code> that derived the new chart
+     from the revised data set.
+    </p>
    </section>
-
   </section>
 
   <section>