Merge with HEAD
authorGraham Klyne
Tue, 15 Nov 2011 14:08:24 +0000
changeset 894 1f9e090e2909
parent 893 3ccbbd31db5c (current diff)
parent 892 12c715b99fe6 (diff)
child 895 2d0d23794f80
Merge with HEAD
--- a/primer/Primer.html	Tue Nov 15 14:07:16 2011 +0000
+++ b/primer/Primer.html	Tue Nov 15 14:08:24 2011 +0000
@@ -85,7 +85,7 @@
  </head>
  <body>
   <section id="abstract">
-   <p>This document aims to provide an intuitive guide to the Prov Data Model,
+   <p>This document aims to provide an intuitive guide to the PROV Data Model,
     with worked examples.</p>
 
    <p>
@@ -95,80 +95,111 @@
 
   <section> 
    <h2>Introduction</h2>
-   <p>The Prov Data Model (Prov-DM) is used to describe the provenance of things, i.e.
-    how something came to be, from what sources, its history, etc. As such, Prov-DM data consists
-    of assertions about the past. These assertions are not assessments, e.g. as to something's
-    authenticity, but the plain facts from which such assessments might be derived.</p>
+   <p>
+    This primer document provides an accessible introduction to the PROV Data Model
+    (PROV-DM) standard for representing provenance on the Web.  Provenance describes
+    the origins of things, so PROV-DM data consists of assertions about the past.
+   </p>
 
-   <p>This guide aims to ease the adoption of the standard by providing:</p>
+   <p>
+    This primer document aims to ease the adoption of the standard by providing:
+   </p>
    <ul>
-    <li>An intuitive explanation of how Prov-DM models provenance.</li>
-    <li>Worked examples that can be followed to produce your own Prov-DM data.</li>
+    <li>An intuitive explanation of how PROV-DM models provenance.</li>
+    <li>Worked examples that can be followed to produce your own PROV-DM data.</li>
     <li>Answers to frequently asked questions regarding how the model should be applied.</li>
    </ul>
 
-   <section>
-    <h3>Provenance</h3>
-
-    <p>Provenance has many meanings depending on what one is interested with regards to the object or resource in question.  Different people may have different perspectives, focusing on different types of information that might be captured in a provenance record.</p>
-
-    <p>One perspective might focus on entity-centered provenance, that is, what entities were involved in generating or manipulating the information in question.  Examples of entities include author, editor, publisher, curator, etc.</p>
-
-    <p>A second perspective might be one to focus on document-centered provenance, by tracing the origins of portions of a document to other documents. An example is referring to other news sources, quoting statistics from reports by some government or non-government agencies, etc.</p>
+   <p>
+    The provenance of digital objects represents their origins.  The PROV-DM is a 
+    proposed standard to represent provenance records, which reflect the entities 
+    and activities involved in producing and delivering or otherwise influencing a 
+    given object.  By knowing the provenance of an object, we can make determinations 
+    about how to use it.  Provenance records can be used for many purposes, such as 
+    understanding how data was collected so it can be meaningfully used, determining 
+    ownership and rights over an object, making judgments about information to 
+    determine whether to trust it, verifying that the activity used to obtain a 
+    result complies with given requirements, and reproducing how something it was generated.
+   </p>
 
-    <p>A third perspective one might take is on process-centered provenance, capturing the actions and steps taken to generate the information in question.   (e.g., a data transformation, an edit, etc.).  An example is the records of execution of processes as workflows of web services.</p>
-
-   </section>
+   <p>
+    As a standard for provenance, PROV-DM accommodates all those different uses 
+    of provenance.  However, different people may have different perspectives on provenance, 
+    and as a result different types of information might be captured in a provenance record.  
+    One perspective might focus on <i>agent-centered provenance</i>, that is, what entities 
+    were involved in generating or manipulating the information in question.  For example, 
+    in the provenance of a picture in a news article we might capture the photographer who 
+    took it, the person that edited it, and the newspaper that published it. A second perspective 
+    might focus on <i>object-centered provenance</i>, by tracing the origins of portions of a 
+    document to other documents. An example is having a web page that was assembled from content
+    from a news article, quotes of interviews with experts, and a graph that plots data from a 
+    government agency.  A third perspective one might take is on <i>process-centered provenance</i>, 
+    capturing the actions and steps taken to generate the information in question.  For example, a 
+    graph may have been generated by invoking a service to retrieve data from a database, and then 
+    extracting certain statistics from the data using some statistics package.
+   </p>
 
+   <p>
+    Provenance records are metadata.  There are other kinds of metadata that is 
+    not provenance.  For example, the size of an image is a metadata property of 
+    that image but it is not provenance.
+   </p>
 
-   <!-- section>
-    <h3>Provenance as data</h3>
-    <p>Explains the contexts in which the reader may see or create Prov-DM data.</p>
-   </section -->
+   <p>
+    A comprehensive overview of requirements, use cases, prior research, and proposed 
+    vocabularies for provenance are available from the 
+    <a href="http://www.w3.org/2005/Incubator/prov/XGR-prov/">Final Report of the W3C Provenance Incubator Group</a>.  
+    The document contains three general scenarios 
+    that may help identify the provenance aspects of your planned applications and 
+    help plan the design of your provenance system.
+   </p>
+   <p>
+    For a detailed description of PROV-DM, please refer to the 
+    <a href="http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html">PROV Data Model and Abstract Syntax Notation Document</a>.
+   </p>
   </section>
 
   <section>
-   <h2>Intuitive overview of Prov-DM</h2>
+   <h2>Intuitive overview of PROV-DM</h2>
 
-   <p><i>This section provides an intuitive explanation of the concepts in Prov-DM. 
+   <p><i>This section provides an intuitive explanation of the concepts in PROV-DM. 
      As with the rest of this document, it should be treated as a starting point for
      understanding the model, and not normative in itself. The model specification
-     provides the precise definitions and constraints to be followed in using Prov-DM.</i></p>
+     provides the precise definitions and constraints to be followed in using PROV-DM.</i></p>
 
    <section>
     <h3>Entities</h3>
 
     <p>
-     In Prov-DM, the things that you ask the provenance of are called <i>entities</i>,
-     and one entity may have many others in the description of its provenance, e.g. the
-     provenance of a building could include the stones that formed its bricks and the
-     spades that dug its foundations. In contrast to process executions, below, entities
-     are understood by their states rather than the activities they perform.
+     In PROV-DM, the things that you may ask the provenance of are called <i>entities</i>.
+     An entity’s provenance may refer to many other entities.  For example, a document D is
+     an entity whose provenance refers to other entities such as a graph inserted into D,
+     the dataset that was used to create that graph, or the author of the document.
     </p>
     <p>
-     Each thing in the world can be viewed from different perspectives and, given that
-     the provenance of one entity may be composed from assertions made by multiple parties,
-     there must be care that the entity referred to by one party is the same as for another.
-     For example, a document D may have document ID 123 throughout its existence, but
-     the first version of the document Dv1 is in HTML 4.0, while the next version, Dv2,
-     is in XHTML 1.1. If one party, describing D, asserts that the document has ID 123, but
-     then another asserter, looking at Dv2, adds that the document is in XHTML 1.1, then
-     there is ambiguity: did the document have ID 123 and was it in XHTML 1.1 throughout its
-     lifetime or just in one version? We have to characterize each entity in Prov-DM data
-     by stating the attributes that define it, e.g. D is defined by its ID being 123, while Dv2 is
-     defined by its ID being 123 <i>and</i> its version being 2.
+     Entities are described and identified by their properties, may be more
+     or less specific, and may be described from different perspectives.  For example,
+     document D, the second version of document D, and document D as stored on my file system,
+     are three distinct entities for which we may describe the provenance. They
+     may all be perspectives on the same thing in the world (document D may exist only
+     in its second version and on my file system), but are <i>characterized</i> in
+     different ways by being described using different <i>attributes</i> (version, location, and 
+     so on).
     </p>
    </section>
 
    <section>
-    <h3>Process Executions</h3>
+    <h3>Activities</h3>
 
     <p>
-     A Prov-DM process execution is an activity that has occurred. Most importantly,
-     process executions are how entities come into existence, often making use of
-     existing entities to achieve this. Continuing the example from above, document Dv2
-     was generated by a translation from HTML 4.0 to XHTML 1.1 that makes use of document Dv1,
-     and this translation activity is a process execution.
+     While entities are static aspects in the world (things), <i>activities</i> are
+     dynamic aspects (actions, processes, etc.)
+     An activity is something that has either occurred or is still 
+     taking place. Most importantly, activities are how entities come into 
+     existence, often making use of previously existing entities to achieve this. 
+     For example, if the second version of document D was generated 
+     by a translation from the first version of the document in another language,
+     then this translation is an activity.
     </p>
    </section>
 
@@ -176,73 +207,89 @@
     <h3>Use and Generation</h3>
 
     <p>
-     The event of an entity coming into existence is called its <i>generation</i>.
-     The entity will be generated as part of some process execution, e.g. writing
-     a document brings the document into existence, while revising the document brings a new
-     version into existence.
-    </p>
-    <p>
-     Process executions also make use of entities. For example, revising a document
-     requires the original version of the document, the corrections to be made,
-     the new material to be added, etc.
-    </p>
-    <p>
-     In Prov-DM, assertions are made that particular process executions used or generated
-     particular entities.
+     Every entity is created by an activity, which is called the <i>generation</i> of the entity.
+     For example, writing a document brings the document into existence, while
+     revising the document brings a new version into existence.
+     Activities also make <i>use</i> of entities. For example, revising a document
+     to fix spelling mistakes uses the original version of the document as well
+     as a list of corrections. In PROV-DM, assertions can be made to state that 
+     particular activities used or generated particular entities.
     </p>
    </section>
 
    <section>
     <h3>Agents</h3>
 
-    <p>An agent can be a person, a piece of software or an inanimate object
-     that was involved in the creation or transformation of an entity or
-     collection of entities. Consider a graph displaying some statistics
-     regarding crime rates over time in a linear regression.
-
-     To represent the provenance of a particular graph in a newspaper to
-     visually summarize and represent a dataset about crime statistics, the
-     agent can either be the person who analyzed or summarized the data in
-     order to process that into a visualization or a piece of software that
-     was used to create the graph.</p>
+    <p>
+     An agent is a type of entity that takes an active role in an activity such 
+     that it can be assigned some degree of responsibility for the activity taking 
+     place. An agent can be a person, a piece of software, or an inanimate object.
+     In PROV-DM, agents are a kind of entity, and it is therefore possible to 
+     associate provenance with agents.  Consider a graph displaying some statistics 
+     regarding crime rates over time in a linear regression.  To represent the 
+     provenance of a that graph, we could state that the person who created the 
+     graph was an agent involved in its creation, and that the software used to 
+     create the graph was also an agent involved in that activity.  We 
+     can also represent the provenance of that software and the agents involved in 
+     that, such as the vendor of that software.
+    </p>     
    </section>
 
    <!--section>
     <h3>Accounts</h3>
-
-    <p>An intuitive overview of how to think about accounts in Prov-DM.</p>
+  
+    <p>An intuitive overview of how to think about accounts in PROV-DM.</p>
    </section -->
 
    <section>
     <h3>Roles</h3>
 
-    <p>A role is a characterization of the function or part a characterized thing played in an activity.  In Prov-DM, roles are qualifiers on relations between entities and process executions that provide application context to the relationship.
-
-     Examples of roles an entity can take while participating in or controlling an activity include author, co-author, editor, manager, curator, publisher, etc.  Role can also be used to describe how an entity was used in an activity, examples include ingredient, source, evidence, etc.
-
-     Roles are intended as an extension point in the model; it is expected users will define and use custom role taxonomies.  Role interpretation is application specific.</p>
+    <p>
+     A role is a description of the function or part an entity 
+     played in an activity.  In PROV-DM data, roles are qualifying, application-specific,
+     information about the relationship between an entity and an activity, whether
+     that is how an activity used an entity, generated an entity, or was controlled by an agent.
+     For example, an agent may play the role of "editor" in an activity that uses
+     one entity in the role of "document to be edited" and another in the role of "edits
+     to be made", to generate a further entity in the role of "edited document".
+    </p>
+    <!--p>Roles are intended as an extension point in the model; it is expected users will define and use custom role taxonomies.  Role interpretation is application specific.</p -->
    </section>
 
    <section>
-    <h3>Revision</h3>
+    <h3>Revisions</h3>
 
     <p>
-     A single resource, such as a document, may go through multiple <i>revisions</i> (also called versions and
-     other comparable terms) over time. Between revisions, several changes may have
-     taken place to the resource, possibly controlled by different agents.
-     Each revision is, in Prov-DM terms, an entity, and Prov-DM allows one to assert the relation
-     between entities that one is a revision of another.
-    </p>
-    <p>
-     In some contexts, for one entity to be considered to be a new revision
-     of something represented by an earlier entity, may require it to be declared a
-     new revision by some agent, thus 'signing off' the changes since the prior revision.
+     A single resource, such as a document, may go through multiple revisions 
+     (also called versions and other comparable terms) over time. Between revisions, 
+     several changes may have taken place to the resource, each possibly controlled 
+     by different agents. The result of each revision is, in PROV-DM terms, an entity, 
+     and PROV-DM allows one to relate those entities by making an assertion that 
+     one is a revision of another.
     </p>
    </section>
 
    <section>
     <h3>Complementarity</h3>
     <p>
+     As described above, entities can be described from different perspectives,
+     by being characterized by different attributes. For example, "document D",
+     "the second version of document D" and "document D as stored on my filesystem"
+     are different entities
+     because they are characterized in different ways. However, for some period of time
+     they may all refer to the same thing in the world, e.g. for a while the copy of 
+     D on my filesystem <i>was</i> the second version.
+    </p>
+    <p>
+     In PROV-DM, we say there is <i>complementarity</i> between one entity and another
+     if everything that characterizes the first is also true of the second.
+     So, both "the second version of document D" and "document D as stored on my filesystem"
+     are complements of "document D", because they are both more specific characterizations
+     of D. If a version of D stored on my filesystem is the second one, then
+     "document D as stored on my filesystem" is a complement of "the second version of document D".
+    </p>
+
+    <!-- p>
      Several asserted entities could be characterizing the same thing, in
      particular when entities are asserted by different <em>accounts</em> or over
      different time periods. If two such entities have <em>overlapping
@@ -257,7 +304,7 @@
      In addition, if <code>:A prov:wasComplementOf :B</code>, then of all the
      attributes of the entity <code>:A</code> which can be <em>mapped</em> to
      <em>compatible</em> attributes of <code>:B</code> MUST be <em>matching</em>
-     for the contiuous duration of the overlap of <code>:A</code> and
+     for the continuous duration of the overlap of <code>:A</code> and
      <code>:B</code>'s lifespans.
      It is out of scope for PROV to specify or assert the nature of
      the <em>compatibility mapping</em> and <em>matching</em>, the exact
@@ -296,7 +343,7 @@
      two entity timespans overlap, this could be anything from
      complete one-to-one match (where all attributes are always true for
      both entities) to merely touching overlaps. 
-    </p>
+    </p -->
    </section>
 
    <section>
@@ -312,7 +359,7 @@
      the designer's original sketches of what the page would look like.
     </p>
     <p>
-     There are different kinds of derivation expressible in Prov-DM.
+     There are different kinds of derivation expressible in PROV-DM.
      Consider the case of the page in the browser above. It is derived from
      the designer's sketch in the strictest sense, i.e. if the sketch had
      been different so would the page. On the other hand, there are
@@ -325,15 +372,15 @@
      was part of the page's history, and while not affecting the browsed
      page's content may have been a factor in its existence. Finally, in
      some cases, we may be able to say not only that one entity was derived
-     from another, but also how it was derived, i.e. by what process
-     execution. For example, the page in the browser is derived from the
-     page on the web server because a download process sent the bytes of
+     from another, but also how it was derived, i.e. by what activity.
+     For example, the page in the browser is derived from the
+     page on the web server because a download activity sent the bytes of
      the latter across an HTTP connection to the browser client.
     </p>
     <p>
-     In Prov-DM terms, we say that the page in the browser <i>was eventually
+     In PROV-DM terms, we say that the page in the browser <i>was eventually
       derived from</i> the sketch, <i>depended on</i> the banner image, and <i>was derived
-     from</i> the page on the web server due to the download process.
+      from</i> the page on the web server due to the download activity.
     </p>
    </section>
   </section>
@@ -341,11 +388,11 @@
   <section>
    <h2>Worked Examples</h2>
 
-   <p>In the following sections, we show how Prov-DM can be used to model 
+   <p>In the following sections, we show how PROV-DM can be used to model 
     provenance in specific examples.</p>
 
    <p>We include examples of how the formal ontology 
-    can be used to represent the Prov-DM assertions as RDF triples.
+    can be used to represent the PROV-DM assertions as RDF triples.
     These are shown using the Turtle notation. In 
     the latter depictions, the namespace prefix <b>prov</b> denotes 
     terms from the Prov ontology, while <b>ex1</b>, <b>ex2</b>, etc. 
@@ -384,23 +431,23 @@
    </section>
 
    <section>
-    <h3>Process Executions</h3>
+    <h3>Activities</h3>
 
     <p>
      Further, the Prov data asserts that there was
-     a process execution (ex1:compiled) denoting the compilation of the
+     an activity (<code>ex1:compiled</code>) denoting the compilation of the
      chart from the data set.
     </p>
     <pre class="turtle example">
-     ex1:compiled a prov:ProcessExecution .
+     ex1:compiled a prov:Activity .
     </pre>
     <p>
      The provenance also includes reference to the steps involved in compilation,
      aggregating the data by region and generating the chart graphic.
     </p>
     <pre class="turtle example">
-     ex1:aggregated a prov:ProcessExecution .
-     ex1:illustrated a prov:ProcessExecution .
+     ex1:aggregated  a prov:Activity .
+     ex1:illustrated a prov:Activity .
     </pre>
    </section>
 
@@ -409,13 +456,13 @@
 
     <p>
      Finally, the Prov data asserts the key events that connected the above
-     entities and process executions, i.e. the use of an entity by a process,
-     or the generation of an entity by a process.
+     entities and activities, i.e. the use of an entity by an activity,
+     or the generation of an entity by an activity.
     </p>
     <p>
-      For example, the data below states that the aggregation process execution
-      (<code>ex1:aggregated</code>) used the data set, that it used the list of
-      regions, and that the aggregated data was generated by this process.
+     For example, the data below states that the aggregation activity
+     (<code>ex1:aggregated</code>) used the data set, that it used the list of
+     regions, and that the aggregated data was generated by this activity.
     </p>
     <pre class="turtle example">
      ex1:aggregated prov:used           ex1:dataSet1 ;
@@ -423,8 +470,8 @@
      ex1:aggregate1 prov:wasGeneratedBy ex1:aggregated .
     </pre>
     <p>
-      Similarly, the chart graphic creation process (<code>ex1:illustrated</code>)
-      used the aggregated data, and the chart was generated by this process.
+     Similarly, the chart graphic creation activity (<code>ex1:illustrated</code>)
+     used the aggregated data, and the chart was generated by this activity.
     </p>
     <pre class="turtle example">
      ex1:illustrated prov:used           ex1:aggregate1 .
@@ -433,8 +480,8 @@
 
     <!-- p>
      For example, the provenance declares the event (of type <code>prov:Usage</code>)
-     where the aggregation process execution used the GovData data set, and the event
-     (of type <code>prov:Generation</code>) where the same process execution generated
+     where the aggregation activity used the GovData data set, and the event
+     (of type <code>prov:Generation</code>) where the same activity generated
      the data aggregated by region.
     </p>
     <pre class="turtle example">
@@ -442,7 +489,7 @@
      ex1:aggregate1Generation a prov:Generation .
     </pre>
     <p>
-     To describe these events, the provenance says within which process execution
+     To describe these events, the provenance says within which activity
      they occur and what entity is used or generated.
     </p>
     <pre class="turtle example">
@@ -452,7 +499,7 @@
      ex1:aggregate1Generation prov:entity ex1:aggregate1 .
     </pre>
     <p>
-     Comparable events are described for the process of generating the chart image
+     Comparable events are described for the activity of generating the chart image
      from the aggregated data.
     </p>
     <pre class="turtle example">
@@ -466,7 +513,7 @@
     <p>
      From this information Betty can see that
      the mistake could have been in the original data set or else was introduced
-     in the compilation process, and sets out to discover which.
+     in the compilation activity, and sets out to discover which.
     </p>
 
    </section>
@@ -476,7 +523,7 @@
 
     <p>
      Digging deeper, Betty wants to know who compiled the chart.  Betty sees 
-     that both the aggregation and chart creation process executions were controlled 
+     that both the aggregation and chart creation activities were controlled 
      by the Derek.
     </p>
     <pre class="turtle example">
@@ -485,8 +532,8 @@
     </pre>
     <p>
      The record for Derek provides the
-     following information, of which the first line is a Prov-DM statement that
-     Derek is an agent.
+     following information, of which the first line is a PROV-O statement that
+     Derek is a (PROV-DM) agent.
     </p>
     <pre class="turtle example">
      ex1:derek a prov:Agent ;
@@ -498,7 +545,7 @@
 
    <!-- section>
     <h3>Accounts</h3>
-
+  
     <p><i>Suggested example:</i> The analyst provides his own record of how he compiled GovData to create 
      the chart, which provides more detail than in the newspaper's provenance data. 
      Specifically, the analysts account separates compilation into two stages: aggregating 
@@ -512,16 +559,16 @@
     <p>
      For Betty to understand where the error lies, she needs to have more detailed 
      information on how entities have been used in, participated in, and generated 
-     by process executions.  Betty has determined that <code>ex1:aggregated</code> used 
+     by activities.  Betty has determined that <code>ex1:aggregated</code> used 
      entities <code>ex1:regionList1</code> and <code>ex1:dataSet1</code>, but she does not 
      know what function these entities played in the processing.  Betty 
-     also knows that <code>ex1:derek</code> controlled the process executions, but she does 
+     also knows that <code>ex1:derek</code> controlled the activities, but she does 
      not know if Derek was the analyst responsible for determining how the data 
      should be aggregated.
     </p>
     <p>
      The above information is described as roles in the provenance data. The aggregation
-     process involved entities in four roles: the data to be aggregated (<code>ex1:dataToAggregate</code>),
+     activity involved entities in four roles: the data to be aggregated (<code>ex1:dataToAggregate</code>),
      the regions to aggregate by (<code>ex1:regionsToAggregateBy</code>), the
      resulting aggregated data (<code>ex1:aggregatedData</code>), and the
      analyst doing the aggregation (<code>ex1:analyst</code>).
@@ -533,43 +580,43 @@
      ex1:analyst              a prov:Role .
     </pre>
     <p>
-     In addition to the simple facts that the aggregation process used, generated or
+     In addition to the simple facts that the aggregation activity used, generated or
      was controlled by entities/agents as described in the sections above, the
      provenance data contains more details of <i>how</i> these entities and agents
      were involved, i.e. the roles they played. For example, the data below states
-     that the aggregation process (<code>ex1:aggregated</code>) included the usage
+     that the aggregation activity (<code>ex1:aggregated</code>) included the usage
      of the GovData data set (<code>ex1:dataSet1</code>) in the role of the data
      to be aggregated (<code>ex1:dataToAggregate</code>).
     </p>
     <pre class="turtle example">
      ex1:aggregated prov:hadQualifiedUsage [ a prov:Usage ;
-            prov:hadQualifiedEntity    ex1:dataSet1 ;
-            prov:roleOfQualifiedEntity ex1:dataToAggregate ] .
+            prov:hadQualifiedEntity ex1:dataSet1 ;
+            prov:hadRole            ex1:dataToAggregate ] .
     </pre>
     <p>
-     This can then be distinguished from the same execution's usage of the list of
+     This can then be distinguished from the same activity's usage of the list of
      regions because the roles played are different.
     </p>
     <pre class="turtle example">
      ex1:aggregated prov:hadQualifiedUsage [ a prov:Usage ;
-            prov:hadQualifiedEntity    ex1:regionList1 ;
-            prov:roleOfQualifiedEntity ex1:regionsToAggregateBy ] .
+            prov:hadQualifiedEntity ex1:regionList1 ;
+            prov:hadRole            ex1:regionsToAggregateBy ] .
     </pre>
     <p>
-     Similarly, the provenance includes assertions that the same process was
+     Similarly, the provenance includes assertions that the same activity was
      controlled in a particular way (<code>ex1:analyst</code>) by Derek, and that
      the entity <code>ex1:aggregate1</code> took the role of the aggregated
-     data in what the process generated.
+     data in what the activity generated.
     </p>
     <pre class="turtle example">
      ex1:aggregated
         prov:hadQualifiedControl [ a prov:Control ;
-            prov:hadQualifiedEntity    ex1:derek ;
-            prov:roleOfQualifiedEntity ex1:analyst
+            prov:hadQualifiedEntity ex1:derek ;
+            prov:hadRole            ex1:analyst
         ] ;
         prov:hadQualifiedGeneration [ a prov:Generation ;
             prov:hadQualifiedEntity ex1:aggregate1 ;
-            prov:roleOfQualifiedEntity ex1:aggregatedData
+            prov:hadRole            ex1:aggregatedData
         ] .
     </pre>
    </section>
@@ -578,7 +625,7 @@
     <h3>Revision</h3>
 
     <p>
-     After looking at the detail of the compilation process, there appears
+     After looking at the detail of the compilation activity, there appears
      to be nothing wrong, so Betty concludes the error is in GovData. She contacts
      the government, and a new version of GovData is created, declared to be the
      next revision of the data by Edith. The provenance data now includes a statement
@@ -595,7 +642,7 @@
 
     <p>Betty lets Derek know that a new revision of the data set exists,
      and he looks at the provenance of the new data to understand what he needs to
-     reanalyse. </p>
+     re-analyze. </p>
     <p>In addition to specifying that 
      <code>ex1:dataSet2</code> is a new revision of
      <code>ex1:dataSet1</code>, the provenance from DataGov also 
@@ -614,14 +661,14 @@
     -->
     <p>
      This assertion means that <code>ex1:dataSet1</code> at some point shared
-     its characterising attributes with <code>ex1:dataSet</code>, and the same for
+     its characterizing attributes with <code>ex1:dataSet</code>, and the same for
      <code>ex2:dataSet2</code>. Thus the <em>entity</em>
      <code>ex1:dataSet1</code> did at some point represent the same
      thing as characterized by the entity <code>ex1:dataSet</code>. The same is
-     true for <code>ex1:dataSet2</code> - but not neccessarily at the
+     true for <code>ex1:dataSet2</code>, though not necessarily at the
      same point in time. 
     </p>
-    <p>
+    <!-- p>
      The term <em>was complement of</em> here means that the
      <code>ex1:dataSet1</code>
      provide additional details that adds to the details of
@@ -633,48 +680,49 @@
      <em>Compatible</em> here means that some kind of mapping can be
      established between the attributes, they don't neccessarily have to
      match directly.
-    </p>
-    <p>   
-     Derek then looks at the characterization of 
-     <code>ex1:dataSet</code> to find these compatible attributes:
+    </p -->
+    <p>
+     Derek then looks at the characterization of the generalized data set
+     (<code>ex1:dataSet</code>) to find the attributes shared with the first 
+     and second versions of the data set. The assertions below give the generalized
+     data set's attributes: it is of type <code>ex1:DataSet</code>, it covers
+     three named regions, it was created by <code>ex1:DataGov</code>, and
+     has a given title.
     </p>
     <pre class="example turtle">
      ex1:dataSet a ex1:DataSet ;
-         ex1:regions ( ex1:North, ex1:NorthWest, ex1:East ) ;
-         dc:creator ex1:DataGov ;
-         dc:title "Regional incidence dataset 2011" .
+           ex1:regions ( ex1:North, ex1:NorthWest, ex1:East ) ;
+           dc:creator  ex1:DataGov ;
+           dc:title    "Regional incidence dataset 2011" .
     </pre>
-    <!--
-    <pre class="example asn">
-    entity(ex1:dataSet, [
-       type="ex1:DataSet",
-       ex1:regions="North,NorthWest,East",
-       dc:creator="ex1:DataGov",
-       dc:title="Regional incidence dataset 2011"])
-    </pre>
-    -->
-    <p>Derek can from this deduce that both datasets had at some point
-     the same creator and title.  Derek then compares this to the
-     attributes for each of the complementing entities:
+    <p>
+     As <code>ex1:dataSet1</code> and <code>ex1:dataSet2</code> complement
+     <code>ex1:dataSet</code>,
+     Derek can deduce from the above attributes that both the former had
+     these same attributes at some point, i.e.      
+     the creator <code>ex1:DataGov</code> and so on.  Derek compares the above
+     assertions to the
+     attributes of <code>ex1:dataSet1</code>.
     </p>
     <pre class="example turtle">
      ex1:dataSet1 a ex1:DataSet ;
-         ex1:postCodes ( "N1", "N2", "NW1", "E1", "E2" ) ;
-         ex1:totalIncidents 141 ;
-         dc:creator ex1:DataGov ;
-         dc:title "Regional incidence dataset 2011" .
+           ex1:postCodes      ( "N1", "N2", "NW1", "E1", "E2" ) ;
+           ex1:totalIncidents 141 ;
+           dc:creator         ex1:DataGov ;
+           dc:title           "Regional incidence dataset 2011" .
     </pre>
-    <!--
-    <pre class="example asn">
-    entity(ex1:dataSet1, [
-       type="ex1:DataSet",
-       ex1:postCodes="N1,N2,NW1,E1,E2",
-       ex1:totalIncidents="141",
-       dc:creator="ex1:DataGov",
-       dc:title="Regional incidence dataset 2011"])
-    </pre>
-    -->
     <p>
+     Shared characterizing attributes are not necessarily represented in
+     the serialized assertions of different entities. For example, the creator
+     and title are exactly the same for <code>ex1:dataSet</code> and <code>ex1:dataSet1</code>,
+     but the regions covered by the data set are described in a different way:
+     "regions" for <code>ex1:dataSet</code> and "postCodes" for <code>ex1:dataSet1</code>.
+     Whether these are equivalent is a domain-specific judgment.
+     We can also see that, while <code>ex1:dataSet1</code> complements <code>ex1:dataSet</code>,
+     the inverse is not true. <code>ex1:dataSet1</code> is more specific, because
+     it has a "totalIncidents" attribute specific to that version of the data set.
+    </p>
+    <!-- p>     
      Derek sees that the creator and title are directly mappable and 
      equal between these entities. He also knows (from his region
      aggregation method) that the <code>ex1:postCodes</code> <code>N1</code> and
@@ -719,16 +767,6 @@
          dc:creator ex1:DataGov ;
          dc:title "Regional incidence dataset 2011" .
     </pre>
-    <!--
-    <pre class="example asn">
-    entity(ex1:dataSet1, [
-       type="ex1:DataSe2",
-       ex1:postCodes="N1,N2,NW1,NW2,E1,E2",
-       ex1:totalIncidents="158",
-       dc:creator="ex1:DataGov",
-       dc:title="Regional incidence dataset 2011"])
-    </pre>
-    -->
     <p>
      In this revision, the new postcode <kbd>NW2</kbd> appears, this is still
      <em>compatible</em> with the region <code>ex1:NorthWest</code>
@@ -752,7 +790,7 @@
      of) the timespans of the two revisions.
     </p>
     <p>
-     From this Derek concludes that he can still use the regions Nort,
+     From this Derek concludes that he can still use the regions North,
      North West and East in the diagram layout, but as the
      <code>ex1:totalIncidents</code> differ, something in the
      raw data has changed. He can't from this provenance assertion
@@ -761,7 +799,7 @@
      Derek decides to redo the aggregation by region using
      <code>ex1:dataSet2</code> and regenerate the
      graphics using the same layout.
-    </p>
+    </p -->
    </section>
 
    <section>
@@ -769,7 +807,7 @@
 
     <p>
      Derek creates a new chart based on the revised data, 
-     using the same compilation process as before. Betty checks the article again at a
+     using the same compilation activity as before. Betty checks the article again at a
      later point, and wants to know if it is based on the old or new GovData.
      She sees three new assertions about derivation in the provenance data, plus
      an assertion about how the new chart was generated.
@@ -786,7 +824,7 @@
      The second says further that the new chart is as it because of the revised
      data set, i.e. there is an explicit influence of the data on the chart.
      Finally, the third and fourth assertions together say further that it was
-     the process execution <code>ex1:compiled2</code> that derived the new chart
+     the activity <code>ex1:compiled2</code> that derived the new chart
      from the revised data set.
     </p>
    </section>