Primer: added PROV-O example for time; corrections and improvements throughout
authorSimon Miles <simon.miles@kcl.ac.uk>
Sun, 01 Apr 2012 15:30:06 +0100
changeset 2139 66c668fa069c
parent 2138 304d0cc3f06c
child 2141 754d7a53db29
Primer: added PROV-O example for time; corrections and improvements throughout
primer/Primer.html
--- a/primer/Primer.html	Sun Apr 01 08:41:02 2012 +0100
+++ b/primer/Primer.html	Sun Apr 01 15:30:06 2012 +0100
@@ -1,5 +1,6 @@
 <!DOCTYPE html>
-<html><head> 
+<html>
+ <head> 
   <title>PROV Model Primer</title>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
   <!--
@@ -54,8 +55,8 @@
  
     // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
     // and its maturity status
-    // previousPublishDate:  "1977-03-15",
-    // previousMaturity:  "WD",
+    previousPublishDate:  "2012-01-10",
+    previousMaturity:  "WD",
  
     // if there a publicly available Editor's Draft, this is the link
     edDraftURI:           "http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html",
@@ -167,8 +168,11 @@
     The <i>provenance</i> of digital objects represents their origins.  The PROV-DM is a 
     proposed standard to represent provenance records, which contain <i>descriptions</i> of the entities 
     and activities involved in producing and delivering or otherwise influencing a 
-    given object.  By knowing the provenance of an object, we can make determinations 
-    about how to use it.  Provenance records can be used for many purposes, such as 
+    given object.
+    For the remainder of this document, we use the term 'provenance' to refer also
+    to records of provenance, except where the distinction is important for clarity.
+    By knowing the provenance of an object, we can make determinations 
+    about how to use it.  Provenance can be used for many purposes, such as 
     understanding how data was collected so it can be meaningfully used, determining 
     ownership and rights over an object, making judgments about information to 
     determine whether to trust it, verifying that the process and steps used to obtain a 
@@ -195,7 +199,7 @@
 
    <p>
     Provenance records are metadata.  There are other kinds of metadata that is 
-    not provenance.  For example, the size of an image is a metadata property of 
+    not provenance.  For example, the size of an image is metadata of 
     that image but it is not provenance.
    </p>
 
@@ -220,21 +224,17 @@
   <section>
    <h2>Intuitive overview of PROV-DM</h2>
 
-   <p>This section provides an intuitive explanation of the concepts in PROV-DM. 
+   <p>
+    This section provides an intuitive explanation of the concepts in PROV-DM. 
     As with the rest of this document, it should be treated as a starting point for
     understanding the model, and not normative in itself. The PROV-DM model specification
-    provides precise definitions and constraints to be used.</p>
-   <div class='note'>
-    Please note that, as they
-    are being developed in parallel, there will be points at which this document
-    does not yet exactly match the current data model or ontology.
-   </div>
-
+    provides precise definitions and constraints to be used.
+   </p>
    <p>
-    The following ER diagram provides a high level overview of the <strong>structure of PROV-DM records</strong>.
+    The following ER diagram provides a high level overview of the structure of PROV-DM records,
+    limited to some key PROV-DM concepts discussed in this document.
     The diagram is the same that appears in the [[PROV-DM]].
    </p>
-
    <div style="text-align: center;">
     <img src="OverviewDiagram.png" alt="PROV-DM overview"/>
    </div>
@@ -245,31 +245,13 @@
     <p>
      In PROV-DM, the things that one may ask the provenance of are called <i>entities</i>.
      Examples of such entities are a web page, a chart, and a spellchecker.
-    </p>
-    <p>
      An entity’s provenance may refer to many other entities.  For example, a document D is
      an entity whose provenance refers to other entities such as a chart inserted into D,
      the dataset that was used to create that chart, or the author of the document.
-    </p>
-    <p>
      Entities may be described from different perspectives that may be more or less specific.  For example,
-     document D as stored in my file system, the second version of document D after someone edited it, 
+     document D as stored in my file system, the second version of document D, 
      and D as an evolving document,
-     are three distinct entities for which we may describe the provenance.
-     <!-- They
-     may all be perspectives on the same thing in the world (document D may exist only
-     in its second version and on my file system), but are <i>characterized</i> in
-     different ways by being described using different <i>attributes</i> (version, location, and 
-     so on).
-    </p>
-    <p>
-     The characterization of an entity means that the provenance assertions
-     about the entity are only about the thing when it has those attributes.
-     For example, the second version of document D is characterized by being the
-     second version, and so assertions about who reviewed that entity apply only
-     to the document as it is in its second version. When the document becomes
-     the third version, a new entity exists (the third version of D) and the
-     provenance assertions about who reviewed the second version do not apply.-->
+     are three distinct entities for which we may describe provenance.
     </p>
    </section>
 
@@ -280,13 +262,11 @@
      <i>Activities</i> are how entities come into 
      existence and how their attributes change to become new entities, 
      often making use of previously existing entities to achieve this. 
+     They are
+     dynamic aspects of the world, such as actions, processes, etc.
      For example, if the second version of document D was generated 
      by a translation from the first version of the document in another language,
      then this translation is an activity.
-     An activity may have either already occurred or be still 
-     taking place when a new entity is generated. 
-     While entities are static aspects in the world (things), <i>activities</i> are
-     dynamic aspects (actions, processes, etc.)
     </p>
    </section>
 
@@ -296,15 +276,13 @@
      Activities <i>generate</i> new entities.
      For example, writing a document brings the document into existence, while
      revising the document brings a new version into existence.
-    </p>
-    <p>
+     An activity may complete with the generation of an entity or generate entities
+     mid-way through occurring.
      Activities also make <i>use</i> of entities. For example, revising a document
      to fix spelling mistakes uses the original version of the document as well
      as a list of corrections. 
-    </p>
-    <p>
-     Descriptions can be made in a provenance record to state that
-     particular activities used or generated particular entities.
+     Descriptions can be included, in PROV-DM data, of
+     particular activities using or generating particular entities.
     </p>
    </section>
 
@@ -331,13 +309,6 @@
      for saying that the agent was responsible for the activity which generated
      the entity.
     </p>
-    <!-- p>
-     Since agents are a kind of entity, it is therefore possible to 
-     associate provenance records with the agents themselves.  
-     In the running example, we 
-     can also represent the provenance of the software used to create the chart, and specify the agents involved in 
-     producing that software, such as the vendor.
-    </p -->
    </section>
 
    <section>
@@ -352,7 +323,7 @@
      For example, an agent may play the role of "editor" in an activity that uses
      one entity in the role of "document to be edited" and another in the role of
      "addition to be made to the document", to generate a further entity in the role of "edited document".
-     Roles are application specific.
+     Roles are application specific, so PROV-DM does not define any particular roles.
     </p>
     <!--p>Roles are intended as an extension point in the model; it is expected users will define and use custom role taxonomies.  Role interpretation is application specific.</p -->
    </section>
@@ -367,12 +338,15 @@
      and a chart is derived from the data that is used to create it.
     </p>
     <p>
-     A given entity, such as a document, may go through multiple <i>revisions</i> 
+     PROV allows some common, specialized kinds of derivation to be described.
+     For example, a given entity, such as a document, may go through multiple <i>revisions</i> 
      (also called versions and other comparable terms) over time. Between revisions,
      one or more attributes of the entity may change. 
      The result of each revision is a new entity, 
      and PROV-DM allows one to relate those entities by making an description that 
      one is a revision of another.
+     Another specialized kind of derivation is to say that one entity, commonly
+     a document, <i>quotes</i> from another.
     </p>
    </section>
 
@@ -388,11 +362,11 @@
    <section>
     <h3>Time</h3>
     <p>
-     Time is critical information in many provenance records.
+     Time is often a critical aspect of provenance.
      PROV-DM allows the timing of significant events to be described, including
      when an entity was generated or used, or when an activity started
      and finished. For example, the model can be used to describe facts such as when a new
-     version of a document was created (generation time), when a document was
+     version of a document was created (generation time), or when a document was
      edited (start and end of the editing activity).
     </p>
    </section>
@@ -424,7 +398,7 @@
      the other (the more general).
     </p>
    </section>   
-   
+
   </section>
 
   <section>
@@ -461,12 +435,13 @@
     </p>
     <p>Betty finds the following descriptions of entities in the provenance:</p>
     <pre class="turtle example">
-     ex:article      a prov:Entity ; dcterms:title "Crime rises in cities" .
-     ex:dataset1     a prov:Entity .
-     ex:regionList   a prov:Entity .
-     ex:composition  a prov:Entity .
-     ex:chart1       a prov:Entity .
-   </pre>
+     ex:article     a prov:Entity ;
+                    dcterms:title "Crime rises in cities" .
+     ex:dataset1    a prov:Entity .
+     ex:regionList  a prov:Entity .
+     ex:composition a prov:Entity .
+     ex:chart1      a prov:Entity .
+    </pre>
     <p>
      These statements, in order, describe that there was an article (<code>ex:article</code>),
      an original data set (<code>ex:dataSet1</code>),
@@ -483,16 +458,17 @@
     <h3>Activities</h3>
 
     <p>
-     Further, the provenance record describes that there was
-     an activity (<code>ex:compiled</code>) denoting the compilation of the
+     Further, the provenance describes that there was
+     an activity (<code>ex:compile</code>) denoting the compilation of the
      chart from the data set.
     </p>
     <pre class="turtle example">
-     ex:compiled a prov:Activity .
+     ex:compile a prov:Activity .
     </pre>
     <p>
-     The provenance record also includes reference to the more specific steps involved in this compilation,
-     which are first composing the data by region and then generating the chart graphic.
+     The provenance also includes reference to the more specific steps involved in this compilation,
+     which are first composing the data by region (<code>ex:compose</code>) and then generating the
+     chart graphic (<code>ex:illustrate</code>).
     </p>
     <pre class="turtle example">
      ex:compose    a prov:Activity .
@@ -504,7 +480,8 @@
     <h3>Use and Generation</h3>
 
     <p>
-     Finally, the provenance record describes the key relations among the above
+     Concluding the basic description of what occurred, the provenance 
+     describes the key relations among the above
      entities and activities, i.e. the use of an entity by an activity,
      or the generation of an entity by an activity.
     </p>
@@ -519,7 +496,7 @@
      ex:composition  prov:wasGeneratedBy ex:compose .
     </pre>
     <p>
-     Similarly, the chart graphic creation activity (<code>ex:illustrated</code>)
+     Similarly, the chart graphic creation activity (<code>ex:illustrate</code>)
      used the composed data, and the chart was generated by this activity.
     </p>
     <pre class="turtle example">
@@ -529,7 +506,7 @@
    </section>
 
    <section>
-    <h3>Agents</h3>
+    <h3>Agents and Responsibility</h3>
 
     <p>
      Digging deeper, Betty wants to know who compiled the chart.
@@ -553,7 +530,7 @@
               foaf:mbox      &lt;mailto:derek@example.org&gt; .
     </pre>
     <p>
-     Derek works as part of an organization, Chart Generators, and so the provenance
+     Derek works as part of an organization, Chart Generators Inc, and so the provenance
      declares that he acts on their behalf. Note that the organization is itself
      an agent.
     </p>
@@ -599,10 +576,6 @@
      ex:analyst              a prov:Role .
     </pre>
     <p>
-     In addition to the simple facts that the composition activity used, was generated by or
-     was associated with entities/agents as described in the sections above, the
-     provenance record contains more details of <i>how</i> these entities and agents
-     were involved, i.e. the roles they played. For example, the descriptions below state
      Examples in the sections above show descriptions of the simple facts that the
      composition activity used, generated and was controlled by entities/agents.
      For example, the usage of the data set by the compose activity is expressed
@@ -613,29 +586,35 @@
     </pre>
     <p>     
      The
-     provenance record can contain more details of <i>how</i> these entities and agents
-     were involved in the activity. One example is the roles the entities played.
-     To do this, PROV-O refers to <i>qualified usage</i>, <i>qualified generation</i>, etc.,
+     provenance can contain more details of exactly how these entities and agents
+     were involved in the activity. 
+     To express this, PROV-O refers to <i>qualified usage</i>, <i>qualified generation</i>, etc.,
      which are descriptions consisting of several statements about how use, generation, etc. took place.
-     For example, the descriptions below state
+     For example, we may describe the plan followed by an agent in performing an activity, or
+     the time at which an activity generated an entity, both illustrated later.
+     Another example of qualified involvement is the role an entity played in an activity.
+     The descriptions below state
      that the composition activity (<code>ex:compose</code>) included the usage
      of the government data set (<code>ex:dataSet1</code>) in the role of the data
      to be composed (<code>ex:dataToCompose</code>).
     </p>
     <pre class="turtle example">
-     ex:compose prov:hadQualifiedUsage [ a prov:Usage ;
+     ex:compose prov:qualifiedUsage [
+                   a prov:Usage ;
                    prov:entity  ex:dataSet1 ;
-                   prov:hadRole ex:dataToCompose ] .
+                   prov:hadRole ex:dataToCompose 
+     ] .
     </pre>
     <p>
      This can then be distinguished from the same activity's usage of the list of
      regions because the roles played are different.
     </p>
     <pre class="turtle example">
-     ex:compose  prov:qualifiedUsage [
-                   a  prov:Usage ;
-                   prov:entity   ex:regionList ;
-                   prov:hadRole  ex:regionsToAggregateBy ] .
+     ex:compose prov:qualifiedUsage [
+                   a prov:Usage ;
+                   prov:entity  ex:regionList ;
+                   prov:hadRole ex:regionsToAggregateBy
+     ] .
     </pre>
     <p>
      Similarly, the provenance includes descriptions that the same activity was
@@ -666,7 +645,7 @@
      She looks at the dataset <code>ex:dataSet1</code>, 
      and sees that it is missing data from one of the zipcodes in the area.  She contacts
      the government, and a new version of GovData is created, declared to be the
-     next revision of the data. The provenance record of this new dataset,
+     next revision of the data. The provenance of this new dataset,
      <code>ex:dataSet2</code>, states that it is a revision of the
      old data set, <code>ex:dataSet1</code>.
     </p>
@@ -685,7 +664,6 @@
                prov:wasDerivedFrom ex:dataSet2 .
     </pre>
    </section>
-  </section>
 
    <section>
     <h3>Plans</h3>
@@ -706,34 +684,61 @@
     <p>
      The connection between them is expressed in PROV-O using a qualified association giving details of
      how Edith was associated with the correction activity,
-     including that she adopted the above corrections plan.
+     including that she followed the above corrections plan.
     </p>
     <pre class="turtle example">
      ex:correct prov:qualifiedAssociation [
-                    prov:agent   ex:edith .
-                    prov:hadPlan ex:corrections .
+                    a Association ;
+                    prov:agent   ex:edith ;
+                    prov:hadPlan ex:corrections
                 ] .
      ex:dataSet2 prov:wasGeneratedBy ex:correct .
     </pre>
    </section>
-  
+
    <section>
     <h3>Time</h3>
-    
+
     <p>
-     
+     The government agency that produced GovData is concerned to know how long
+     the incorrect chart was in circulation before the corrected chart was created.
+     That is, they wish to compare the times at which the original and the corrected
+     charts were generated. Time of generation is expressed in PROV-O using a qualified
+     description of the generation. The snippet below shows that the second chart
+     was generated roughly a month after the first.
     </p>
+    <pre class="turtle example">
+     ex:chart1 prov:qualifiedGeneration [
+                    a prov:Generation ;
+                    prov:activity ex:compile ;
+                    prov:atTime   "2012-03-02T10:30:00"^^xsd:dateTime
+     ] .
+     ex:chart2 prov:qualifiedGeneration [
+                    a prov:Generation ;
+                    prov:activity ex:compile2 ;
+                    prov:atTime   "2012-04-01T15:21:00"^^xsd:dateTime
+     ] .
+    </pre>
+    <p>
+     To ensure their procedures are efficient, the agency also wish to know how long the
+     corrections took once the error was discovered. That is, they wish to know the
+     start and end times of the correction activity (<code>ex:correct</code>).
+     These details are expressed as follows, showing that the corrections took a
+     little over a day.
+    </p>
+    <pre class="turtle example">
+     ex:correct prov:startedAtTime "2012-03-31T09:21:00"^^xsd:dateTime ;
+                prov:endedAtTime   "2012-04-01T15:21:00"^^xsd:dateTime .
+    </pre>
    </section>
-   
+
    <section>
     <h3>Alternate Entities and Specialization</h3>
-    
+
     <p>
      Before noticing anything wrong with the government data, Betty had already
      posted a blog entry about the article. The blog entry had its own published
-     provenance, stating that it quoted from the article. This was expressed
-     using a PROV property, <code>wasQuotedFrom</code>, which is a kind of
-     derivation.
+     provenance, stating that it quoted from the article.
     </p>
     <pre class="turtle example">
      ex:blogEntry a prov:Entity ;
@@ -745,7 +750,7 @@
      (<code>ex:article</code>) and the first version of the article (<code>ex:articleV1</code>),
      allowing both to be referred to as entities in provenance data. The article
      discussed the GovData data set, and so the provenance data published by the
-     newspaper asserts that the first version of the article was derived from that data set.
+     newspaper describes the first version of the article as being derived from that data set.
     </p>
     <pre class="turtle example">
      ex:articleV1 a prov:Entity ;
@@ -762,7 +767,7 @@
      ex:articleV1 prov:specializationOf ex:article .
     </pre>
     <p>
-     Later, after the data set is corrected and new chart generated, a new version
+     Later, after the data set is corrected and the new chart generated, a new version
      of the article is created, <code>ex:articleV2</code>. To ensure that those
      consulting the provenance of <code>ex:articleV2</code> understand that it
      is connected with the provenance of <code>ex:article</code> and <code>ex:articleV1</code>,
@@ -776,9 +781,10 @@
      Here, <code>alternateOf</code> expresses that the first and second versions
      are specializations of the same thing (the article).
     </p>
-    
+
    </section>
-  
+  </section>
+
   <section class="appendix">
    <h2>PROV-N Examples</h2>
    <p>
@@ -829,7 +835,7 @@
    </section>
 
    <section>
-    <h3>Agents</h3>
+    <h3>Agents and Responsibility</h3>
     <pre class="example asn">
      entity(ex:derek, [ type="prov:Person", foaf:givenName = "Derek", 
             foaf:mbox= "&lt;mailto:derek@example.org&gt;"]).
@@ -897,7 +903,9 @@
     <li>Updated examples to latest PROV-O terms</li>
     <li>Added PROV-O examples for attribution </li>
     <li>Added PROV-O examples for plans, adoptedPlan </li>
-    <li>Added PROV-O examples for specialization and alternate </li>
+    <li>Added PROV-O examples for specialization, alternate and quotation</li>
+    <li>Added intuition section on quotation.</li>
+    <li>Added PROV-O examples for time</li>
    </ul>
   </section>