--- a/primer/Primer.html Wed Dec 14 00:11:16 2011 +0000
+++ b/primer/Primer.html Wed Dec 14 10:28:01 2011 +0000
@@ -1,1010 +1,700 @@
-<!DOCTYPE html>
-<html><head>
- <title>Prov Model Primer</title>
- <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
- <!--
- === NOTA BENE ===
- For the three scripts below, if your spec resides on dev.w3 you can check them
- out in the same tree and use relative links so that they'll work offline,
- -->
- <!-- PM -->
- <style type="text/css">
- .note { font-size:small; margin-left:50px }
- </style>
-
- <script src="http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js" class="remove"></script>
-
- <script class="remove">
- var respecConfig = {
- // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
- specStatus: "ED",
-
- // the specification's short name, as in http://www.w3.org/TR/short-name/
- shortName: "Prov-Primer",
-
- // if your specification has a subtitle that goes below the main
- // formal title, define it here
- subtitle : "Initial draft for internal discussion",
-
- // if you wish the publication date to be other than today, set this
- // publishDate: "2009-08-06",
-
- // if the specification's copyright date is a range of years, specify
- // the start date here:
- // copyrightStart: "2005"
-
- // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
- // and its maturity status
- // previousPublishDate: "1977-03-15",
- // previousMaturity: "WD",
-
- // if there a publicly available Editor's Draft, this is the link
- edDraftURI: "http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html",
-
- // if this is a LCWD, uncomment and set the end of its review period
- // lcEnd: "2009-08-05",
-
- // if you want to have extra CSS, append them to this list
- // it is recommended that the respec.css stylesheet be kept
- extraCSS: ["http://dev.w3.org/2009/dap/ReSpec.js/css/respec.css", "./extra.css"],
-
- // editors, add as many as you like
- // only "name" is required
- editors: [
- { name: "Yolanda Gil", url: "http://www.isi.edu/~gil/",
- company: "Information Sciences Institute, University of Southern California, US" },
- { name: "Simon Miles", url: "http://www.inf.kcl.ac.uk/~simonm",
- company: "King's College London, UK" },
- ],
-
- // authors, add as many as you like.
- // This is optional, uncomment if you have authors as well as editors.
- // only "name" is required. Same format as editors.
-
- authors: [
- { name: "TBD"},
- ],
-
- // name of the WG
- wg: "Provenance Working Group",
-
- // URI of the public WG page
- wgURI: "http://www.w3.org/2011/prov/wiki/Main_Page",
-
- // name (with the @w3c.org) of the public mailing to which comments are due
- wgPublicList: "public-prov-wg",
-
- // URI of the patent status for this WG, for Rec-track documents
- // !!!! IMPORTANT !!!!
- // This is important for Rec-track documents, do not copy a patent URI from a random
- // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
- // Team Contact.
- wgPatentURI: "",
- };
- </script>
- </head>
- <body>
- <section id="abstract">
- <p>This document aims to provide an intuitive guide to the PROV Data Model,
- with worked examples.</p>
-
- <p>
- This is a document for internal discussion, which will ultimately
- evolve in the first Public Working Draft of the Primer.</p>
- </section>
-
- <section>
- <h2>Introduction</h2>
- <p>
- This primer document provides an accessible introduction to the PROV Data Model
- (PROV-DM) standard for representing provenance on the Web, and its representation
- in the PROV Ontology (PROV-O). Provenance describes
- the origins of things, so PROV-DM data consists of assertions about the past.
- </p>
-
- <p>
- This primer document aims to ease the adoption of the standard by providing:
- </p>
- <ul>
- <li>An intuitive explanation of how PROV-DM models provenance.</li>
- <li>Worked examples that can be followed to produce your own PROV-DM data.</li>
- <li>Answers to frequently asked questions regarding how the model should be applied.</li>
- </ul>
-
- <p>
- The <i>provenance</i> of digital objects represents their origins. The PROV-DM is a
- proposed standard to represent provenance records, which contain <i>assertions</i> about the entities
- and activities involved in producing and delivering or otherwise influencing a
- given object. By knowing the provenance of an object, we can make determinations
- about how to use it. Provenance records can be used for many purposes, such as
- understanding how data was collected so it can be meaningfully used, determining
- ownership and rights over an object, making judgments about information to
- determine whether to trust it, verifying that the process and steps used to obtain a
- result complies with given requirements, and reproducing how something it was generated.
- </p>
-
- <p>
- As a standard for provenance, PROV-DM accommodates all those different uses
- of provenance. Different people may have different perspectives on provenance,
- and as a result different types of information might be captured in provenance records.
- One perspective might focus on <i>agent-centered provenance</i>, that is, what entities
- were involved in generating or manipulating the information in question. For example,
- in the provenance of a picture in a news article we might capture the photographer who
- took it, the person that edited it, and the newspaper that published it. A second perspective
- might focus on <i>object-centered provenance</i>, by tracing the origins of portions of a
- document to other documents. An example is having a web page that was assembled from content
- from a news article, quotes of interviews with experts, and a chart that plots data from a
- government agency. A third perspective one might take is on <i>process-centered provenance</i>,
- capturing the actions and steps taken to generate the information in question. For example, a
- chart may have been generated by invoking a service to retrieve data from a database, then
- extracting certain statistics from the data using some statistics package, and finally
- processing these results with a graphing tool.
- </p>
-
- <p>
- Provenance records are metadata. There are other kinds of metadata that is
- not provenance. For example, the size of an image is a metadata property of
- that image but it is not provenance.
- </p>
-
- <p>
- For general background on provenance, a
- comprehensive overview of requirements, use cases, prior research, and proposed
- vocabularies for provenance are available from the
- <a href="http://www.w3.org/2005/Incubator/prov/XGR-prov/">Final Report of the W3C Provenance Incubator Group</a>.
- That document contains three general scenarios
- that may help identify the provenance aspects of your planned applications and
- help plan the design of your provenance system.
- </p>
- <p>
- The next section gives an introductory overview of PROV-DM using simple examples.
- The following section shows how the formal ontology PROV-O can be used to represent the PROV-DM assertions
- as RDF triples. The document also contains frequently asked questions, and an appendix giving example
- snippets of the PROV-DM Abstract Syntax Notation (ASN).
- For a detailed description of PROV-DM, please refer to the
- <a href="http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html">PROV Data Model and Abstract Syntax Notation document</a>.
- For a detailed description of PROV-O, refer to the
- <a href="http://dvcs.w3.org/hg/prov/raw-file/default/ontology/ProvenanceFormalModel.html">PROV Ontology Model and Formal Semantics document</a>.
- </p>
- </section>
-
- <section>
- <h2>Intuitive overview of PROV-DM</h2>
-
- <p><i>This section provides an intuitive explanation of the concepts in PROV-DM.
- As with the rest of this document, it should be treated as a starting point for
- understanding the model, and not normative in itself. The PROV-DM model specification
- provides precise definitions and constraints to be used.</i></p>
-
-<p>
-The following ER diagram provides a high level overview of the <strong>structure of PROV-DM records</strong>.
-The diagram is the same that appears in the
-<a href="http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html">PROV Data Model and Abstract Syntax Notation document</a>,
-but note that this primer document only describes some of the terms shown in the diagram.
-</p>
-
-<div style="text-align: center;">
- <img src="overview.png" alt="PROV-DM overview"/>
-</div>
-
- <section>
- <h3>Entities</h3>
-
- <p>
- In PROV-DM, the things that one may ask the provenance of are called <i>entities</i>.
- Examples of such entities are a web page, a chart, and a spellchecker.
- </p>
- <p>
- An entity’s provenance may refer to many other entities. For example, a document D is
- an entity whose provenance refers to other entities such as a chart inserted into D,
- the dataset that was used to create that chart, or the author of the document.
- </p>
- <p>
- Entities may be described from different perspectives that may be more or less specific. For example,
- document D as stored in my file system, the second version of document D after someone edited it,
- and D as an evolving document,
- are three distinct entities for which we may describe the provenance. They
- may all be perspectives on the same thing in the world (document D may exist only
- in its second version and on my file system), but are <i>characterized</i> in
- different ways by being described using different <i>attributes</i> (version, location, and
- so on).
- </p>
- <p>
- The characterization of an entity means that the provenance assertions
- about the entity are only about the thing when it has those attributes.
- For example, the second version of document D is characterized by being the
- second version, and so assertions about who reviewed that entity apply only
- to the document as it is in its second version. When the document becomes
- the third version, a new entity exists (the third version of D) and the
- provenance assertions about who reviewed the second version do not apply.
- </p>
- </section>
-
- <section>
- <h3>Activities</h3>
-
- <p>
- Activities are how entities come into
- existence and how their attributes change to become new entities,
- often making use of previously existing entities to achieve this.
- For example, if the second version of document D was generated
- by a translation from the first version of the document in another language,
- then this translation is an activity.
- An activity may have either already occurred or be still
- taking place when a new entity is generated.
- While entities are static aspects in the world (things), <i>activities</i> are
- dynamic aspects (actions, processes, etc.)
- </p>
- </section>
-
- <section>
- <h3>Use and Generation</h3>
-
- <p>
- Activities <i>generate</i> new entities.
- For example, writing a document brings the document into existence, while
- revising the document brings a new version into existence.
- </p>
- <p>
- Activities also make <i>use</i> of entities. For example, revising a document
- to fix spelling mistakes uses the original version of the document as well
- as a list of corrections.
- </p>
- <p>
- Assertions can be made in a provenance record to state that
- particular activities used or generated particular entities.
- </p>
- </section>
-
- <section>
- <h3>Agents</h3>
-
- <p>
- An agent is a type of entity that takes an active role in an activity such
- that it can be assigned some degree of responsibility for the activity taking
- place. An agent can be a person, a piece of software, or an inanimate object.
- Several agents can be associated with an activity.
- Consider a chart displaying some statistics
- regarding crime rates over time in a linear regression. To represent the
- provenance of a that chart, we could state that the person who created the
- chart was an agent involved in its creation, and that the software used to
- create the chart was also an agent involved in that activity.
- </p>
- <p>
- Since agents are a kind of entity, it is therefore possible to
- associate provenance records with the agents themselves.
- In the running example, we
- can also represent the provenance of the software used to create the chart, and specify the agents involved in
- producing that software, such as the vendor.
- </p>
- </section>
-
- <!--section>
- <h3>Accounts</h3>
-
- <p>An intuitive overview of how to think about accounts in PROV-DM.</p>
- </section -->
-
- <section>
- <h3>Roles</h3>
-
- <p>
- A <i>role</i> is a description of the function or the part that an entity
- played in an activity. Roles specify
- the relationship between an entity and an activity, whether
- how an activity used an entity or generated an entity. Roles also specify how agents are
- involved in an activity, qualifying their participation in the activity or
- specifying what agents controlled it.
- For example, an agent may play the role of "editor" in an activity that uses
- one entity in the role of "document to be edited" and another in the role of "document
- editor", to generate a further entity in the role of "edited document".
- Roles are application specific.
- </p>
- <!--p>Roles are intended as an extension point in the model; it is expected users will define and use custom role taxonomies. Role interpretation is application specific.</p -->
- </section>
-
- <section>
- <h3>Revisions and Derivation</h3>
-
- <p>
- A given entity, such as a document, may go through multiple <i>revisions</i>
- (also called versions and other comparable terms) over time. Between revisions,
- one or more attributes of the entity may change.
- The result of each revision is a new entity,
- and PROV-DM allows one to relate those entities by making an assertion that
- one is a revision of another.
- </p>
- <p>
- When one entity's existence, content, characteristics and so on are
- at least partly due to another entity, then we say that the former is
- <i>derived</i> from the latter. For example, one document may contain
- material copied from another,
- and a chart is derived from the data that is used to create it.
- </p>
- <!-- p>
- <p>
- There are different kinds of derivation expressible in PROV-DM.
- Consider the case of the page in the browser above. It is derived from
- the designer's sketch in the strictest sense, i.e. if the sketch had
- been different so would the page. On the other hand, there are
- entities that are part of the page's history but which did not inform
- the content of that page, i.e. the page would have been the same even
- if the earlier entity changed. For example, on creating the original
- draft of the page, the designer may have included a banner image
- saying "DRAFT - FOR REVIEW ONLY". This banner was not part of the
- sketch, nor part of the published page downloaded to the browser, but
- was part of the page's history, and while not affecting the browsed
- page's content may have been a factor in its existence. Finally, in
- some cases, we may be able to say not only that one entity was derived
- from another, but also how it was derived, i.e. by what activity.
- For example, the page in the browser is derived from the
- page on the web server because a download activity sent the bytes of
- the latter across an HTTP connection to the browser client.
- </p>
- </p -->
- <p>
- There are different kinds of derivation expressible in PROV-DM. For
- example, the data may be normalized before creating the chart.
- In PROV-DM terms, we say that the chart <i>was eventually
- derived from</i> the original data, <i>depended on</i> the settings
- of the graphing software, and <i>was derived
- from</i> the normalized data.
- </p>
- </section>
-
- <!-- p>
- <section>
- <h3>Complementarity</h3>
- <p>
- As described above, entities can be described from different perspectives,
- by being characterized by different attributes. For example, "document D",
- "the second version of document D" and "document D as stored on my filesystem"
- are different entities
- because they are characterized in different ways. However, for some period of time
- they may all refer to the same thing in the world, e.g. for a while the copy of
- D on my filesystem <i>was</i> the second version.
- </p>
- <p>
- In PROV-DM, we say there is <i>complementarity</i> between one entity and another
- if, in some period of time, they have the same or compatible characterization.
- So, both "the second version of document D" and "document D as stored on my filesystem"
- are complements of "document D", because they are both characterized by being
- document D, but with specific additional attributes.
- If, at some point in time, a version of D stored on my filesystem is the second one, then
- "document D as stored on my filesystem" and "the second version of document D" are
- complements of each other.
- </p>
- Several asserted entities could be characterizing the same thing, in
- particular when entities are asserted by different <em>accounts</em> or over
- different time periods. If two such entities have <em>overlapping
- lifespans</em>, and the first entity have some <em>attributes</em> that
- have not been asserted (and not necessarily always true) for the second entity,
- then the first entity is said to be <em>complementing</em> the second
- entity, that is the first entity helps form a more detailed
- description of the second entity, at least for the duration of the
- overlapping lifespan.
- </p>
- <p>
- In addition, if <code>:A prov:wasComplementOf :B</code>, then of all the
- attributes of the entity <code>:A</code> which can be <em>mapped</em> to
- <em>compatible</em> attributes of <code>:B</code> MUST be <em>matching</em>
- for the continuous duration of the overlap of <code>:A</code> and
- <code>:B</code>'s lifespans.
- It is out of scope for PROV to specify or assert the nature of
- the <em>compatibility mapping</em> and <em>matching</em>, the exact
- interpretation of these is left to the asserter of
- <code>wasComplementOf</code>
- </p>
- <p>
- If <code>:B</code> also have some attributes which
- are not asserted (or not always true) about <code>:A</code>,
- then this MAY be asserted using the
- inverse relation <code>:B prov:wasComplementOf :A</code>. If two entities
- both complement each other in this manner, both MUST have some
- attributes the other does not have, although those attributes MAY
- not have been asserted in the provenance. Note that the
- <em>lack</em> of such an inverse assertion does not neccessarily
- mean that <code>:B</code> did not have any additional attributes
- for <code>:A</code> in the timespan, only that this has not
- been asserted.
- </p>
- <p>
- In the simplest case, both entites are described using the same
- attributes, in which case <em>matching</em> means the values SHOULD
- literally be the same (matching by identity). On the other hand an
- attribute like <code>ex1:speed_in_mph</code> can be <em>mapped</em> to
- a compatible <code>ex2:speed_in_kmh</code> attribute. Not all
- attributes might be mappable in both directions, for instance
- <code>ex1:city</code> to <code>ex2:country</code>, but not vice
- versa.
- </p>
- <p>
- Note that it is out of scope for PROV to assert or explain any
- mapping of compatible attributes. This is merely a conclusion
- that can be drawn from the assertion that the two entities both
- described the same thing in the overlapping time spans. Also note
- that asserting a complementary relationship does not detail how the
- two entity timespans overlap, this could be anything from
- complete one-to-one match (where all attributes are always true for
- both entities) to merely touching overlaps.
- </section>
- </p -->
-
-
- </section>
-
- <section>
- <h2>Examples of Use of the PROV-O Ontology</h2>
-
- <p>In the following sections, we show how PROV-DM can be used to model
- provenance in specific examples.</p>
-
- <p>We include examples of how the formal ontology PROV-O
- can be used to represent the PROV-DM assertions as RDF triples.
- These are shown using the Turtle notation. In
- the latter depictions, the namespace prefix <b>prov</b> denotes
- terms from the Prov ontology, while <b>ex1</b>, <b>ex2</b>, etc.
- denote terms specific to the example.</p>
-
- <p>We also provide a representation of the examples in the Abstract
- Syntax Model ASM used in the conceptual model document. The full ASM data
- for the examples in this section is
- included in the appendix.</p>
-
- <section>
- <h3>Entities</h3>
-
- <p>
- An online newspaper publishes an article with a chart about crime statistics making using of data (GovData) provided through a government portal.
- The article includes a chart based on the data, with data values aggregated by
- geographical regions.
- </p>
- <p>
- A blogger, Betty, looking at the article, spots what she thinks to be an error in the chart.
- Betty retrieves the provenance record of the article, how it was created.
- </p>
- <p>Betty would find the following assertions about entities in the provenance record:</p>
- <pre class="turtle example">
- ex1:newspaper1 a prov:Entity .
- ex1:article1 a prov:Entity .
- ex1:regionList1 a prov:Entity .
- ex1:aggregate1 a prov:Entity .
- ex1:chart1 a prov:Entity .
- </pre>
- <p>
- These statements, in order, assert that there is a newspaper (<code>ex1:newspaper1</code>) and an article (<code>ex1:article1</code>),
- that the original data set is an entity (<code>ex1:dataSet1</code>),
- there is a list of regions
- (<code>ex1:regionList1</code>) that is an entity, that the data aggregated by region is an entity (<code>ex1:aggregate1</code>),
- and that the chart (<code>ex1:chart1</code>) is an entity.
- </p>
-
- </section>
-
- <section>
- <h3>Activities</h3>
-
- <p>
- Further, the provenance record asserts that there was
- an activity (<code>ex1:compiled</code>) denoting the compilation of the
- chart from the data set.
- </p>
- <pre class="turtle example">
- ex1:compiled a prov:Activity .
- </pre>
- <p>
- The provenance record also includes reference to the more specific steps involved in this compilation,
- which are first aggregating the data by region and then generating the chart graphic.
- </p>
- <pre class="turtle example">
- ex1:aggregated a prov:Activity .
- ex1:plotted a prov:Activity .
- </pre>
- </section>
-
- <section>
- <h3>Use and Generation</h3>
-
- <p>
- Finally, the provenance record asserts the key relations among the above
- entities and activities, i.e. the use of an entity by an activity,
- or the generation of an entity by an activity.
- </p>
- <p>
- For example, the assertions below state that the aggregation activity
- (<code>ex1:aggregated</code>) used the original data set, that it used the list of
- regions, and that the aggregated data was generated by this activity.
- </p>
- <pre class="turtle example">
- ex1:aggregated prov:used ex1:dataSet1 ;
- prov:used ex1:regionList1 .
- ex1:aggregate1 prov:wasGeneratedBy ex1:aggregated .
- </pre>
- <p>
- Similarly, the chart graphic creation activity (<code>ex1:illustrated</code>)
- used the aggregated data, and the chart was generated by this activity.
- </p>
- <pre class="turtle example">
- ex1:illustrated prov:used ex1:aggregate1 .
- ex1:chart1 prov:wasGeneratedBy ex1:illustrated .
- </pre>
-
- <!--p>
- For example, the provenance declares the event (of type <code>prov:Usage</code>)
- where the aggregation activity used the GovData data set, and the event
- (of type <code>prov:Generation</code>) where the same activity generated
- the data aggregated by region.
- </p>
- <pre class="turtle example">
- ex1:dataSet1Usage a prov:Usage .
- ex1:aggregate1Generation a prov:Generation .
- </pre>
- <p>
- To describe these events, the provenance says within which activity
- they occur and what entity is used or generated.
- </p>
- <pre class="turtle example">
- ex1:aggregated prov:qualifiedUsage ex1:dataSet1Usage .
- ex1:aggregated prov:qualifiedGeneration ex1:aggregate1Generation .
- ex1:dataSet1Usage prov:entity ex1:dataSet1 .
- ex1:aggregate1Generation prov:entity ex1:aggregate1 .
- </pre>
- <p>
- Comparable events are described for the activity of generating the chart image
- from the aggregated data.
- </p>
- <pre class="turtle example">
- ex1:aggregate1Usage a prov:Usage .
- ex1:chart1Generation a prov:Generation .
- ex1:illustrated prov:qualifiedUsage ex1:aggregate1Usage .
- ex1:illustrated prov:qualifiedGeneration ex1:chart1Generation .
- ex1:aggregate1Usage prov:entity ex1:aggregate1 .
- ex1:chart1Generation prov:entity ex1:chart1 .
- </pre>
- <p>
- From this information Betty can see that
- the mistake could have been in the original data set or else was introduced
- in the compilation activity, and sets out to discover which.
- </p>
- </p -->
-
- </section>
-
- <section>
- <h3>Agents</h3>
-
- <p>
- Digging deeper, Betty wants to know who compiled the chart.
- Betty sees that Derek was involved in both the aggregation and
- chart creation activities:
- </p>
- <pre class="turtle example">
- ex1:aggregated prov:wasAssociatedWith ex1:derek .
- ex1:illustrated prov:wasAssociatedWith ex1:derek .
- </pre>
- <p>
- The record for Derek provides the
- following information, of which the first line is a PROV-O statement that
- Derek is an agent, followed by statements about general properties of Derek.
- </p>
- <pre class="turtle example">
- ex1:derek a prov:Agent ;
- a foaf:Person ;
- foaf:givenName "Derek"^^xsd:string ;
- foaf:mbox <mailto:dererk@example.org> .
- </pre>
- </section>
-
- <!-- section>
- <h3>Accounts</h3>
-
- <p><i>Suggested example:</i> The analyst provides his own record of how he compiled GovData to create
- the chart, which provides more detail than in the newspaper's provenance data.
- Specifically, the analysts account separates compilation into two stages: aggregating
- data by region and then producing the chart. Therefore, there are two separate
- accounts of the same events.</p>
- </section -->
-
- <section>
- <h3>Roles</h3>
-
- <p>
- For Betty to understand where the error lies, she needs to have more detailed
- information on how entities have been used in, participated in, and generated
- by activities. Betty has determined that <code>ex1:aggregated</code> used
- entities <code>ex1:regionList1</code> and <code>ex1:dataSet1</code>, but she does not
- know what function these entities played in the processing. Betty
- also knows that <code>ex1:derek</code> controlled the activities, but she does
- not know if Derek was the analyst responsible for determining how the data
- should be aggregated.
- </p>
- <p>
- The above information is described as roles in the provenance records. The aggregation
- activity involved entities in four roles: the data to be aggregated (<code>ex1:dataToAggregate</code>),
- the regions to aggregate by (<code>ex1:regionsToAggregateBy</code>), the
- resulting aggregated data (<code>ex1:aggregatedData</code>), and the
- analyst doing the aggregation (<code>ex1:analyst</code>).
- </p>
- <pre class="turtle example">
- ex1:dataToAggregate a prov:Role .
- ex1:regionsToAggregateBy a prov:Role .
- ex1:aggregatedData a prov:Role .
- ex1:analyst a prov:Role .
- </pre>
- <p>
- In addition to the simple facts that the aggregation activity used, generated or
- was controlled by entities/agents as described in the sections above, the
- provenance record contains more details of <i>how</i> these entities and agents
- were involved, i.e. the roles they played. For example, the assertions below state
- that the aggregation activity (<code>ex1:aggregated</code>) included the usage
- of the government data set (<code>ex1:dataSet1</code>) in the role of the data
- to be aggregated (<code>ex1:dataToAggregate</code>).
- </p>
- <pre class="turtle example">
- ex1:aggregated prov:hadQualifiedUsage [ a prov:Usage ;
- prov:hadQualifiedEntity ex1:dataSet1 ;
- prov:hadRole ex1:dataToAggregate ] .
- </pre>
- <p>
- This can then be distinguished from the same activity's usage of the list of
- regions because the roles played are different.
- </p>
- <pre class="turtle example">
- ex1:aggregated prov:hadQualifiedUsage [ a prov:Usage ;
- prov:hadQualifiedEntity ex1:regionList1 ;
- prov:hadRole ex1:regionsToAggregateBy ] .
- </pre>
- <p>
- Similarly, the provenance includes assertions that the same activity was
- controlled in a particular way (<code>ex1:analyst</code>) by Derek, and that
- the entity <code>ex1:aggregate1</code> took the role of the aggregated
- data in what the activity generated.
- </p>
- <pre class="turtle example">
- ex1:aggregated
- prov:hadQualifiedControl [ a prov:Control ;
- prov:hadQualifiedEntity ex1:derek ;
- prov:hadRole ex1:analyst
- ] ;
- prov:hadQualifiedGeneration [ a prov:Generation ;
- prov:hadQualifiedEntity ex1:aggregate1 ;
- prov:hadRole ex1:aggregatedData
- ] .
- </pre>
- </section>
-
- <section>
- <h3>Revision</h3>
-
- <p>
- After looking at the detail of the compilation activity, there appears
- to be nothing wrong, so Betty concludes the error is in the government dataset.
- She looks at the characterization of the dataset <code>ex1:dataSet1</code>,
- and sees that it is missing data from one of the zipcodes in the area. She contacts
- the government, and a new version of GovData is created, declared to be the
- next revision of the data by Edith. The provenance record of this new dataset,
- <code>ex1:dataSet2</code>, states that it is a revision of the
- old data set, <code>ex1:dataSet1</code>.
- </p>
- <pre class="turtle example">
- ex1:dataSet2 prov:wasRevisionOf ex1:dataSet1 .
- </pre>
- </section>
-
- <!-- section>
- <h3>Complementarity</h3>
-
- <p>Betty lets Derek know that a new revision of the data set exists,
- and he looks at the provenance of the new data to understand what he needs to
- re-analyze. </p>
- <p>In addition to specifying that
- <code>ex1:dataSet2</code> is a new revision of
- <code>ex1:dataSet1</code>, the provenance from DataGov also
- asserts that both of these entities were a <em>complement of</em>
- another entity <code>ex1:dataSet</code>.
- </p>
- <pre class="turtle example">
- ex1:dataSet1 prov:wasComplementOf ex1:dataSet .
- ex1:dataSet2 prov:wasComplementOf ex1:dataSet .
- </pre>
- <!--
- <pre class="asn example">
- wasComplementOf(ex1:dataSet1, ex1:dataSet)
- wasComplementOf(ex1:dataSet2, ex1:dataSet)
- </pre>
- -->
- <p>
- This assertion means that <code>ex1:dataSet1</code> at some point shared
- its characterizing attributes with <code>ex1:dataSet</code>, and the same for
- <code>ex2:dataSet2</code>. Thus the <em>entity</em>
- <code>ex1:dataSet1</code> did at some point represent the same
- thing as characterized by the entity <code>ex1:dataSet</code>. The same is
- true for <code>ex1:dataSet2</code>, though not necessarily at the
- same point in time.
- </p>
- <!-- p>
- The term <em>was complement of</em> here means that the
- <code>ex1:dataSet1</code>
- provide additional details that adds to the details of
- <code>ex1:dataSet</code> (complementing it), and that both of these
- entities represented the same thing.
- Characterizing attributes of <code>ex1:dataSet</code> are from this
- asserted to have been <em>compatible</em> with the properties of
- <code>ex1:dataSet1</code> and <code>ex1:dataSet2</code>.
- <em>Compatible</em> here means that some kind of mapping can be
- established between the attributes, they don't neccessarily have to
- match directly.
- </p -->
- <p>
- Derek then looks at the characterization of the generalized data set
- (<code>ex1:dataSet</code>) to find the attributes shared with the first
- and second versions of the data set. The assertions below give the generalized
- data set's attributes: it is of type <code>ex1:DataSet</code>, it covers
- three named regions, it was created by <code>ex1:DataGov</code>, and
- has a given title.
- </p>
- <pre class="example turtle">
- ex1:dataSet a ex1:DataSet ;
- ex1:regions ( ex1:North, ex1:NorthWest, ex1:East ) ;
- dc:creator ex1:DataGov ;
- dc:title "Regional incidence dataset 2011" .
- </pre>
- <p>
- As <code>ex1:dataSet1</code> and <code>ex1:dataSet2</code> complement
- <code>ex1:dataSet</code>,
- Derek can deduce from the above attributes that both the former had
- these same attributes at some point, i.e.
- the creator <code>ex1:DataGov</code> and so on. Derek compares the above
- assertions to the
- attributes of <code>ex1:dataSet1</code>.
- </p>
- <pre class="example turtle">
- ex1:dataSet1 a ex1:DataSet ;
- ex1:postCodes ( "N1", "N2", "NW1", "E1", "E2" ) ;
- ex1:totalIncidents 141 ;
- dc:creator ex1:DataGov ;
- dc:title "Regional incidence dataset 2011" .
- </pre>
- <p>
- Shared characterizing attributes are not necessarily represented in
- the serialized assertions of different entities. For example, the creator
- and title are exactly the same for <code>ex1:dataSet</code> and <code>ex1:dataSet1</code>,
- but the regions covered by the data set are described in a different way:
- "regions" for <code>ex1:dataSet</code> and "postCodes" for <code>ex1:dataSet1</code>.
- Whether these are equivalent is a domain-specific judgment.
- We can also see that, while <code>ex1:dataSet1</code> complements <code>ex1:dataSet</code>,
- the inverse is not true. <code>ex1:dataSet1</code> is more specific, because
- it has a "totalIncidents" attribute specific to that version of the data set.
- </p>
- <!-- p>
- Derek sees that the creator and title are directly mappable and
- equal between these entities. He also knows (from his region
- aggregation method) that the <code>ex1:postCodes</code> <code>N1</code> and
- <code>N2</code> are in the
- region <code>ex1:North</code>, and so on, and can confirm that although
- this regional characterisation of the data is not expressed
- using the same attributes in the two entities, they are <em>compatible</em>.
- </p>
- <p>Derek notes that <code>ex1:totalIncidents</code> is not stated
- for <code>ex1:dataSet</code>, and not mappable to any of the
- other existing attributes. Thus this could be one of the
- complementing attributes that makes <code>ex1:dataSet1</code>
- more specific than <code>ex1:dataSet</code>.
-
- Derek can from the assertion <code>ex1:dataSet1
- prov:wasComplementOf ex1:dataSet</code>
- see that <code>ex1:dataSet</code>
- did have 141 incidents when its characterization interval
- overlapped that of <code>ex1:dataSet1</code>, but not neccessarily
- throughout its lifetime. Note that in this example the provenance
- assertions are not providing any direct description of the
- characterization interval of the entities.
- </p>
- <p>
- Due to the open world assumption (more
- information might be added later) he can not conclude
- from this alone that <code>ex1:dataSet</code> at any point did
- <strong>not</strong> have 141 incidents. He therefore does not know
- for sure that <code>ex1:totalIncidents</code> is a complementing
- attribute which <code>ex1:dataSet</code> does not have in its
- characterisation.
- </p>
- <p>
- Derek finally compares the newer revision
- <code>ex1:dataSet2</code> with
- <code>ex1:dataSet</code>:
- </p>
- <pre class="example turtle">
- ex1:dataSet2 a ex1:DataSet ;
- ex1:postCodes ( "N1", "N2", "NW1", "NW2", "E1", "E2" ) ;
- ex1:totalIncidents 158 ;
- dc:creator ex1:DataGov ;
- dc:title "Regional incidence dataset 2011" .
- </pre>
- <p>
- In this revision, the new postcode <kbd>NW2</kbd> appears, this is still
- <em>compatible</em> with the region <code>ex1:NorthWest</code>
- of <code>ex1:dataSet</code>
- On the other hand, the attribute <code>prov:totalIncidents</code> have gone up to 158.
- </p>
- <p>
- From the <code>prov:wasComplementOf</code> assertion Derek knows that
- <code>ex1:dataSet2</code> also provides additional attributes for
- <code>ex1:dataSet</code>, but because the total incidents can't
- both be 141 and 158, the attribute <code>ex1:totalIncidents</code>
- is a complementing attribute, and changes over the
- characterisation interval (lifespan) of <code>ex1:dataSet</code>,
- and is thus not one of its characterising attributes. He also now
- knows that <code>ex1:dataSet</code> is a common characterisation
- of the dataset that spans (parts of) both revisions. It has
- however not been asserted explicitly that the
- <code>ex1:dataSet</code> is a somewhat more general
- characterisation, just that it allows mutability on the
- <code>prov:totalIncidents</code> attribute and overlapped (parts
- of) the timespans of the two revisions.
- </p>
- <p>
- From this Derek concludes that he can still use the regions North,
- North West and East in the diagram layout, but as the
- <code>ex1:totalIncidents</code> differ, something in the
- raw data has changed. He can't from this provenance assertion
- alone tell if that is merely from the addition of the post code
- NW2, or if data for the other post codes have changed as well.
- Derek decides to redo the aggregation by region using
- <code>ex1:dataSet2</code> and regenerate the
- chart using the same layout.
- </p -->
- </section -->
-
- <section>
- <h3>Derivation</h3>
-
- <p>
- Derek notices that there is a new dataset available and creates a new chart based on the revised data,
- using the same compilation activity as before. Betty checks the article again at a
- later point, and wants to know if it is based on the old or new GovData.
- She sees three new assertions about derivation in the provenance data, plus
- an assertion about how the new chart was generated.
- </p>
- <pre class="example turtle">
- ex1:chart2 prov:dependedOn ex1:dataSet2 .
- ex1:chart2 prov:wasEventuallyDerivedFrom ex1:dataSet2 .
- ex1:chart2 prov:wasDerivedFrom ex1:dataSet2 .
- ex1:chart2 prov:wasGeneratedBy ex1:compiled2 .
- </pre>
- <p>
- She interprets these assertions as follows. The first says that the new chart included,
- somewhere in the history of its creation, the revised data set.
- The second says further that the new chart is as it because of the revised
- data set, i.e. there is an explicit influence of the data on the chart.
- Finally, the third and fourth assertions together say further that it was
- the activity <code>ex1:compiled2</code> that derived the new chart
- from the revised data set.
- </p>
- </section>
- </section>
-
- <section>
- <h2>Frequently asked questions</h2>
- </section>
-
- <section class="appendix">
- <h2>Abstract Syntax Notation for Examples</h2>
- <p>
- Below we give translations of the working example snippets into the PROV-DM
- abstract syntax notation (ASN).
- </p>
- <section>
- <h3>Entities</h3>
- <pre class="example asn">
- entity(ex1:dataSet1).
- entity(ex1:regionList1).
- entity(ex1:aggregate1).
- entity(ex1:chart1).
- </pre>
- </section>
-
- <section>
- <h3>Activities</h3>
- <pre class="example asn">
- activity(ex1:compiled).
- activity(ex1:aggregated).
- activity(ex1:illustrated).
- </pre>
- <!--
- <p>
- In the first assertion above, 'compilation_step' is an optional reference to the 'recipe' that describes
- what the 'compiled' activity did. The interpretation of its name,
- 'compilation_step', is left to applications (it is not further resolved within PROV-DM).
- </p>
- <p>
- In the second assertion, optional 'recipe' has been omitted.
- </p>
- -->
- <!--PM comment: here readers will be confused by the processExecutiion / activity disconnect!
- also this does not show start/end times, optional attributes. At least one example would be useful-->
- </section>
-
- <section>
- <h3>Use and Generation</h3>
- <pre class="example asn">
- used(ex1:aggregated, ex1:dataSet1).
- used(ex1:aggregated, ex1:regionList1).
- wasGeneratedBy(ex1:aggregate1, ex1:aggregated).
-
- used(ex1:illustrated, ex1:aggregate1).
- wasGeneratedBy(ex1:chart1, ex1:illustrated).
- </pre>
- </section>
-
- <section>
- <h3>Agents</h3>
- <pre class="example asn">
- entity(ex1:derek, [ type="foaf:Person", foaf:givenName = "Derek",
- foaf:mbox= "<mailto:derek@example.org>"]).
- agent(ex1:derek).
-
- wasControlledBy(ex1:aggregated, ex1:derek).
- wasControlledBy(ex1:illustrated, ex1:derek).
- </pre>
- </section>
-
- <section>
- <h3>Roles</h3>
- <p>
- Roles are not declared directly in PROV-DM, rather they are attributes of
- relations. Thus, the entire Turtle example in sec. 3.5 is rendered as follows:
- </p>
- <pre class="example asn">
- used(ex1:aggregated, ex1:dataSet1, [ prov:role = "dataToAggregate"]).
- used(ex1:aggregated, ex1:regionList1, [ prov:role = "regionsToAggregteBy"]).
- </pre>
- <p>
- In the first assertion above, note that this adds a "role" attribute to the first 'used' assertion of Ex. 3.
- Similarly in the second assertion, we have added a "role" attribute to the second 'used' assertion of Ex. 3.
- </p>
- </section>
-
- <section>
- <h3>Revision</h3>
- <pre class="example asn">
- wasRevisionOf(ex1:dataSet2, ex1:dataSet1).
- </pre>
- </section>
-
- <!--
- <section>
- <h3>Complementarity</h3>
- <pre class="example asn">
- entity(ex1:dataSet, [ type="ex1:DataSet", ex1:regions ="(ex1:North, ex1:NorthWest, ex1:East)",
- dc:creator="ex1:DataGov", dc:title="Regional incidence dataset 2011" ]).
-
- wasComplementOf(dataSet1, dataSet).
- wasComplementOf(dataSet2, dataSet).
-
- entity(ex1:dataSet1, [ type="ex1:DataSet", ex1:postCodes="( 'N1', 'N2', 'NW1', 'E1', 'E2' ) ",
- ex1:totalIncidents = "141", dc:creator = " ex1:DataGov",
- dc:title = "Regional incidence dataset 2011" ]).
- </pre>
- </section>
- -->
-
- <section>
- <h3>Derivation</h3>
- <pre class="example asn">
- dependedOn(ex1:chart2, ex1:dataSet2).
- wasEventuallyDerivedFrom(ex1:chart2, ex1:dataSet2).
- wasDerivedFrom(ex1:chart2, ex1:dataSet2).
- wasGeneratedBy(ex1:chart2, ex1:compiled2).
- </pre>
- </section>
- </section>
-
- <section class="appendix">
- <h2>Acknowledgements</h2>
- <p>
- WG membership to be listed here.
- </p>
- </section>
-
- </body></html>
+<!DOCTYPE html>
+<html><head>
+ <title>Prov Model Primer</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <!--
+ === NOTA BENE ===
+ For the three scripts below, if your spec resides on dev.w3 you can check them
+ out in the same tree and use relative links so that they'll work offline,
+ -->
+ <!-- PM -->
+ <style type="text/css">
+ .note { font-size:small; margin-left:50px }
+ </style>
+
+ <script src="http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js" class="remove"></script>
+
+ <script class="remove">
+ var respecConfig = {
+ // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
+ specStatus: "ED",
+
+ // the specification's short name, as in http://www.w3.org/TR/short-name/
+ shortName: "Prov-Primer",
+
+ // if your specification has a subtitle that goes below the main
+ // formal title, define it here
+ subtitle : "Initial draft for internal discussion",
+
+ // if you wish the publication date to be other than today, set this
+ // publishDate: "2009-08-06",
+
+ // if the specification's copyright date is a range of years, specify
+ // the start date here:
+ // copyrightStart: "2005"
+
+ // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
+ // and its maturity status
+ // previousPublishDate: "1977-03-15",
+ // previousMaturity: "WD",
+
+ // if there a publicly available Editor's Draft, this is the link
+ edDraftURI: "http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html",
+
+ // if this is a LCWD, uncomment and set the end of its review period
+ // lcEnd: "2009-08-05",
+
+ // if you want to have extra CSS, append them to this list
+ // it is recommended that the respec.css stylesheet be kept
+ extraCSS: ["http://dev.w3.org/2009/dap/ReSpec.js/css/respec.css", "./extra.css"],
+
+ // editors, add as many as you like
+ // only "name" is required
+ editors: [
+ { name: "Yolanda Gil", url: "http://www.isi.edu/~gil/",
+ company: "Information Sciences Institute, University of Southern California, US" },
+ { name: "Simon Miles", url: "http://www.inf.kcl.ac.uk/~simonm",
+ company: "King's College London, UK" },
+ ],
+
+ // authors, add as many as you like.
+ // This is optional, uncomment if you have authors as well as editors.
+ // only "name" is required. Same format as editors.
+
+ authors: [
+ { name: "TBD"},
+ ],
+
+ // name of the WG
+ wg: "Provenance Working Group",
+
+ // URI of the public WG page
+ wgURI: "http://www.w3.org/2011/prov/wiki/Main_Page",
+
+ // name (with the @w3c.org) of the public mailing to which comments are due
+ wgPublicList: "public-prov-wg",
+
+ // URI of the patent status for this WG, for Rec-track documents
+ // !!!! IMPORTANT !!!!
+ // This is important for Rec-track documents, do not copy a patent URI from a random
+ // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
+ // Team Contact.
+ wgPatentURI: "",
+ };
+ </script>
+ </head>
+ <body>
+ <section id="abstract">
+ <p>This document aims to provide an intuitive guide to the PROV Data Model,
+ with worked examples.</p>
+
+ <p>
+ This is a document for internal discussion, which will ultimately
+ evolve in the first Public Working Draft of the Primer.</p>
+ </section>
+
+ <section>
+ <h2>Introduction</h2>
+ <p>
+ This primer document provides an accessible introduction to the PROV Data Model
+ (PROV-DM) standard for representing provenance on the Web, and its representation
+ in the PROV Ontology (PROV-O). Provenance describes
+ the origins of things, so PROV-DM data consists of assertions about the past.
+ </p>
+
+ <p>
+ This primer document aims to ease the adoption of the standard by providing:
+ </p>
+ <ul>
+ <li>An intuitive explanation of how PROV-DM models provenance.</li>
+ <li>Worked examples that can be followed to produce your own PROV-DM data.</li>
+ <li>Answers to frequently asked questions regarding how the model should be applied.</li>
+ </ul>
+
+ <p>
+ The <i>provenance</i> of digital objects represents their origins. The PROV-DM is a
+ proposed standard to represent provenance records, which contain <i>assertions</i> about the entities
+ and activities involved in producing and delivering or otherwise influencing a
+ given object. By knowing the provenance of an object, we can make determinations
+ about how to use it. Provenance records can be used for many purposes, such as
+ understanding how data was collected so it can be meaningfully used, determining
+ ownership and rights over an object, making judgments about information to
+ determine whether to trust it, verifying that the process and steps used to obtain a
+ result complies with given requirements, and reproducing how something it was generated.
+ </p>
+
+ <p>
+ As a standard for provenance, PROV-DM accommodates all those different uses
+ of provenance. Different people may have different perspectives on provenance,
+ and as a result different types of information might be captured in provenance records.
+ One perspective might focus on <i>agent-centered provenance</i>, that is, what entities
+ were involved in generating or manipulating the information in question. For example,
+ in the provenance of a picture in a news article we might capture the photographer who
+ took it, the person that edited it, and the newspaper that published it. A second perspective
+ might focus on <i>object-centered provenance</i>, by tracing the origins of portions of a
+ document to other documents. An example is having a web page that was assembled from content
+ from a news article, quotes of interviews with experts, and a chart that plots data from a
+ government agency. A third perspective one might take is on <i>process-centered provenance</i>,
+ capturing the actions and steps taken to generate the information in question. For example, a
+ chart may have been generated by invoking a service to retrieve data from a database, then
+ extracting certain statistics from the data using some statistics package, and finally
+ processing these results with a graphing tool.
+ </p>
+
+ <p>
+ Provenance records are metadata. There are other kinds of metadata that is
+ not provenance. For example, the size of an image is a metadata property of
+ that image but it is not provenance.
+ </p>
+
+ <p>
+ For general background on provenance, a
+ comprehensive overview of requirements, use cases, prior research, and proposed
+ vocabularies for provenance are available from the
+ <a href="http://www.w3.org/2005/Incubator/prov/XGR-prov/">Final Report of the W3C Provenance Incubator Group</a>.
+ That document contains three general scenarios
+ that may help identify the provenance aspects of your planned applications and
+ help plan the design of your provenance system.
+ </p>
+ <p>
+ The next section gives an introductory overview of PROV-DM using simple examples.
+ The following section shows how the formal ontology PROV-O can be used to represent the PROV-DM assertions
+ as RDF triples. The document also contains frequently asked questions, and an appendix giving example
+ snippets of the PROV-DM Abstract Syntax Notation (ASN).
+ For a detailed description of PROV-DM, please refer to the
+ <a href="http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html">PROV Data Model and Abstract Syntax Notation document</a>.
+ For a detailed description of PROV-O, refer to the
+ <a href="http://dvcs.w3.org/hg/prov/raw-file/default/ontology/ProvenanceFormalModel.html">PROV Ontology Model and Formal Semantics document</a>.
+ </p>
+ </section>
+
+ <section>
+ <h2>Intuitive overview of PROV-DM</h2>
+
+ <p><i>This section provides an intuitive explanation of the concepts in PROV-DM.
+ As with the rest of this document, it should be treated as a starting point for
+ understanding the model, and not normative in itself. The PROV-DM model specification
+ provides precise definitions and constraints to be used.</i></p>
+
+<p>
+The following ER diagram provides a high level overview of the <strong>structure of PROV-DM records</strong>.
+The diagram is the same that appears in the
+<a href="http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html">PROV Data Model and Abstract Syntax Notation document</a>,
+but note that this primer document only describes some of the terms shown in the diagram.
+</p>
+
+<div style="text-align: center;">
+ <img src="overview.png" alt="PROV-DM overview"/>
+</div>
+
+ <section>
+ <h3>Entities</h3>
+
+ <p>
+ In PROV-DM, the things that one may ask the provenance of are called <i>entities</i>.
+ Examples of such entities are a web page, a chart, and a spellchecker.
+ </p>
+ <p>
+ An entity’s provenance may refer to many other entities. For example, a document D is
+ an entity whose provenance refers to other entities such as a chart inserted into D,
+ the dataset that was used to create that chart, or the author of the document.
+ </p>
+ <p>
+ Entities may be described from different perspectives that may be more or less specific. For example,
+ document D as stored in my file system, the second version of document D after someone edited it,
+ and D as an evolving document,
+ are three distinct entities for which we may describe the provenance. They
+ may all be perspectives on the same thing in the world (document D may exist only
+ in its second version and on my file system), but are <i>characterized</i> in
+ different ways by being described using different <i>attributes</i> (version, location, and
+ so on).
+ </p>
+ <p>
+ The characterization of an entity means that the provenance assertions
+ about the entity are only about the thing when it has those attributes.
+ For example, the second version of document D is characterized by being the
+ second version, and so assertions about who reviewed that entity apply only
+ to the document as it is in its second version. When the document becomes
+ the third version, a new entity exists (the third version of D) and the
+ provenance assertions about who reviewed the second version do not apply.
+ </p>
+ </section>
+
+ <section>
+ <h3>Activities</h3>
+
+ <p>
+ Activities are how entities come into
+ existence and how their attributes change to become new entities,
+ often making use of previously existing entities to achieve this.
+ For example, if the second version of document D was generated
+ by a translation from the first version of the document in another language,
+ then this translation is an activity.
+ An activity may have either already occurred or be still
+ taking place when a new entity is generated.
+ While entities are static aspects in the world (things), <i>activities</i> are
+ dynamic aspects (actions, processes, etc.)
+ </p>
+ </section>
+
+ <section>
+ <h3>Use and Generation</h3>
+
+ <p>
+ Activities <i>generate</i> new entities.
+ For example, writing a document brings the document into existence, while
+ revising the document brings a new version into existence.
+ </p>
+ <p>
+ Activities also make <i>use</i> of entities. For example, revising a document
+ to fix spelling mistakes uses the original version of the document as well
+ as a list of corrections.
+ </p>
+ <p>
+ Assertions can be made in a provenance record to state that
+ particular activities used or generated particular entities.
+ </p>
+ </section>
+
+ <section>
+ <h3>Agents</h3>
+
+ <p>
+ An agent is a type of entity that takes an active role in an activity such
+ that it can be assigned some degree of responsibility for the activity taking
+ place. An agent can be a person, a piece of software, or an inanimate object.
+ Several agents can be associated with an activity.
+ Consider a chart displaying some statistics
+ regarding crime rates over time in a linear regression. To represent the
+ provenance of a that chart, we could state that the person who created the
+ chart was an agent involved in its creation, and that the software used to
+ create the chart was also an agent involved in that activity.
+ </p>
+ <p>
+ Since agents are a kind of entity, it is therefore possible to
+ associate provenance records with the agents themselves.
+ In the running example, we
+ can also represent the provenance of the software used to create the chart, and specify the agents involved in
+ producing that software, such as the vendor.
+ </p>
+ </section>
+
+ <section>
+ <h3>Roles</h3>
+
+ <p>
+ A <i>role</i> is a description of the function or the part that an entity
+ played in an activity. Roles specify
+ the relationship between an entity and an activity, whether
+ how an activity used an entity or generated an entity. Roles also specify how agents are
+ involved in an activity, qualifying their participation in the activity or
+ specifying what agents controlled it.
+ For example, an agent may play the role of "editor" in an activity that uses
+ one entity in the role of "document to be edited" and another in the role of
+ "addition to be made to the document", to generate a further entity in the role of "edited document".
+ Roles are application specific.
+ </p>
+ <!--p>Roles are intended as an extension point in the model; it is expected users will define and use custom role taxonomies. Role interpretation is application specific.</p -->
+ </section>
+
+ <section>
+ <h3>Revisions and Derivation</h3>
+
+ <p>
+ A given entity, such as a document, may go through multiple <i>revisions</i>
+ (also called versions and other comparable terms) over time. Between revisions,
+ one or more attributes of the entity may change.
+ The result of each revision is a new entity,
+ and PROV-DM allows one to relate those entities by making an assertion that
+ one is a revision of another.
+ </p>
+ <p>
+ When one entity's existence, content, characteristics and so on are
+ at least partly due to another entity, then we say that the former is
+ <i>derived</i> from the latter. For example, one document may contain
+ material copied from another,
+ and a chart is derived from the data that is used to create it.
+ </p>
+ <p>
+ There are different kinds of derivation expressible in PROV-DM. For
+ example, the data may be normalized before creating the chart.
+ In PROV-DM terms, we say that the chart <i>was derived from</i>
+ the normalized data and <i>was eventually derived from</i> the original data.
+ </p>
+ </section>
+
+
+ </section>
+
+ <section>
+ <h2>Examples of Use of the PROV-O Ontology</h2>
+
+ <p>In the following sections, we show how PROV-DM can be used to model
+ provenance in specific examples.</p>
+
+ <p>We include examples of how the formal ontology PROV-O
+ can be used to represent the PROV-DM assertions as RDF triples.
+ These are shown using the Turtle notation. In
+ the latter depictions, the namespace prefix <b>prov</b> denotes
+ terms from the Prov ontology, while <b>ex1</b>, <b>ex2</b>, etc.
+ denote terms specific to the example.</p>
+
+ <p>We also provide a representation of the examples in the Abstract
+ Syntax Model ASM used in the conceptual model document. The full ASM data
+ for the examples in this section is
+ included in the appendix.</p>
+
+ <section>
+ <h3>Entities</h3>
+
+ <p>
+ An online newspaper publishes an article with a chart about crime statistics making using of data (GovData) provided through a government portal.
+ The article includes a chart based on the data, with data values aggregated by
+ geographical regions.
+ </p>
+ <p>
+ A blogger, Betty, looking at the article, spots what she thinks to be an error in the chart.
+ Betty retrieves the provenance record of the article, how it was created.
+ </p>
+ <p>Betty would find the following assertions about entities in the provenance record:</p>
+ <pre class="turtle example">
+ ex1:newspaper1 a prov:Entity .
+ ex1:article1 a prov:Entity .
+ ex1:regionList1 a prov:Entity .
+ ex1:aggregate1 a prov:Entity .
+ ex1:chart1 a prov:Entity .
+ </pre>
+ <p>
+ These statements, in order, assert that there is a newspaper (<code>ex1:newspaper1</code>) and an article (<code>ex1:article1</code>),
+ that the original data set is an entity (<code>ex1:dataSet1</code>),
+ there is a list of regions
+ (<code>ex1:regionList1</code>) that is an entity, that the data aggregated by region is an entity (<code>ex1:aggregate1</code>),
+ and that the chart (<code>ex1:chart1</code>) is an entity.
+ </p>
+
+ </section>
+
+ <section>
+ <h3>Activities</h3>
+
+ <p>
+ Further, the provenance record asserts that there was
+ an activity (<code>ex1:compiled</code>) denoting the compilation of the
+ chart from the data set.
+ </p>
+ <pre class="turtle example">
+ ex1:compiled a prov:Activity .
+ </pre>
+ <p>
+ The provenance record also includes reference to the more specific steps involved in this compilation,
+ which are first aggregating the data by region and then generating the chart graphic.
+ </p>
+ <pre class="turtle example">
+ ex1:aggregated a prov:Activity .
+ ex1:illustrated a prov:Activity .
+ </pre>
+ </section>
+
+ <section>
+ <h3>Use and Generation</h3>
+
+ <p>
+ Finally, the provenance record asserts the key relations among the above
+ entities and activities, i.e. the use of an entity by an activity,
+ or the generation of an entity by an activity.
+ </p>
+ <p>
+ For example, the assertions below state that the aggregation activity
+ (<code>ex1:aggregated</code>) used the original data set, that it used the list of
+ regions, and that the aggregated data was generated by this activity.
+ </p>
+ <pre class="turtle example">
+ ex1:aggregated prov:used ex1:dataSet1 ;
+ prov:used ex1:regionList1 .
+ ex1:aggregate1 prov:wasGeneratedBy ex1:aggregated .
+ </pre>
+ <p>
+ Similarly, the chart graphic creation activity (<code>ex1:illustrated</code>)
+ used the aggregated data, and the chart was generated by this activity.
+ </p>
+ <pre class="turtle example">
+ ex1:illustrated prov:used ex1:aggregate1 .
+ ex1:chart1 prov:wasGeneratedBy ex1:illustrated .
+ </pre>
+
+ <!--p>
+ For example, the provenance declares the event (of type <code>prov:Usage</code>)
+ where the aggregation activity used the GovData data set, and the event
+ (of type <code>prov:Generation</code>) where the same activity generated
+ the data aggregated by region.
+ </p>
+ <pre class="turtle example">
+ ex1:dataSet1Usage a prov:Usage .
+ ex1:aggregate1Generation a prov:Generation .
+ </pre>
+ <p>
+ To describe these events, the provenance says within which activity
+ they occur and what entity is used or generated.
+ </p>
+ <pre class="turtle example">
+ ex1:aggregated prov:qualifiedUsage ex1:dataSet1Usage .
+ ex1:aggregated prov:qualifiedGeneration ex1:aggregate1Generation .
+ ex1:dataSet1Usage prov:entity ex1:dataSet1 .
+ ex1:aggregate1Generation prov:entity ex1:aggregate1 .
+ </pre>
+ <p>
+ Comparable events are described for the activity of generating the chart image
+ from the aggregated data.
+ </p>
+ <pre class="turtle example">
+ ex1:aggregate1Usage a prov:Usage .
+ ex1:chart1Generation a prov:Generation .
+ ex1:illustrated prov:qualifiedUsage ex1:aggregate1Usage .
+ ex1:illustrated prov:qualifiedGeneration ex1:chart1Generation .
+ ex1:aggregate1Usage prov:entity ex1:aggregate1 .
+ ex1:chart1Generation prov:entity ex1:chart1 .
+ </pre>
+ <p>
+ From this information Betty can see that
+ the mistake could have been in the original data set or else was introduced
+ in the compilation activity, and sets out to discover which.
+ </p>
+ </p -->
+
+ </section>
+
+ <section>
+ <h3>Agents</h3>
+
+ <p>
+ Digging deeper, Betty wants to know who compiled the chart.
+ Betty sees that Derek was involved in both the aggregation and
+ chart creation activities:
+ </p>
+ <pre class="turtle example">
+ ex1:aggregated prov:wasAssociatedWith ex1:derek .
+ ex1:illustrated prov:wasAssociatedWith ex1:derek .
+ </pre>
+ <p>
+ The record for Derek provides the
+ following information, of which the first line is a PROV-O statement that
+ Derek is an agent, followed by statements about general properties of Derek.
+ </p>
+ <pre class="turtle example">
+ ex1:derek a prov:Agent ;
+ a foaf:Person ;
+ foaf:givenName "Derek"^^xsd:string ;
+ foaf:mbox <mailto:dererk@example.org> .
+ </pre>
+ </section>
+
+ <section>
+ <h3>Roles</h3>
+
+ <p>
+ For Betty to understand where the error lies, she needs to have more detailed
+ information on how entities have been used in, participated in, and generated
+ by activities. Betty has determined that <code>ex1:aggregated</code> used
+ entities <code>ex1:regionList1</code> and <code>ex1:dataSet1</code>, but she does not
+ know what function these entities played in the processing. Betty
+ also knows that <code>ex1:derek</code> controlled the activities, but she does
+ not know if Derek was the analyst responsible for determining how the data
+ should be aggregated.
+ </p>
+ <p>
+ The above information is described as roles in the provenance records. The aggregation
+ activity involved entities in four roles: the data to be aggregated (<code>ex1:dataToAggregate</code>),
+ the regions to aggregate by (<code>ex1:regionsToAggregateBy</code>), the
+ resulting aggregated data (<code>ex1:aggregatedData</code>), and the
+ analyst doing the aggregation (<code>ex1:analyst</code>).
+ </p>
+ <pre class="turtle example">
+ ex1:dataToAggregate a prov:Role .
+ ex1:regionsToAggregateBy a prov:Role .
+ ex1:aggregatedData a prov:Role .
+ ex1:analyst a prov:Role .
+ </pre>
+ <p>
+ In addition to the simple facts that the aggregation activity used, generated or
+ was controlled by entities/agents as described in the sections above, the
+ provenance record contains more details of <i>how</i> these entities and agents
+ were involved, i.e. the roles they played. For example, the assertions below state
+ that the aggregation activity (<code>ex1:aggregated</code>) included the usage
+ of the government data set (<code>ex1:dataSet1</code>) in the role of the data
+ to be aggregated (<code>ex1:dataToAggregate</code>).
+ </p>
+ <pre class="turtle example">
+ ex1:aggregated prov:hadQualifiedUsage [ a prov:Usage ;
+ prov:hadQualifiedEntity ex1:dataSet1 ;
+ prov:hadRole ex1:dataToAggregate ] .
+ </pre>
+ <p>
+ This can then be distinguished from the same activity's usage of the list of
+ regions because the roles played are different.
+ </p>
+ <pre class="turtle example">
+ ex1:aggregated prov:hadQualifiedUsage [ a prov:Usage ;
+ prov:hadQualifiedEntity ex1:regionList1 ;
+ prov:hadRole ex1:regionsToAggregateBy ] .
+ </pre>
+ <p>
+ Similarly, the provenance includes assertions that the same activity was
+ controlled in a particular way (<code>ex1:analyst</code>) by Derek, and that
+ the entity <code>ex1:aggregate1</code> took the role of the aggregated
+ data in what the activity generated.
+ </p>
+ <pre class="turtle example">
+ ex1:aggregated
+ prov:hadQualifiedControl [ a prov:Control ;
+ prov:hadQualifiedEntity ex1:derek ;
+ prov:hadRole ex1:analyst
+ ] ;
+ prov:hadQualifiedGeneration [ a prov:Generation ;
+ prov:hadQualifiedEntity ex1:aggregate1 ;
+ prov:hadRole ex1:aggregatedData
+ ] .
+ </pre>
+ </section>
+
+ <section>
+ <h3>Revision and Derivation</h3>
+
+ <p>
+ After looking at the detail of the compilation activity, there appears
+ to be nothing wrong, so Betty concludes the error is in the government dataset.
+ She looks at the characterization of the dataset <code>ex1:dataSet1</code>,
+ and sees that it is missing data from one of the zipcodes in the area. She contacts
+ the government, and a new version of GovData is created, declared to be the
+ next revision of the data by Edith. The provenance record of this new dataset,
+ <code>ex1:dataSet2</code>, states that it is a revision of the
+ old data set, <code>ex1:dataSet1</code>.
+ </p>
+ <pre class="turtle example">
+ ex1:dataSet2 prov:wasRevisionOf ex1:dataSet1 .
+ </pre>
+ <p>
+ Derek notices that there is a new dataset available and creates a new chart based on the revised data,
+ using the same compilation activity as before. Betty checks the article again at a
+ later point, and wants to know if it is based on the old or new GovData.
+ She sees two new assertions about derivation in the provenance data, plus
+ an assertion about how the new chart was generated.
+ </p>
+ <pre class="example turtle">
+ ex1:chart2 prov:wasEventuallyDerivedFrom ex1:dataSet2 .
+ ex1:chart2 prov:wasDerivedFrom ex1:dataSet2 .
+ ex1:chart2 prov:wasGeneratedBy ex1:compiled2 .
+ </pre>
+ <p>
+ She interprets these assertions as follows. The first says that the new chart
+ is as it because of the revised
+ data set, i.e. there is an explicit influence of the data on the chart.
+ Finally, the third and fourth assertions together say further that it was
+ the activity <code>ex1:compiled2</code> that derived the new chart
+ from the revised data set.
+ </p>
+ </section>
+
+
+ <section>
+ <h2>Frequently asked questions</h2>
+ </section>
+
+ <section class="appendix">
+ <h2>Abstract Syntax Notation for Examples</h2>
+ <p>
+ Below we give translations of the working example snippets into the PROV-DM
+ abstract syntax notation (ASN).
+ </p>
+ <section>
+ <h3>Entities</h3>
+ <pre class="example asn">
+ entity(ex1:dataSet1).
+ entity(ex1:regionList1).
+ entity(ex1:aggregate1).
+ entity(ex1:chart1).
+ </pre>
+ </section>
+
+ <section>
+ <h3>Activities</h3>
+ <pre class="example asn">
+ activity(ex1:compiled).
+ activity(ex1:aggregated).
+ activity(ex1:illustrated).
+ </pre>
+ <!--
+ <p>
+ In the first assertion above, 'compilation_step' is an optional reference to the 'recipe' that describes
+ what the 'compiled' activity did. The interpretation of its name,
+ 'compilation_step', is left to applications (it is not further resolved within PROV-DM).
+ </p>
+ <p>
+ In the second assertion, optional 'recipe' has been omitted.
+ </p>
+ -->
+ <!--PM comment: here readers will be confused by the processExecutiion / activity disconnect!
+ also this does not show start/end times, optional attributes. At least one example would be useful-->
+ </section>
+
+ <section>
+ <h3>Use and Generation</h3>
+ <pre class="example asn">
+ used(ex1:aggregated, ex1:dataSet1).
+ used(ex1:aggregated, ex1:regionList1).
+ wasGeneratedBy(ex1:aggregate1, ex1:aggregated).
+
+ used(ex1:illustrated, ex1:aggregate1).
+ wasGeneratedBy(ex1:chart1, ex1:illustrated).
+ </pre>
+ </section>
+
+ <section>
+ <h3>Agents</h3>
+ <pre class="example asn">
+ entity(ex1:derek, [ type="foaf:Person", foaf:givenName = "Derek",
+ foaf:mbox= "<mailto:derek@example.org>"]).
+ agent(ex1:derek).
+
+ wasControlledBy(ex1:aggregated, ex1:derek).
+ wasControlledBy(ex1:illustrated, ex1:derek).
+ </pre>
+ </section>
+
+ <section>
+ <h3>Roles</h3>
+ <p>
+ Roles are not declared directly in PROV-DM, rather they are attributes of
+ relations. Thus, the entire Turtle example in sec. 3.5 is rendered as follows:
+ </p>
+ <pre class="example asn">
+ used(ex1:aggregated, ex1:dataSet1, [ prov:role = "dataToAggregate"]).
+ used(ex1:aggregated, ex1:regionList1, [ prov:role = "regionsToAggregteBy"]).
+ </pre>
+ <p>
+ In the first assertion above, note that this adds a "role" attribute to the first 'used' assertion of Ex. 3.
+ Similarly in the second assertion, we have added a "role" attribute to the second 'used' assertion of Ex. 3.
+ </p>
+ </section>
+
+ <section>
+ <h3>Revision and Derivation</h3>
+ <pre class="example asn">
+ wasRevisionOf(ex1:dataSet2, ex1:dataSet1).
+ </pre>
+
+ <pre class="example asn">
+ wasEventuallyDerivedFrom(ex1:chart2, ex1:dataSet2).
+ wasDerivedFrom(ex1:chart2, ex1:dataSet2).
+ wasGeneratedBy(ex1:chart2, ex1:compiled2).
+ </pre>
+ </section>
+ </section>
+
+ <section class="appendix">
+ <h2>Acknowledgements</h2>
+ <p>
+ WG membership to be listed here.
+ </p>
+ </section>
+
+ </body></html>