--- a/primer/Primer.html Wed Apr 18 14:01:16 2012 +0100
+++ b/primer/Primer.html Wed Apr 18 17:59:50 2012 +0100
@@ -33,6 +33,12 @@
"13 December 2011. W3C Working Draft. (Work in progress.) "+
"URL: <a href=\"http://www.w3.org/TR/2011/WD-prov-o-20111213/\">http://www.w3.org/TR/2011/WD-prov-o-20111213</a>",
+ "PROV-N":
+ "Luc Moreau, Paolo Missier"+
+ "<a href=\"http://www.w3.org/TR/2011/WD-prov-dm-20111215/\"><cite>PROV-N: The PROV Notation</cite></a>. "+
+ "15 December 2011. W3C Working Draft. (Work in progress.) "+
+ "URL: <a href=\"http://www.w3.org/TR/2011/WD-prov-dm-20111215/\">http://www.w3.org/TR/2011/WD-prov-dm-20111215/</a>",
+
"TURTLE":
"Eric Prud'hommeaux, Gavin Carothers"+
"<a href=\"http://www.w3.org/TR/2011/WD-turtle-20110809/\"><cite>Turtle: Terse RDF Triple Language</cite></a>. "+
@@ -129,13 +135,12 @@
<section id="abstract">
<p>
This document provides an intuitive introduction and guide to the
- PROV data model for provenance [[PROV-DM]]. PROV-DM is a core data model for
+ PROV specification for provenance on the Web. PROV is a core data model for
provenance for building representations of the entities, people and
processes involved in producing a piece of data or thing in the world.
- This primer explains the fundamental PROV-DM concepts in an illustrative
- fashion, and provides examples applying the PROV-O OWL2
- ontology [[PROV-O]]. The primer is intended as a starting point for those wishing
- to create or make use of PROV-DM data.
+ This primer explains the fundamental PROV concepts and provides examples
+ of its use. The primer is intended as a starting point for those wishing
+ to create or use PROV data.
</p>
<!-- p>
@@ -148,31 +153,44 @@
various aspects that are necessary to achieve the vision of
interoperable interchange of provenance information in heterogeneous
environments such as the Web. This document is an
- intuitive introduction and guide to the [[PROV-DM]] data model for
- provenance. It includes simple examples applying the [[PROV-O]]
- OWL2 ontology.
+ intuitive introduction and guide with simple illustrative examples
+ of the core aspects of PROV.
+
+ <h4>PROV Family of Specifications</h4>
+The PROV family of specifications aims to define the various aspects that are necessary to achieve the vision of inter-operable
+interchange of provenance information in heterogeneous environments such as the Web.
+The specifications are as follows.
+<ul>
+<li> PROV-PRIMER, a primer for the PROV data model (this document),</li>
+<li> PROV-DM, the PROV data model for provenance,</li>
+<li> PROV-DM-CONSTRAINTS, a set of constraints applying to the PROV data model,</li>
+<li> PROV-N, a notation for provenance aimed at human consumption,</li>
+<li> PROV-O, the PROV ontology, an OWL-RL ontology allowing the mapping of PROV to RDF;</li>
+<li> PROV-AQ, the mechanisms for accessing and querying provenance; </li>
+<li> PROV-SEM, a formal semantics for the PROV data model.</li>
+<li> PROV-XML, an XML schema for the PROV data model.</li>
+</ul>
+<h4>How to read the PROV Family of Specifications</h4>
+<ul>
+<li>The primer is the entry point to PROV offering a pedagogical presentation of the provenance model.</li>
+<li>The Linked Data and Semantic Web community should focus on PROV-O defining PROV classes and properties specified in an OWL-RL ontology. For further details, PROV-DM and PROV-DM-CONSTRAINTS specify the constraints applicable to the data model, and its interpretation. PROV-SEM provides a mathematical semantics.</li>
+<li>The XML community should focus on PROV-XML defining an XML schema for PROV-DM. Further details can also be found in PROV-DM, PROV-DM-CONSTRAINTS, and PROV-SEM.</li>
+<li>Developers seeking to retrieve or publish provenance should focus of PROV-AQ.</li>
+<li>Readers seeking to implement other PROV serializations
+should focus on PROV-DM and PROV-DM-CONSTRAINTS. PROV-O, PROV-N, PROV-XML offer examples of mapping to RDF, text, and XML, respectively.</li>
+</ul>
+
+
</section>
<section>
<h2>Introduction</h2>
<p>
- This primer document provides an accessible introduction to the PROV Data Model
- ([[PROV-DM]]) specification for representing provenance on the Web, and its expression
- in the PROV Ontology ([[PROV-O]]). Provenance describes
- the origins of things, so PROV-DM data consists of descriptions about the past.
- </p>
-
- <p>
- This primer document aims to ease the adoption of the specifications by providing:
- </p>
- <ul>
- <li>An intuitive explanation of how PROV-DM models provenance.</li>
- <li>Examples that can be followed to produce new PROV-DM data.</li>
- </ul>
-
- <p>
- The <i>provenance</i> of digital objects represents their origins. The PROV-DM is a
- proposed specification to represent provenance records, which contain <i>descriptions</i> of the entities
+ This primer document provides an accessible introduction to the PROV
+ specification for provenance on the Web.
+ The <i>provenance</i> of digital objects represents their origins. PROV is a
+ proposed specification to represent provenance records,
+ which contain <i>descriptions</i> of the entities
and activities involved in producing and delivering or otherwise influencing a
given object.
For the remainder of this document, we use the term 'provenance' to refer also
@@ -186,7 +204,7 @@
</p>
<p>
- As a specification for provenance, PROV-DM accommodates all those different uses
+ As a specification for provenance, PROV accommodates all those different uses
of provenance. Different people may have different perspectives on provenance,
and as a result different types of information might be captured in provenance records.
One perspective might focus on <i>agent-centered provenance</i>, that is, what entities
@@ -207,8 +225,8 @@
Provenance records are metadata. There are other kinds of metadata that is
not provenance. For example, the size of an image is metadata of
that image but it is not provenance.
- </p>
-
+ </p>
+
<p>
For general background on provenance, a
comprehensive overview of requirements, use cases, prior research, and proposed
@@ -218,29 +236,46 @@
that may help identify the provenance aspects of planned applications and
help plan the design of a provenance system.
</p>
+
<p>
- The next section gives an introductory overview of PROV-DM using simple examples.
- The following section shows how the formal ontology PROV-O can be used to represent the PROV-DM descriptions
- as RDF triples. The document also contains frequently asked questions, and an appendix giving example
- snippets of the Provenance Notation (PROV-N).
- For a detailed description of [[PROV-DM]] and [[PROV-O]], please refer to the respective documents.
+ This primer document aims to ease the adoption of the PROV specifications by providing:
</p>
+ <ul>
+ <li>An intuitive explanation of how PROV models provenance. A detailed description of
+ all the concepts and relations in the PROV Data Model is provided in [[PROV-DM]].</li>
+ <li>A simple self-contained example that illustrates how to produce and use PROV assertions, highlighting how
+ to combine PROV with other popular vocabularies such as FOAF and Dublin Core. A description
+ of the formal PROV ontology (PROV-O) can be found in [[PROV-O]].</li>
+ <li>Example snippets using a notation of PROV designed for human
+ consumption (PROV-N). Details of this notation can be found at [[PROV-N]].</li>
+ </ul>
+
+ <p>There are additional reference documents for PROV that are not covered in this
+ primer, including the PROV Access and Query aspects of the specification (PROV-AQ),
+ the constraints on the PROV data model (PROV-DM-CONSTRAINTS),
+ a formal semantics of the PROV data model (PROV-SEM), and the PROV XML notation
+ (PROV-XML). </p>
+
</section>
<section>
- <h2>Intuitive overview of PROV-DM</h2>
+ <h2>Intuitive overview of PROV</h2>
<p>
- This section provides an intuitive explanation of the concepts in PROV-DM.
+ This section provides an intuitive explanation of the main concepts in PROV.
As with the rest of this document, it should be treated as a starting point for
- understanding the model. The PROV-DM model specification
+ understanding the model. The PROV-DM data model document [[PROV-DM]]
provides precise definitions and constraints to be used.
</p>
<p>
- The following diagram provides a high level overview of the structure of PROV-DM records,
- limited to some key PROV-DM concepts discussed in this document.
- The diagram is the same that appears in the [[PROV-DM]].
+ The following diagram provides a high level overview of the structure of PROV records,
+ limited to some key PROV concepts discussed in this document.
+ The diagram is the same that appears in the [[PROV-DM]] document.
+ Note that because PROV is meant to describe how things were created or delivered,
+ PROV relations are named so they can be used in assertions about the past.
+ This also affects the domain and range of the relations in PROV.
</p>
+
<div style="text-align: center;">
<img src="OverviewDiagram.png" alt="PROV-DM overview"/>
</div>
@@ -249,13 +284,13 @@
<h3>Entities</h3>
<p>
- In PROV-DM, physical, digital, conceptual, or other kinds of thing are called
+ In PROV, physical, digital, conceptual, or other kinds of thing are called
<i>entities</i>.
Examples of such entities are a web page, a chart, and a spellchecker.
Provenance records can describe the provenance of entities, and
an entity’s provenance may refer to many other entities. For example, a document D is
an entity whose provenance refers to other entities such as a chart inserted into D,
- the dataset that was used to create that chart, or the author of the document.
+ and the dataset that was used to create that chart.
Entities may be described as having different attributes and
be described from different perspectives. For example,
document D as stored in my file system, the second version of document D,
@@ -296,12 +331,12 @@
<section>
<h3>Agents and Responsibility</h3>
<p>
- An <i>agent</i> is a type of entity that takes an role in an activity such
- that it can be assigned some degree of <i>responsibility</i> for the activity taking
+ An <i>agent</i> takes a role in an activity such
+ that the agent can be assigned some degree of <i>responsibility</i> for the activity taking
place.
An agent can be a person, a piece of software, an inanimate object, an organization, or
other entities that may be ascribed responsibility.
- When an agent has some responsibility for an activity, PROV-DM says the agent was
+ When an agent has some responsibility for an activity, PROV says the agent was
<i>associated</i> with the activity, where several agents may be associated with
an activity and vice-versa.
Consider a chart displaying some statistics
@@ -319,6 +354,13 @@
for saying that the agent was responsible for the activity which generated
the entity.
</p>
+ <p>
+ One may want to describe the provenance of an agent. For example, an organization
+ responsible for the creation of a report may evolve over time as the report is written as
+ some members leave and others join. To make provenance assertions about an agent in PROV ,
+ the agent must be declared explicitly both as an agent and as an entity.
+ </p>
+
</section>
<section>
@@ -333,7 +375,7 @@
For example, an agent may play the role of "editor" in an activity that uses
one entity in the role of "document to be edited" and another in the role of
"addition to be made to the document", to generate a further entity in the role of "edited document".
- Roles are application specific, so PROV-DM does not define any particular roles.
+ Roles are application specific, so PROV does not define any particular roles.
</p>
<!--p>Roles are intended as an extension point in the model; it is expected users will define and use custom role taxonomies. Role interpretation is application specific.</p -->
</section>
@@ -352,8 +394,8 @@
For example, a given entity, such as a document, may go through multiple <i>revisions</i>
(also called versions and other comparable terms) over time. Between revisions,
one or more attributes of the entity may change.
- The result of each revision is a new entity,
- and PROV-DM allows one to relate those entities by making a description that
+ In PROV, the result of each revision is a new entity.
+ PROV allows one to relate those entities by making a description that
one was a revision of another.
Another specialized kind of derivation is to say that one entity, commonly
a document, <i>quotes</i> from another.
@@ -364,7 +406,7 @@
<h3>Plans</h3>
<p>
Activities may follow pre-defined procedures, such as recipes, tutorials, instructions, or workflows.
- PROV-DM refers to these, in general, as <i>plans</i>, and allows the description that a plan was followed, by agents,
+ PROV refers to these, in general, as <i>plans</i>, and allows the description that a plan was followed, by agents,
in executing an activity.
</p>
</section>
@@ -373,7 +415,7 @@
<h3>Time</h3>
<p>
Time is often a critical aspect of provenance.
- PROV-DM allows the timing of significant events to be described, including
+ PROV allows the timing of significant events to be described, including
when an entity was generated or used, or when an activity started
and finished. For example, the model can be used to describe facts such as when a new
version of a document was created (generation time), or when a document was
@@ -384,47 +426,47 @@
<section>
<h3>Alternate Entities and Specialization</h3>
<p>
- Entities are defined in a flexible way in PROV-DM, allowing for different
- perspectives to be taken as appropriate for the application. For example,
- some PROV-DM descriptions may refer to a document D, other descriptions may be
- more specifically about the second version of D, while another set may
- concern the copy of D stored on a particular hard disk. All three are
- different entities referred to with different identifiers, but are also perspectives
- or abstractions on the same thing. Because of
- this, the entities are said to be <i>alternates</i> of each other, and
- described as such. Being aware that two entities are alternates allows those
- consuming the PROV-DM data to know that understanding the provenance of one entity is salient
- to understanding the provenance of the other.
- </p>
- <p>
- In some cases, we can be more informative still. Where one entity is a more
- general or longer term perspective on the same thing as another, we can say that the latter
- is a <i>specialization</i> of the former. For example, both the second version
- of document D and the copy of D on the hard disk are specializations of document D in
- general. That is, D's period of existence will contain the periods in which the
- second version existed, and where a copy of D was on the hard disk. It is helpful
- to describe specialization in provenance data, because it indicates that everything
- which was true of one entity (the more specialized) was at some point true of
- the other (the more general).
+ Entities are defined in a flexible way in PROV, allowing for different
+ perspectives to be taken as appropriate for the application. One case is when
+ the same entity appears with different descriptions in a provenance record
+ because each appearance emphasizes different aspects of the entity. For example,
+ a book may be described by its title in one place and by its author and publication date
+ in another. Another case is when the same entity evolves over time into different
+ versions. An example is a document that is continuously updated and has
+ subsequent releases over time. Another case is when the same entity is copied
+ or replicated. For example, a document may be copied to several directories.
+ Another case is when an entity goes through different incarnations. For example,
+ a committee producing a report may have a set of members when the report
+ is first released and have a different set of members when each update of
+ the report is released. In all these situations,
+ the entities can be said in PROV to be <i>specializations</i> of the more general entity,
+ and to be <i>alternates</i> of each other. Being aware that two entities are alternates allows those
+ consuming the PROV data to know that understanding the provenance of one entity is salient
+ to understanding the provenance of the other. Knowing that alternate entities are
+ specializations of another allows a consumer of PROV to refer to the general entity
+ with a unique identifier even though it is specified as different alternates
+ throughout the provenance records.
</p>
</section>
</section>
<section>
- <h2>Examples</h2>
+ <h2>Examples of Key Concepts in PROV</h2>
<p>
- In the following sections, we show how PROV-DM can be used to model
+ In the following sections, we show how PROV can be used to model
provenance in a specific example scenario.
</p>
<p>
- We include samples of how the formal ontology PROV-O
- can be used to represent the PROV-DM descriptions as RDF triples.
+ We include samples of how the formal ontology (PROV-O)
+ can be used to represent the PROV descriptions as RDF triples.
These are shown using the Turtle notation [[TURTLE]]. In
the latter depictions, the namespace prefix <b>prov</b> denotes
terms from the PROV ontology, while <b>ex</b> denotes terms specific to the example.
- </p>
+ We also illustrate in these examples how PROV can be used in combination with other
+ languages, such as FOAF and Dublin Core (with namespace prefix <b>foaf</b> and
+ <b>dcterms</b> respectively). </p>
<p>We also provide a representation of the examples in the Provenance
Notation, PROV-N, used in the data model document. The full PROV-N data
@@ -453,7 +495,7 @@
ex:chart1 a prov:Entity .
</pre>
<p>
- These statements, in order, describe that there was an article (<code>ex:article</code>),
+ These statements, in order, refer to the article (<code>ex:article</code>),
an original data set (<code>ex:dataSet1</code>),
a list of regions (<code>ex:regionList</code>),
data aggregated by region (<code>ex:composition</code>),
@@ -646,9 +688,9 @@
</pre>
<p>
Similarly, the provenance includes descriptions that the same activity was
- enacted in a particular way (<code>ex:analyst</code>) by Derek, and that
- the entity <code>ex:composition</code> took the role of the composed
- data in what the activity generated.
+ enacted in a particular way by Derek, so it indicates that he had the role of
+ <code>ex:analyst</code>, and that the entity <code>ex:composition</code> took the role of the composed
+ data in what the activity generated:
</p>
<pre class="turtle example">
ex:compose prov:qualifiedAssociation [
@@ -656,7 +698,7 @@
prov:agent ex:derek ;
prov:hadRole ex:analyst
] .
- ex:composition prov:qualifiedGeneration [
+ ex:composition prov:qualifiedGeneration [
a prov:Generation ;
prov:activity ex:compose ;
prov:hadRole ex:composedData
@@ -695,6 +737,12 @@
ex:chart2 a prov:Entity ;
prov:wasDerivedFrom ex:dataSet2 .
</pre>
+ <p>and that the new chart is a revision of the original one:
+ </p>
+ <pre class="turtle example">
+ ex:chart2 a prov:Entity ;
+ prov:wasRevisionOf ex:chart1 .
+ </pre>
<p>
Derivation and revision are connections between entities, and so depicted
with arrows in our visualization.
@@ -794,20 +842,9 @@
<p>
The newspaper, from past experience, anticipated that there could be revisions
to the article, and so created identifiers for both the article in general
- (<code>ex:article</code>) and the first version of the article (<code>ex:articleV1</code>),
- allowing both to be referred to as entities in provenance data. The article
- discussed the GovData data set, and so the provenance data published by the
- newspaper describes the first version of the article as being derived from that data set.
- </p>
- <pre class="turtle example">
- ex:articleV1 a prov:Entity ;
- prov:wasDerivedFrom ex:dataSet1 .
- </pre>
- <p>
- Without some way to know entities <code>ex:article</code> and <code>ex:articleV1</code>
- are related, anyone looking at Betty's and the newspaper's PROV data above would
- not know that the blog entry was written about an article derived from GovData.
- Therefore, the newspaper also describes the connection between the two: that
+ (<code>ex:article</code>) as a URI that got redirected to the first version of the article (<code>ex:articleV1</code>),
+ allowing both to be referred to as entities in provenance data.
+ In the provenance records, the newspaper describes the connection between the two: that
the first version of the article is a specialization of the article in general.
</p>
<pre class="turtle example">
@@ -815,7 +852,8 @@
</pre>
<p>
Later, after the data set is corrected and the new chart generated, a new version
- of the article is created, <code>ex:articleV2</code>. To ensure that those
+ of the article is created, <code>ex:articleV2</code>, with its own URI where the article
+ is redirected to. To ensure that those
consulting the provenance of <code>ex:articleV2</code> understand that it
is connected with the provenance of <code>ex:article</code> and <code>ex:articleV1</code>,
the newspaper describes how these entities are related.
@@ -825,10 +863,6 @@
ex:articleV2 prov:alternateOf ex:articleV1 .
</pre>
<p>
- Here, <code>alternateOf</code> expresses that the first and second versions
- are specializations of the same thing (the article).
- </p>
- <p>
Specialization and alternate relations connect entities, and so are visualized
as links between them.
</p>
@@ -902,7 +936,7 @@
<section>
<h3>Roles</h3>
<p>
- Roles are not declared directly in PROV-DM, rather they are attributes of
+ Roles are not declared directly in PROV, rather they are attributes of
relations. Thus, the entire Turtle example in Section 3.5 is rendered as follows:
</p>
<pre class="example asn">