prov: changeset 4513:37f9692fb7a0

--- a/dc-note/Overview.html	Mon Oct 08 23:06:29 2012 +0200
+++ b/dc-note/Overview.html	Mon Oct 08 23:07:04 2012 +0200
@@ -548,10 +548,9 @@
    This document is part of a set of specifications aiming to define the
    various aspects that are necessary to achieve the vision of
    interoperable interchange of provenance information in heterogeneous
-   environments such as the Web. This document is a non-normative,
-   intuitive introduction and guide to the [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-DM">PROV-DM</a></cite>] data model for
-   provenance. It includes simple worked examples applying the [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-O">PROV-O</a></cite>]
-   OWL2 ontology. The document is expected to become a Note once it is stable.
+   environments such as the Web. This document is a non-normative
+   mapping between the [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-O">PROV-O</a></cite>] OWL ontology
+   and the Dublin Core Terms vocabulary[<a href="#bib-DCTERMS">DCTERMS</a>]. The document is expected to become a Note once it is stable.
   <p>This document was published by the <a href="http://www.w3.org/2011/prov/">Provenance Working Group</a> as an Editor's Draft.
    If you wish to make comments regarding this document, please send them to 
    <a href="mailto:public-prov-wg@w3.org">public-prov-wg@w3.org</a>
@@ -588,7 +587,7 @@
 								<li class="tocline"><a href="#term_creator" class="tocxref"><span class="secno">2.5.1.1 </span>dct:creator</a></li>
 								<li class="tocline"><a href="#term_contributor" class="tocxref"><span class="secno">2.5.1.2 </span>dct:contributor</a></li>
 								<li class="tocline"><a href="#term_publisher" class="tocxref"><span class="secno">2.5.1.3 </span>dct:publisher</a></li>
-								<li class="tocline"><a href="#term_rights_holder" class="tocxref"><span class="secno">2.5.1.4 </span>dct:rightsHolder</a></li>
+								<!--<li class="tocline"><a href="#term_rights_holder" class="tocxref"><span class="secno">2.5.1.4 </span>dct:rightsHolder</a></li>-->
 							</ul>
 						<li class="tocline"><a href="#entity_date_mappings" class="tocxref"><span class="secno">2.5.2 </span>Entity-Date (When) mappings</a></li>
 							<ul class="toc">
@@ -606,9 +605,9 @@
 								<li class="tocline"><a href="#term_replaces" class="tocxref"><span class="secno">2.5.3.3 </span>dct:replaces/replacedBy</a></li>
 								<li class="tocline"><a href="#term_source" class="tocxref"><span class="secno">2.5.3.4 </span>dct:source</a></li>
 							</ul>-->
-					</ul>
-				<li class="tocline"><a href="#cleanup" class="tocxref"><span class="secno">2.6 </span>Cleanup</a></li>
-				<li class="tocline"><a href="#list_of_excluded_terms" class="tocxref"><span class="secno">2.7 </span>List of the terms excluded from the mapping</a></li>
+						<li class="tocline"><a href="#cleanup" class="tocxref"><span class="secno">2.5.3 </span>Cleanup</a></li>
+					</ul>				
+				<li class="tocline"><a href="#list_of_excluded_terms" class="tocxref"><span class="secno">2.6 </span>List of the terms excluded from the mapping</a></li>
 			</ul>
 		<li class="tocline"><a href="#acknowledgements" class="tocxref"><span class="secno">A. </span>Acknowledgements</a></li>
 		<li class="tocline"><a href="#references" class="tocxref"><span class="secno">B. </span>References</a></li>
@@ -650,16 +649,16 @@
 </pre>
 <p>
 Clearly not all metadata statements deal with provenance. 
-For instance, <code>dct:title</code>, <code>dct:subject</code> and <code>dct:format</code> are descriptions of the resource <code>ex:document1</code>. 
+ <code>dct:title</code>, <code>dct:subject</code> and <code>dct:format</code> are descriptions of the resource <code>ex:document1</code>. 
 They do not provide any information how the resource was created or modified in the past.
- On the other hand, some statements imply provenance-related information, e.g., <code>dct:creator</code> 
+ On the other hand, some statements imply provenance-related information. For example <code>dct:creator</code> 
  implies that the document has been created and refers to the author. Similarly, the existence 
  of the <code>dct:issued</code> date implies that the document has been published. This information is redundantly 
  implied by the dct:publisher statement as well. Finally, <code>dct:replaces</code> relates 
- our document to another document <code>ex:doc2</code> and it can be inferred that this document had probably
- some kind of influence on our document <code>ex:document1</code>, which also gives us some provenance related information.
+ our document to another document <code>ex:doc2</code> which had probably
+ some kind of influence on <code>ex:document1</code>.
 </p><p>
-This is a pattern that applies generally to metadata, i.e., we can distinguish 
+Following this pattern, we can generally categorize the existing metadata as 
 description metadata and provenance metadata. To be more precise, we define provenance 
 metadata as metadata providing provenance information according to the definition of 
 the Provenance Working Group [<a href="#bib-PROV-DEF">PROV-DEF</a>] and description metadata as all other metadata.
@@ -678,23 +677,21 @@
  dateSubmitted, hasFormat, hasVersion, isFormatOf, isReferencedBy, isReplacedBy, issued, isVersionOf, license, modified,
  provenance, publisher, references, replaces, rightsHolder, rights, source, valid.
 </p><p>
-This classification can certainly be questioned and was already subject to many discussions. We use a very
- conservative strategy: if the group can't reach consensus about if an element should be mapped to PROV or not, 
- we exclude it from the mapping list. This way, we want to ensure that rather less, but correct provenance
- data is created than more, but possibly incorrect data.
+This is a conservative classification of provenance metadata. It can be argued that other elements contain 
+provenance information as well, depending on their usage in a concrete implementation or application.
 </p><p>
 According to our classification, there are 25 terms out of 55 that can be considered as provenance related.
- As a next step, we consider sub-categories of the provenance related terms as follows:
+ The terms can further be categorized according to the question they answer regarding the provenance of a resource:
 </p><p>
-<b>Who?</b> (contributor, creator, publisher, rightsHolder) Category that includes all properties that have the range <code>dct:Agent</code>,
+<b>Who?</b> (contributor, creator, publisher, rightsHolder): Category that includes all properties that have <code>dct:Agent</code> as range,
  i.e., a resource that acts or has the power to act. The contributor, creator, and publisher clearly influence
  the resource and therefore are important for its origin. This is not immediately clear for the rightsHolder,
- but as ownership is considered the important provenance information for artworks, we have decided to include it in this category.
+ but as ownership is considered the important provenance information for many resources, like artworks, it is included in this category.
 </p><p>
-<b>When?</b> (available, created, date, dateAccepted, dateCopyrighted, dateSubmitted, issued, modified, valid)
+<b>When?</b> (available, created, date, dateAccepted, dateCopyrighted, dateSubmitted, issued, modified, valid):
  Dates typically belong to the provenance record of a resource. It can be questioned whether a resource changes by
- being published or not. However, we consider the publication as an action that affects the state of the resource and
- therefore it is relevant for the provenance. Two dates can be considered special regarding their relevance for
+ being published or not. Depending on the application, however, the publication can be seen as an action that changes 
+ the state of the resource. Two dates can be considered special regarding their relevance for
  provenance: available and valid. They are different from the other dates as by definition they can represent a
  date range. Often, the range of availability or validity of a resource is inhererent to the resource and known
  beforehand – consider the validity of a passport or a credit card or the availability of a limited special offer.
@@ -702,10 +699,10 @@
  by the validity range. On the other hand, if an action is involved, e.g., a resource is declared invalid because
  a mistake has been found, this is relevant for its provenance.
 </p><p>
-<b>How?</b> (isVersionOf, hasVersion, isFormatOf, hasFormat, references, isReferencedBy, replaces, isReplacedBy, source, rights, license)
+<b>How?</b> (isVersionOf, hasVersion, isFormatOf, hasFormat, references, isReferencedBy, replaces, isReplacedBy, source, rights, license):
  Resources are often derived from other resources. In this case, the original resource becomes part of the provenance
- record of the derived resource. Derivations can be further classified as isVersionOf, isFormatOf, replaces, source.
- references is a weaker relation, but it can be assumed that a referenced resource influenced the described resource
+ record of the derived resource. Derivations can be further classified as <code>dct:isVersionOf, dct:isFormatOf, dct:replaces, dct:source</code>.
+  <code>dct:references</code> is a weaker relation, but it can be assumed that a referenced resource influenced the described resource
  and therefore it is relevant for its provenance. The respective inverse properties do not necessarily contribute to
  the provenance of the described resource, e.g., a resource is usually not directly affected by being referenced or
  by being used as a source – at most indirectly, as the validity state can change if a resource is replaced by a new
@@ -764,8 +761,8 @@
 <p>
 This leaves one very special term: <i>provenance</i>. It is defined as a "statement of any changes in ownership and
  custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation" [<a href="#bib-DCTERMS">DC-TERMS</a>],
-which refers to the traditional definition of provenance for artworks. Despite being relevant for provenance from the
-W3C Provenance Incubator Group's persepctive, this definition may overlap partially with almost half of the DCMI terms, which
+which refers to the traditional definition of provenance for artworks. Despite being relevant for provenance,
+ this definition may overlap partially with almost half of the DCMI terms, which
 specify concrete aspects of provenance of a resource.
 </p><p>
 In summary, the DCMI terms – and therefore any Dublin Core metadata record – hold a lot of provenance information and
@@ -778,8 +775,7 @@
  why a resource was affected, lacks – apart from subtle hints from terms like replaces – as usual a satisfying answer. -->
 </p>
 <h3 id ="namespaces">1.1 Namespaces</h3> 
-<p>In this document we use namespaces from different vocabularies to create the mapping.
- The namespaces we will be using through the document can be seen in <a href="#ns"> Table 2</a> below:
+<p>The namespaces used through the document can be seen in <a href="#ns"> Table 2</a> below:
  <div id="ns" ALIGN="center">
  <table>
 	<caption> <a href="#ns"> Table 2</a>: Namespaces used in the document </caption>
@@ -795,8 +791,8 @@
 </div>
 <div id="Mapping" class="section"> 
 <h2>2. Mapping from Dublin Core to PROV</h2>
-<p>Why are we concerned with a mapping between Dublin Core and PROV? First, such a mapping can provide valuable insights
- into the different characteristics of both data models, in particular it "explains" PROV from a Dublin Core view point.
+<p>A mapping between Dublin Core Terms and PROV-O has many advantages. First, it can provide valuable insights
+ into the different characteristics of both data models (in particular it explains PROV from a Dublin Core view point).
  Second, such a mapping can be used to extract PROV data from the huge amount of Dublin Core data that is available on 
  the Web today. Third, it can translate PROV data to Dublin Core and therefore make it accessible for applications that
  understand Dublin Core. And not least, it can lower the barrier to adopt PROV, as simple Dublin Core statements can be
@@ -804,18 +800,17 @@
 <div id="basic" class="section">
 <h3>2.1 Basic considerations </h3>
 <p>
-Substantially, a complete mapping from Dublin Core to PROV consists of three parts:
+Substantially, a complete mapping from Dublin Core to PROV consists of four parts:
 </p><p>
     1) <b>Direct mappings</b> between terms that can be expressed in form of subclass or subproperty relationships in RDFS
 	– or equivalent relationships in OWL.
 </p><p>
     2) Definition of new <b>refinements</b> (subclasses or subproperties) of the target vocabulary to reflect the expressiveness of the source vocabulary.
 </p><p>
-    3) Provision of <b>complex mappings</b> that create statements in the target vocabulary based on statements in the source vocabulary.
-</p><p>
-
+    3) Provision of <b>complex mappings</b> that create statements in the target vocabulary based on statements in the source vocabulary. Since
+	the mapping produces blank nodes for each <code>dct</code> statement, a clean-up phase with strategies for reducing the blank nodes is also necessary.
 </p>
-<p>
+<!--<p>
 For the third part (complex mappings), we provide context free mappings that do not depend on the existence of any other statements.
  We briefly describe strategies on how to refine and clean the complex mapping results taking the context into account.
 </p><p>
@@ -829,13 +824,14 @@
 Clean-up. The context free mapping produces blank nodes for each <code>dct</code> statement. The number of blank nodes can be reduced 
 by applying reasoning patterns to clean up the data, e.g. by conflating nodes that are actually the same (e.g., an issued document could
 be the same as the created document).
-</p>
+</p>-->
 <p>
 </div>
 <div id="entities_in_dc" class="section">
 <h3><span class="secno">2.2 </span>What is ex:document1? Entities in Dublin Core</h3>
 <p>
-Consider the example metadata record above (<a href="#example1">example 1)</a>. As a <code>dc</code> metadata record describes the resulting document as a whole,
+Consider the example metadata record shown at the beggining of the document (<a href="#example1">example 1)</a>. As a <code>dc</code> 
+metadata record describes the resulting document as a whole,
  it is not clear how this document relates to the different states that the document had until it reached its final state.
  For example, a document can have assigned a <code>dct:created</code> date and a <code>dct:issued</code> date. According to
  the PROV ontology, the activity of issuing a document involves two different states of the document: the document beffore it was issued
@@ -844,26 +840,38 @@
 </p><p>
     1) Create new instances of entities, typically as blank nodes, that are all related to the original
 	document by means of <code>prov:specializationOf</code>. This leads to bloated and not very intuitive data models, e.g. think
-	about the translation of a single <code>dct:creator</code> statement, where you would expect to somehow find some activity and 
+	about the translation of a single <code>dct:publisher</code> statement, where anyone would expect to somehow find some activity and 
 	agent that are directly related to the document (as in <a href="#figure_mapping_example">Figure 1</a>).
-</p><p>	
-    2) Use the original document as the instance that is used as <code>prov:Entity</code>. The implications regarding
-	the semantics of a <code>prov:Activity</code> are not yet totally clear, however, it contradicts the above mentioned definition
-	to have an activity that uses an entity and generates the same entity. If an activity actually generates an entity,
-	it is semantically incorrect to have several activities that all generate the same entity at different points in time.
-	<!--<b>This has to be investigated and discussed further. For references, see PROV-DM Generation, PROV-DM Derivation,
-	PROV-O Activity</b>.-->
 </p><p>
 	<div id = "figure_mapping_example" class="figure" style="text-align: center;">
-	<img src="img/ComplexMapping.png"></img>
+	<img src="img/example1.png"></img>
 	<div style="text-align: center;">
-<a href="#figure_mapping_example">Figure 1</a>. A mapping example	
+	<a href="#figure_mapping_example">Figure 1</a>. A mapping example creating blank nodes for each state of the resource.	
 	</div>
 	</div>
 </p><p>	
-As the first option is the more conservative one with respect to the underlying semantics, our proposal is to use
- it in for the context-free mapping. We will use blank nodes, although any naming mechanism could be provided if necessary,
-leaving the conflating of nodes to the clean-up phase. Here, we can deal with more specific questions like the following:
+    2) Use the original document as the instance used as <code>prov:Entity</code>. However, to have an activity that uses 
+	an entity and generates the same entity is not compliant with the prov constraints [<a href="#bib-Constraints">PROV-CONTRAINTS</a>.] 
+	Also, if an activity actually generates an entity,
+	it is semantically incorrect to have several activities that generate the same entity at different points in time.
+	<!--<b>This has to be investigated and discussed further. For references, see PROV-DM Generation, PROV-DM Derivation,
+	PROV-O Activity</b>.
+	Removed this from the text above: The implications regarding
+	the semantics of a <code>prov:Activity</code> are not yet totally clear, h
+	-->	
+</p><p>
+	<div id = "figure_mapping_example_conflating" class="figure" style="text-align: center;">
+	<img src="img/mapping-example - conflating.png"></img>
+	<div style="text-align: center;">
+	<a href="#figure_mapping_example_conflating">Figure 2</a>. A mapping example conflating blank nodes in the same resource. The used and generated resource have the same identifier.	
+	</div>
+	</div>
+</p><p>	
+As the first option is the most conservative with respect to the underlying semantics, it has been chosen as guideline in the context-free mapping. 
+Blank nodes are used for the mapping, although any naming mechanism could be provided if necessary,
+leaving the conflating of nodes to the clean-up phase. 
+<!-- I comment this because it repeats what it has already been stated
+Here, we can deal with more specific questions like the following:
 </p><p>
     How do we reduce the number of specializations, e.g., by stating that the specialization that is generated by activity
 	1 is the same entity that is used by activity 2?
@@ -873,12 +881,13 @@
 	Depending on the underlying data, this can be the entity that is identified by the URI of the original document. However,
 	we have to be careful to avoid cycles in the provenance we produce. For now, this remains undecided.
 </p>
+-->
 </div>
 <div id="mappings" class="section"> 
 <h3><span class="secno">2.3 </span>Direct mappings</h3>
 <p>
-Direct mappings can particularly be provided for classes and the “shortcuts”, i.e. the direct relationships in PROV between
- an entity and an agent or an entity and a date.
+<!--Direct mappings can particularly be provided for classes and the “shortcuts”, i.e. the direct relationships in PROV between
+ an entity and an agent or an entity and a date.-->
 The direct mappings provide basic interoperability using the integration mechanisms of RDF. By means
  of RDFS-reasoning, any PROV application can at least make some sense from Dublin Core data. The direct mappings also
  contribute to the formal definition of the vocabularies by translating them to PROV.</p>
@@ -888,7 +897,7 @@
 </p>	
 <p>
 <a href="#list_of_direct_terms">Table 3</a> and <a href="#list_of_direct_mappings2">Table 4</a> provide the detailed mapping plus the rationale for each term.
- Those mappings for which the group could not find consensus have been dropped. For more information see the
+ For more information see the
  <a href="#list_of_excluded_terms">list of terms left out of the mapping</a>. 
 </p><p>
 <div id="list_of_direct_terms" ALIGN="center">
@@ -905,7 +914,7 @@
 		<td><b>dct:Agent</b></td>
 		<td>owl:equivalentClass</td>
 		<td> prov:Agent.</td>
-		<td>Both <code>dct:Agent</code> and <code>prov:Agent</code> refer to the same thing: a resource that has the power to act (which then has responsability for an activity)</td>
+		<td>Both <code>dct:Agent</code> and <code>prov:Agent</code> refer to the same concept: a resource that has the power to act (which then has responsability for an activity)</td>
 	</tr>
 	<tr>
 		<td><b>dct:rightsHolder</b></td>
@@ -937,15 +946,9 @@
 		<td><b>dct:isVersionOf</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:wasDerivedFrom</td>
-		<td><code>dct:isVersionOf</code> refers to "a related resource to which the current resource is a version, edition or adaptation". Hence we can
-		conclude that the current resource has been derived from the original one.</td>
-	</tr>
-	<tr>
-		<td><b>dct:hasVersion</b></td>
-		<td>rdfs:subPropertyOf</td>
-		<td>prov:hadDerivation</td>
-		<td>Inverse property of the previous one.</td>
-	</tr>
+		<td><code>dct:isVersionOf</code> refers to "a related resource to which the current resource is a version, edition or adaptation". 
+		Hence the current resource has been derived from the original one.</td>
+	</tr>	
 	<tr>
 		<td><b>dct:isFormatOf</b></td>
 		<td>rdfs:subPropertyOf</td>
@@ -962,16 +965,9 @@
 		<td><b>dct:replaces</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:wasInfluencedBy</td>
-		<td>This mapping is not straightforward. There is a relation between 2 resources when the former replaces the latter, but it is not necessarily
-		derivation, revision, specification or alternate. Since we want to state some influence but we don't find any specific relation that matches
-		the dct term, we propose to map it to the abstract term <code>prov:wasInfluencedBy</code></td>
-	</tr>
-	<tr>
-		<td><b>dct:isReplacedBy</b></td>
-		<td>rdfs:subPropertyOf</td>
-		<td>prov:influenced</td>
-		<td>Inverse property of the previous one</td>
-	</tr>
+		<td>This mapping is not straightforward. There is a relation between two resources when the former replaces the latter, but it is not necessarily
+		derivation, revision, specification or alternate. Thus, the term is mapped to <code>prov:wasInfluencedBy</code></td>
+	</tr>	
 	<tr>
 		<td><b>dct:source </b></td>
 		<td>rdfs:subPropertyOf</td>
@@ -983,7 +979,7 @@
 		<td><b>dct:type</b></td>
 		<td>owl:equivalentProperty</td>
 		<td>prov:type</td>
-		<td>Both properties refer to the same thing: the nature of the resource (or genre).</td>
+		<td>Both properties relate two resources in a similar way: the nature of the resource (or genre).</td>
 	</tr>
 	<tr>
 		<td><b>dct:created</b></td>
@@ -1011,19 +1007,19 @@
 		<td><b>dct:dateCopyRighted</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:generatedAtTime</td>
-		<td>See previous property</td>
+		<td>See <code>dct:dateAccepted</code></td>
 	</tr>
 	<tr>
 		<td><b>dct:dateSubmitted</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:generatedAtTime</td>
-		<td>See previous property</td>
+		<td>See <code>dct:dateAccepted</code></td>
 	</tr>
 	<tr>
 		<td><b>dct:modified</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:generatedAtTime</td>
-		<td>See previous property</td>
+		<td>See <code>dct:dateAccepted</code></td>
 	</tr>
 	</tbody>
 </table>
@@ -1031,11 +1027,11 @@
 With the direct mapping, a metadata record such as <a href="#example1">example 1</a> will infer that 
 the resource was <code>prov:generatedAtTime</code> at two different times. Although this may seem inconsistent, it is supported by PROV and it is due the difference 
 between Dublin Core and PROV resources: while the former conflates more than one version or "state" of the resource in a single entity, the latter
-proposes to separate all of them. Thus, the mapping would produce "scruffy" provenance (i.e., valid provenance which will not comply with all the PROV consraints [<a href="#bib-Constraints">PROV_CONSTRAINTS]</a>)
+proposes to separate all of them. Thus, the mapping would produce provenance that would comply with the current definition of entity but
+it would not comply with all the PROV consraints [<a href="#bib-Constraints">PROV_CONSTRAINTS]</a>.
 </p>
 <p>
-We end the direct mapping with the properties that have been found to be superproperties of certain prov concepts. The summary can be seen below in 
-<a href="#list_of_direct_mappings2">Table 4</a>
+Some properties have been found to be superproperties of certain prov concepts. The summary can be seen below in <a href="#list_of_direct_mappings2">Table 4</a>
 <!-- SHOULD ADD THIS FOR EACH
 <pre rel="prov:wasQuotedFrom" resource="http://dvcs.w3.org/hg/prov/raw-file/tip/examples/eg-24-prov-o-html-examples/rdf/create/rdf/property_qualifiedAttribution.ttl"
 -->
@@ -1068,7 +1064,35 @@
 		</tbody>
 	</table>
 </div>
+<p>
+<a href="#list_of_direct_mappings_no_prov_core">Table 5</a> enumerates the mapping of the <code>dct</code> properties that map to inverse relationships in PROV. These
+have been separated in a different table because they don't belong to the core of PROV.
 </p>
+<div id="list_of_direct_mappings_no_prov_core" ALIGN="center">
+	<table>
+		<caption> <a href="#list_of_direct_mappings_no_prov_core"> Table 5:</a> Direct mappings to the PROV terms not included in the core </caption>
+		<tbody>
+		<tr>
+			<th>PROV Term</th>
+			<th>Relation</th>
+			<th>DC Term</th>
+			<th>Rationale</th>
+		</tr>
+		<tr>
+			<td><b>dct:hasVersion</b></td>
+			<td>rdfs:subPropertyOf</td>
+			<td>prov:hadDerivation</td>
+			<td>Inverse property of <code>dct:isVersionOf</code>.</td>
+		</tr>
+		<tr>
+			<td><b>dct:isReplacedBy</b></td>
+			<td>rdfs:subPropertyOf</td>
+			<td>prov:influenced</td>
+			<td>Inverse property of <code>dct:replaces</code></td>
+	</tr>
+		</tbody>
+	</table>
+</div>
 
 <!--<p>
 Under discussion (dropped out from the inicial draft):
@@ -1089,8 +1113,7 @@
 <div id="refinements" class="section"> 
 <h3><span class="secno">2.4 </span>PROV refinements</h3>
 <p>
-To properly reflect the meaning of the Dublin Core terms, we need refinements,
- i.e. more specific subclasses:
+To properly reflect the meaning of the Dublin Core terms, more specific subclasses are needed:
 </p><p>
 <pre class="code">
  prov:PublicationActivity      rdfs:subClassOf     prov:Activity .
@@ -1107,7 +1130,7 @@
  
 <p>
 Custom refinements of the properties should be omitted as they would be identical to the Dublin Core terms. If these more
- specific properties are wanted, the Dublin Core terms should be used directly, according to the direct mappings presented in section 2.3. 
+ specific properties are needed, the Dublin Core terms should be used directly, according to the direct mappings presented in section 2.3. 
 </p>
 </div>
 <div id="complex_mappings" class="section">
@@ -1121,7 +1144,7 @@
 <p>
 In this category, we have four terms: contributor, creator, publisher, and rightsHolder. The former three
  can be mapped with the same pattern, similar to the one presented in <a href="#figure_mapping_example">Figure 1</a>.
- The only main changes changes are the roles and activities involved in each term.
+ The only changes required are the roles and activities involved for each term.
  
  <p>
  In the text below, variables ?document and ?agent are set to different matching values depending
@@ -1129,56 +1152,63 @@
  where the variables are placeholders that are filled with the values found in the data.
  The mapping corresponds to the graph in <a href="#figure_mapping_example">Figure 1</a> (with small changes
 for creator and rightsHolder). With this mapping,
- the difference in the complexity becomes obvious. A lot of blank nodes are created, so a subsequent clean-up phase that relates them and provides stable
+ the difference in the complexity becomes obvious. Many blank nodes are created, so a subsequent clean-up phase that relates them and provides stable
  URIs for the entities is required. Depending on the implementation, URIs can also be coined here for every specialization.
  Sometimes, URIs for the specializations are also available and simply not exposed to the Dublin Core record.
- Our implementation is only an example that works conservatively, i.e., we assume that there is no further
- information about the identity of specializations available. 
+ The implementation proposed in this document is an example that works conservatively. The assumption is that no further
+ information about the identity of the specializations is available. 
 
 </p>
  
 </p><p>
 <h5 id="term_creator"><span class="secno">2.5.1.1 </span>dct:creator</h5>
-The creator is the agent associated with role CreatorRole in the CreationActivity that created a specialization of the entity (?document). 
-We avoid using the Dublin Core entity because it may have other statements referring to it (about publishing, licensing, modifying it, etc.). 
+A creator is the agent associated with role CreatorRole in the CreationActivity that created a specialization of the entity (?document).  
 <pre class="code">
- CONSTRUCT {
-    ?document a prov:Entity .
+  CONSTRUCT {
+    ?document a prov:Entity ;
 		prov:wasAttributedTo ?agent.
+		
     ?agent a prov:Agent .
-    _:activity a prov:Activity, prov:CreationActivity ;
+	
+    _:activity a prov:Activity, dcprov:CreationActivity ;
 		prov:wasAssociatedWith ?agent;
 		prov:qualifiedAssociation [
 			a prov:Association;
 			prov:agent ?agent;
-			prov:hadRole prov:CreatorRole .
+			prov:hadRole dcprov:CreatorRole .
 		].
+		
     _:resulting_entity a prov:Entity ;
 		prov:specializationOf ?document ;
 		prov:wasGeneratedBy _:activity ;
 		prov:wasAttributedTo ?agent.		
+		
  } WHERE {
     ?document dct:creator ?agent.
  }
 </pre>
 <h5 id="term_contributor"><span class="secno">2.5.1.2 </span>dct:contributor</h5>
-In the same way, publisher and contributor can be mapped, only the roles and activities change:
+Contributor is mapped following the previous pattern. Only the roles and activities change:
 <pre class="code">
  CONSTRUCT {
-    ?document a prov:Entity .
+    ?document a prov:Entity ;
 		prov:wasAttributedTo ?agent .
+		
     ?agent a prov:Agent .
+		
     _:activity a prov:Activity, prov:ContributionActivity ;
 		prov:wasAssociatedWith ?agent ;
 		prov:qualifiedAssociation [ 
 			a prov:Association ;
 			prov:agent ?agent ;
 			prov:hadRole prov:ContributorRole .
-		]
-    _:resulting_entity a prov:Entity ;
+		].
+		
+    _:resulting_entity a prov:Entity ;		
 		prov:specializationOf ?document ;
 		prov:wasGeneratedBy _:activity ;
 		prov:wasAttributedTo ?agent .
+		
  } WHERE {
     ?document dct:contributor ?agent .
  }
@@ -1186,25 +1216,35 @@
 <h5 id="term_publisher"><span class="secno">2.5.1.3 </span>dct:publisher</h5>
 In case of publication, a second specialization representing the entity before the publication is necessary: 
 <pre class="code">
- CONSTRUCT {
-    ?document a prov:Entity .
+  CONSTRUCT {
+    ?document a prov:Entity ;
 		prov:wasAttributedTo ?agent .
+		
     ?agent a prov:Agent .
+	
+    _:used_entity a prov:Entity;
+		prov:specializationOf ?document.
+		
     _:activity a prov:Activity, prov:PublicationActivity ;
+		prov:used _:used_entity;
 		prov:wasAssociatedWith ?agent ;
 		prov:qualifiedAssociation [ 
 			a prov:Association ;
 			prov:agent ?agent ;
 			prov:hadRole prov:PublisherRole .
-		]
+		].
+		
     _:resulting_entity a prov:Entity ;
-		prov:specializationOf ?document ;
+		prov:specializationOf ?document ;		
+		prov:wasDerivedFrom _:used_entity
 		prov:wasGeneratedBy _:activity ;
 		prov:wasAttributedTo ?agent .
+		
  } WHERE {
     ?document dct:publisher ?agent .
  }
 </pre>
+<!--
 <h5 id="term_rights_holder"><span class="secno">2.5.1.4 </span>dct:rightsHolder</h5>
 The rightsHolder concept mapping is slightly different. Here we propose to omit the activity and just add the rights holder to the entity by means of
  <code>prov:wasAttributedTo</code>. This mapping could actually be omitted as the statements can be inferred from the direct mapping.
@@ -1217,21 +1257,19 @@
   ?document dct:rightsHolder      ?agent .
  }
 </pre>
+-->
 </p>
+
 </div>
 <div id="entity_date_mappings">
 <h4><span class="secno">2.5.2 </span>Entity-Date mappings (When)</h4>
 <p>
 The dates often correspond with a who-property, e.g., creator and created or publisher and issued.
  Therefore, they lead to similar statements, only providing a date instead of an agent associated with the activity.
- We use issued as an example here, because from issued, two specializations can be inferred: something must be available
- before it can be published.
 </p>
 <p>
 When using Dublin Core terms, it is usual to see that a resource is annotated with several dc assertions like creator, publisher,
- issued, date, etc. Therefore if we assume that each date corresponds to the generation date by an activity (creationActivity,
- publishingActivity, etc.) then we can't say that all those activities generated the resource. Instead, in order to generate "proper"
- provenance records, we say that all those activities generated an entity which for which the resource is a specialization.
+ issued, date, etc. In this phase of the mapping each term is trated independently.
 </p>
 <h5 id="term_created"><span class="secno">2.5.2.1 </span>dct:created</h5> 
 </p><p><pre class="code">
@@ -1284,7 +1322,7 @@
 </p>
 <h5 id="term_modified"><span class="secno">2.5.2.3 </span>dct:modified</h5> 
 <p>
-As seen with the following terms, most entity/date properties will have a similar structure.
+As seen with the previous terms, most entity/date properties will have a similar structure.
 </p><p><pre class="code"> 
  CONSTRUCT{
  ?document             a                         prov:Entity .
@@ -1460,7 +1498,7 @@
     OPTIONAL { ?document2 dct:references ?document1 .}
  }
  
-</pre></p><p>-->
+</pre></p><p>
 </p>
 <h5 id="term_replaces"><span class="secno">2.5.3.3 </span>dct:replaces / dct:isReplacedBy</h5>
 <p><pre class="code">
@@ -1488,19 +1526,18 @@
 -->
 </div>
 <div id="cleanup" class="section">
-<h2><span class="secno">2.6 </span>Cleanup</h2>
+<h3><span class="secno">2.5.3 </span>Cleanup</h3>
 <p>
-The clean-up phase depends on the intensions of the implementor and the answer to the question,
- <i>what is the described resource (<code>ex:document1</code>)?</i> in the resulting provenance data. The approach presented in this document 
+The clean-up phase depends on how implementors interpret the described resources. The approach presented in this document 
  is conservative and it leads to the proliferation of blank nodes. Blank nodes could be renamed to specific identifiers
  by the implementor, in order to avoid obtaining additional blank nodes when reapplying the construct queries presented
  in the previous section.</p>
  <p> Providing a set of rules to conflate the blank nodes is not in the scope of this document. However, the group has
- created a list of suggestions for implementors with ideas on how this could be achieved:</p>
- <p>1)<b>Conflate properties referring to the same state of the resource</b>: In Dublin Core certain properties complement each other (e.g.,
+ created a list of suggestions for implementors with proposals on how this could be achieved:</p>
+ <p>1)<b> Conflate properties referring to the same state of the resource</b>: In Dublin Core certain properties complement each other (e.g.,
  creator and created, publisher and issued, modified and contributor, etc.). By combining some of the queries, we could group
  some of the records and create more complete PROV assertions.</p>
- <p>Example: Combining created and creator:
+ <p>The example below shows how a modification to the previous pattern would conflate the blank nodes for <code>dct:creator</code> and <code>dct:created</code> properties: 
  <pre class="code">
  CONSTRUCT{
  ?document               a                         prov:Entity .
@@ -1508,17 +1545,16 @@
  _:activity              a                         prov:Activity, prov:CreationActivity.
                          prov:wasAssociatedWith    ?agent
                          prov:qualifiedAssociation [
-			                  a prov:Association;
-			                  prov:agent ?agent;
-			                  prov:hadRole prov:CreatorRole .
+			       a prov:Association;
+			       prov:agent ?agent;
+			       prov:hadRole prov:CreatorRole .
                          ]
 			  
  # The “output”
  _:created_entity      a                         prov:Entity ;
                        prov:specializationOf     ?document ;
                        prov:wasGeneratedBy       _:activity ;
-                       prov:wasGeneratedAtTime   ?date;
-                       prov:wasDerivedFrom       _:used_entity ;
+                       prov:wasGeneratedAtTime   ?date;                 
                        prov:qualifiedGeneration  [ 
                               a prov:Generation ;
                               prov:atTime ?date  ;			
@@ -1529,20 +1565,39 @@
             dct:created  ?date.
  }
  </pre>
+ <a href="#figure_cleanup1">Figure 3</a> shows a graphical representation of the pattern:
+ 
+<div id = "figure_cleanup1" class="figure" style="text-align: center;">
+	<img src="img/cleanup1.png"></img>
+	<div style="text-align: center;">
+	<a href="#figure_mapping_example">Figure 3</a>. Gathering complementing properties to conflate blank nodes.	
+	</div>
+</div>
+ 
  </p>
- <p>2) Another solution would be to <b>sort all the activities according to their date</b>, if known, and conflate the blank
-nodes result of one activity and the input of the subsequent activity, in case they are both specializations of the same entity. </p>
-<p>3) Finally, another simpler idea is to <b>ignore all the specializations of ex:document1 and use the resource itself</b>. This solution
+ <p>2) Another solution is to <b>sort all the activities according to their date</b>, if known, and conflate the blank
+nodes result of one activity and the input of the subsequent activity, in case they are both specializations of the same entity. 
+<a href="#figure_cleanup2">Figure 4</a> shows a graphical example with two different activities (creation and publication) that happened at different
+points in time. Instead of creating different blank nodes for the respective usage and generation, both activities shared the same
+blank node (_:created_entity).
+<div id = "figure_cleanup2" class="figure" style="text-align: center;">
+	<img src="img/cleanup2.png"></img>
+	<div style="text-align: center;">
+	<a href="#figure_mapping_example">Figure 4</a>. Sorting the activities by dat to conflate blank nodes.	
+	</div>
+</div>
+</p>
+<p>3) Finally, another solution is to <b>ignore all the specializations of ex:document1 and use the resource itself</b>. This solution
 would avoid the majority of the blank nodes, linking all the activities with the resource. However, the results would be confusing in
 case there are several dublin core statements describing the same resource (like publisher and creator), since most of the
 activities would use and generate the same resource at different times (all the provenance of the different versions of the resource
-would be conflated in the same entity).
+would be conflated in the same entity). A graphical representation of an example can be seen in <a href="#figure_mapping_example_conflating">Figure 2</a>.
 </p>
 <p>
 </p>
 </div>
 <div id="list_of_excluded_terms" class="section">
-<h2><span class="secno">2.7. </span>List of terms excluded from the mapping</h2>
+<h2><span class="secno">2.6. </span>List of terms excluded from the mapping</h2>
 <p>
 	<table>
 	<caption> <a href="#list_of_excluded_terms"> Table 6:</a> List of terms excluded from the mapping </caption>
@@ -1728,7 +1783,7 @@
 <!-- OddPage -->
 <h2><span class="secno">A. </span>Acknowledgements</h2>
    <p>
-    We would like to thank Antoine Isaac, Timothy Lebo, Simon Miles, and Satya Sahoo for their feedback. 
+    We would like to thank Antoine Isaac, Timothy Lebo, Simon Miles, Satya Sahoo and Ivan Herman for their feedback. 
    </p>
   </div>
author	dgarijo
	Mon, 08 Oct 2012 23:07:04 +0200
changeset 4513	37f9692fb7a0
parent 4512	f7844e3de04b
child 4514	530647e86e4a