prov: changeset 4502:a047bb6e7ec9

--- a/dc-note/Overview.html	Sat Sep 29 03:15:23 2012 +0200
+++ b/dc-note/Overview.html	Mon Oct 01 19:19:32 2012 +0200
@@ -599,13 +599,13 @@
 								<li class="tocline"><a href="#term_dateCopyRighted" class="tocxref"><span class="secno">2.5.2.5 </span>dct:dateCopyRighted</a></li>
 								<li class="tocline"><a href="#term_dateSubmitted" class="tocxref"><span class="secno">2.5.2.6 </span>dct:dateSubmitted</a></li>
 							</ul>
-						<li class="tocline"><a href="#entity_entity_mappings" class="tocxref"><span class="secno">2.5.3 </span>Entity-Entity (How) mappings</a></li>
+						<!--<li class="tocline"><a href="#entity_entity_mappings" class="tocxref"><span class="secno">2.5.3 </span>Entity-Entity (How) mappings</a></li>
 							<ul class="toc">
 								<li class="tocline"><a href="#term_has_Version" class="tocxref"><span class="secno">2.5.3.1 </span>dct:isVersionOf/hasVersion</a></li>
 								<li class="tocline"><a href="#term_has_Format" class="tocxref"><span class="secno">2.5.3.2 </span>dct:isFormatOf/hasFormat</a></li>								
 								<li class="tocline"><a href="#term_replaces" class="tocxref"><span class="secno">2.5.3.3 </span>dct:replaces/replacedBy</a></li>
 								<li class="tocline"><a href="#term_source" class="tocxref"><span class="secno">2.5.3.4 </span>dct:source</a></li>
-							</ul>
+							</ul>-->
 					</ul>
 				<li class="tocline"><a href="#cleanup" class="tocxref"><span class="secno">2.6 </span>Cleanup</a></li>
 				<li class="tocline"><a href="#list_of_excluded_terms" class="tocxref"><span class="secno">2.7 </span>List of the terms excluded from the mapping</a></li>
@@ -650,14 +650,14 @@
 </pre>
 <p>
 Clearly not all metadata statements deal with provenance. 
-For instance, dct:title, dct:subject and dct:format are descriptions of the resource ex:document1. 
+For instance, <code>dct:title</code>, <code>dct:subject</code> and <code>dct:format</code> are descriptions of the resource <code>ex:document1</code>. 
 They do not provide any information how the resource was created or modified in the past.
- On the other hand, some statements imply provenance-related information, e.g., dct:creator 
+ On the other hand, some statements imply provenance-related information, e.g., <code>dct:creator</code> 
  implies that the document has been created and refers to the author. Similarly, the existence 
- of the dct:issued date implies that the document has been published. This information is redundantly 
- implied by the dct:publisher statement as well. Finally, dct:replaces relates 
- our document to another document ex:doc2 and it can be inferred that this document had probably
- some kind of influence on our document ex:document1, which also gives us some provenance related information.
+ of the <code>dct:issued</code> date implies that the document has been published. This information is redundantly 
+ implied by the dct:publisher statement as well. Finally, <code>dct:replaces</code> relates 
+ our document to another document <code>ex:doc2</code> and it can be inferred that this document had probably
+ some kind of influence on our document <code>ex:document1</code>, which also gives us some provenance related information.
 </p><p>
 This is a pattern that applies generally to metadata, i.e., we can distinguish 
 description metadata and provenance metadata. To be more precise, we define provenance 
@@ -686,7 +686,7 @@
 According to our classification, there are 25 terms out of 55 that can be considered as provenance related.
  As a next step, we consider sub-categories of the provenance related terms as follows:
 </p><p>
-<b>Who?</b> (contributor, creator, publisher, rightsHolder) Category that includes all properties that have the range dct:Agent,
+<b>Who?</b> (contributor, creator, publisher, rightsHolder) Category that includes all properties that have the range <code>dct:Agent</code>,
  i.e., a resource that acts or has the power to act. The contributor, creator, and publisher clearly influence
  the resource and therefore are important for its origin. This is not immediately clear for the rightsHolder,
  but as ownership is considered the important provenance information for artworks, we have decided to include it in this category.
@@ -826,28 +826,29 @@
  a different activity later in time, it can be assumed that both are the same entities, if the second activity directly follows the first
  activity. These conflations and other clean-up steps are performed separately, as there are several possibilities to perform them.
 </p><p>
-Clean-up. Based on the context-free mapping, reasoning patterns can be employed to clean-up the data, e.g. by conflating blank nodes
- that are actually the same or by identifying a final specialization of the original document that is identical to this document.
+Clean-up. The context free mapping produces blank nodes for each <code>dct</code> statement. The number of blank nodes can be reduced 
+by applying reasoning patterns to clean up the data, e.g. by conflating nodes that are actually the same (e.g., an issued document could
+be the same as the created document).
 </p>
 <p>
 </div>
 <div id="entities_in_dc" class="section">
 <h3><span class="secno">2.2 </span>What is ex:document1? Entities in Dublin Core</h3>
 <p>
-Consider the example metadata record above (<a href="#example1">example 1)</a>. As a DC metadata record describes the resulting document as a whole,
+Consider the example metadata record above (<a href="#example1">example 1)</a>. As a <code>dc</code> metadata record describes the resulting document as a whole,
  it is not clear how this document relates to the different states that the document had until it reached its final state.
- For example, a document can have assigned a dct:created date and a dct:issued date. The activity of issuing a document
- does not necessarily change the document, but regarding the PROV ontology, there are two different specializations of
- this document before and after the issuing activity, distinguishable by the property of the document that states if
- the document was issued. Generally, there are two possibilities to deal with this issue:</p>
+ For example, a document can have assigned a <code>dct:created</code> date and a <code>dct:issued</code> date. According to
+ the PROV ontology, the activity of issuing a document involves two different states of the document: the document beffore it was issued
+ and the issued document. Each of these states correspond to a different specialization of the document, even if the document
+ has not changed. Generally, there are two possibilities to deal with this issue:</p>
 </p><p>
-    1) We can always create new instances of entities, typically as blank nodes, that all are related to the original
-	document by means of prov:specializationOf. This leads to bloated and not very intuitive data models, e.g. think
-	about the translation of a single dct:creator statement, where you would expect to somehow find some activity and 
+    1) Create new instances of entities, typically as blank nodes, that are all related to the original
+	document by means of <code>prov:specializationOf</code>. This leads to bloated and not very intuitive data models, e.g. think
+	about the translation of a single <code>dct:creator</code> statement, where you would expect to somehow find some activity and 
 	agent that are directly related to the document (as in <a href="#figure_mapping_example">Figure 1</a>).
 </p><p>	
-    2) We can always use the original document as the instance that is used as prov:Entity. The implications regarding
-	the semantics of a prov:Activity are not yet totally clear, however, it contradicts the above mentioned definition
+    2) Use the original document as the instance that is used as <code>prov:Entity</code>. The implications regarding
+	the semantics of a <code>prov:Activity</code> are not yet totally clear, however, it contradicts the above mentioned definition
 	to have an activity that uses an entity and generates the same entity. If an activity actually generates an entity,
 	it is semantically incorrect to have several activities that all generate the same entity at different points in time.
 	<!--<b>This has to be investigated and discussed further. For references, see PROV-DM Generation, PROV-DM Derivation,
@@ -867,8 +868,8 @@
     How do we reduce the number of specializations, e.g., by stating that the specialization that is generated by activity
 	1 is the same entity that is used by activity 2?
 </p><p>	
-    How do we relate the specializations to ex:document1? We could create two entities based on the actual creation activity:
-	ex:document1 and a first specialization. We could further declare the last produced specialization as the same entity as ex:document1.
+    How do we relate the specializations to <code>ex:document1</code>? We could create two entities based on the actual creation activity:
+	<code>ex:document1</code> and a first specialization. We could further declare the last produced specialization as the same entity as <code>ex:document1</code>.
 	Depending on the underlying data, this can be the entity that is identified by the URI of the original document. However,
 	we have to be careful to avoid cycles in the provenance we produce. For now, this remains undecided.
 </p>
@@ -904,7 +905,7 @@
 		<td><b>dct:Agent</b></td>
 		<td>owl:equivalentClass</td>
 		<td> prov:Agent.</td>
-		<td>Both dct:Agent and prov:Agent refer to the same thing: a resource that has the power to act (which then has responsability for an activity)</td>
+		<td>Both <code>dct:Agent</code> and <code>prov:Agent</code> refer to the same thing: a resource that has the power to act (which then has responsability for an activity)</td>
 	</tr>
 	<tr>
 		<td><b>dct:rightsHolder</b></td>
@@ -936,7 +937,7 @@
 		<td><b>dct:isVersionOf</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:wasDerivedFrom</td>
-		<td>dct:isVersionOf refers to "a related resource to which the current resource is a version, edition or adaptation". Hence we can
+		<td><code>dct:isVersionOf</code> refers to "a related resource to which the current resource is a version, edition or adaptation". Hence we can
 		conclude that the current resource has been derived from the original one.</td>
 	</tr>
 	<tr>
@@ -949,13 +950,13 @@
 		<td><b>dct:isFormatOf</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:alternateOf</td>
-		<td>dct:isFormatOf refers to another resource which is the same but in another format. Thus the mapping is straightforward to prov:alternateOf</td>
+		<td><code>dct:isFormatOf</code> refers to another resource which is the same but in another format. Thus the mapping is straightforward to <code>prov:alternateOf</code></td>
 	</tr>
 	<tr>
 		<td><b>dct:hasFormat</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:alternateOf</td>
-		<td> See rationale for dct:isFormatOf</td>
+		<td> See rationale for <code>dct:isFormatOf</code></td>
 	</tr>
 	<tr>
 		<td><b>dct:replaces</b></td>
@@ -963,7 +964,7 @@
 		<td>prov:wasInfluencedBy</td>
 		<td>This mapping is not straightforward. There is a relation between 2 resources when the former replaces the latter, but it is not necessarily
 		derivation, revision, specification or alternate. Since we want to state some influence but we don't find any specific relation that matches
-		the dct term, we propose to map it to the abstract term prov:wasInfluencedBy</td>
+		the dct term, we propose to map it to the abstract term <code>prov:wasInfluencedBy</code></td>
 	</tr>
 	<tr>
 		<td><b>dct:isReplacedBy</b></td>
@@ -975,28 +976,29 @@
 		<td><b>dct:source </b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:wasDerivedFrom</td>
-		<td>In Dublin Core, dct:source is defined as a "related resource from which the described resource is derived", which matches the notion of derivation
+		<td>In Dublin Core, <code>dct:source</code> is defined as a "related resource from which the described resource is derived", which matches the notion of derivation
 		in PROV-DM ("a transformation of an entity in another")</td>
 	</tr>
 	<tr>
 		<td><b>dct:type</b></td>
 		<td>owl:equivalentProperty</td>
 		<td>prov:type</td>
-		<td>Both properties refer to the same thing: the nature of the resource (or genre). It could be mapped to rdf:type if we map the document against PROV-O</td>
+		<td>Both properties refer to the same thing: the nature of the resource (or genre).</td>
 	</tr>
 	<tr>
 		<td><b>dct:created</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:generatedAtTime</td>
-		<td>dct:created is a property to describe the time of cretion of the entity, which is the time of its generation as well. We have decided
-		to map it as a subclass because the resources in Dublin Core have associated many dates, which could be associated to each of their versions.
-		In this case, we see the creation as the first one, but not necessarily the current version of the resource.</td>
+		<td><code>dct:created</code> is a property used to describe the time of creation of an entity, which corresponds to the time of its generation. 
+		The rationale to map this property as a subclass of <code>prov:generatedAtTime</code> is that resources in Dublin Core may have  
+		many dates associated to them (creation, modification, issue, etc.), each of which could correspond to a different version of the document. 
+		In this case, the creation is the first date asserted to the document, but doesn't necessarily correspond to the current version of the resource.</td>
 	</tr>
 	<tr>
 		<td><b>dct:issued</b></td>
 		<td>rdfs:subPropertyOf</td>
 		<td>prov:generatedAtTime</td>
-		<td>Date when the resource was issued. It is mapped as a subproperty of prov:generatedAtTime because the issued resource is an entity itself,
+		<td>Date when the resource was issued. It is mapped as a subproperty of <code>prov:generatedAtTime</code> because the issued resource is an entity itself,
 		which has been generated at a certain time.</td>
 	</tr>
 	<tr>
@@ -1026,10 +1028,10 @@
 	</tbody>
 </table>
 </div>
-Regarding the dates mappings, we realize that if we have a metadata record such as <a href="#example1">example 1</a>, the direct mapping will infer that 
-the resource was prov:generatedAtTime at two different times. Although this may seem inconsistent, it is supported by PROV and it is due the difference 
+With the direct mapping, a metadata record such as <a href="#example1">example 1</a> will infer that 
+the resource was <code>prov:generatedAtTime</code> at two different times. Although this may seem inconsistent, it is supported by PROV and it is due the difference 
 between Dublin Core and PROV resources: while the former conflates more than one version or "state" of the resource in a single entity, the latter
-proposes to separate all of them. It would produce "scruffy" provenance (i.e., valid provenance which will not comply with all the PROV consraints [<a href="#bib-Constraints">PROV_CONSTRAINTS]</a>)
+proposes to separate all of them. Thus, the mapping would produce "scruffy" provenance (i.e., valid provenance which will not comply with all the PROV consraints [<a href="#bib-Constraints">PROV_CONSTRAINTS]</a>)
 </p>
 <p>
 We end the direct mapping with the properties that have been found to be superproperties of certain prov concepts. The summary can be seen below in 
@@ -1053,15 +1055,15 @@
 			<td>prov:hadPrimarySource</td>
 			<td>rdfs:subPropertyOf</td>
 			<td><b>dct:source</b></td>
-			<td>It is surprising to see that some terms of Dublin Core are more general than the ones defined in PROV. However the definition of prov:hadPrimarySource
-			("something produced by some agent with direct experience and knowledge about the topic") is more restrictive than dct:source ( "A related resource from which the described resource is derived").</td>
+			<td>It is surprising to see that some terms of Dublin Core are more general than the ones defined in PROV. However the definition of <code>prov:hadPrimarySource</code>
+			("something produced by some agent with direct experience and knowledge about the topic") is more restrictive than <code>dct:source</code> ( "A related resource from which the described resource is derived").</td>
 		</tr>
 		<tr>
 			<td>prov:wasRevisionOf</td>
 			<td>rdfs:subPropertyOf</td>
 			<td><b>dct:isVersionOf</b></td>
-			<td>Similar to the previous property, prov:wasRevisionOf is more restrictive in the sense that it refers to revised version of a resource, while
-			dct:isVersionOf involves versions, editions or adaptations of the original resource.</td>
+			<td>Similar to the previous property, <code>prov:wasRevisionOf</code> is more restrictive in the sense that it refers to revised version of a resource, while
+			<code>dct:isVersionOf</code> involves versions, editions or adaptations of the original resource.</td>
 		</tr>
 		</tbody>
 	</table>
@@ -1205,7 +1207,7 @@
 </pre>
 <h5 id="term_rights_holder"><span class="secno">2.5.1.4 </span>dct:rightsHolder</h5>
 The rightsHolder concept mapping is slightly different. Here we propose to omit the activity and just add the rights holder to the entity by means of
- prov:wasAttributedTo. This mapping could actually be omitted as the statements can be inferred from the direct mapping.
+ <code>prov:wasAttributedTo</code>. This mapping could actually be omitted as the statements can be inferred from the direct mapping.
 <pre class="code">
  CONSTRUCT {
   ?document a                     prov:Entity .
@@ -1394,16 +1396,17 @@
 </pre>
 </p>
 </div>
+<!--
 <div id="entity_entity_mappings">
 <h4><span class="secno">2.5.3 </span>Entity-Entity mappings (How)</h4>
 <p>
-Most Dublin Core terms in this category are related to the prov:wasDerivedFrom property.
+Most Dublin Core terms in this category are related to the <code>prov:wasDerivedFrom</code> property.
  They can be mapped directly, but also a complex mapping can be provided. In these cases, a specialty of SPARQL 
  CONSTRUCT queries can be used to deal with the inverse properties in Dublin Core.
 </p>
 <h5 id="term_has_Version"><span class="secno">2.5.3.1 </span>dct:isVersionOf / dct:hasVersion</h5>
 <p>
-I would say that prov:wasDerivedFrom>dct:isVersionOf>prov:wasRevisionOf. Thus:
+I would say that <code>prov:wasDerivedFrom</code>><code>dct:isVersionOf</code>><code>prov:wasRevisionOf</code>. Thus:
 </p><p><pre class="code">
  CONSTRUCT {
     ?document1 a prov:Entity ;
@@ -1425,12 +1428,12 @@
  In essence, these examples sketch the first part of the mapping. As everything is provided as 
  RDF statements or SPARQL CONSTRUCT queries, this mapping can simply applied to arbitrary RDF data by
  adding the statements and the resulting graphs from the queries to the data. 
- </p>-->
+ </p> this should be commented
  <p>
 <h5 id="term_has_Format"><span class="secno">2.5.3.2 </span>dct:isFormatOf / dct:hasFormat</h5>
 </p><p>
 isFormatOf is defined as “A related resource that is substantially the same as the described resource, but in another format”. This
- would map to prov:alternateOf. We don’t know which entities are both of them specializing, but we know that one is an alternate of the other.
+ would map to <code>prov:alternateOf</code>. We don’t know which entities are both of them specializing, but we know that one is an alternate of the other.
 </p><p> <pre class="code">
  CONSTRUCT {
     ?document1 a prov:Entity ;
@@ -1482,12 +1485,13 @@
 </pre>
 </p>
 </div>
+-->
 </div>
 <div id="cleanup" class="section">
 <h2><span class="secno">2.6 </span>Cleanup</h2>
 <p>
 The clean-up phase depends on the intensions of the implementor and the answer to the question,
- <i>what is the described resource (ex:document1)?</i> in the resulting provenance data. The approach presented in this document 
+ <i>what is the described resource (<code>ex:document1</code>)?</i> in the resulting provenance data. The approach presented in this document 
  is conservative and it leads to the proliferation of blank nodes. Blank nodes could be renamed to specific identifiers
  by the implementor, in order to avoid obtaining additional blank nodes when reapplying the construct queries presented
  in the previous section.</p>
@@ -1599,7 +1603,7 @@
 	</tr><tr>
 		<td><b id="term_identifier">dct:identifier</b></td> 
 		<td>Descriptive metadata</td>
-		<td>An unambiguous reference on a given context. Note: it could be mapped to the PROV-DM' ID for entities. </td>
+		<td>An unambiguous reference on a given context. </td>
 	</tr><tr>
 		<td><b id="term_instructionalMethod">dct:instructionalMethod</b></td>
 		<td>Descriptive metadata</td>
@@ -1638,7 +1642,7 @@
 	</tr><tr>
 		<td><b id="term_spatial">dct:spatial</b></td> 
 		<td>Descriptive metadata</td>
-		<td>Spatial characteristics of the content of the resource resource (e.g., the book is about Spain). Thus it can't be mapped to prov:hadLocation.</td>
+		<td>Spatial characteristics of the content of the resource resource (e.g., the book is about Spain). Thus it can't be mapped to <code>prov:hadLocation</code>.</td>
 	</tr><tr>
 		<td><b id="term_subject">dct:subject</b></td>
 		<td>Descriptive metadata</td>
@@ -1663,7 +1667,7 @@
 		<td><b id="term_bibliographicCitation">dct:bibliographicCitation</b></td>
 		<td>Descriptive metadata</td>
 		<td>Property that relates the Literal representing the bibliographic citation of the resource to the 
-	actual resource (e.g., :el_Quijote dct:bibliographicCitation "Miguel de Cervantes Saavedra: El Quijote, España").</td>
+	actual resource (e.g., <code>:el_Quijote dct:bibliographicCitation "Miguel de Cervantes Saavedra: El Quijote, España"</code>).</td>
 	</tr><tr>
 		<td id="term_references"><b>dct:references</b></td> 
 		<td> Provenance: How </td>
author	dgarijo
	Mon, 01 Oct 2012 19:19:32 +0200
changeset 4502	a047bb6e7ec9
parent 4498	a7bb257d62f8
child 4503	6992fdf536e7