--- a/dc-note/dc-note.html Tue Feb 05 10:44:50 2013 +0100
+++ b/dc-note/dc-note.html Tue Feb 05 15:42:56 2013 +0100
@@ -31,7 +31,10 @@
"DCTERMS":
"<a href=\"http://dublincore.org/documents/dcmi-terms/\"><cite>Dublin Core Terms Vocabulary</cite></a>. "+
"8 December 2010. "+
- "URL: <a href=\"http://dublincore.org/documents/dcmi-terms/\">http://dublincore.org/documents/dcmi-terms/</a>"
+ "URL: <a href=\"http://dublincore.org/documents/dcmi-terms/\">http://dublincore.org/documents/dcmi-terms/</a>",
+ "RDFS":
+ "Dan Brickley; Ramanathan V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema. 10 February 2004. W3C Recommendation."+
+ "URL: <a href=\"http://www.w3.org/TR/2004/REC-rdf-schema-20040210\">http://www.w3.org/TR/2004/REC-rdf-schema-20040210/</a>"
};
var respecConfig = {
@@ -579,7 +582,6 @@
<section>
<h2>Introduction</h2>
-
<p>
The Dublin Core Metadata Initiative (DCMI, commonly referred to as Dublin Core) [[DCMI]] provides a core metadata vocabulary for simple and generic resource descriptions.
The original element set was created in 1995 and contains 15 broadly-defined elements still in use.
@@ -593,21 +595,69 @@
<code>dc</code> prefix, while <code>dct</code> (or <code>dcterms</code>) is used as prefix for the newer DCMI element set.
</p>
<p>
+ This document defines a mapping between the <code>dct</code> terms and the PROV Ontology (PROV-O) [[PROV-O]], which defines an OWL2 Ontology encoding
+ the PROV Data Model [[PROV-DM]]. Substantially, the mapping consists of three parts:
+ </p><p>
+ 1) <b>Direct mappings</b> between terms that can be expressed in the form of subclass or subproperty relationships in RDFS
+ – or equivalent relationships in OWL.
+ </p><p>
+ 2) Definition of new <b>refinements</b> (subclasses or subproperties) of the target vocabulary to reflect the expressiveness of the source vocabulary.
+ </p><p>
+ 3) Provision of <b>complex mappings</b> that create statements in the target vocabulary based on statements in the source vocabulary. Since
+ the mapping produces blank nodes for each <code>dct</code> statement, a clean-up phase with strategies for reducing the blank nodes is also necessary.
+ </p>
+ </p>
+ <section>
+ <h3 id ="namespaces">Namespaces</h3>
+ <p>The namespaces used through the document can be seen in <a href="#ns"> Table 2</a> below:
+ <div id="ns" ALIGN="center">
+ <table>
+ <caption> <a href="#ns"> Table 2</a>: Namespaces used in the document </caption>
+ <tbody>
+ <tr><td><b>prefix</b></td><td><b>Namespace IRI</b></td><td><b>Definition</b></td></tr>
+ <tr><td>owl</td><td><http://www.w3.org/2002/07/owl#></td><td>The OWL namespace [[OWL2-OVERVIEW]]</td></tr>
+ <tr><td>rdfs</td><td><http://www.w3.org/2000/01/rdf-schema#></td><td>The RDFS namespace[[RDFS]]</td></tr>
+ <tr><td>prov</td><td><http://www.w3.org/ns/prov#></td><td>The PROV namespace [[PROV-DM]]</td></tr>
+ <tr><td>dct</td><td><http://purl.org/dc/terms/></td><td>Dublin Core Terms namespace [[DCTERMS]]</td></tr>
+ <tr><td>ex</td><td><http://example.org></td><td>Application-dependent IRIs</td></tr>
+ </tbody>
+ </table>
+ </div>
+ </p>
+ </section>
+
+ <section>
+ <h3 id ="namespaces">How to use this document</h3>
+ <p>
+ ------------>TO DO, methodology here.<------------
+ </p>
+ </section>
+ </div>
+</section>
+
+<section>
+ <h2>Mapping from Dublin Core to PROV</h2>
+ <p>A mapping between Dublin Core Terms and PROV-O has many advantages. First, it can provide valuable insights
+ into the different characteristics of both data models (in particular it explains PROV from a Dublin Core point of view).
+ Second, such a mapping can be used to extract PROV data from the large amount of Dublin Core data available on
+ the Web today. Third, the mapping can translate PROV data to Dublin Core and make it accessible for applications that
+ understand Dublin Core. Finally, the mapping can lower the barrier to entry for PROV adoption. Simple Dublin Core
+ statements can be used as starting point for PROV data generation. </p>
+ <section>
+ <h3>Provenance in Dublin Core</h3>
+ <p>
DCMI terms hold a lot of provenance information about a resource: <i>when</i> it was affected in the past,
<i>who</i> affected it and <i>how</i> it was affected. The rest of the DCMI terms (description metadata), tell us <i>what</i> was affected.
- There is no direct information in Dublin Core describing <i>where</i> a resource was affected. Such information is usually
- only available for the publication of a resource.
- </p>
- <p>
- <a href="#categories">Table 1</a> classifies the <code>dct</code> terms in these four categories: <i>what?</i>, <i>who?</i>, <i>when?</i> and <i>how?</i>.
+ <a href="#categories">Table 1</a> classifies the <code>dct</code> terms according to these four categories (<i>what?</i>, <i>who?</i>, <i>when?</i> and <i>how?</i>).
Each category corresponds to the question it answers regarding the description or provenance of a given resource.
The classification is by necessity somewhat conservative, as it can be argued that some elements placed in the description metadata terms contain
- provenance information as well, depending on their usage. The categories are further explained below:
+ provenance information as well, depending on their usage. It is worth mentioning that there is no direct information in Dublin Core describing
+ <i>where</i> a resource was affected (such information is only available for the publication of a resource). The categories are further explained below:
</p>
<!-- A total of 25 out of 55 terms can be considered provenance related.-->
<p>
- <b> Descriptive Terms (What?):</b> This category contains all the terms describing a resource without refering to its provenance.
- Some examples are the <code> dct:title</code>, <code> dct:abstract</code> or <code>dct:description</code> of a resource, the <code>dct:accessRights</code> that other agents have to
+ <b> Descriptive Terms (What?):</b> This category contains all the terms describing a resource without refering to its provenance (a total of 30 out of 55 terms).
+ Some examples are the <code>dct:title</code>, <code> dct:abstract</code> or <code>dct:description</code> of a resource, the <code>dct:accessRights</code> that other agents have to
access the resource, the <code>dct:format</code> in which the resource can be found, etc.
</p>
<p>
@@ -632,16 +682,20 @@
It can be questioned whether a resource changes by
being published or not. Depending on the application, however, the publication can be seen as an action that changes
the state of the resource.
+
+ – at most indirectly, as the validity state can change if a resource is replaced by a new
+ version
-->
<p>
<b>Derivation Terms (How?):</b> This category contains derivation related terms.
- Resources are often derived from other resources. In this case, the original resource becomes part of the provenance
- record of the derived resource. Derivations can be further classified as <code>dct:isVersionOf, dct:isFormatOf, dct:replaces and dct:source</code>.
- <code>dct:references</code> is a weaker relation, but it can be assumed that a referenced resource influenced the described resource
+ When a resource is derived from other resources, the original resource becomes part of the provenance
+ record of the derived resource. In Dublin Core, derivations can be further classified as versions (<code>dct:isVersionOf</code>),
+ format serializations (<code>dct:isFormatOf</code>), replacements (<code>dct:replaces</code>) and sources of information (<code>dct:source</code>).
+ <code>dct:references</code> is a weaker relation (having a reference to a resource does not always mean that the content is derived from it),
+ but it can be assumed that a referenced resource influenced the described resource
and therefore it is relevant for its provenance. The respective inverse properties do not necessarily contribute to
the provenance of the described resource, e.g., a resource is usually not directly affected by being referenced or
- by being used as a source – at most indirectly, as the validity state can change if a resource is replaced by a new
- version. However, inverse properties belong to the provenance related terms as they can be used to describe the relations
+ by being used as a source. However, inverse properties belong to the provenance related terms as they can be used to describe the relations
between the resources involved. Finally, licensing and rights are considered part of the provenance of the resource as well,
since they restrict how the resource has been used by its owners.
</p>
@@ -683,7 +737,7 @@
<tr>
<td><b>Provenance</b></td>
<td>How</td>
- <td><a href="#term_isVersionOf">isVersionOf</a>, <a href="#term_hasVersion">"hasVersion</a>, <a href="#term_isFormatOf">isFormatOf</a>, <a href="#term_has_Format">hasFormat</a>, <a href="#term_license">license</a>,
+ <td><a href="#term_isVersionOf">isVersionOf</a>, <a href="#term_hasVersion">hasVersion</a>, <a href="#term_isFormatOf">isFormatOf</a>, <a href="#term_has_Format">hasFormat</a>, <a href="#term_license">license</a>,
<a href="#term_references">references</a>, <a href="#term_isReferencedBy">isReferencedBy</a>, <a href="#term_replaces">replaces</a>, <a href="#term_isReplacedBy">isReplacedBy</a>, <a href="#term_rights">rights</a>,
<a href="#term_source">source</a></td>
</tr>
@@ -691,12 +745,33 @@
</table>
</div>
<p>
- This leaves one very special term: <i>provenance</i>. This term is defined as a "statement of any changes in ownership and
- custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation" [[DCTERMS]],
- which corresponds to the traditional definition of provenance for artworks. Despite being relevant for provenance,
+ This leaves one very special term: <i>provenance</i>. This term is defined as a "statement of any changes in ownership and
+ custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation" [[DCTERMS]],
+ a definition that corresponds to the notion of provenance for artworks. This term can be considered a link between the resource and any
+ provenance statement about the resource, so it can't be included in any of the aforementioned categories and it is out of the scope of this mapping.
+ <!--Despite being relevant for provenance,
this definition may overlap partially with almost half of the DCMI terms, which
- specify concrete aspects of provenance of a resource.
- </p><p>
+ specify concrete aspects of provenance of a resource.-->
+ </p>
+ <!--<p>
+ The mapping is based on the categories presented in <a href="#categories">Table 1</a>, and has been divided
+ The term provenance
+ is kept out of the mapping.
+ </p>-->
+ </section>
+ <section>
+ <h3>What is ex:doc1? Entities in Dublin Core</h3>
+ <p>
+ Consider the example metadata record shown at the beginning of this document (in <a href="#example1">example 1</a>). As a <code>dc</code>
+ metadata record describes the resulting document as a whole,
+ it is not clear how this document relates to the different states that the document had until it reached its final state.
+ For example, a document may have a <code>dct:created</code> date and a <code>dct:issued</code> date. According to
+ the PROV ontology, the activity of issuing a document involves two different states of the document: the document before it was issued
+ and the issued document. Each of these states correspond to a different specialization of the document, even if the document
+ has not changed. Generally, there are two approaches to deal with this issue:</p>
+ </p>
+ --------------->TO DEAL WITH THIS
+ <p>
An example of a simple metadata record annotated with <code>dct</code> terms can be seen below:
</p><p>
<a href="#example1">Example 1</a>: a simple metadata record:
@@ -717,60 +792,10 @@
implies that the document has been created and refers to an author. Similarly, the existence
of the <code>dct:issued</code> date implies that the document has been published. This information is redundantly
implied by the <code>dct:publisher</code> statement as well. Finally, <code>dct:replaces</code> relates
- the document to another document <code>ex:doc2</code> which had probably
- some kind of influence on <code>ex:doc1</code>.
- </p>
- <h3 id ="namespaces">1.1 Namespaces</h3>
- <p>The namespaces used through the document can be seen in <a href="#ns"> Table 2</a> below:
- <div id="ns" ALIGN="center">
- <table>
- <caption> <a href="#ns"> Table 2</a>: Namespaces used in the document </caption>
- <tbody>
- <tr><td><b>owl</b></td><td><http://www.w3.org/2002/07/owl#></td></tr>
- <tr><td><b>rdfs</b></td><td><http://www.w3.org/2000/01/rdf-schema#></td></tr>
- <tr><td><b>prov</b></td><td><http://www.w3.org/ns/prov#></td></tr>
- <tr><td><b>dct</b></td><td><http://purl.org/dc/terms/></td></tr>
- </tbody>
- </table>
- </div>
+ the document to another document <code>ex:doc2</code> which probably had some kind of influence on <code>ex:doc1</code>.
</p>
- </div>
-</section>
-
-<section>
- <h2>Mapping from Dublin Core to PROV</h2>
- <p>A mapping between Dublin Core Terms and PROV-O has many advantages. First, it can provide valuable insights
- into the different characteristics of both data models (in particular it explains PROV from a Dublin Core point of view).
- Second, such a mapping can be used to extract PROV data from the large amount of Dublin Core data available on
- the Web today. Third, the mapping can translate PROV data to Dublin Core and make it accessible for applications that
- understand Dublin Core. Finally, the mapping can lower the barrier to entry for PROV adoption. Simple Dublin Core
- statements can be used as starting point for PROV data generation. </p>
- <section>
- <h3>Basic considerations </h3>
+
<p>
- Substantially, a complete mapping from Dublin Core to PROV consists of three parts:
- </p><p>
- 1) <b>Direct mappings</b> between terms that can be expressed in form of subclass or subproperty relationships in RDFS
- – or equivalent relationships in OWL.
- </p><p>
- 2) Definition of new <b>refinements</b> (subclasses or subproperties) of the target vocabulary to reflect the expressiveness of the source vocabulary.
- </p><p>
- 3) Provision of <b>complex mappings</b> that create statements in the target vocabulary based on statements in the source vocabulary. Since
- the mapping produces blank nodes for each <code>dct</code> statement, a clean-up phase with strategies for reducing the blank nodes is also necessary.
- </p>
- <p>
- </section>
- <section>
- <h3>What is ex:doc1? Entities in Dublin Core</h3>
- <p>
- Consider the example metadata record shown at the beginning of this document (in <a href="#example1">example 1</a>). As a <code>dc</code>
- metadata record describes the resulting document as a whole,
- it is not clear how this document relates to the different states that the document had until it reached its final state.
- For example, a document may have a <code>dct:created</code> date and a <code>dct:issued</code> date. According to
- the PROV ontology, the activity of issuing a document involves two different states of the document: the document before it was issued
- and the issued document. Each of these states correspond to a different specialization of the document, even if the document
- has not changed. Generally, there are two approaches to deal with this issue:</p>
- </p><p>
1) To create new instances of entities, typically as blank nodes, that are all related to the original
document by means of <code>prov:specializationOf</code>. This leads to bloated and not very intuitive data models, e.g. think
about the translation of a single <code>dct:publisher</code> statement, where anyone would expect to somehow find some activity and