W3C

Dublin Core to PROV Mapping

W3C Working Draft 12 March 2013

This version:
http://www.w3.org/TR/2013/WD-prov-dc-20130312/
Latest published version:
http://www.w3.org/TR/prov-dc/
Latest editor's draft:
http://dvcs.w3.org/hg/prov/raw-file/default/dc-note/dc-note.html
Previous version:
http://www.w3.org/TR/2012/WD-prov-dc-20121211/
Editors:
Daniel Garijo, Universidad Politécnica de Madrid, Spain
Kai Eckert, University of Mannheim, Germany
Contributors:
Simon Miles, King's College London, UK
Craig M. Trim, IBM, USA
Michael Panzer, OCLC Online Computer Library center, USA

Abstract

This document describes a partial mapping from Dublin Core Terms [DCTERMS] to the PROV-O OWL2 ontology [PROV-O]. A substantial number of terms in the Dublin Core vocabulary provide information about the provenance of the resource. Translating these terms to PROV makes the contained provenance information explicit within a provenance chain. The mapping is expressed partly by direct RDFS/OWL mappings between properties and classes, which can be found here.

Some of the direct mappings can be refined, translating single Dublin Core Terms into an extended representation of the provenance chain. Therefore, refinements of classes defined in PROV are needed to represent specific Dublin Core activities and roles. This set of PROV refinements can be accessed here.

The PROV Document Overview [PROV-OVERVIEW] describes the overall state of PROV, and should be read before other PROV documents.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

PROV Family of Documents

This document is part of the PROV family of documents, a set of documents defining various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. These documents are listed below. Please consult the [PROV-OVERVIEW] for a guide to reading these documents.

This document was published by the Provenance Working Group as a Working Draft. If you wish to make comments regarding this document, please send them to [email protected] (subscribe, archives). All comments are welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

The Dublin Core Metadata Initiative (DCMI) [DCMI] provides a core metadata vocabulary (commonly referred to as Dublin Core) for simple and generic resource descriptions. The original element set (DC elements) was created in 1995 and contains 15 broadly-defined elements still in use. The core elements have no range specification, and arbitrary values can be used as objects. The core elements have been expanded beyond the original fifteen. Existing elements have been refined and new elements have been added. This expanded vocabulary is referred to as "DCMI Terms" (DC terms) and currently consists of 55 properties [DCTERMS].

The use of DC terms is preferred and the DC elements have been depecreated. Both sets have different namespaces. The original element set is typically referred with the dc prefix, while dct (or dcterms) is used as prefix for the DC Terms.

This document defines a mapping between the DC Terms and the PROV Ontology (PROV-O) [PROV-O], which defines an OWL2 Ontology encoding the PROV Data Model [PROV-DM]. The mapping has been designed for several purposes:

  1. Bridge the gap between the DC and PROV communities, in order to provide valuable insights into the different characteristics of both data models.
  2. Help developers to derive PROV data from the large amount of Dublin Core data available on the web, improving interoperability between DC and PROV applications.
  3. Facilitate PROV adoption. Simple Dublin Core statements can be used as a starting point for more complex PROV data generation.

1.1 Namespaces

The namespaces used through the document can be seen in Table 2 below:

Table 2: Namespaces used in the document
prefixNamespace IRIDefinition
owl<http://www.w3.org/2002/07/owl#>The OWL namespace [OWL2-OVERVIEW].
rdfs<http://www.w3.org/2000/01/rdf-schema#>The RDFS namespace[RDFS].
prov<http://www.w3.org/ns/prov#>The PROV namespace [PROV-DM].
dct<http://purl.org/dc/terms/>Dublin Core Terms namespace [DCTERMS].
ex<http://example.org>Application-dependent URIs. Used in the examples of the document.

1.2 Structure of this document

Section 2 explains the main considerations to take into account in order to fully understand the mapping:

Section 3 describes the mapping between DC and PROV. The mapping is divided in different sections, depending on the level of complexity that different users might be interested on when translating their DC data to PROV:

2. Preliminaries

This section explains two main particular considerations that should be taken into account regarding the mapping:

2.1 Provenance in Dublin Core

DCMI terms hold a lot of provenance information about a resource: when it was affected in the past, who affected it and how it was affected. The rest of the DCMI terms (description metadata), tell us what was affected. Table 1 classifies the DC Terms according to these four categories (what?, who?, when? and how?). Each category corresponds to the question it answers regarding the description or provenance of a given resource. The classification is by necessity somewhat minimalistic, as it can be argued that some elements placed in the description metadata terms contain provenance information as well, depending on their usage. It is worth mentioning that there is no direct information in Dublin Core describing where a resource was affected. The categories are further explained below:

Descriptive Terms (What?): This category contains all the terms describing a resource without refering to its provenance (a total of 30 out of 55 terms). Some examples are the dct:title, dct:abstract or dct:description of a resource, the dct:format in which the resource can be found, etc.

Agency Terms (Who?): This category contains agent related terms. All properties have dct:Agent as range, i.e., a resource that acts or has the power to act. The dct:contributor, dct:creator, and dct:publisher clearly influence the resource and therefore are important for its origin. This is not immediately clear for the dct:rightsHolder, but as ownership is considered the important provenance information for many resources, like artworks, it is included in this category.

Date and Time Terms (When?): This category contains date and time related terms. Dates belong to the provenance record of a resource, as they track when something was created (dct:created), modified (dct:modified), published (dct:issued), etc. Two dates can be considered special regarding their relevance for provenance: dct:available and dct:valid. They are different from the other dates as by definition they can represent a date range. Often, the range of availability or validity of a resource is inherent to the resource and known beforehand – consider the validity of a passport or the availability of a limited special offer published on the web. In these cases, there is no action involved that makes the resource invalid or unavailable, it is simply determined by the validity range. On the other hand, if an action is involved, e.g., a resource is declared invalid because a mistake has been found, then it is relevant for its provenance.

Derivation Terms (How?): This category contains derivation related terms. When a resource is derived from other resources, the original resource becomes part of the provenance chain of the derived resource. In Dublin Core, derivations can be further classified as versions (dct:isVersionOf), format serializations (dct:isFormatOf), replacements (dct:replaces) and sources of information (dct:source). dct:references is a weaker relation (having a reference to a resource does not always mean that the content is derived from it), but it can be assumed that a referenced resource influenced the described resource and therefore it is relevant for its provenance. The respective inverse properties do not necessarily contribute to the provenance of the described resource, e.g., a resource is usually not directly affected by being referenced or by being used as a source. However, inverse properties belong to the provenance related terms as they can be used to describe the relations between the resources involved. Finally, licensing (dct:license), rights (dct:rights) and their access (accessRights) are considered part of the provenance of the resource as well, since they restrict and explain how the resource can be used for further derivation.

This leaves one very special term: provenance. This term is defined as a "statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation" [DCTERMS], a definition that corresponds to the notion of provenance for artworks. This term can be considered a link between the resource and any provenance statement about the resource, so it cannot be included in any of the aforementioned categories and it is out of the scope of this mapping.

2.2 Entities in Dublin Core

Consider the example metadata record below (Example 1), where a document (ex:doc1) is described with several DC statements:

Example 1: a simple metadata record:

Example 1
ex:doc1 dct:title "A mapping from Dublin Core..." ;
dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ;
dct:created "2012-02-28" ;
dct:publisher ex:w3c ;
dct:issued "2012-02-29" ;
dct:subject ex:dublincore ;
dct:replaces ex:doc2 ;
dct:format "HTML" .
In Example 1, dct:title, dct:subject and dct:format are descriptions of the resource ex:doc1. They do not provide any information on how the resource was created or modified in the past. On the other hand, some statements imply provenance-related information. For example dct:creator implies that the document has been created and refers to an author. Similarly, the existence of the dct:issued date implies that the document has been published. This information is redundantly implied by the dct:publisher statement as well. Finally, dct:replaces relates ex:doc1 to the document ex:doc2, a previous resource representing the mapping.

As a dc metadata record describes the document as a whole, it is not clear how this document relates to the different states that the document had until it reached its final state. For example, a document may have a dct:created date and a dct:issued date. According to the PROV ontology, the activity of issuing a document involves two different states of the document: the document before it was issued and the issued document. Each of these states correspond to a different prov:specialization of the document. Generally, there are two approaches to deal with this issue:

1) Create new instances of entities that are all related to the original document by means of prov:specializationOf. For example, consider the translation of a single dct:publisher statement (as shown on the top of Figure 1): having a publisher implies a "Publish" activity (represented with a blank node), which is related to the ex:publisher agent. The activity must have taken as input the document to be published (:_usedEntity, which is a prov:sprecializationOf the resource we are describing), and generated the published resource (:_resultingEntity). Since we cannot ensure that the published resource has not suffered any further modifications, :_resultingEntity is also a prov:specializationOf the resource ex:doc1.

A mapping example creating blank nodes for each state of the resource
Figure 1. A mapping example creating blank nodes for each state of the resource. PROV entities are represented with ellipses, activities with rectangles and agents with pentagons. The bold arrow implies how the DC statement (on top of the figure) would be converted to PROV (the graph on the bottom).

2) Adopt the original resource (ex:doc1) as the prov:Entity used and then generated by the Publish activity (:_activity). However, this representation leads to a misinterpretation of the DC statement, as shown in the example of Figure 2. The representation implies that ex:doc1 was generated by _:activity and then used by _:activity afterwards, instead of being used and then being generated by _:activity (prov:Entities must exist before being used).

A mapping example conflating blank nodes within the same resource
Figure 2. A mapping example conflating blank nodes within the same resource. The used and generated resources have the same identifier. This example is an invalid translation of the dct:publisher statement (as it implies that ex:doc1 was generated by _:activity and then used by the same activity).

Since the first option provides a correct interpretation of the DC statements, it has been chosen as guideline in the complex mapping. Blank nodes are used for the mapping, although any naming mechanism could be provided if necessary, leaving the conflating of nodes to the clean-up phase.

3. Mapping from Dublin Core to PROV

This section describes the mapping between Dublin Core and PROV. The mapping is divided in several subsections:

3.1 Direct mappings

The direct mappings relate the DC Terms to the PROV binary relationships by using the integration mechanisms of RDF. PROV applications will be able to interoperate with these DC statements by applying means of OWL 2 RL reasoning, (i.e., they will be able to understand DC statements).

Dublin Core, while less complex from a modeling perspective, is more specific about the type of the activity taking place. PROV provides general attribution, and the details about the kind of influence that an activity or an agent had are left to custom refinements of the PROV classes and properties.

Table 3 and Table 4 provide the detailed mapping plus the rationale for each term. The rest of the terms can be found in the list of terms left out of the mapping.

Table 3: Direct mappings
DC Term Relation PROV Term Rationale
dct:Agent owl:equivalentClass prov:Agent Both dct:Agent and prov:Agent refer to the same concept: a resource that has the power to act (which then has responsibility for an activity, entity or other agent).
dct:rightsHolder rdfs:subPropertyOf prov:wasAttributedTo The rights holder has the attribution of the license associated to a resource. Thus, we can say that the resource is attributed in part to the rights holder.
dct:creator rdfs:subPropertyOf prov:wasAttributedTo A creator is one of the agents who participated in the creation of a resource. He has the attribution for the outcome of that activity.
dct:publisher rdfs:subPropertyOf prov:wasAttributedTo A publisher has the attribution of the published resource after participating in the publishing activity that generated it.
dct:contributor rdfs:subPropertyOf prov:wasAttributedTo A contributor is associated with either the creation activity or the updating of the resource. Therefore he/she has attribution over the outcome of those activities.
dct:isVersionOf owl:equivalentProperty prov:wasRevisionOf dct:isVersionOf refers to "a related resource to which the current resource is a version, edition or adaptation". In PROV, a revision is "a derivation for which the resulting entity is a revised version of some original". No specific attributes about revision are provided, so editions and adaptations can be considered revisions as well.
dct:isFormatOf rdfs:subPropertyOf prov:alternateOf dct:isFormatOf refers to another resource which is the same but in another format. Thus the mapping is straightforward to prov:alternateOf.
dct:hasFormat rdfs:subPropertyOf prov:alternateOf See rationale for dct:isFormatOf.
dct:source rdfs:subPropertyOf prov:wasDerivedFrom dct:source is defined as a "related resource from which the described resource is derived", which matches the notion of derivation in PROV-DM ("a transformation of an entity in another").
dct:created rdfs:subPropertyOf prov:generatedAtTime Property used to describe the time of creation of a resource (i.e., the time of its generation). We map it as a subproperty of prov:generatedAtTime because "creation" is one of the many activities that generate an entity (for example, generation includes modification, issue, acceptance, etc.).
dct:issued rdfs:subPropertyOf prov:generatedAtTime Property used to describe the date when the resource was issued. dct:issued is mapped as a subproperty of prov:generatedAtTime because the issued resource is an entity itself, which has been generated at a certain time.
dct:dateAccepted rdfs:subPropertyOf prov:generatedAtTime Property used to describe the date when the resource was accepted. dct:dateAccepted is mapped as a subproperty of prov:generatedAtTime because the accepted resource was generated by an "Accept" activity which may have changed it from its previous state.
dct:dateCopyrighted rdfs:subPropertyOf prov:generatedAtTime Property used to describe the date when the resource was copy righted. dct:dateCopyrighted is mapped as a subproperty of prov:generatedAtTime because the copyrighted resource was generated by a "CopyRight" activity which may have changed it from its previous state.
dct:dateSubmitted rdfs:subPropertyOf prov:generatedAtTime Property used to describe the date when the resource was submitted. dct:dateSubmitted is mapped as a subproperty of prov:generatedAtTime because the submitted resource was generated by a "Submit" activity which may have changed it from its previous state.
dct:modified rdfs:subPropertyOf prov:generatedAtTime Property used to describe the date when the resource was modified. dct:modified is mapped as a subproperty of prov:generatedAtTime because the modified resource was generated by a "Modify" activity that changed it from its previous state.
It is worth mentioning that applying the direct mappings to a metadata record such as example 1 will infer that the resource (ex:doc1) was prov:generatedAtTime at two different times (two generation dates are associated to the document: dct:created and dct:issued). This is valid, since from the PROV point of view the "creation" and "issue" activities generate new entities. Dublin Core, on the other hand, groups those two intermediate entities under the same resource (ex:doc1), creating the record exposed in Example 1. This approach is supported by PROV but it does not comply with all the PROV constraints [PROV-CONSTRAINTS].

Regarding the rest of the direct mappings, a property (prov:hadPrimarySource) has been found to be superproperty of a PROV concept, represented in Table 4:

Table 4: Direct mappings (2)
PROV Term Relation DC Term Rationale
prov:hadPrimarySource rdfs:subPropertyOf dct:source The definition of prov:hadPrimarySource ("something produced by some agent with direct experience and knowledge about the topic") is more restrictive than dct:source ( "A related resource from which the described resource is derived").

Table 5 enumerates the mapping of the DC terms that map to inverse relationships in PROV. These have been separated in a different table because they don't belong to the core of PROV.

Table 5: Direct mappings to the PROV terms not included in the core
DC Term Relation PROV Term Rationale
dct:hasVersion owl:equivalentProperty prov:hadRevision Inverse property of dct:isVersionOf.

3.2 PROV refinements

To properly reflect the meaning of the Dublin Core terms, more specific subclasses are needed:

 prov:Publish         rdfs:subClassOf     prov:Activity .
 prov:Contribute      rdfs:subClassOf     prov:Activity .
 prov:Create          rdfs:subClassOf     prov:Activity, prov:Contribute .
 prov:Modify          rdfs:subClassOf     prov:Activity .
 prov:Accept          rdfs:subClassOf     prov:Activity .
 prov:Copyright       rdfs:subClassOf     prov:Activity .
 prov:Submit          rdfs:subClassOf     prov:Activity .
 prov:Publisher       rdfs:subClassOf     prov:Role .
 prov:Contributor     rdfs:subClassOf     prov:Role . 
 prov:Creator         rdfs:subClassOf     prov:Role, prov:Contributor .
		

Custom refinements of the properties have been omitted as they would be identical to the DC terms. If these more specific properties are needed, the Dublin Core terms can be used directly, according to the direct mappings presented in Section 3.1.

3.3 Complex Mappings

The complex mappings consist of a set of patterns defined to generate qualified PROV statements from Dublin Core statements. This type of qualification may not be always needed, and it is the choice of the implementer whether to use them or not depending on the use case. It is also important to note that not all the direct mappings have a complex mapping associated, just those which imply a specific activity: creation, publication, etc. The complex mappings are provided in form of SPARQL CONSTRUCT queries, i.e., queries that describe a resulting RDF graph based on another RDF graph found in the original data. We divide the queries in different categories:

3.3.1 Entity-Agent mappings (Who)

In this category, we have three terms: dct:contributor, dct:creator and dct:publisher. The three of them can be mapped with the same pattern, similar to the one presented in Figure 1. The only changes required are the roles and activities involved for each term.

In the text below, variables ?document and ?agent are set to different matching values depending on the available data. The graph in the CONSTRUCT part can be seen as a template where the variables are placeholders that are filled with the values found in the data. The mapping corresponds to the graph in Figure 1 (with small changes for creator and rightsHolder). With this mapping, the difference in the complexity becomes obvious. Many blank nodes are created, so a subsequent clean-up phase that relates them and provides stable URIs for the entities is required. Depending on the implementation, URIs can also be coined here for every specialization. The implementation proposed in this document is an example that works conservatively. The assumption is that no further information about the identity of the specializations is available.

3.3.1.1 dct:creator
A creator is the agent in charge of the "Create" activity that generated a specialization of the entity ?document. The agent is assigned the role "creator".
  CONSTRUCT {
	?document a prov:Entity ;
			prov:wasAttributedTo ?agent.				
	
	?agent a prov:Agent .
					
	_:activity a prov:Activity, prov:Create ;
			prov:wasAssociatedWith ?agent;
			prov:qualifiedAssociation [
				a prov:Association;
				prov:agent ?agent;
				prov:hadRole prov:Creator .
		].
						
	_:resulting_entity a prov:Entity ;
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity ;
			prov:wasAttributedTo ?agent.		
						
 } WHERE {
	?document dct:creator ?agent.
 }
				
3.3.1.2 dct:contributor
Contributor is mapped following the previous pattern. Only the roles and activities change:
 CONSTRUCT {
	?document a prov:Entity ;
			prov:wasAttributedTo ?agent .
				
	?agent a prov:Agent .
					
	_:activity a prov:Activity, prov:Contribute ;
			prov:wasAssociatedWith ?agent ;
			prov:qualifiedAssociation [ 
				a prov:Association ;
				prov:agent ?agent ;
				prov:hadRole prov:Contributor .
			].
						
	_:resulting_entity a prov:Entity ;		
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity ;
			prov:wasAttributedTo ?agent .
						
 } WHERE {
	?document dct:contributor ?agent .
 }
				
3.3.1.3 dct:publisher
In case of publication, a second specialization representing the entity before the publication is necessary:
  CONSTRUCT {
	?document a prov:Entity ;
			prov:wasAttributedTo ?agent .
						
	?agent a prov:Agent .
					
	_:used_entity a prov:Entity;
			prov:specializationOf ?document.
						
	_:activity a prov:Activity, prov:Publish ;
			prov:used _:used_entity;
			prov:wasAssociatedWith ?agent ;
			prov:qualifiedAssociation [ 
					a prov:Association ;
					prov:agent ?agent ;
					prov:hadRole prov:Publisher .
			].
						
	_:resulting_entity a prov:Entity ;
			prov:specializationOf ?document ;		
			prov:wasDerivedFrom _:used_entity
			prov:wasGeneratedBy _:activity ;
			prov:wasAttributedTo ?agent .
						
 } WHERE {
	?document dct:publisher ?agent .
 }
				

3.3.2 Entity-Date mappings (When)

Dates often correspond with a who-property, e.g., creator and created or publisher and issued. Therefore, they lead to similar complex patterns (associating a date to each activity instead of an agent). When using Dublin Core terms, it is usual to see that a resource is annotated with several dct assertions like creator, publisher, issued, date, etc. In this section each term is treated independently. It is important to note that since the range for dates in Dublin Core is a rdfs:Literal and xsd:dateTime for the prov:atTime property, the mapping is only valid for those literals that are xsd:dateTime.

3.3.2.1 dct:created

 CONSTRUCT{
	 ?document a  prov:Entity .
							
	 _:activity a prov:Activity, prov:Create ;
							 
	 # The “output”
	 _:created_entity a prov:Entity ;
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity ;
			prov:wasGeneratedAtTime ?date;
			prov:qualifiedGeneration [ 
					a prov:Generation ;
					prov:atTime ?date  ;
					prov:activity _:activity . 
			] .
 } WHERE { 
  ?document dct:created ?date.
 }
				
3.3.2.2 dct:issued

 CONSTRUCT{
	 ?document a prov:Entity .
	 
	 _:activity a prov:Activity, prov:Publish ;
			prov:used _:used_entity .
					  
	# The “input”
	 _:used_entity a prov:Entity .
			prov:specializationOf ?document .
					  
	 # The “output”
	 _:iss_entity a prov:Entity ;
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity ;
			prov:wasGeneratedAtTime ?date;
			prov:wasDerivedFrom _:used_entity ;
			prov:qualifiedGeneration [ 
				 a prov:Generation ;
				 prov:atTime ?date  ;
				 prov:activity _:activity . 
			] .   
 } WHERE { 
	  ?document dct:issued ?date.
 }
				

3.3.2.3 dct:modified

 
 CONSTRUCT{
	?document a prov:Entity .
	 
	 _:activity a prov:Activity, prov:Modify ;
			prov:used _:used_entity .
					  
	# The “input”
	 _:used_entity a prov:Entity .
			prov:specializationOf ?document .
					  
	 # The “output”
	 _:modified_entity a prov:Entity ;
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity ;
			prov:wasGeneratedAtTime ?date;
			prov:wasDerivedFrom _:used_entity ;
			prov:qualifiedGeneration  [ 
				 a prov:Generation ;
				 prov:atTime ?date  ;
				 prov:activity _:activity . 
			] .   
 } WHERE { 
  ?document dct:modified ?date.
 }
				

3.3.2.4 dct:dateAccepted

 
 CONSTRUCT{
	 ?document a prov:Entity .
	 
	 _:activity a prov:Activity, prov:Accept ;
			prov:used _:used_entity .
					  
	# The “input”
	 _:used_entity a prov:Entity .
			prov:specializationOf ?document .
					  
	 # The “output”
	 _:accepted_entity a prov:Entity ;
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity ;
			prov:wasGeneratedAtTime   ?date;
			prov:wasDerivedFrom       _:used_entity ;
			prov:qualifiedGeneration  [ 
				 a prov:Generation ;
				 prov:atTime ?date  ;
				 prov:activity _:activity . 
			] .   
 } WHERE { 
  ?document dct:dateAccepted ?date.
 }
				

3.3.2.5 dct:dateCopyrighted

CONSTRUCT{
	 ?document a prov:Entity .
	 
	 _:activity a prov:Activity, prov:Copyright ;
			prov:used _:used_entity .
					  
	# The “input”
	 _:used_entity a prov:Entity .
			prov:specializationOf ?document .
					  
	 # The “output”
	 _:copyrighted_entity a prov:Entity ;
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity ;
			prov:wasGeneratedAtTime ?date;
			prov:wasDerivedFrom _:used_entity ;
			prov:qualifiedGeneration [ 
				 a prov:Generation ;
				 prov:atTime ?date  ;
				 prov:activity _:activity . 
			] .   
 } WHERE { 
  ?document dct:dateCopyrighted ?date.
 }
				

3.3.2.6 dct:dateSubmitted

 CONSTRUCT{
	 ?document a prov:Entity .
	 
	 _:activity a prov:Activity, prov:Submit ;
			prov:used _:used_entity .
					  
	# The “input”
	 _:used_entity a prov:Entity .
			prov:specializationOf ?document .
				  
	 # The “output”
	 _:submitted_entity a prov:Entity ;
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity ;
			prov:wasGeneratedAtTime ?date;
			prov:wasDerivedFrom _:used_entity ;
			prov:qualifiedGeneration  [ 
				 a prov:Generation ;
				 prov:atTime ?date  ;
				 prov:activity _:activity . 
			] .   
 } WHERE { 
  ?document dct:dateSubmitted ?date.
 }
				

3.3.3 Entity-Entity mappings (How)

In Dublin Core, most of the properties relating entities to other entities don't describe the involvement of a specific activity (e.g., dct:format, dct:source or isVersionOf). The only exception is dct:replaces, further explained below.

3.3.3.1 dct:replaces

There is a relation between two resources when the former replaces or displaces the latter. The replacement is the result of a "search and replace" Activity, which used a specialization of the replaced entity (_:old_entity) and produced a specialization of the replacement (_:new_entity). Thus, _:new_entity was derived from _:old_entity, as it couldn't have existed without it. However, the derivation relationship cannot always be applied between the original entities, because they could have existed before the replacement took place (for example, if a book replaces another in a catalog we cannot say that it was derived from it).

CONSTRUCT{
 	 ?document a prov:Entity .
 	 ?document2 a prov:Entity.
					
 	 _:activity a prov:Activity, prov:Replace ;
			prov:used _:old_entity.
				 
	  # The “input”
 	 _:old_entity a prov:Entity;
			prov:specializationOf ?document2 ;
			
 	 # The “output”
 	 _:new_entity a prov:Entity ;
			prov:specializationOf ?document ;
			prov:wasGeneratedBy _:activity;
			prov:wasDerivedFrom _:old_entity .
                     
 } WHERE { 
  ?document dct:replaces ?document2.
 }
				

The term dct:isReplacedBy would produce a similar mapping, inverting the roles of document and document2.

3.4 Cleanup

The clean-up phase depends on how implementers interpret the described resources. The approach presented in this document leads to the proliferation of blank nodes. Blank nodes could be renamed to specific identifiers by the implementer, in order to avoid obtaining additional blank nodes when reapplying the construct queries presented in the previous section.

Providing a set of rules to conflate the blank nodes is not in the scope of this document. However, the group has created a list of suggestions for implementers with proposals on how this could be achieved:

1) Conflate properties referring to the same state of the resource: In Dublin Core certain properties complement each other (e.g., creator and created, publisher and issued, modified and contributor, etc.). By combining some of the queries, some of the records could be grouped creating more complete PROV assertions.

The example below shows how to conflate the blank nodes for dct:creator and dct:created properties:

	 CONSTRUCT{
	 ?document a prov:Entity .
	 
	 _:activity a prov:Activity, prov:Create.
				prov:wasAssociatedWith ?agent
				prov:qualifiedAssociation [
					 a prov:Association;
					 prov:agent ?agent;
					 prov:hadRole prov:Creator .
				] .
				  
	 # The “output”
	 _:created_entity a prov:Entity ;
				prov:specializationOf ?document ;
				prov:wasGeneratedBy _:activity ;
				prov:wasGeneratedAtTime   ?date;                 
				prov:qualifiedGeneration  [ 
					 a prov:Generation ;
					 prov:atTime ?date  ;			
					 prov:activity _:activity . 
			 ] .   
	 } WHERE { 
	  ?document dct:creator  ?agent;
				dct:created  ?date.
	 }
		 
Figure 3 shows a graphical representation of the pattern:
Using complementing properties to conflate blank nodes
Figure 3. Using complementing properties to conflate blank nodes. Dates are represented in green and roles in purple.

2) Another solution is to sort all the activities according to their logical order, if known, and conflate the blank nodes result of one activity with the input of the subsequent activity. Figure 4 shows a graphical example with two different activities (creation and publication) that happened at different points in time. Creation precedes publication, so instead of creating different blank nodes for their respective usage and generation, both activities share the same blank node (_:created_entity).

Ordering activities to conflate blank nodes
Figure 4. Ordering activities to conflate blank nodes. The creation activity occurs before the publishing activity.

3.5 List of terms excluded from the mapping

Table 6 lists the terms excluded from the mapping, either because thay are not suitable or because they don't represent provenance information.

Table 6: List of terms excluded from the mapping
Term Category Rationale
dct:abstract Descriptive metadata Summary of the resource. Thus, not part of its provenance.
dct:accrualMethod Descriptive Metadata Method by which items are added to a collection. It doesn't describe the action itself, so it is out of the scope of the mapping.
dct:accrualPeriodicity Descriptive metadata Frequency of the addition of items to a collection.
dct:accrualPolicy Descriptive metadata Policy associated with the insertion of items to a collection. It could be used to enrich the qualified involvement, but there is no direct mapping of this relationship.
dct:alternative Descriptive metadata Refers to an alternative title of the resource. For example "The Bible" might be also known as "The Holy Book". Titles are not identifiers, so this property cannot be mapped to prov:alternateOf.
dct:audience Descriptive metadata The audience for whom the resource is useful.
dct:conformsTo Descriptive metadata Indicates the standard to which the resource conforms to (if any).
dct:coverage Descriptive metadata Topic of the resource.
dct:description Descriptive metadata An account of the resource.
dct:educationLevel Descriptive metadata The educational level of the audience for which the resource is intended to.
dct:extent Descriptive metadata Size or duration of the resource.
dct:format Descriptive metadata Format of the resource.
dct:identifier Descriptive metadata An unambiguous reference on a given context.
dct:instructionalMethod Descriptive metadata Method used to create the knowledge that the resource is supposed to support.
dct:isPartOf Descriptive metadata Inverse of dct:hasPart.
dct:isRequiredBy Descriptive metadata Property used to describe that the current resource is required for supporting the function of another resource. This is not related the provenance of the reosource, since it refers to something that may not have happened yet (e.g., a library dependency in script program).
dct:language Descriptive metadata Language of the resource.
dct:mediator Descriptive metadata Entity that mediates access to the resource.
dct:medium Descriptive metadata Material of the resource.
dct:requires Descriptive metadata Inverse property of dct:isRequiredBy (see dct:isRequiredBy).
dct:hasPart Descriptive metadata A resource that is included in the current resource. Since entity composition is out of the scope of PROV, this property has been excluded from the mapping
dct:spatial Descriptive metadata Spatial characteristics of the content of the resource (e.g., the book is about Spain). Thus it cannot be mapped to prov:hadLocation.
dct:subject Descriptive metadata Subject of the resource.
dct:tableOfContents Descriptive metadata List of subunits of the resource.
dct:temporal Descriptive metadata Temporal characteristics of which the resource refers to (e.g., a book about 15th century).
dct:title Descriptive metadata Title of the resource.
dct:type Descriptive metadata Type of the resource.
dct:bibliographicCitation Descriptive metadata Property that relates the literal representing the bibliographic citation of the resource to the actual resource (e.g., :el_Quijote dct:bibliographicCitation "Miguel de Cervantes Saavedra: El Quijote, España").
dct:references Provenance: How Term used to point out, refer or cite a related resource to the resource being described. The references normally point out to resources from which the current resource was derived, or that the current resource quoted. However, this is not always the case. For example, if a resource A included a reference to a resource B stating :"Reference [B] has nothing to do with the work described here", then we cannot consider the reference as a derivation or a quotation. For this reason, dct:references has been dropped from the mapping.
dct:isReferencedBy Provenance: How Inverse to dct:references.
dct:accessRights Provenance: How Agents who can access the resource (security status). Since the privileges of the resource are part of the description of the resource, the property has been excluded from the mapping.
dct:license Provenance: How License of the resource. It has been left out of the mapping because there is no term in PROV-O to represent this information.
dct:rights Provenance: How Metadata about the rights of the resource.
dct:date Provenance: When Date is a very general property. It is the superproperty which all the other dates specialize, but there is no equivalent concept in PROV. It has been excluded from the mapping.
dct:available Provenance: When Property that states when a resource is available. There is no direct mapping between this property and the notion of invalidation in PROV.
dct:valid Provenance: When Property that states when a resource is valid. The notion of invalidation is defined in PROV-DM, but not the notion of validation. Thus this property is left out of the mapping.
dct:relation Provenance A related resource. This relationship is very broad and could relate either provenance resources or not. Therefore it could be seen as a superproperty of prov:wasDerivedFrom, prov:wasInfluencedBy, prov:alternateOf, prov:specializationOf, etc.
dct:provenance Provenance This term is a link between the resource and any provenance statement about the resource. Since PROV-O doesn't specify any mechanisms to link a bundle of provenance statements to an entity, this term is considered out of the scope of the mapping.

3.6 Mapping from PROV to DC

The mapping from PROV to Dublin Core is not part of this note. If the refinements proposed in this document are used, then the inverse of the complex mapping patterns can be applied. However, if the refinements are not used then only a few Dublin Core statements can be inferred from plain PROV statements. For example, when mapping dates only unqualified properties can be extracted, as there is no information if an activity with an associated date is a creation or a modification or a publication. Likewise, the agents involved cannot be mapped to creators, contributors, or publishers. While Dublin Core includes provenance information, its focus lies on the broader description of resources. PROV models a provenance chain, but it provides almost no information about the involved resources themselves.

A. Acknowledgements

This document is the result of a collaboration between the Provenance Working Group and the Dublin Core Metadata Initiative. The editors extend special thanks to Antoine Isaac, Ivan Herman, Timothy Lebo, Luc Moreau, Paul Groth and Satya Sahoo for their feedback; and María Poveda and Idafen Santana for their help with the HTML generation.

B. References

B.1 Informative references

[DCMI]
Dublin Core Metadata Initiative. URL: http://dublincore.org/
[DCTERMS]
Dublin Core Terms Vocabulary. 8 December 2010. URL: http://dublincore.org/documents/dcmi-terms/
[OWL2-OVERVIEW]
W3C OWL Working Group. OWL 2 Web Ontology Language: Overview. 27 October 2009. W3C Recommendation. URL: http://www.w3.org/TR/2009/REC-owl2-overview-20091027/
[PROV-AQ]
Graham Klyne; Paul Groth; eds. Provenance Access and Query. 12 March 2013, Working Draft. URL: http://www.w3.org/TR/2013/WD-prov-aq-20130312/
[PROV-CONSTRAINTS]
James Cheney; Paolo Missier; Luc Moreau; eds. Constraints of the PROV Data Model. 12 March 2013, W3C Proposed Recommendation. URL: http://www.w3.org/TR/2013/PR-prov-constraints-20130312/
[PROV-DICTIONARY]
Tom De Nies; Sam Coppens; eds. PROV Dictionary. 12 March 2013, Working Draft. URL: http://www.w3.org/TR/2013/WD-prov-dictionary-20130312/
[PROV-DM]
Luc Moreau; Paolo Missier; eds. PROV-DM: The PROV Data Model. 12 March 2013, W3C Proposed Recommendation. URL: http://www.w3.org/TR/2013/PR-prov-dm-20130312/
Luc Moreau; Timothy Lebo; eds. Linking Across Provenance Bundles. 12 March 2013, Working Draft. URL: http://www.w3.org/TR/2013/WD-prov-links-20130312/
[PROV-N]
Luc Moreau; Paolo Missier; eds. PROV-N: The Provenance Notation. 12 March 2013, W3C Proposed Recommendation. URL: http://www.w3.org/TR/2013/PR-prov-n-20130312/
[PROV-O]
Timothy Lebo; Satya Sahoo; Deborah McGuinness; eds. PROV-O: The PROV Ontology. 12 March 2013, W3C Proposed Recommendation. URL: http://www.w3.org/TR/2013/PR-prov-o-20130312/
[PROV-OVERVIEW]
Paul Groth; Luc Moreau; eds. PROV-OVERVIEW: An Overview of the PROV Family of Documents. 12 March 2013, Working Draft. URL: http://www.w3.org/TR/2013/WD-prov-overview-20130312/
[PROV-SEM]
James Cheney; ed. Semantics of the PROV Data Model. 12 March 2013, Working Draft. URL: http://www.w3.org/TR/2013/WD-prov-sem-20130312.
[PROV-PRIMER]
Yolanda Gil; Simon Miles; eds. PROV Model Primer. 12 March 2013, Working Draft. URL: http://www.w3.org/TR/2013/WD-prov-primer-20130312/
[PROV-XML]
Hook Hua; Curt Tilmes; Stephan Zednik; eds. PROV-XML: The PROV XML Schema. 12 March 2013, Working Draft. URL: http://www.w3.org/TR/2013/WD-prov-xml-20130312/
[RDFS]
Dan Brickley; Ramanathan V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema. 10 February 2004. W3C Recommendation.URL: http://www.w3.org/TR/2004/REC-rdf-schema-20040210/