W3C

Dublin Core to PROV Mapping

W3C Working Draft 19 August 2012

This version:
https://dvcs.w3.org/hg/prov/raw-file/ff940ee82d3d/dc-note/Overview.html
Latest published version:
https://dvcs.w3.org/hg/prov/raw-file/ff940ee82d3d/dc-note/Overview.html
Latest editor's draft:
https://dvcs.w3.org/hg/prov/raw-file/ff940ee82d3d/dc-note/Overview.html
Editors:
Kai Eckert, Manheim University Library, Manheim, Germany
Daniel Garijo, Universidad Politécnica de Madrid
Authors:
Simon Miles, King's College London, UK
Michael Panzer OCLC Online Computer Library center, USA

Abstract

This document provides a mapping between the PROV-O OWL2 ontology [PROV-O] and the Dublin Core Terms Vocabulary [DCTERMS].

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is part of a set of specifications aiming to define the various aspects that are necessary to achieve the vision of interoperable interchange of provenance information in heterogeneous environments such as the Web. This document is a non-normative, intuitive introduction and guide to the [PROV-DM] data model for provenance. It includes simple worked examples applying the [PROV-O] OWL2 ontology. The document is expected to become a Note once it is stable.

This document was published by the Provenance Working Group as a First Public Working Draft. If you wish to make comments regarding this document, please send them to public-prov-wg@w3.org (subscribe , archives). All feedback is welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

The Dublin Core Metadata Initiative (DCMI) [DCMI] provides a core metadata vocabulary, commonly refered to as Dublin Core. Originally, it consisted of 15 elements that are still available and called the element set . The elements are defined very broadly, in particular they have no range specification, i.e., they can be used with arbitrary values as objects. The elements have been further refined and types have been introduced. This more specific vocabulary is called the terms and currently consists of 55 properties [DCTERMS].

The Dublin Core elements are considered legacy and the use of the DCMI terms is preferred. Both have different namespaces, usually the elements are used with the dc, the terms with dct or dcterms. Consider the following example for a metadata record:
 ex:document1 dct:title "A mapping from Dublin Core..." ;
    dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ;
    dct:created "2012-02-28" ;
    dct:publisher ex:w3c ;
    dct:issued "2012-02-29" ;
    dct:subject ex:dublincore ;
    dct:replaces ex:doc2 ;
    dct:format "HTML" .

Clearly not all metadata statements deal with provenance. For instance, dct:title, dct:subject and dct:format are descriptions of the resource ex:document1. They do not provide any information how the resource was created or modified in the past. On the other hand, some statements imply provenance-related information, e.g., dct:creator implies that the document has been created and refers to the author. Similarly, the existence of the dct:issued date implies that the document has been published. This information is redundantly implied by the dct:publisher statement as well. Finally, dct:replaces relates our document to another document ex:doc2 and it can be inferred that this document had probably some kind of influence on our document ex:document1, which also gives us some provenance related information.

This is a pattern that applies generally to metadata, i.e., we can distinguish description metadata and provenance metadata. To be more precise, we define provenance metadata as metadata providing provenance information according to the definition of the Provenance Working Group[PROV-DEF] and description metadata as all other metadata.

Based on this definition, the DCMI terms can be classified as follows:

Description metadata: abstract, accessRights, accrualMethod, accrualPeriodicity, accrualPolicy, alternative, audience, bibliographicCitation, conformsTo, coverage, description, educationLevel, extent, hasPart, isPartOf, format, identifier, instructionalMethod, isRequiredBy, language, mediator, medium, relation, requires, spatial, subject, tableOfContents, temporal, title, type.

Provenance metadata : available, contributor, created, creator, date, dateAccepted, dateCopyrighted, dateSubmitted, hasFormat, hasVersion, isFormatOf, isReferencedBy, isReplacedBy, issued, isVersionOf, license, modified, provenance, publisher, references, replaces, rightsHolder, rights, source, valid.

This classification can certainly be questioned and was already subject to many discussions. We use a very conservative strategy: if the group can't reach consensus about if an element should be mapped to PROV or not, we exclude it from tha mapping list. This way, we want to ensure that rather less, but correct provenance data is created than more, but possibly incorrect data.

According to our classification, there are 25 terms out of 55 that can be considered as provenance related. Based on their different aspects of provenance, we discuss them below:

Who? (contributor, creator, publisher, rightsHolder) Category that includes all properties that have the range dct:Agent, i.e., a resource that acts or has the power to act. The contributor, creator, and publisher clearly influence the resource and therefore are important for its origin. This is not immediately clear for the rightsHolder, but as ownership is considered the important provenance information for artworks, we have decided to include it in this category.

When? (available, created, date, dateAccepted, dateCopyrighted, dateSubmitted, issued, modified, valid) Dates typically belong to the provenance record of a resource. It can be questioned whether a resource changes by being published or not. However, we consider the publication as an action that affects the state of the resource and therefore it is relevant for the provenance. Two dates can be considered special regarding their relevance for provenance: available and valid. They are different from the other dates as by definition they can represent a date range. Often, the range of availability or validity of a resource is inhererent to the resource and known beforehand – consider the validity of a passport or a credit card or the availability of a limited special offer. In these cases, there is no action involved that makes the resource invalid or unavailable, it is simply determined by the validity range. On the other hand, if an action is involved, e.g., a resource is declared invalid because a mistake has been found, this is relevant for its provenance.

How? (isVersionOf, hasVersion, isFormatOf, hasFormat, references, isReferencedBy, replaces, isReplacedBy, source, rights, license) Resources are often derived from other resources. In this case, the original resource becomes part of the provenance record of the derived resource. Derivations can be further classified as isVersionOf, isFormatOf, replaces, source. references is a weaker relation, but it can be assumed that a referenced resource influenced the described resource and therefore it is relevant for its provenance. The respective inverse properties do not necessarily contribute to the provenance of the described resource, e.g., a resource is usually not directly affected by being referenced or by being used as a source – at most indirectly, as the validity state can change if a resource is replaced by a new version. However, inverse properties belong to the provenance related terms as they can be used to describe the relations between the resources involved. Finally, licensing and rights are considered part of the provenance of the resource as well, since they restrict how the resource has been used by its owners.

Table 1 summarizes the terms in their respective categories:

Table 1: Categorization of the Dublin Core Terms
Category Sub-category Terms
Descriptive metadata - abstract, accessRights, accrualMethod, accrualPeriodicity, accrualPolicy, alternative, audience, bibliographicCitation, conformsTo, coverage, description, educationLevel, extent, hasPart, isPartOf, format, identifier, instructionalMethod, isRequiredBy, language, mediator, medium, relation, requires, spatial, subject, tableOfContents, temporal, title, type
Provenance Who contributor, creator, publisher, rightsHolder
Provenance When available, created, date, dateAccepted, dateCopyrighted, dateSubmitted, issued, modified, valid
Provenance How isVersionOf, hasVersion, isFormatOf, hasFormat, license, references, isReferencedBy, replaces, isReplacedBy, rights, source

This leaves one very special term: provenance. It is defined as a "statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation" [DC-TERMS], which refers to the traditional definition of provenance for artworks. Despite being relevant for provenance from the W3C Provenance Incubator Group's persepctive, this definition it may overlap partially with almost half of the DCMI terms, which specify concrete aspects of provenance of a resource.

In summary, the DCMI terms – and therefore any Dublin Core metadata record – hold a lot of provenance information and tell us about a resource, when it was affected in the past, who affected it and how it was affected. The description metadata, i.e., the other DCMI terms, tells us what was affected. There is no direct information in Dublin Core describing where a resource was affected. Such information is usually only available for the publication of a resource, i.e., this action is located at the address of the publisher. Note that spatial is not related to this question, as this is a descriptive property that tells us for instance that a book is about Berlin, but not that it was created in Berlin – or even that it has ever been or is otherwise related to Berlin.

1.1 Namespaces

In this document we use namespaces from different vocabularies to create the mapping. The namespaces we will be using through the document can be seen in Table 2 below:

Table 2: Namespaces used in the document
owl<http://www.w3.org/2002/07/owl#>
rdfs<http://www.w3.org/2000/01/rdf-schema#>
prov<http://www.w3.org/ns/prov#>
dct<http://purl.org/dc/terms/>
dcprov<TO_BE_DETERMINED>

2. Mapping from Dublin Core to PROV

Why are we concerned with a mapping between Dublin Core and PROV? First, such a mapping can provide valuable insights into the different characteristics of both data models, in particular it "explains" PROV from a Dublin Core view point. Second, such a mapping can be used to extract PROV data from the huge amount of Dublin Core data that is available on the Web today. Third, it can translate PROV data to Dublin Core and therefore make it accessible for applications that understand Dublin Core. And not least, it can lower the barrier to adopt PROV, as simple Dublin Core statements can be used as starting point to generate PROV data.

2.1 Basic considerations

Substantially, a complete mapping from Dublin Core to PROV consists of three parts:

1) Direct mappings between terms that can be expressed in form of subclass or subproperty relationships in RDFS – or equivalent relationships in OWL.

2) Definition of new refinements (subclasses or subproperties) of the target vocabulary to reflect the expressiveness of the source vocabulary.

3) Provision of complex mappings that create statements in the target vocabulary based on statements in the source vocabulary.

For the third part (complex mappings), we provide context free mappings that do not depend on the existence of any other statements. We briefly describe strategies on how to refine and clean the complex mapping results taking the context into account.

For the context-free mapping, first, only single DC statements are mapped to PROV. Relations between several statements affecting the resulting PROV statements are not yet taken into account. The input and output of all activities are identified as separate specializations of the original resource mentioned in the DC statement. A specialization in PROV identifies a state of a resource during its lifetime that is partt of the provenance chain. However, if a specialization of a document is generated by one activity and a specialization is used by a different activity later in time, it can be assumed that both are the same entities, if the second activity directly follows the first activity. These conflations and other clean-up steps are performed separately, as there are several possibilities to perform them.

Clean-up. Based on the context-free mapping, reasoning patterns can be employed to clean-up the data, e.g. by conflating blank nodes that are actually the same or by identifying a final specialization of the original document that is identical to this document.

2.2 What is ex:document1? Entities in Dublin Core

Consider the example metadata record above (example 1). As a DC metadata record describes the resulting document as a whole, it is not clear, how this document relates to the different states that the document had until it reached its final state. For example, a document can have assigned a dct:created date and a dct:issued date. The activity of issuing a document does not necessarily change the document, but regarding the PROV ontology, there are two different specializations of this document before and after the issuing activity, distinguishable by the property of the document that states if the document was issued. Generally, there are two possibilities to deal with this issue:

1): We can always create new instances of entities, typically as blank nodes, that all are related to the original document by means of prov:specializationOf. This leads to bloated and not very intuitive data models, e.g. think about the translation of a single dct:creator statement, where you would expect to somehow find some activity and agent that are directly related to the document (as in Figure 1).

2): We can always use the original document as the instance that is used as prov:Entity. The implications regarding the semantics of a prov:Activity are not yet totally clear, however, it contradicts the above mentioned definition to have an activity that uses an entity and generates the same entity. If an activity actually generates an entity, it is semantically incorrect to have several activities that all generate the same entity at different points in time.

Figure 1. A mapping example

As the first option is the more conservative one with respect to the underlying semantics, our proposal is to use it in for the context-free mapping. We will use blank nodes, although any naming mechanism could be provided if necessary, leaving the conflating of nodes to the clean-up phase. Here, we can deal with more specific questions like the following:

How do we reduce the number of specializations, e.g., by stating that the specialization that is generated by activity 1 is the same entity that is used by activity 2?

How do we relate the specializations to ex:document1? We could create two entities based on the actual creation activity: ex:document1 and a first specialization. We could further declare the last produced specialization as the same entity as ex:document1. Depending on the underlying data, this can be the entity that is identified by the URI of the original document. However, we have to be careful to avoid cycles in the provenance we produce. For now, this remains undecided.

2.3 Direct mappings

Direct mappings can particularly be provided for classes and the “shortcuts”, i.e. the direct relationships in PROV between an entity and an agent or an entity and a date. The direct mappings provide basic interoperability using the integration mechanisms of RDF. By means of RDFS-reasoning, any PROV application can at least make some sense from Dublin Core data. The direct mappings also contribute to the formal definition of the vocabularies by translating them to PROV.

Dublin Core, while less complex from a modeling perspective, is more specific about the type of the activity taking place. PROV provides general attribution, and the details about the kind of influence that an activity or an agent had are left to custom refinements of the PROV classes and properties.

Table 3 and Table 4 provide the detailed mapping plus the rationale for each term. Those mappings in which the group could not find consensus have been dropped. For more information see the list of terms left out of the mapping.

Table 3: Direct mappings
DC Term Relation PROV Term Rationale
dct:Agent owl:equivalentClass prov:Agent. Both dct:Agent and prov:Agent refer to the same thing: a resource that has the power to act (which then has responsability for an activity)
dct:rightsHolder rdfs:subPropertyOf prov:wasAttributedTo The rights holder has the attribution of the activity that created the licensed resource.
dct:creator rdfs:subPropertyOf prov:wasAttributedTo A creator is the agent who created the resource. He is the one involved in the creation activity that led to the resource. He has the attribution for that activity
dct:publisher rdfs:subPropertyOf prov:wasAttributedTo A publisher has the attribution of the publishing activity that led to the published resource
dct:contributor rdfs:subPropertyOf prov:wasAttributedTo A contributor is involved either in the creation activity or in the updating of the resource. Therefore he is attributed to take part in those activities.
dct:isVersionOf rdfs:subPropertyOf prov:wasDerivedFrom dct:isVersion of refers to "a related resource to which the current resource is a version, edition or adaptation". Hence we can conclude that the current resource has been derived from the original one.
dct:hasVersion rdfs:subPropertyOf prov:hadDerivation Inverse property of the previous one.
dct:isFormatOf rdfs:subPropertyOf prov:alternateOf dct.isFormatOf refers to another resource which is the same but in another format. Thus the mapping is straightforward to prov:alternateOf
dct:hasFormat rdfs:subPropertyOf prov:alternateOf See rationale for dct:hasFormat
dct:replaces rdfs:subPropertyOf prov:wasInfluencedBy This mapping is not straightforward. There is a relation between 2 resources when the former replaces the latter, but it is not necessarily derivation, revision, specification or alternate. Since we want to state some influence but we don't find any specific relation that matches the dct term, we propose to map it to the abstract term prov:wasInfluencedBy
dct:isReplacedBy rdfs:subPropertyOf prov:influenced Inverse property of the previous one
dct:source rdfs:subPropertyOf prov:wasDerivedFrom In Dublin Core, dct:source is defined as a "related resource from which the described resource is derived", which matches the notion of derivation in PROV-DM ("a transformation of an entity in another")
dct:type owl:equivalentProperty prov:type Both properties refer to the same thing: the nature of the resource (or genre). It could be mapped to rdf:type if we map the document against PROV-O
dct:created rdfs:subPropertyOf prov:generatedAtTime dct:created is a property to describe the time of cretion of the entity, which is the time of its generation as well. We have decided to map it as a subclass because the resources in Dublin Core have associated many dates, which could be associated to each of their versions. In this case, we see the creation as the first one, but not necessarily the current version of the resource.
dct:issued rdfs:subPropertyOf prov:generatedAtTime Date when the resource was issued. It is mapped as a subproperty of prov:generatedAtTime because the issued resource is an entity itself, which has been generated at a certain time.
dct:dateAccepted rdfs:subPropertyOf prov:generatedAtTime The rationale is similar to the previous 2 properties: the version of the resource which was accepted could be different from the created or issued one.
dct:dateCopyRighted rdfs:subPropertyOf prov:generatedAtTime See previous property
dct:dateSubmitted rdfs:subPropertyOf prov:generatedAtTime See previous property
dct:modified rdfs:subPropertyOf prov:generatedAtTime See previous property
Regarding the dates mappings, we realize that if we have a metadata record such as example 1, the direct mapping will infer that the resource was prov:generatedAtTime at two different times. Although this may seem inconsistent, it is supported by PROV and it is due the difference between Dublin Core and PROV resources: while the former conflates more than one version or "state" of the resource in a single entity, the latter proposes to separate all of them. It would produce "scruffy" provenance (i.e., valid provenance which will not comply with all the PROV consraints [PROV_CONSTRAINTS])

We end the direct mapping with the properties that have been found to be superproperties of certain prov concepts. The summary can be seen below in Table 4

Table 4: Direct mappings (2)
PROV Term Relation DC Term Rationale
prov:hadPrimarySource rdfs:subPropertyOf dct:source It is surprising to see that some terms of Dublin Core are more general than the ones defined in PROV. However the definition of prov:hadPrimarySource ("something produced by some agent with direct experience and knowledge about the topic") is more restrictive than dct:source ( "A related resource from which the described resource is derived").
prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf Similar to the previous property, prov:wasRevisionOf is more restrictive in the sense that it refers to revised version of a resource, while dct:isVersionOf involves versions, editions or adaptations of the original resource.

2.4 PROV refinements

To properly reflect the meaning of the Dublin Core terms, we need refinements, i.e. more specific subclasses:

 dcprov:PublicationActivity      rdfs:subClassOf     prov:Activity .
 dcprov:ContributionActivity     rdfs:subClassOf     prov:Activity .
 dcprov:CreationActivity         rdfs:subClassOf     prov:Activity, dcprov:ContributionActivity .
 dcprov:ModificationActivity     rdfs:subClassOf     prov:Activity .
 dcprov:AcceptanceActivity       rdfs:subClassOf     prov:Activity .
 dcprov:CopyrightingActivity     rdfs:subClassOf     prov:Activity .
 dcprov:SubmissionActivity       rdfs:subClassOf     prov:Activity .
 dcprov:PublisherRole            rdfs:subClassOf     prov:Role .
 dcprov:ContributorRole          rdfs:subClassOf     prov:Role . 
 dcprov:CreatorRole              rdfs:subClassOf     prov:Role, dcprov:ContributorRole .

Custom refinements of the properties should be omitted as they would be identical to the Dublin Core terms. If these more specific properties are wanted, the Dublin Core terms should be used directly, according to the direct mappings presented in section 2.3.

2.5 Complex Mappings

The complex mappings are provided in form of SPARQL CONSTRUCT queries, i.e., queries that describe a resulting RDF graph based on another RDF graph found in the original data. We divide the queries in different categories:

2.5.1 Entity-Agent mappings (Who)

In this category, we have four terms: contributor, creator, publisher, and rightsHolder. The former three can be mapped with the same pattern, similar to the one presented in Figure 1. The only main changes changes are the roles and activities involved in each term.

In the text below, variables ?document and ?agent are set to different matching values depending on the data found in the triple store. The graph in the CONSTRUCT part can be seen as a template where the variables are placeholders that are filled with the values found in the data. The mapping corresponds to the graph in Figure 1 (with small changes for creator and rightsHolder). With this mapping, the difference in the complexity becomes obvious. A lot of blank nodes are created, so a subsequent clean-up phase that relates them and provides stable URIs for the entities is required. Depending on the implementation, URIs can also be coined here for every specialization. Sometimes, URIs for the specializations are also available and simply not exposed to the Dublin Core record. Our implementation is only an example that works conservatively, i.e., we assume that there is no further information about the identity of specializations available.

2.5.1.1 dct:creator
The creator is the agent associated with role CreatorRole in the CreationActivity that created a specialization of the entity (?document). We avoid using the Dublin Core entity because it may have other statements referring to it (about publishing, licensing, modifying it, etc.).
 CONSTRUCT {
    ?document a prov:Entity .
		prov:wasAttributedTo ?agent.
    ?agent a prov:Agent .
    _:activity a prov:Activity, dcprov:CreationActivity ;
		prov:wasAssociatedWith ?agent;
		prov:qualifiedAssociation [
			a prov:Association;
			prov:agent ?agent;
			prov:hadRole dcprov:CreatorRole .
		].
    _:resulting_entity a prov:Entity ;
		prov:specializationOf ?document ;
		prov:wasGeneratedBy _:activity ;
		prov:wasAttributedTo ?agent.		
 } WHERE {
    ?document dct:creator ?agent.
 }
2.5.1.2 dct:contributor
In the same way, publisher and contributor can be mapped, only the roles and activities change:
 CONSTRUCT {
    ?document a prov:Entity .
		prov:wasAttributedTo ?agent .
    ?agent a prov:Agent .
    _:activity a prov:Activity, dcprov:ContributionActivity ;
		prov:wasAssociatedWith ?agent ;
		prov:qualifiedAssociation [ 
			a prov:Association ;
			prov:agent ?agent ;
			prov:hadRole dcprov:ContributorRole .
		]
    _:resulting_entity a prov:Entity ;
		prov:specializationOf ?document ;
		prov:wasGeneratedBy _:activity ;
		prov:wasAttributedTo ?agent .
 } WHERE {
    ?document dct:contributor ?agent .
 }
2.5.1.3 dct:publisher
In case of publication, a second specialization representing the entity before the publication is necessary:
 CONSTRUCT {
    ?document a prov:Entity .
		prov:wasAttributedTo ?agent .
    ?agent a prov:Agent .
    _:activity a prov:Activity, dcprov:PublicationActivity ;
		prov:wasAssociatedWith ?agent ;
		prov:qualifiedAssociation [ 
			a prov:Association ;
			prov:agent ?agent ;
			prov:hadRole dcprov:PublisherRole .
		]
    _:resulting_entity a prov:Entity ;
		prov:specializationOf ?document ;
		prov:wasGeneratedBy _:activity ;
		prov:wasAttributedTo ?agent .
 } WHERE {
    ?document dct:publisher ?agent .
 }
2.5.1.4 dct:rightsHolder
The rightsHolder concept mapping is slightly different. Here we propose to omit the activity and just add the rights holder to the entity by means of prov:wasAttributedTo. This mapping could actually be omitted as the statements can be inferred from the direct mapping.
 CONSTRUCT {
  ?document a                     prov:Entity .
  ?agent    a                     prov:Agent .
  ?document prov:wasAttributedTo  ?agent .
 } WHERE { 
  ?document dct:rightsHolder      ?agent .
 }

2.5.2 Entity-Date mappings (When)

The dates often correspond with a who-property, e.g., creator and created or publisher and issued. Therefore, they lead to similar statements, only providing a date instead of an agent associated with the activity. We use issued as an example here, because from issued, two specializations can be inferred: something must be available before it can be published.

When using Dublin Core terms, it is usual to see that a resource is annotated with several dc assertions like creator, publisher, issued, date, etc. Therefore if we assume that each date corresponds to the generation date by an activity (creationActivity, publishingActivity, etc.) then we can't say that all those activities generated the resource. Instead, in order to generate "proper" provenance records, we say that all those activities generated an entity which for which the resource is a specialization.

2.5.2.1 dct:created

 CONSTRUCT{
 ?document           a                         prov:Entity .
					
 _:activity          a                         prov:Activity, dcprov:CreationActivity ;
				 
 # The “output”
 _:created_entity    a                         prov:Entity ;
                     prov:specializationOf     ?document ;
                     prov:wasGeneratedBy       _:activity ;
                     prov:wasGeneratedAtTime      ?date;
                     prov:qualifiedGeneration  [ 
                         a prov:Generation ;
                         prov:atTime ?date  ;
                         prov:activity _:activity . 
                     ] .
 } WHERE { 
  ?document dct:created ?date.
 }
 
2.5.2.2 dct:issued

 CONSTRUCT{
 ?document        a                         prov:Entity .
 
 _:activity       a                         prov:Activity, dcprov:PublicationActivity ;
                  prov:used                 _:used_entity .
				  
# The “input”
 _:used_entity    a                         prov:Entity .
                  prov:specializationOf     ?document .
				  
 # The “output”
 _:iss_entity     a                         prov:Entity ;
                  prov:specializationOf     ?document ;
                  prov:wasGeneratedBy       _:activity ;
                  prov:wasGeneratedAtTime   ?date;
                  prov:wasDerivedFrom       _:used_entity ;
                  prov:qualifiedGeneration  [ 
                         a prov:Generation ;
                         prov:atTime ?date  ;
                         prov:activity _:activity . 
                  ] .   
 } WHERE { 
  ?document dct:issued ?date.
 }

2.5.2.3 dct:modified

As seen with the following terms, most entity/date properties will have a similar structure.

 
 CONSTRUCT{
 ?document             a                         prov:Entity .
 
 _:activity            a                         prov:Activity, dcprov:ModificationActivity ;
                       prov:used                 _:used_entity .
				  
# The “input”
 _:used_entity         a                         prov:Entity .
                       prov:specializationOf     ?document .
				  
 # The “output”
 _:modified_entity     a                         prov:Entity ;
                       prov:specializationOf     ?document ;
                       prov:wasGeneratedBy       _:activity ;
                       prov:wasGeneratedAtTime   ?date;
                       prov:wasDerivedFrom       _:used_entity ;
                       prov:qualifiedGeneration  [ 
                              a prov:Generation ;
                              prov:atTime ?date  ;
                              prov:activity _:activity . 
                       ] .   
 } WHERE { 
  ?document dct:modified ?date.
 }

2.5.2.4 dct:dateAccepted

 
 CONSTRUCT{
 ?document             a                         prov:Entity .
 
 _:activity            a                         prov:Activity, dcprov:AcceptanceActivity ;
                       prov:used                 _:used_entity .
				  
# The “input”
 _:used_entity         a                         prov:Entity .
                       prov:specializationOf     ?document .
				  
 # The “output”
 _:accepted_entity     a                         prov:Entity ;
                       prov:specializationOf     ?document ;
                       prov:wasGeneratedBy       _:activity ;
                       prov:wasGeneratedAtTime   ?date;
                       prov:wasDerivedFrom       _:used_entity ;
                       prov:qualifiedGeneration  [ 
                              a prov:Generation ;
                              prov:atTime ?date  ;
                              prov:activity _:activity . 
                       ] .   
 } WHERE { 
  ?document dct:dateAccepted ?date.
 }

2.5.2.5 dct:dateCopyrighted

 CONSTRUCT{
 ?document                a                         prov:Entity .
 
 _:activity               a                         prov:Activity, dcprov:CopyrightingActivity ;
                          prov:used                 _:used_entity .
				  
# The “input”
 _:used_entity            a                         prov:Entity .
                          prov:specializationOf     ?document .
				  
 # The “output”
 _:copyrighted_entity     a                         prov:Entity ;
                          prov:specializationOf     ?document ;
                          prov:wasGeneratedBy       _:activity ;
                          prov:wasGeneratedAtTime   ?date;
                          prov:wasDerivedFrom       _:used_entity ;
                          prov:qualifiedGeneration  [ 
                                 a prov:Generation ;
                                 prov:atTime ?date  ;
                                 prov:activity _:activity . 
                          ] .   
 } WHERE { 
  ?document dct:dateCopyrighted ?date.
 }

2.5.2.6 dct:dateSubmitted

 CONSTRUCT{
 ?document               a                         prov:Entity .
 
 _:activity              a                         prov:Activity, dcprov:SubmissionActivity ;
                         prov:used                 _:used_entity .
				  
# The “input”
 _:used_entity           a                         prov:Entity .
                         prov:specializationOf     ?document .
			  
 # The “output”
 _:submitted_entity      a                         prov:Entity ;
                         prov:specializationOf     ?document ;
                         prov:wasGeneratedBy       _:activity ;
                         prov:wasGeneratedAtTime   ?date;
                         prov:wasDerivedFrom       _:used_entity ;
                         prov:qualifiedGeneration  [ 
                                a prov:Generation ;
                                prov:atTime ?date  ;
                                prov:activity _:activity . 
                         ] .   
 } WHERE { 
  ?document dct:dateSubmitted ?date.
 }

2.5.3 Entity-Entity mappings (How)

Most Dublin Core terms in this category are related to the prov:wasDerivedFrom property. They can be mapped directly, but also a complex mapping can be provided. In these cases, a specialty of SPARQL CONSTRUCT queries can be used to deal with the inverse properties in Dublin Core.

2.5.3.1 dct:isVersionOf / dct:hasVersion

I would say that prov:wasDerivedFrom>dct:isVersionOf>prov:wasRevisionOf. Thus:

 CONSTRUCT {
    ?document1 a prov:Entity ;
       prov:wasDerivedFrom ?document2.
    ?document2 a prov:Entity .
 } WHERE {
    OPTIONAL { ?document1 dct:isVersionOf ?document2 . }
    OPTIONAL { ?document2 dct:hasVersion ?document1 .}
 }

The OPTIONAL keyword means that the included statement does not need to exist. Triples in the resulting graph with variables that have no binding simply are omitted. In this case this leads to the correct PROV statement, if either or both source statements are present. From the entity/entity relations, an activity can also be inferred (e.g., the activity that led to the creation of the new version) . We omit it here for brevity.

2.5.3.2 dct:isFormatOf / dct:hasFormat

isFormatOf is defined as “A related resource that is substantially the same as the described resource, but in another format”. This would map to prov:alternateOf. We don’t know which entities are both of them specializing, but we know that one is an alternate of the other.

 CONSTRUCT {
    ?document1 a prov:Entity ;
       prov:alternateOf ?document2.
    ?document2 a prov:Entity .
 } WHERE {
    OPTIONAL { ?document1 dct:isFormatof ?document2 . }
    OPTIONAL { ?document2 dct:hasFormat ?document1 .}
 }

2.5.3.3 dct:replaces / dct:isReplacedBy

 CONSTRUCT {
    ?document1 a prov:Entity ;
       prov:wasInfluencedBy ?document2.
    ?document2 a prov:Entity .
 } WHERE {
    OPTIONAL { ?document1 dct:replaces ?document2 . }
    OPTIONAL { ?document2 dct:isReplacedBy ?document1 .}
 }

2.5.3.4 dct:source

 CONSTRUCT{
   ?document1     a   prov:Entity ;
                  prov:wasDerivedFrom    :subj2 .
   ?document2     a   prov:Entity .
  } WHERE { 
   ?document1 dct:source ?document2.
  }

2.6 Cleanup

The clean-up phase depends on the intensions of the implementor and the answer to the question, what is the described resource (ex:document1)? in the resulting provenance data. The approach presented in this document is conservative and it leads to the proliferation of blank nodes. Blank nodes could be renamed to specific identifiers by the implementor, in order to avoid obtaining additional blank nodes when reapplying the construct queries presented in the previous section.

Providing a set of rules to conflate the blank nodes is not in the scope of this document. However, the group has created a list of suggestions for implementors with ideas on how this could be achieved:

1)Conflate properties referring to the same state of the resource: In Dublin Core certain properties complement each other (e.g., creator and created, publisher and issued, modified and contributor, etc.). By combining some of the queries, we could group some of the records and create more complete PROV assertions.

Example: Combining created and creator:

 CONSTRUCT{
 ?document               a                         prov:Entity .
 
 _:activity              a                         prov:Activity, dcprov:CreationActivity.
                         prov:wasAssociatedWith    ?agent
                         prov:qualifiedAssociation [
			                  a prov:Association;
			                  prov:agent ?agent;
			                  prov:hadRole dcprov:CreatorRole .
                         ]
			  
 # The “output”
 _:created_entity      a                         prov:Entity ;
                       prov:specializationOf     ?document ;
                       prov:wasGeneratedBy       _:activity ;
                       prov:wasGeneratedAtTime   ?date;
                       prov:wasDerivedFrom       _:used_entity ;
                       prov:qualifiedGeneration  [ 
                              a prov:Generation ;
                              prov:atTime ?date  ;			
                              prov:activity _:activity . 
                       ] .   
 } WHERE { 
  ?document dct:creator  ?agent;
            dct:created  ?date.
 }
 

2) Another solution would be to sort all the activities according to their date, if known, and conflate the blank nodes result of one activity and the input of the subsequent activity, in case they are both specializations of the same entity.

3) Finally, another simpler idea is to ignore all the specializations of ex:document1 and use the resource itself. This solution would avoid the majority of the blank nodes, linking all the activities with the resource. However, the results would be confusing in case there are several dublin core statements describing the same resource (like publisher and creator), since most of the activities would use and generate the same resource at different times (all the provenance of the different versions of the resource would be conflated in the same entity).

2.7. List of terms excluded from the mapping

Table 6: List of terms excluded from the mapping
Term Category Rationale
dct:abstract Descriptive metadata Summary of the resource. Thus, not part of its provenance.
dct:accrualMethod Descriptive Metadata Method by which items are added to a collection. It doesn't describe the action itslef, so we decided to leave this term out of the mpping
dct:accrualPeriodicity Descriptive metadata Frequency of the items added to a collection.
dct:accrualPolicy Descriptive metadata Policy associated with the insertion of items to a collection. We could use it to enrich the qualified involvement, but there is no direct mapping of this relationship.
dct:alternative Descriptive metadata Refers to an alternative name of the resource.
dct:audience Descriptive metadata The audience for whom the resource is useful.
dct:conformsTo Descriptive metadata Indicates the standard to which the resource conforms to (if any).
dct:coverage Descriptive metadata Topic of the resource.
dct:description Descriptive metadata An account of the resource.
dct:educationLevel Descriptive metadata The educational level of the audience for which the resource is intended too.
dct:extent Descriptive metadata Size or duration of the resource.
dct:format Descriptive metadata Format of the resource. Descriptive metadata.
dct:identifier Descriptive metadata An unambiguous reference on a given context. Note: it could be mapped to the PROV-DM' ID for entities.
dct:instructionalMethod Descriptive metadata Method used to create the knowledge that the resource is supposed to support.
dct:isPartOf Descriptive metadata Inverse of hasPart.
dct:isRequiredBy Descriptive metadata The current resource is required for supporting the function of another resource. This is not related the provenance, since it refers to something that may not have happened yet (e.g., a library dependency, but the program that needs it hasn’t been executed yet).
dct:language Descriptive metadata Language of the resource.
dct:mediator Descriptive metadata Entity that mediates access to the resource.
dct:medium Descriptive metadata Material of the resource.
dct:requires Descriptive metadata Inverse property of isRequiredBy (see isRequiredBy).
dct:hasPart Descriptive metadata A resource that is included in the current resource. Entity composition is out of the scope of DM, so we leave it out of the mapping list as well
dct:spatial Descriptive metadata Spatial characteristics of the content of the resource resource (e.g., the book is about Spain). Thus it can't be mapped to prov:hadLocation.
dct:subject Descriptive metadata Subject of the resource.
dct:tableOfContents Descriptive metadata List of subunits of the resource.
dct:temporal Descriptive metadata Temporal characteristics of which the resource refers to (e.g., a book about 15th century).
dct:title Descriptive metadata Title of the resource.
dct:type Descriptive metadata Type of the resource.
dct:bibliographicCitation Descriptive metadata Property that relates the Literal representing the bibliographic citation of the resource to the actual resource (e.g., :el_Quijote dct:bibliographicCitation "Miguel de Cervantes Saavedra: El Quijote, España").
dct:references Provenance: How This term could be used to refer to sources that have been used to create the document, but it could be also used to cite the sources that are not relevant for the current work. Since we could not reach consensus on how to map it to prov, we have left it out of the mapping
dct:isRefrencedBy Provenance: How Inverse to dct:references.
dct:accessRights Provenance: How Who can access the resource (security status). Since the privileges of the resource are part of the description of the resource, it’s not included in the list.
dct:license Provenance: How License of the resource. It has been left out of the list because there is no term in PROV-O to map to.
dct:rights Provenance: How Metadata about the rights of the resource.
dct:date Provenance: When Date is a very general property. It is the superproperty which all the other specialize. We have decided to leave it with no mapping
dct:available Provenance: When Property that states when a resource is available. We couldn't find consensus on how to map this property, so it was dropped.
dct:valid Provenance: When Property that states when a resource is valid. We have the notion of invalidation in PROV-O, but not the notion of validation. Thus we leave this property out of the mapping
dct:relation Provenance A related resource. This relationship is very broad and could relate either provenance resources or not. It could be seen as a superproperty of wasDerivedFrom, wasInfluencedBy, alternateOf, specializationOf, etc. Thus there is no direct mapping.

3. Mapping from PROV to DC

The mapping from PROV to DC is not part of this note. It can be questioned, if a mapping without additional information would provide meaningful data. If refinements are used, the mapping would be straight forward and more or less the inverse of the mapping patterns that we used. However, without such refinements, almost no DC statements can be inferred, besides some unqualified dates. Dublin Core includes provenance information, but the focus lies on the description of the resources. Pure PROV data models a provenance chain, but it contains almost no information about the resulting resource itself.

A. Acknowledgements

We would like to thank Antoine Isaac, Timothy Lebo, Simon Miles, and Satya Sahoo for their feedback.

B. References

B.1 Normative references

No normative references.

B.2 Informative references

[DCMI]
Dublin Core Metadata Initiative. URL: http://dublincore.org/
[DCTERMS]
Dublin Core Terms Vocabulary. URL: http://dublincore.org/documents/dcmi-terms/
[PROV-CONTRAINTS]
James Cheney, Paolo Missier, and Luc Moreau (eds.) Constraints of the PROV Data Model.. 2011. W3C Working Draft. URL: http://www.w3.org/TR/prov-constraints/
[PROV-DM]
Luc Moreau, Paolo Missier The PROV Data Model and Abstract Syntax Notation. 15 December 2011. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2011/WD-prov-dm-20111215/
[PROV-DEF]
W3C Provenance Working Group's Definition of Provenance.URL: http://www.w3.org/TR/2012/WD-prov-dm-20120724/#dfn-provenance
[PROV-O]
Timothy Lebo, Satya Sahoo, Deborah McGuinness The PROV Ontology: Model and Formal Semantics. 13 December 2011. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2011/WD-prov-o-20111213