This document describes the use of existing mechanisms for accessing and querying provenance data about resources on the web.



Accessing provenance data

A general expectation is that web applications may access provenance information in the same way as any web resource, by dereferencing its URI. Typically, this will be by performing an HTTP GET operation. Thus, any provenance information may be associated with a URI, and may be accessed by dereferencing that URI using normal web mechanisms.

The problem of accessing some required provenance information then reduces to the problem of finding its URI, which is dealt with separately in section .

This specification thus RECOMMENDS that if a publisher wishes to make provenance information available, it is published as a normal web resource, and provision is made for the URI of the provenance to be discoverable.

This presumption of using web retrieval to access provenance does not preclude use of other mechanisms. In particular, alternative mechanisms may be needed if there is no URI associated with some particular provenance data. One such mechanism is suggested in .

Locating provenance data

On the presumption that provenance data is a resource that can be accessed using normal web retrieval, one needs to know a URI to dereference. The provenance URI may be known in advance, in which case there is nothing more to specify. If the provenance URI is not known, then a mechanism to discover a provenance URI must be based on some information that is available to the would-be accessor. We also wish to allow that provenance information could be provided by parties other than the provider of the original resource. Indeed, provenance data for a resource may be provided by several different parties, each with different concerns.

We start by considering mechanisms for the resource provider to also indicate a provenance URI. Because the resource provider controls the response when the resource is accessed, this allows for direct indication of a provenance URI. Three mechanisms are described here:

These particular cases are selected as corresponding to primary current web protocol and data formats.

Resource accessed by HTTP

For a document accessible using HTTP, [[POWDER-DR]] describes a mechanism for associating metadata with a resource by adding an HTTP Link header field to the HTTP response to a GET or HEAD operation (other HTTP operations are not excluded, but are not considered here). Since the POWDER specification was published, the HTTP linking draft has been approved by the IETF as RFC 5988 [[LINK-REL]].

The same basic mechanism can be used for referencing provence data, for which a new link relation type is registered according to the template in :

                Link: provenance-URI; rel="provenance"
When used in conjunction with an HTTP success response code (2xx), this HTTP header indicates that provenance-URI is the URI of a provenance resource for which information is returned. At this time, the meaning of provenance links returned with other HTTP response codes is not defined: future revisions of this specification may define interpretations for these.

An HTTP response MAY include multiple provenance link headers, indicating a number of different resources that are known to the responding server, each providing provenance about the accessed resource.

The presence of a provenance link in an HTTP response does not preclude the possibility that other publishers may offer provenance information about the same resource. In such cases, discovery of the additional provenance information must use other means (e.g. see ).

Open issues

Are the provenance resources indicated in this way to be considered authoritative? I.e. if the client trusts information returned by the server (e.g. is prepared to act on inferences based on the returned data), should it also trust the provenance data, or should trust in the linked provenance data be determined separately? If the linked data is to be trusted, then the data from multiple linked provenance resources MUST be consistent if it is to be meaningful. I favour an approach whereby trust in the provenance resources is established independently, which is similar to the situation for any other resource; e.g. based on the domain that serves it, or an associated digital signature.

Resource presented as HTML

For a document presented as HTML or XHTML, without regard for how it has been obtained, [[POWDER-DR]] describes a mechanism for associating metadata with a resource by adding a <Link> element to the HTML <head> section.

The same basic mechanism can be used for referencing provence data, for which a new link relation type is registered according to the template in :

                <html xmlns="">
                      <meta name="wdr.issuedby" content=""/>
                      <link rel="provenance" href="provenance-URI">
                      <title>Welcome to </title>
This element indicates that provenance-URI is the URI of a provenance resource for the containing document.

An HTML document header MAY include multiple provenance link elements, indicating a number of different resources that are known to the creator of the document, each providing provenance about the document resource.

See in particular Appendix A. Notes on Using the Link Header with the HTML4 Format of RFC5988 for further notes about using link relation types in HTML.

Open issues

@@TODO - The POWDER specification also adds: Documents MAY also include any of the attribution data from the POWDER document in meta tags. In particular, the issuedby field is likely to be useful to user agents deciding whether or not to fetch the full POWDER document. Any attribution data encoded in meta tags within an HTML document should be the same as that in the POWDER document. In case of discrepancy, the POWDER document should be taken as more authoritative. Is there a parallel we should add here for provenance?

Resource presented as RDF

If a resource is presented as RDF (in any of its recognized syntaxes, including RDFa), it may contain references to its own provenance using additional RDF statements.

For this purpose a new RDF property, prov:hasProvenance, is defined as a relation between two resources, where the object of the property is a resource that provides provenance data about the subject resource. Multiple prov:hasProvenance assertions may be made about a subject resource.

@@TODO: example

@@TODO: document namespace. Check naming style. Use provenance model namespace? Define as part of model?

Third party services

The mechanisms for provenance discovery described above have all assumed the provenance URI is being supplied by the provider of the original resource. Where provenance information is provided by a third party without any collaboration from the original resource provider, the provenance link cannot be provided directly and a different approach must be considered.

We assume that the application or person requesting provenance information has the URI or other unique identification of the resource for which provenance is required, and also has a URI for a third-party service that provides a provenance information service. Specifically, the third party service URI is the URI of a SPARQL endpoint which is queried for the desired provenance information.

If the requester has a URI for the original resource, they simple issue a simple SPARQL query for the URI(s) of any associated provenance data; e.g., if the original resource has URI,

                @prefix prov: <@@TBD>
                SELECT ?provenance_uri WHERE
                  <> prov:hasProvenance ?provenance_uri

If the requester has identifying information that is not the URI of the original resource, then they will need to construct a more elaborate query to locate the target resource and obtain its provenance URI(s). The nature of identifying information that can be used in this way will depend upon the third party service used, further definition of which is out of scope for this specification. For example, a query for a document identified by a DOI, say 1234.5678, might look like this:

                @prefix prov: <@@TBD>
                @prefix idservice: <@@TBD>
                SELECT ?provenance_uri WHERE
                  [ idservice:hasDOI "1234.5678" ] prov:hasProvenance ?provenance_uri

The mechanisms described here focus on finding the URI(s) for provenance information. Below, will consider access to provenance information for which there is no separate URI.

Querying provenance data

(This section will describe the use of a SPARQL endpoint serice to obtain provenance information directly from a service provider. No new protocol or vocabulary elements are defined: the mechanisms are used are thosed described above, coupled with possible use of provenance vocabulary terms in a SPARQL query.)


Provenance service discovery

(How to discover provenance services. There is nothing particular about provenance on this respect, and this section will discuss some of the available options without adding any new normative specification.)


IANA considerations

This document requests registration of "provenance" link relation, per section-6.2.1 of RFC 5988. @@TODO At an appropriate time (??), the following template should be submitted to

Relation Name:
the resource identified by target URI of the link provides provenance information about the resource identified by the context link
@@this spec, @@provenance-model-spec
Application Data:

Security considerations

Provenance is central to establishing trust in data. If provenance information is corrupted, it may lead agents (human or software) to draw inappropriate and possibly harmful conclusions. Therefore, care is needed to ensure that the integrity of provenance data is maintained.

When using HTTP to access provenance information, or to determine a provenance URI, secure HTTP (https) SHOULD be used.

When retrieving a provenance URI from a document, steps SHOULD be taken to ensure the document itself is an accurate copy of the original whose author is being trusted (e.g. signature checking, or verifying its checksum aainst an author-provided secure web service). against

@@TODO ... privacy, access control to provenance (from Edinburgh meeting). In particular, note that the fact that a resource is openly accessible does not mean that its provenance information should also be.

@@TODO ... more, probably


Many thanks to Robin Berjon for making our lives so much easier with his cool ReSpec tool.

Motivating scenario

This scenario was selected by the provenance working group as a touchstone for evaluating any provenance access proposal. This appendix evaluates the foregoing proposals against the requirements implied by that scenario.

Gap analysis

There are clearly a number of capabilities needed for a provenance-aware application that are not covered by the mechanisms described above. But most of these amount to implementation details and decisions for a particular application, and as such are beyond the scope of this document to specify.

One feature not covered above that might be a candidate for specification is a common format for a data package that combines original content along with provenance-related metadata or data. At this stage, it is not clear what format that might take, but some possible candidates are:

Given the extent of work already performed in this field, it seems to me that rather than recommending a particular approach at this stage, we would do better to catalogue some available solutions and see which ones (if any) that implementers choose to run with. In any case, it seems that a specification that is specific for provenance to the exclusion of other metadata is unlikely to obtain traction, as provenance is just part of a wider landscape of information quality, trust, preservation and more.