Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
The PROV Document Overview describes the overall state of PROV, and should be read before other PROV documents.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the third public working. This revision introduces a new definition of a provenance pingback service as well as making various clarifications about the definition of service descriptions and how they are retrieved.
This document was published by the Provenance Working Group as a Working Draft. If you wish to make comments regarding this document, please send them to public-prov-comments@w3.org (subscribe, archives). All comments are welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The Provenance Data Model [PROV-DM], Provenance Ontology [PROV-O] and related specifications define how to represent provenance in the World Wide Web (see the [PROV-OVERVIEW]).
This note describes how standard web protocols may be used to locate, retrieve and query provenance records:
Most mechanisms described in this note are independent of the provenance format used, and may be used to access provenance in any available format. For interoperable provenance publication, use of PROV-O represented in a standardized RDF format is recommended. Where alternative formats are available, selection may be made by content negotiation.
For ease of reference, the main body of this document contains some links to external web pages. Such links are distinguished from internal references thus: W3C Provenance Working Group.
In defining the specification below, we make use of the following concepts.
The pingback definition is new. Review is encouraged.
This document uses the term URI for web resource identifiers, as this is the term used in many of the currently ratified specifications that this document builds upon. In many situations, a URI may also be an IRI [RFC3987], which is a generalisation of a URI allowing a wider range of Unicode characters. Every absolute URI is an IRI, but not every IRI is an URI. When IRIs are used in situations that require a URI, they must first be converted according to the mapping defined in section 3.1 of [RFC3987]. A notable example is retrieval over the HTTP protocol. The mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets not allowed in URIs, and Punycode-encoding of domain names.
Fundamentally, a provenance record is about resources. In general, resources may vary over time and context. E.g., a resource describing the weather in London changes from day-to-day, or a listing of restaurants near you will vary depending on your location.
Provenance records a history of the entities, activities, and people involved in producing an artifact, and may be collected from several sources at different times. In order to create a meaningful history, the individual provenance records used must remain valid and correct when interpreted in a context other than that in which they were collected. Yet we may still want to make provenance assertions about dynamic or context-dependent resources (e.g. a weather forecast for London on a particular day may have been derived from a particular set of Meteorological Office data).
Provenance records for dynamic and context-dependent resources are possible through a notion of constrained resources. A constrained resource is simply a resource (in the sense defined by [WEBARCH], section 2.2) that is a specialization or instance of some other resource. For example, a W3C specification typically undergoes several public revisions before it is finalized. A URI that refers to the "current" revision might be thought of as denoting the specification throughout its lifetime. Each individual revision would also have its own target-URI denoting the specification at that particular stage in its development. Using these, we can make provenance assertions that a particular revision was published on a particular date, and was last modified by a particular editor. Target-URIs may use any URI scheme, and are not required to be dereferencable.
Requests for provenance about a resource may return provenance records that use one or more target-URIs to refer to versions of that resource, such as when there are assertions referring to the same underlying resource in different contexts. For example, a provenance record for a W3C document might include information about all revisions of the document using statements that use the different target-URIs of the various revisions.
These ideas are represented in the provenance data model [PROV-DM] by the concepts entity and specialization. In particular, an entity may be a specialization of some resource whose "fixed aspects" provide sufficient constraint for expressed provenance about the resource to be invariant with respect to that entity. This entity is itself just another resource (e.g. the weather forecast for a give date as opposed to the current weather forecast), with its own URI for referring to it within a provenance record.
Review second para below.
The mechanisms described in this document are intended to allow a provider to supply information that allows a consumer to access provenance records, which themselves explicitly identify the entities they describe. A provenance record may contain information about several entities, referring to them using their various target-URIs. Thus a consumer should be selective in its use of the information provided when interpreting a provenance record.
A provenance record consumer will need to isolate information about the specific entity or entities of interest. These may be constrained resources identified by separate target-URIs than the original resource, in which case it will need to know about the target-URIs used. The mechanisms defined later allow a provider to expose such URIs.
While a provider should avoid giving spurious information, there are no fixed semantics, particularly when multiple resources are indicated, and a client should not assume that a specific given provenance-URI will yield information about a specific target-URI. In the general case, a client presented with multiple provenance-URIs and multiple target-URIs should look at all of the provenance-URIs for information about any or all of the target-URIs.
A provenance record is not of itself guaranteed to be authoritative or correct. Trust in provenance records must be determined separately from trust in the original resource. Just as in the web at large, it is a user's responsibility to determine an appropriate level of trust in any other resource; e.g. based on the domain that serves it, or an associated digital signature. (See also section 6. Security considerations.)
A number of resource types are described above in section 1.1 Concepts. The table below summarizes what these various URIs are intended to denote, and the kind of information that should be returned if they are dereferenced:
Denotes | Dereferences to | |
---|---|---|
Target-URI | Any resource that is described by some provenance - typically an entity (in the sense of [PROV-DM], but may be an activity). | If the URI is dereferencable, it should return a representation or description of the resource for which provenance is provided. |
Provenance-URI | A provenance record, or provenance description, in the sense described by [PROV-DM] (PROV Overview). | A provenance record in any defined format, selectable via content negotiation. |
Service-URI | A provenance query service. The service-URI is the initial URI used when accessing a provenance query service; following REST API style [REST-APIs], URIs for accessing provenance are determined via the service description. | A provenance query service description per section 4.1 Provenance query service description. Alternative formats may be offered via HTTP content negotiation. |
Pingback-URI | A provenance pingback service. This is a service to which provenance pingback information can be submitted using an HTTP POST operation per section 5. Forward provenance. No other operations are specified. | None specified (the owner of a provenance pingback URI may choose to return useful information, but is not required to do so.) |
This specification describes two ways to access provenance records:
Web applications may access a provenance record in the same way as any resource on the Web, by dereferencing its URI (commonly using an HTTP GET operation). Thus, any provenance record may be associated with a provenance-URI, and may be accessed by dereferencing that URI using web mechanisms. How much or how little provenance is returned in a provenance record is a matter for the provider, taking account that a provenance trace may extend as linked data across multiple provenance records.
When there is no easy way to associate a provenance-URI with a resource (e.g. for resources not directly web-accessible, or whose publication mechanism is controlled by someone else), a provenance description may be obtained using a provenance query service at an indicated service-uri. A REST protocol for provenance queries is defined in Section section 4. Provenance query services; also described there is a mechanism for locating a SPARQL query service [SPARQL-SD].
When publishing provenance, corresponding provenance-URIs or service-URIs should be discoverable using one or more of the mechanisms described in section 3. Locating provenance records.
Provenance may be presented as a bundle, which is "a named set of provenance descriptions, and is itself an entity, so allowing provenance of provenance to be expressed" [PROV-DM]. A provenance description at a dereferencable provenance-URI may be treated as a bundle, and this is a good way to make provenance easily accessible. But there are other possible implementations of a bundle, such as a named graph in an RDF dataset [RDF-CONCEPTS11], for which the bundle URI may not be directly dereferencable.
When a bundle is published as part of an RDF Dataset, to access it would require accessing the RDF Dataset and then extracting the identified graph component; this in turn would require knowing a URI or some other way to retrieve the RDF dataset. This specification does not describe a specific mechanism for extracting components from a document containing multiple graphs.
The W3C Linked Data Platform group (www.w3.org/2012/ldp/) is chartered to produce a W3C Recommendation for HTTP-based (RESTful) application integration patterns using read/write Linked Data; we anticipate that they may address access to RDF Datasets in due course.
A provenance record can be accessed using direct web retrieval, given its provenance-URI. If this is known in advance, there is nothing more to specify. If a provenance-URI is not known then a mechanism to discover one must be based on information that is available to the would-be accessor. Likewise, provenance may be exposed by a query service, in which case, the corresponding service-URI must be discovered.
Three mechanisms are defined for a provenance consumer to find information about a provenance-URI or service-URI, along with a target-URI:
These particular cases are selected as corresponding to current primary web protocol and data formats. Similar approaches may be defined for other protocols or resource formats.
Provenance records may be offered by several providers other than that of the original resource publisher, each with different concerns, and presenting provenance at different locations. It is possible that these different providers may present contradictory provenance.
For a resource accessible using HTTP, a provenance record may be indicated using an HTTP Link
header field, as defined by Web Linking (RFC 5988) [LINK-REL]. The Link
header field is included in the HTTP response to a GET or HEAD operation (other HTTP operations are not excluded, but are not considered here).
A has_provenance
link relation type for referencing a provenance record may be used thus:
Link: <provenance-URI>; rel="http://www.w3.org/ns/prov#has_provenance"; anchor="target-URI"
When used in conjunction with an HTTP success response code (2xx
), this HTTP header field indicates that provenance-URI
is the URI of a provenance record about the originally requested resource, and that the requested resource is identified within the provenance record as target-URI
. (See also section 1.3 Interpreting provenance records.)
If no anchor
parameter is provided then the target-URI
is assumed to be the URI of the requested resource used in the corresponding HTTP request.
This specification does not define the meaning of these links returned with other HTTP response codes: future revisions may define interpretations for these.
An HTTP response MAY include multiple has_provenance
link header fields, indicating a number of different provenance resources (and anchors) that are known to the responding server, each referencing a provenance record about the accessed resource.
The presence of a has_provenance
link in an HTTP response does not preclude the possibility that other providers may offer provenance records about the same resource. In such cases, discovery of the additional provenance records must use other means (e.g. see section 4. Provenance query services).
An example request including provenance headers in its response might look like this (where C:
and S:
prefixes indicate client and server emitted data respectively):
C: GET http://example.com/resource/ HTTP/1.1 C: Accept: text/html S: HTTP/1.1 200 OK S: Content-type: text/html S: Link: <http://example.com/resource/provenance/>; rel="http://www.w3.org/ns/prov#has_provenance"; anchor="http://example.com/resource/" S: S: <html ...> S: : S: </html>
Tim comment (14): Should a reference to the forward provenance section be included, too?
[GK] I don't see the need. Forward provenance is not primarily *about* the same resource, IMO, and I think mentioning it here could be more confusing than helpful.
The resource provider may indicate that provenance records about the resource are provided by a provenance query service. This is done through the use of a has_query_service
link relation type following the same pattern as above:
Link: <service-URI>; rel="http://www.w3.org/ns/prov#has_query_service"; anchor="target-URI"
The has_query_service
link identifies the service-URI. Dereferencing this URI yields a service description that provides further information to enable a client to submit a query to retrieve a provenance record for a resource; see section 4. Provenance query services for more details.
There may be multiple has_query_service
link header fields, and these MAY appear in an HTTP response together with has_provenance
link header fields.
C: GET http://example.com/resource/ HTTP/1.1 C: Accept: text/html S: HTTP/1.1 200 OK S: Content-type: text/html S: Link: <http://example.com/resource/provenance/>; rel="http://www.w3.org/ns/prov#has_query_service"; anchor="http://example.com/resource/" S: S: <html ...> S: : S: </html>
When performing content negotiation for a resource, it is common for HTTP 302 or 303 redirect response codes to be used to direct a client to an appropriately-formatted resource. When accessing a resource for which provenance is available, link headers SHOULD be included with the response to the final redirected request, and not on the intermediate 303 responses. (When accessing a resource from a browser using Javascript, the intermediate 303 responses are usually handled transparently by the browser and are not visible to the HTTP client code.)
Following content negotiation, any provenance link returned refers to the resource whose URI is used in the corresponding HTTP request, or the given anchor parameter if that is different.
An example transaction using content negotiation and redirection might look like this (where C:
and S:
prefixes indicate client and server emitted data respectively):
C: GET http://example.com/resource/ HTTP/1.1 C: Accept: text/html S: HTTP/1.1 302 Found S: Location: /resource/content.html S: Vary: Accept S: S: HTML content for http://example.com/resource/ S: is available at http://example.com/resource/content.html C: GET http://example.com/resource/content.html HTTP/1.1 C: Accept: text/html S: HTTP/1.1 200 OK S: Content-type: text/html S: Link: <http://example.com/resource/provenance/>; rel="http://www.w3.org/ns/prov#has_provenance"; anchor="http://example.com/resource/20130226/content.html" S: S: <html> S: <!-- HTML content here... --> S: </html>
<link>
element to the HTML <head>
section.
Two link relation types for referencing provenance may be used:
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <link rel="http://www.w3.org/ns/prov#has_provenance" href="provenance-URI"> <link rel="http://www.w3.org/ns/prov#has_anchor" href="target-URI"> <title>Welcome to example.com</title> </head> <body> <!-- HTML content here... --> </body> </html>
The provenance-URI
given by the first link element (#has_provenance
) identifies the provenance-URI for the document.
The target-URI
given by the second link element (#has_anchor
) specifies an identifier for the document that may be used within the provenance record when referring to the document.
If no target-URI
is provided (via a #has_anchor
link element) then is it is assumed to be the URI of the document. It is RECOMMENDED that this convention be used only when the document has a URI that is reasonably expected to be known or easily discoverable by a consumer of the document (e.g. when delivered from a web server, or as part of a MIME structure containing content identifiers [RFC2392]).
An HTML document header MAY present multiple provenance-URI
s over several #has_provenance
link elements, indicating a number of different provenance records that are known to the publisher of the document, each of which may provide provenance about the document (see section 1.3 Interpreting provenance records).
Check with Dong: I think the cross reference should make the assumptions explicit. I, too, feel this material is not strictly needed, but was previously asked to add some clarification about mutliple links.
The document creator may specify that the provenance about the document is provided by a provenance query service. This is done through the use of a third link relation type following the same pattern as above:
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <link rel="http://www.w3.org/ns/prov#has_query_service" href="service-URI"> <link rel="http://www.w3.org/ns/prov#has_anchor" href="target-URI"> <title>Welcome to example.com</title> </head> <body> <!-- HTML content here... --> </body> </html>
The has_query_service
link element identifies the service-URI. Dereferencing this URI yields a service description that provides further information to enable a client to query for provenance about a resource; see section 4. Provenance query services for more details.
There MAY be multiple #has_query_service
link elements, and these MAY appear in the same document as #has_provenance
link elements (though we do not anticipate that #has_provenance
and #has_query_service
link relations will commonly be used together).
Check with Dong: This test was already revised in response to earlier comment. I, too, feel this material is not strictly needed, but was previously asked to add some clarification.
If a resource is represented as RDF (in any of its recognized syntaxes, including RDFa), it may contain references to its own provenance using additional RDF statements. For this purpose the link relations introduced above (section 3. Locating provenance records) may be used as RDF properties: prov:has_provenance
, prov:has_anchor
, and prov:has_query_service
, where the prov:
prefix here indicates the PROV namespace URI http://www.w3.org/ns/prov#
.
The RDF property prov:has_provenance
is a relation between two resources, where the object of the property is a provenance-URI that denotes a provenance record about the subject resource. Multiple prov:has_provenance
assertions may be made about a subject resource.
Property prov:has_anchor
specifies a target-URI used in the indicated provenance to refer to the containing RDF document.
Property prov:has_query_service
specifies a service-URI for provenance queries.
@prefix prov: <http://www.w3.org/ns/prov#>. <> dcterms:title "Welcome to example.com" ; prov:has_anchor <http://example.com/data/resource.rdf> ; prov:has_provenance <http://example.com/provenance/resource.rdf> ; prov:has_query_service <http://example.com/provenance-query-service/> . # (More RDF data ...)
(The above example uses Turtle RDF syntax [TURTLE].)
These terms (prov:has_provenance
, prov:has_anchor
, and prov:has_query_service
) may be also used in RDF statements with other subjects to indicate provenance of other resources, but discussion of such use is beyond the scope of this document.
This section describes a simple HTTP query protocol for accessing provenance records, and also a mechanism for locating a SPARQL service endpoint [SPARQL-SD]. The HTTP protocol specifies HTTP operations for retrieving provenance records from a provenance query service, following the approach of the SPARQL Graph Store HTTP Protocol [SPARQL-HTTP].
The introduction of query services is motivated by the following possible considerations:
The patterns for using provenance query services are designed around REST principles [REST], which aim to minimize coupling between client and server implementation details.
The query mechanisms provided by a provenance query service are described by a service description, which is obtained by dereferencing a service-URI. A service description may contain information about additional mechanisms that are not described here. In keeping with REST practice for web applications, alternative service descriptions using different formats may be offered and accessed using HTTP content negotiation. We describe below a service description format that uses RDF to describe two query mechanisms.
The general procedure for using a provenance query service is:
The remainder of this section covers the following topics:
Review. Stian suggests recommending use of JSON-LD. I am resisting this because it is clearly allowed by "RDF (in any of its common serializations as determined by HTTP content negotiation)", focusing on a particular format as part of the underlying mechanism seems to go against REST principles, and at this stage it seems that promoting any particular format will draw objections from proponents of other formats. I've taken a different tack, making the text more open about possible service description formats, while specifically presenting a description based on the RDF model.
Dereferencing a service-URI yields a service description. The service description presented here may be supplied as RDF (in any of its common serializations as determined by HTTP content negotiation), and it may contain descriptions of one or more available query mechanisms. Each query mechanism is associated with an RDF type, as explained below. (The presentation here of RDF service descriptions does not preclude use of non-RDF formats selectable by HTTP content negotiation.)
The overall structure of a service description is as follows:
<service-URI> a prov:ServiceDescription ; prov:describesService <direct-query-description>, <sparql-query-description> . <direct-query-description> a prov:DirectQueryService ; prov:provenanceUriTemplate "direct-query-template" . <sparql-query-description> a sd:Service ; sd:endpoint <sparql-query> ; # other details... .
We see here that the service-URI
identifies a resource of type prov:ServiceDescription
, which collects descriptions of one or more provenance query mechanisms. Each associated mechanism is indicated by a prov:describesService
statement.
We expect the presentation of service descriptions to be considered by the W3C Linked Data Platform group (www.w3.org/2012/ldp/); at the time of writing, there is no consensus (cf. message at lists.w3.org/Archives/Public/public-ldp/2012Nov/0036.html and responses). As and when such consensus emerges, we recommend that provenance query service implementers consider adopting it, or at least consider making their implementations compatible with it.
A direct HTTP query service is described by an RDF resource of type prov:DirectQueryService
It allows for accessing provenance about a specified target-URI. The query URI to use is described by a URI Template [URI-template] (level 2 or above) in which which the variable uri
stands for the target-URI; e.g.
@prefix prov: <http://www.w3c.org/ns/prov#> <direct-query-description> a prov:DirectQueryService ; prov:provenanceUriTemplate "query-URI?target={+uri}" .
where query-URI
is the base URI of the direct query service, and direct-query-description
is any distinct RDF subject node (i.e. a blank node or a URI).
The URI template indicated by prov:provenanceUriTemplate
may expand to an absolute or relative URI reference. A URI for the desired provenance record is obtained by expanding the URI template with the variable uri
set to the target-URI for which provenance is requested. In this example, if the target-URI contains '#' or '&' these must be %-escaped as %23
or %26
respectively before template expansion [RFC3986]. If the result is a relative reference, it is interpreted per [RFC3986] (section 5.2) using the URI of the service description as its base URI (which is generally the same as the query service-URI, unless HTTP redirection has been invoked).
A provenance query service MAY recognize additional parameters encoded as part of a URI for the provenance record. If it does, it SHOULD include these in the provenance URI template in the service description, so that clients may discover how a URI is formed using this additional information.
For example, a query service might offer to include just the immediate provenance of a target, or to also supply provenance of other resources from which the target is derived. Suppose a service accepts an additional parameter steps
that defines the number of previous steps to include in a provenance trace, it might publish its service description thus:
<direct-query-description> a prov:DirectQueryService ; prov:provenanceUriTemplate "http://www.example.com/provenance/service?target={+uri}{&steps}" .
which might result in an HTTP query for provenance information that looks like this:
GET http://example.com/provenance/service?target=http://www.example.com/entity&steps=2 HTTP/1.1
(Note that in this case, a "level 3" URI template feature is used [URI-template].)
A SPARQL query service is described by an RDF resource of type sd:Service
[SPARQL-SD].
It allows for accessing provenance information using a SPARQL query, which may be constructed to retrieve provenance for a particular resource, or for multiple resources. The query may be formulated using the PROV-O vocabulary terms [PROV-O], and others supported by the SPARQL endpoint as appropriate.
The SPARQL query service description is constructed as defined by SPARQL 1.1 Service Description [SPARQL-SD]; e.g.
sparql-query-description a sd:Service ; sd:endpoint <query-URI/sparql/> ; sd:supportedLanguage sd:SPARQL11Query .
where query-URI
is the base URI of the provenance query service, and sparql-query-description
is any distinct RDF subject node (i.e. a blank node or a URI).
The SPARQL service description may be detailed or sparse, provided that it includes at a minimum the following:
sparql-query-description a sd:Service ; sd:endpoint <(SPARQL service endpoint URI reference)> .
The endpoint may be given as an absolute or relative URI reference. If a relative reference is given, it is interpreted in the normal way for the RDF format used, which will commonly be relative to the URI of the service document itself.
The following service description example uses Turtle [TURTLE] syntax to describe both direct HTTP and SPARQL query services:
@prefix prov: <http://www.w3c.org/ns/prov#> @prefix dcterms: <http://purl.org/dc/terms/> @prefix foaf: <http://xmlns.com/foaf/0.1/> @prefix sd: <http://www.w3.org/ns/sparql-service-description#> <> a prov:ServiceDescription ; prov:describesService <#direct>, <#sparql> ; dcterms:publisher <#us> . <#us> a foaf:Organization ; foaf:name "and not a service!" . <#direct> a prov:DirectQueryService ; prov:provenanceUriTemplate "/direct?target={+uri}" . <#sparql> a sd:Service ; sd:endpoint </sparql/> ; sd:supportedLanguage sd:SPARQL11Query ; sd:resultFormat <http://www.w3.org/ns/formats/RDF_XML> , <http://www.w3.org/ns/formats/Turtle> , <http://www.w3.org/ns/formats/SPARQL_Results_XML> , <http://www.w3.org/ns/formats/SPARQL_Results_JSON> , <http://www.w3.org/ns/formats/SPARQL_Results_CSV> , <http://www.w3.org/ns/formats/SPARQL_Results_TSV> .
This protocol combines the target-URI with a supplied URI template to formulate an HTTP GET request.
Thus, if the URI template extracted from the service description is http://example.com/provenance/service?target={uri}
and the supplied target-URI is http://www.example.com/entity123
, the resulting HTTP request would be:
GET /provenance/service?target=http%3A%2F%2Fwww.example.com%2Fentity123 HTTP/1.1 Host: example.com
Any server that implements this protocol and receives a request URI in this form SHOULD return a provenance record for the target-URI embedded in the query component, where that URI is the result of percent-decoding [RFC3986] the part of the request URI corresponding to {var}
in the URI template. E.g., in the above example, the decoded target-URI is http://www.example.com/entity123
. The target-URI MUST be an absolute URI, and the server SHOULD respond with 400 Bad Request
if it is not.
A server SHOULD NOT offer a template containing {+uri}
or other non-simple variable expansion options [URI-template] unless all valid target-URIs for which it can provide provenance do not contain problematic characters like '#'
or '&'
.
The defined URI template expansion process [URI-template] generally takes care of %-escaping characters that are not permitted in URIs. However, when expanding a template with {+uri}
, some permitted characters such as '#'
and '&'
are not escaped. If the supplied target-URI contains these characters, then they may disrupt interpretation of the resulting query URI. To prevent this, '#'
and '&'
characters in the target-URI may be replaced with %23
and %26
respectively, before performing the URI template expansion. An alternative, simpler and more reliable approach is to use {uri}
in the URI template string, which will cause all URI-reserved characters to be %-escaped as part of the URI-template expansion, as in the example above.
If the provenance described by the request is unknown to the server, a suitable error response code SHOULD be returned. In the absence of any security of privacy concerns about the resource, that might be 404 Not Found
. But if the existence or non-existence of a resource is considered private or sensitive, an authorization failure or other error response may be returned.
The direct HTTP query service may return provenance in any available format.
For interoperable provenance publication, use of the PROV-O vocabulary [PROV-O] represented in a standardized RDF format is recommended. Where alternative formats are available, selection may be made by content negotiation, using Accept:
header fields in the HTTP request.
Services MUST identify the Content-Type
of the provenance returned.
Additional URI query parameters may be used as indicated by the service description in section 4.1.1 Direct HTTP query service description.
Previously, section 3. Locating provenance records has described use of HTTP Link:
header fields, HTML <link>
elements and RDF statements to indicate provenance query services. Beyond that, this specification does not define any specific mechanism for discovering query services. Applications may use any appropriate mechanism, including but not limited to: prior configuration, search engines, service registries, etc.
To facilitate service discovery, we recommend that RDF publication of dataset and service descriptions use the property prov:has_query_service
and the provenance service type prov:ServiceDescription
as appropriate (see the appendix section B. ).
For example, a VoID description [VoID] of a dataset might indicate a provenance query service providing information about that dataset:
<http://example.org/dataset/> a void:Dataset ; prov:has_query_service <http://example.org/provenance/> .
The RDF service description example in section 4.1.3 Service description example shows use of the prov:ServiceDescription
type.
REVIEW. This section describes an "at-risk" feature whose final inclusion in this document is undecided. Does the use of a "ping-back" for discovering forward provenance fall under the remit of "provenance access and query"? Is it a useful feature to define?
This section describes a discovery mechanism for forward provenance; i.e. provenance describing how a resource is used after it has been created .
The mechanisms discussed in previous sections are primarily concerned with access to historical provenance, dealing with questions such as:
These questions can be turned around to consider a publisher's forward-looking use of a resource, like:
The ability to answer forward-looking questions requires some cooperation among the parties who use a resource; for example, a consumer could report use directly to the publisher, or a search engine could discover and report such downstream resource usage. To facilitate such cooperation, a publisher of a resource may implement a "ping-back" capability.
A resource may have an associated "ping-back" URI which can be presented with references to provenance about the resource. The ping-back URI is associated with a resource using mechanisms similar to those used for presenting a provenance-URI, but using a pingback
link relation instead of has_provenance
. A consumer of the resource, or some other system, may perform an HTTP POST operation to the pingback URI, with a request body containing a list of provenance-URIs for provenance records describing uses of the resource.
For example, consider a resource that is published by acme.example.com
, and is subsequently used by wile-e.example.org
in the construction of some new entity; we might see an exchange along the following lines. We start with wile-e.example.org
retrieving a copy of acme.example.org
's resource:
C: GET http://acme.example.org/super-widget HTTP/1.1 S: 200 OK S: Link: <http://acme.example.org/super-widget/provenance>; rel=http://www.w3.org/ns/prov#has_provenance S: Link: <http://acme.example.org/super-widget/pingback>; rel=http://www.w3.org/ns/prov#pingback : (super-widget resource data)
The first of the links in the response is a has_provenance
link with a provenance-URI that has been described previously (section 3.1 Resource accessed by HTTP). The second is a distinct resource that exists to receive provenance pingbacks. Later, when a new resource has been created or action performed based upon the acme.example.org/super-widget
, a client MAY post a pingback request to any supplied pingback
URI:
C: POST http://acme.example.org/super-widget/pingback HTTP/1.1 C: Content-Type: text/uri-list C: C: http://wile-e.example.org/contraption/provenance C: http://wile-e.example.org/another/provenance S: 204 No Content S: Link: <http://acme.example.org/super-widget/provenance>; rel=http://www.w3.org/ns/prov#has_provenance; anchor="http://acme.example.org/super-widget"
The pingback request supplies a list of provenance-URIs from which forward provenance may be retrieved. The pingback service may do as it chooses with these URIs; e.g., it may choose to save them for later use, to retrieve the associated provenance and save that, to publish the URIs along with other provenance information about the original entity to which they relate, or even to ignore them.
The client MAY further supply has_query_service
links indicating provenance query services that can describe the target-URI. The anchor MUST be included, and SHOULD be the target-URI of the resource to which this pingback service belongs, or some related resource with relevant provenance.
C: POST http://acme.example.org/super-widget/pingback HTTP/1.1 C: Link: <http://wile-e.example.org/sparql>; rel="http://www.w3.org/ns/prov#has_query_service"; anchor="http://acme.example.org/super-widget" C: Content-Type: text/uri-list C: Content-Length: 0 C: S: 204 No Content S: Link: <http://acme.example.org/super-widget/provenance>; rel=http://www.w3.org/ns/prov#has_provenance; anchor="http://acme.example.org/super-widget"
In the above example, the client did not submit any provenance-URIs and the URI list is therefore empty.
The client MAY similarly include has_provenance
links to specify provenance records with a different anchor. The provenance-URIs of those headers SHOULD also be included in the content if the POSTed Content-type is text/uri-list
.
Does this SHOULD requirement serve any useful purpose?
There is no required information in the server response to a pingback POST request.
In the examples above, the pingback service responds with an empty response body, and links to provenance for the original resource.
(Note that the Link:
header returned contains an explicit anchor
parameter with the URI of the original resource; without this, the link would relate the indicated URI to the pingback URI http://acme.example.org/super-widget/pingback
rather than the original resource.)
The only defined operation on a pingback-URI is POST, which supplies links to provenance information or services as described above. A pingback-URI MAY respond to other requests, but no requirements are imposed on how it responds. In particular, it is not specified here how a pingback resource should respond to an HTTP GET request. This leaves open a possibility that the pingback resource MAY have the same URI as the original resource, provided that the original does not respond to POST in some different way.
Provenance is central to establishing trust in data. If provenance is corrupted, it may lead agents (human or software) to draw inappropriate and possibly harmful conclusions. Therefore, care is needed to ensure that the integrity of provenance is maintained. Just as provenance can help determine a level of trust in some information, a provenance record related to the provenance itself ("provenance of provenance") can help determine trust in the provenance.
Secure HTTP (https) SHOULD be used across unsecured networks when accessing provenance that may be used as a basis for trust decisions, or to obtain a provenance URI for same.
When retrieving a provenance URI from a document, steps SHOULD be taken to ensure the document itself is an accurate copy of the original whose author is being trusted (e.g. signature checking, or use of a trusted secure web service). (See also section 1.3 Interpreting provenance records.)
Provenance may present a route for leakage of privacy-related information, combining as it does a diversity of information types with possible personally-identifying information; e.g. editing timestamps may provide clues to the working patterns of document editors, or derivation traces might indicate access to sensitive materials. In particular, note that the fact that a resource is openly accessible does not mean that its provenance should also be. When publishing provenance, its sensitivity SHOULD be considered and appropriate access controls applied where necessary. When a provenance-aware publishing service accepts some resource for publication, the contributors SHOULD have some opportunity to review and correct or conceal any provenance that they don't wish to be exposed. Provenance management systems SHOULD embody mechanisms for enforcement and auditing of privacy policies as they apply to provenance.
Provenance may be used by audits to establish accountability for information use [INFO-ACC] and to verify use of proper processes in information processing activities. Thus, provenance management systems can provide mechanisms to support auditing and enforcement of information handling policies. In such cases, provenance itself may be a valuable target for attack by malicious agents, and care must be taken to ensure it is stored securely and in a fashion that resists attempts to tamper with it.
The pingback service described in section 5. Forward provenance might be abused for "link spamming" (similar to the way that weblog ping-backs have been used to direct viewers to spam sites). As with many such services, an application needs to find a balance between maintaining ease of submission for useful information and blocking unwanted information. We have no easy solutions for this problem, and the caveats noted above about establishing integrity of provenance records apply similarly to information provided by ping-back calls.
When clients and servers are retrieving submitted URIs such as provenance descriptions and following or registering links; reasonable care should be taken to prevent malicious use such as distributed denial of service attacks (DDoS), cross-site request forgery (CSRF), spamming and hosting of inappropriate materials. Reasonable preventions might include same-origin policy, HTTP authorization, SSL, rate-limiting, spam filters, moderation queues, user acknowledgements and validation. It is out of scope for this document to specify how such mechanisms work and should be applied.
Is CSRF a real threat here? How?
Accessing provenance services might reveal to the service and third-parties information which is considered private, including which resources a client has taken interest in. For instance, a browser extension which collects all provenance data for a resource which is being saved to the local disk, could be revealing user interest in a sensitive resource to a third-party site listed by prov:has_provenance
or prov:has_query_service
relation. A detailed query submitted to a third-party provenance query service might be revealing personal information such as social security numbers. Accordingly, user agents in particular SHOULD NOT follow provenance and provenance service links without first obtaining the user's explicit permission to do so.
The editors acknowledge the contribution and review from members of the W3C Provenance working group for their feedback throughout the development of this specification.
The provenance query service description and forward provenance specifications are substantially based on proposals by Stian Soiland-Reyes (University of Manchester).
Thanks to Robin Berjon for making our lives easier with his ReSpec tool.
Possible renaming of service description relations to lowercase-only forms?
This specification defines the following additional names in the provenance namespace with URI http://www.w3.org/ns/prov#.
Name | Description | Definition ref |
---|---|---|
ServiceDescription |
Type for a generic provenance query service. Mainly for use in RDF provenance query service descriptions, to facilitate discovery in linked data environments. | section 4.3 Provenance query service discovery |
DirectQueryService |
Type for a direct HTTP query service description. Mainly for use in RDF provenance query service descriptions, to distinguish direct HTTP query service descriptions from other query service descriptions. | section 4.1.1 Direct HTTP query service description |
has_anchor |
Indicates a target-URI for an resource, used by an associated provenance record. | section 3.2 Resource represented as HTML, section 3.3 Resource represented as RDF |
has_provenance |
Indicates a provenance-URI for a resource; the resource identified by this property presents a provenance record about its subject or anchor resource. | section 3.1 Resource accessed by HTTP, section 3.2 Resource represented as HTML |
has_query_service |
Indicates a provenance query service that can access provenance related to its subject or anchor resource. | section 3.1.1 Specifying Provenance Query Services |
describesService |
relates a generic provenance query service resource (type prov:ServiceDescription ) to a specific query service description (e.g. a prov:DirectQueryService or a sd:Service ). |
section 4.1 Provenance query service description |
provenanceUriTemplate |
Indicates a URI template string for constructing provenance-URIs | section 4.1.1 Direct HTTP query service description |
pingback |
Relates a resource to a provenance pingback service that may receive forward provenance links about the resource. | section 5. Forward provenance |
The ontology describing these terms is at paq/prov-aq.ttl or paq/prov-aq.owl
Update when location and copy finalized.
Always update copy of mercurial change log. Below are changes since 19 June.