--- a/paq/provenance-access.html Thu Aug 04 13:38:07 2011 +0100
+++ b/paq/provenance-access.html Thu Aug 04 13:38:27 2011 +0100
@@ -2,6 +2,7 @@
<html>
<head>
<title>Provenance Access and Query</title>
+ <link rel="stylesheet" type="text/css" href="css/paq.css" />
<meta http-equiv='Content-Type' content='text/html;charset=utf-8'/>
<!-- Use common W3C-hosted version of ReSpec.js:
-->
@@ -118,151 +119,152 @@
</section>
<section>
- <h2>Locating provenance data</h2>
+ <h2>Locating provenance</h2>
<p>
- On the presumption that provenance data is a resource that can be accessed using normal web retrieval, one needs to know a URI to dereference. The provenance URI may be known in advance, in which case there is nothing more to specify. If the provenance URI is not known, then a mechanism to discover a provenance URI must be based on some information that is available to the would-be accessor. We also wish to allow that provenance information could be provided by parties other than the provider of the original resource. Indeed, provenance data for a resource may be provided by several different parties, each with different concerns.
+ On the presumption that provenance data is a resource that can be accessed using normal web retrieval, one needs to know a URI to dereference. The provenance URI may be known in advance, in which case there is nothing more to specify. If the provenance URI is not known, then a mechanism to discover a provenance URI must be based on some information that is available to the would-be accessor. We also wish to allow that provenance information could be provided by parties other than the provider of the original resource. Indeed, provenance data for a resource may be provided by several different parties, at different URIs, each with different concerns. (It is quite possible that contradictory provenance may be provided by different parties.)
</p>
<p>
- We start by considering mechanisms for the resource provider to also indicate a provenance URI. Because the resource provider controls the response when the resource is accessed, this allows for direct indication of a provenance URI. Three mechanisms are described here:
+ We start by considering mechanisms for the resource provider to also indicate a provenance URI referring to the provenance of a provided or indicated resource. Because the resource provider controls the response when the resource is accessed, this allows for direct indication of a provenance URI. Three mechanisms are described here:
<ul>
<li>The requester knows the resource URI <em>and</em> the resource is accessed using HTTP</li>
- <li>The requester has a copy of a resource representation as HTML or XHTML</li>
- <li>The requester has a copy of a resource representation as RDF (including the range of possible RDF syntaxes, such as HTML with embedded RDFa)</li>
+ <li>The requester has a copy of a resource presented as HTML or XHTML</li>
+ <li>The requester has a copy of a resource presented as RDF (including the range of possible RDF syntaxes, such as HTML with embedded RDFa)</li>
</ul>
These particular cases are selected as corresponding to primary current web protocol and data formats.
</p>
- <section>
- <h2>Resource accessed by HTTP</h2>
- <p>
- For a document accessible using HTTP, [[POWDER-DR]] describes <a href="http://www.w3.org/TR/2009/REC-powder-dr-20090901/#httplink">a mechanism</a> for associating metadata with a resource by adding an HTTP <code>Link</code> header field to the HTTP response to a GET or HEAD operation (other HTTP operations are not excluded, but are not considered here). Since the POWDER specification was published, the HTTP linking draft has been approved by the IETF as <a href="http://tools.ietf.org/html/rfc5988">RFC 5988</a> [[LINK-REL]].
- </p>
- <p>
- The same basic mechanism can be used for referencing provence data, for which a new link relation type is registered according to the template in <a href="#iana-considerations" class="sectionRef"></a>:
- <code>
- <pre class="example">
- Link: <cite>provenance-URI</cite>; rel="provenance"
- </pre>
- </code>
- When used in conjunction with an HTTP success response code (<code>2xx</code>), this HTTP header indicates that <code><cite>provenance-URI</cite></code> is the URI of a provenance resource for which information is returned. At this time, the meaning of provenance links returned with other HTTP response codes is not defined: future revisions of this specification may define interpretations for these.
- </p>
- <p>
- An HTTP response MAY include multiple provenance link headers, indicating a number of different resources that are known to the responding server, each providing provenance about the accessed resource.
- </p>
- <p>
- The presence of a provenance link in an HTTP response does not preclude the possibility that other publishers may offer provenance information about the same resource. In such cases, discovery of the additional provenance information must use other means (e.g. see <a href="#third-party-services" class="sectionRef"></a>).
- </p>
-
- <section>
- <h2>Open issues</h2>
- <p>
- Are the provenance resources indicated in this way to be considered authoritative? I.e. if the client trusts information returned by the server (e.g. is prepared to act on inferences based on the returned data), should it also trust the provenance data, or should trust in the linked provenance data be determined separately? If the linked data <em>is</em> to be trusted, then the data from multiple linked provenance resources MUST be consistent if it is to be meaningful. I favour an approach whereby trust in the provenance resources is established independently, which is similar to the situation for any other resource; e.g. based on the domain that serves it, or an associated digital signature.
- </p>
- </section>
-
- </section>
+ <section>
+ <h2>Resource accessed by HTTP</h2>
+ <p>
+ For a document accessible using HTTP, POWDER [[POWDER-DR]] describes <a href="http://www.w3.org/TR/2009/REC-powder-dr-20090901/#httplink">a mechanism</a> for associating metadata with a resource using an HTTP <code>Link</code> header field. The <code>Link</code> header field is included in the HTTP response to a GET or HEAD operation (other HTTP operations are not excluded, but are not considered here). Since the POWDER specification was published, the HTTP linking draft has been approved by the IETF as <a href="http://tools.ietf.org/html/rfc5988">RFC 5988</a> [[LINK-REL]].
+ </p>
+ <p>
+ The same basic mechanism can be used for referencing provenance data, for which a new link relation type is registered according to the template in <a href="#iana-considerations" class="sectionRef"></a>:
+ <code>
+ <pre class="pattern">
+ Link: <cite>provenance-URI</cite>; rel="provenance"
+ </pre>
+ </code>
+ When used in conjunction with an HTTP success response code (<code>2xx</code>), this HTTP header indicates that <code><cite>provenance-URI</cite></code> is the URI of some provenance for the requested resource. At this time, the meaning of provenance links returned with other HTTP response codes is not defined: future revisions of this specification may define interpretations for these.
+ </p>
+ <p>
+ An HTTP response MAY include multiple provenance link headers, indicating a number of different resources that are known to the responding server, each providing provenance about the accessed resource.
+ </p>
+ <p>
+ The presence of a provenance link in an HTTP response does not preclude the possibility that other publishers may offer provenance information about the same resource. In such cases, discovery of the additional provenance information must use other means (e.g. see <a href="#third-party-services" class="sectionRef"></a>).
+ </p>
<section>
- <h2>Resource presented as HTML</h2>
- <p>
- For a document presented as HTML or XHTML, without regard for how it has been obtained, [[POWDER-DR]] describes <a href="http://www.w3.org/TR/2009/REC-powder-dr-20090901/#assoc-markup">a mechanism</a> for associating metadata with a resource by adding a <code><Link></code> element to the HTML <code><head></code> section.
- </p>
+ <h2>Open issues</h2>
<p>
- The same basic mechanism can be used for referencing provence data, for which a new link relation type is registered according to the template in <a href="#iana-considerations" class="sectionRef"></a>:
- <code>
- <pre class="example">
- <html xmlns="http://www.w3.org/1999/xhtml">
- <head>
- <meta name="wdr.issuedby" content="http://authority.example.org/company.rdf#me"/>
- <link rel="provenance" href="<cite>provenance-URI</cite>">
- <title>Welcome to example.com </title>
- </head>
- <body>
- ...
- </body>
- </html>
- </pre>
- </code>
- This element indicates that <code><cite>provenance-URI</cite></code> is the URI of a provenance resource for the containing document.
- </p>
- <p>
- An HTML document header MAY include multiple provenance link elements, indicating a number of different resources that are known to the creator of the document, each providing provenance about the document resource.
- </p>
- <p>
- See in particular <a href="http://tools.ietf.org/html/rfc5988#appendix-A">Appendix A. Notes on Using the Link Header with the HTML4 Format</a> of RFC5988 for further notes about using link relation types in HTML.
- </p>
- <p class="note">
- An alternative option would be to use an HTML <code><meta></code> element to present provenance links. The <code><Link></code> is preferred as it reflects more closely the intended goal, and has been defined with somewhat consistent applicability across HTTP, HTML and potentially RDF data. A specification to use <code><meta></code> for this would miss this opportunity to build on the existing specification and registry.
- </p>
- <section>
- <h2>Open issues</h2>
- <p>
- @@TODO -
- The POWDER specification also adds: Documents MAY also include any of the attribution data from the POWDER document in meta tags. In particular, the issuedby field is likely to be useful to user agents deciding whether or not to fetch the full POWDER document. Any attribution data encoded in meta tags within an HTML document should be the same as that in the POWDER document. In case of discrepancy, the POWDER document should be taken as more authoritative. Is there a parallel we should add here for provenance?
- </p>
- </section>
-
+ Are the provenance resources indicated in this way to be considered authoritative? I.e. if the client trusts information returned by the server (e.g. is prepared to act on inferences based on the returned data), should it also trust the provenance data, or should trust in the linked provenance data be determined separately? If the linked data <em>is</em> to be trusted, then the data from multiple linked provenance resources MUST be consistent if it is to be meaningful. I favour an approach whereby trust in the provenance resources is established independently, which is similar to the situation for any other resource; e.g. based on the domain that serves it, or an associated digital signature.
</p>
</section>
+ </section>
+
+ <section>
+ <h2>Resource presented as HTML</h2>
+ <p>
+ For a document presented as HTML or XHTML, without regard for how it has been obtained, POWDER [[POWDER-DR]] describes <a href="http://www.w3.org/TR/2009/REC-powder-dr-20090901/#assoc-markup">a mechanism</a> for associating metadata with a resource by adding a <code><Link></code> element to the HTML <code><head></code> section.
+ </p>
+ <p>
+ The same basic mechanism can be used for referencing provence data, for which a new link relation type is registered according to the template in <a href="#iana-considerations" class="sectionRef"></a>:
+ <code>
+ <pre class="pattern">
+ <html xmlns="http://www.w3.org/1999/xhtml">
+ <head>
+ <meta name="wdr.issuedby" content="http://authority.example.org/company.rdf#me"/>
+ <link rel="provenance" href="<cite>provenance-URI</cite>">
+ <title>Welcome to example.com </title>
+ </head>
+ <body>
+ ...
+ </body>
+ </html>
+ </pre>
+ </code>
+ This element indicates that <code><cite>provenance-URI</cite></code> is the URI of a provenance resource for the containing document.
+ </p>
+ <p>
+ An HTML document header MAY include multiple provenance link elements, indicating a number of different resources that are known to the creator of the document, each providing provenance about the document resource.
+ </p>
+ <p>
+ See in particular <a href="http://tools.ietf.org/html/rfc5988#appendix-A">Appendix A. Notes on Using the Link Header with the HTML4 Format</a> of RFC5988 for further notes about using link relation types in HTML.
+ </p>
+ <p class="note">
+ An alternative option would be to use an HTML <code><meta></code> element to present provenance links. The <code><Link></code> is preferred as it reflects more closely the intended goal, and has been defined with somewhat consistent applicability across HTTP, HTML and potentially RDF data. A specification to use <code><meta></code> for this would miss this opportunity to build on the existing specification and registry.
+ </p>
<section>
- <h2>Resource presented as RDF</h2>
- <p>
- If a resource is presented as RDF (in any of its recognized syntaxes, including RDFa), it may contain references to its own provenance using additional RDF statements.
- </p>
+ <h2>Open issues</h2>
<p>
- For this purpose a new RDF property, <code>prov:hasProvenance</code>, is defined as a relation between two resources, where the object of the property is a resource that provides provenance data about the subject resource. Multiple <code>prov:hasProvenance</code> assertions may be made about a subject resource.
- </p>
- <p>
- @@TODO: example
- </p>
- <p>
- @@TODO: document namespace. Check naming style. Use provenance model namespace? Define as part of model?
+ @@TODO -
+ The POWDER specification also adds: Documents MAY also include any of the attribution data from the POWDER document in meta tags. In particular, the issuedby field is likely to be useful to user agents deciding whether or not to fetch the full POWDER document. Any attribution data encoded in meta tags within an HTML document should be the same as that in the POWDER document. In case of discrepancy, the POWDER document should be taken as more authoritative. Is there a parallel we should add here for provenance?
</p>
</section>
+ </section>
- <section>
- <h2>Third party services</h2>
- <p>
- The mechanisms for provenance discovery described above have all assumed the provenance URI is being supplied by the provider of the original resource. Where provenance information is provided by a third party without any cooperation from the original resource provider, the provenance link cannot be provided through the same channels as the original resource, and a different approach must be considered.
- </p>
- <p>
- We assume that the application or person requesting provenance information has the URI or other unique identification of the resource for which provenance is required, and also has a URI for a third-party service that provides a provenance information service. The nature of this third party service is an implementation choice, to be agreed between provider and users of the service, but for ease of interoperation we recommend use of SPARQL [[RDF-SPARQL-PROTOCOL]] [[RDF-SPARQL-QUERY]]. The third party service URI would then be the URI of a SPARQL endpoint (or, to use the SPARQL specification language, a <a href="http://www.w3.org/TR/rdf-sparql-protocol/#conformant-sparql-protocol-service">SPARQL protocol service</a>) which is queried for the desired provenance information.
- </p>
- <p>
- If the requester has a URI for the original resource, they simple issue a simple SPARQL query for the URI(s) of any associated provenance data; e.g., if the original resource has URI <code>http://example.org/resource</code>,
- <code>
- <pre class="example">
- @prefix prov: <@@TBD>
- SELECT ?provenance_uri WHERE
- {
- <http://example.org/resource> prov:hasProvenance ?provenance_uri
- }
- </pre>
- </code>
- </p>
- <p class="issue">
- @@TODO: specific provenance property to be determined by the model specification?
- </p>
- <p>
- If the requester has identifying information that is not the URI of the original resource, then they will need to construct a more elaborate query to locate the target resource and obtain its provenance URI(s). The nature of identifying information that can be used in this way will depend upon the third party service used, further definition of which is out of scope for this specification. For example, a query for a document identified by a DOI, say <code>1234.5678</code>, might look like this:
- <code>
- <pre class="example">
- @prefix prov: <@@TBD>
- @prefix idservice: <@@TBD>
- SELECT ?provenance_uri WHERE
- {
- [ idservice:hasDOI "1234.5678" ] prov:hasProvenance ?provenance_uri
- }
- </pre>
- </code>
- </p>
- <p>
- The mechanisms described here focus on finding the URI(s) for provenance information. Below, <a href="#querying-provenance-data" class="sectionRef"></a> will consider access to provenance information for which there is no separate URI.
- </p>
- </section>
+ <section>
+ <h2>Resource presented as RDF</h2>
+ <p>
+ If a resource is presented as RDF (in any of its recognized syntaxes, including RDFa), it may contain references to its own provenance using additional RDF statements.
+ </p>
+ <p>
+ For this purpose a new RDF property, <code>prov:hasProvenance</code>, is defined as a relation between two resources, where the object of the property is a resource that provides provenance data about the subject resource. Multiple <code>prov:hasProvenance</code> assertions may be made about a subject resource.
+ </p>
+ <p>
+ @@TODO: example
+ </p>
+ <p>
+ @@TODO: document namespace. Check naming style. Use provenance model namespace? Define as part of model?
+ </p>
+ </section>
</section>
+
+ <section>
+ <h2>Third party services</h2>
+ <p>
+ The mechanisms for provenance discovery described above have all assumed the provenance URI is being supplied by the provider of the original resource. Where provenance information is provided by a third party without any cooperation from the original resource provider, the provenance link cannot be provided through the same channels as the original resource, and a alternative approaches must be considered.
+ </p>
+ <p>
+ We assume that the application or person requesting provenance information has the URI or other unique identification of the resource for which provenance is required, and also has a URI for a third-party provenance service.
+ </p>
+ <p>
+ The nature of a third party provenance service is an implementation choice, to be agreed between provider and users of the service, but for ease of interoperability between different providers and users we recommend use of SPARQL [[RDF-SPARQL-PROTOCOL]] [[RDF-SPARQL-QUERY]]. The third party service URI would then be the URI of a SPARQL endpoint (or, to use the SPARQL specification language, a <a href="http://www.w3.org/TR/rdf-sparql-protocol/#conformant-sparql-protocol-service">SPARQL protocol service</a>) which is queried for the desired provenance information.
+ </p>
+ <p>
+ If the requester has a URI for the original resource, they simple issue a simple SPARQL query for the URI(s) of any associated provenance data; e.g., if the original resource has URI <code>http://example.org/resource</code>,
+ <code>
+ <pre class="example">
+ @prefix prov: <@@TBD>
+ SELECT ?provenance_uri WHERE
+ {
+ <http://example.org/resource> prov:hasProvenance ?provenance_uri
+ }
+ </pre>
+ </code>
+ </p>
+ <p class="issue">
+ @@TODO: specific provenance property to be determined by the model specification?
+ </p>
+ <p>
+ If the requester has identifying information that is not the URI of the original resource, then they will need to construct a more elaborate query to locate the target resource and obtain its provenance URI(s). The nature of identifying information that can be used in this way will depend upon the third party service used, further definition of which is out of scope for this specification. For example, a query for a document identified by a DOI, say <code>1234.5678</code>, might look like this:
+ <code>
+ <pre class="example">
+ @prefix prov: <@@TBD>
+ @prefix idservice: <@@TBD>
+ SELECT ?provenance_uri WHERE
+ {
+ [ idservice:hasDOI "1234.5678" ] prov:hasProvenance ?provenance_uri
+ }
+ </pre>
+ </code>
+ </p>
+ <p>
+ The mechanisms described here focus on finding the URI(s) for provenance information. Below, <a href="#querying-provenance-data" class="sectionRef"></a> will consider access to provenance information for which there is no separate URI.
+ </p>
+ </section>
<section>
<h2>Querying provenance data</h2>
@@ -309,8 +311,8 @@
<dt>Application Data:</dt>
<dd>
...
- </td>
- </table>
+ </dd>
+ </dl>
</p>
</section>
@@ -386,7 +388,7 @@
Packaging formats along the lines of those used for shipping Java web applications or (basically, a ZIP file with a manifest and some imposed structure)
</li>
<li>
- Ongoing work in the research community (e.g. ...) to encapsulate data, code, annotations and metadata into a common exchangeable format.
+ Ongoing work in the research community (e.g. <a href="http://eprints.ecs.soton.ac.uk/21587/">Why Linked Data is Not Enough for Scientists</a>, ePub, etc.) to encapsulate data, code, annotations and metadata into a common exchangeable format.
</li>
</ul>
</p>