Re-worked section on provenance querying
authorGraham Klyne
Fri, 05 Aug 2011 12:49:53 +0100
changeset 131 88ad941062fb
parent 130 893a8d76d3da
child 132 2b6cb01bb25f
Re-worked section on provenance querying
paq/provenance-access.html
--- a/paq/provenance-access.html	Fri Aug 05 12:45:40 2011 +0100
+++ b/paq/provenance-access.html	Fri Aug 05 12:49:53 2011 +0100
@@ -19,6 +19,8 @@
         "LINK-REL" : "M. Nottingham, <a href='http://www.ietf.org/rfc/rfc5988.txt'><cite>Web Linking</cite></a>, October 2010, Internet RFC 5988. URL: <a href='http://www.ietf.org/rfc/rfc5988.txt'>http://www.ietf.org/rfc/rfc5988.txt</a>",
         "RFC2387" : "E. Levinson. <a href=\"http://www.ietf.org/rfc/rfc2387.txt\"><cite>The MIME Multipart/Related Content-type.</cite></a> August 1998. Internet RFC 2387. URL: <a href=\"http://www.ietf.org/rfc/rfc2387.txt\">http://www.ietf.org/rfc/rfc2387.txt</a>",
         "RFC3297" : "G. Klyne; R. Iwazaki; D. Crocker. <a href=\"http://www.ietf.org/rfc/rfc3297.txt\"><cite>Content Negotiation for Messaging Services based on Email.</cite></a> July 2002. Internet RFC 3297. URL: <a href=\"http://www.ietf.org/rfc/rfc3297.txt\">http://www.ietf.org/rfc/rfc3297.txt</a>",
+        "PRISM" : "International Digital Enterprise Alliance, Inc. <a href=\"http://www.prismstandard.org/specifications/2.0/PRISM_prism_namespace_2.0.pdf\"><cite>PRISM: Publishing Requirements for Industry Standard Metadata</cite></a>. February 2008. URL: <a href=\"http://www.prismstandard.org/specifications/2.0/PRISM_prism_namespace_2.0.pdf\">http://www.prismstandard.org/specifications/2.0/PRISM_prism_namespace_2.0.pdf</a>",
+        "FABIO" : "D. Shotton; S. Peroni. <a href=\"http://speroni.web.cs.unibo.it/cgi-bin/lode/req.py?req=http:/purl.org/spar/fabio#namespacedeclarations\"><cite>FaBiO, the FRBR-aligned Bibliographic Ontology.</cite></a> June 2011. URL: <a href=\"http://speroni.web.cs.unibo.it/cgi-bin/lode/req.py?req=http:/purl.org/spar/fabio#namespacedeclarations\">http://speroni.web.cs.unibo.it/cgi-bin/lode/req.py?req=http:/purl.org/spar/fabio#namespacedeclarations</a>",
       };
       var respecConfig = {
           // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
@@ -108,7 +110,7 @@
         A general expectation is that web applications may access provenance information in the same way as any web resource, by dereferencing its URI. Typically, this will be by performing an HTTP GET operation. Thus, any provenance information may be associated with a URI, and may be accessed by dereferencing that URI using normal web mechanisms.
       </p>
       <p>
-        The problem of accessing some required provenance information then reduces to the problem of finding its URI, which is dealt with separately in section <a href="#locating-provenance-data" class="sectionRef"></a>.
+        The problem of accessing some required provenance information then reduces to the problem of finding its URI, which is dealt with separately in section <a href="#locating-provenance" class="sectionRef"></a>.
       </p>
       <p>
         This specification thus RECOMMENDS that if a publisher wishes to make provenance information available, it is published as a normal web resource, and provision is made for the URI of the provenance to be discoverable.
@@ -281,7 +283,7 @@
         <pre class="example">
 <code>http://example.net/provenance-discovery?uri=http://example.info/qdata/&amp;type=application/json</code>
         </pre>
-        <p class="issue">
+        <p class="pending">
           SameAs.org also provides URIs for directly accessing the different result formats without content negotiation, by appending an extra segment to the SameAs.org service URI.  I'm reluctant to suggest this mechanism for a service with separately specified base URI, hence using the additional query parameter.
         </p>
       </section>
@@ -364,10 +366,12 @@
           <dl>
             <dt><code>200 OK</code></dt>
             <dd>Provenance URI(s) returned; see above.</dd>
-            <dt><code>204 No data</code></dt>
-            <dd>The request was valid, but the server is returning no provenance URIs *(either because it does not know of any provenance URIs, or possibly because it declines to provide the information for policy reasons.</dd>
+            <dt><code>204 No content</code></dt>
+            <dd>The request was valid, but the server is returning no provenance URIs (either because no provenance URIs are known, or possibly because provision of provenance is restricted for policy reasons. The entity body part of the HTTP response message MUST be empty [[HTTP11]].</dd>
+            <!--
             <dt><code>...</code></dt>
             <dd>...</dd>
+            -->
           </dl>
         </p>
       </section>
@@ -376,8 +380,8 @@
     
     <section>
       <h2>Querying provenance</h2>
-      <p class="issue">
-        This section is work in progress, incomplete, not ready for review.
+      <p class="pending">
+        This section proposes use of SPARQL queries to address requirements that are not covered by the simple retrieval and discovery services proposed above. 
       </p>
       <p>
         There are circumstances where simply identifying and retrieving provenance as a web resource may not best fit the requirements of a particular application or service, e.g.:
@@ -390,47 +394,87 @@
         </ul>
       </p>
       <p>
-        For such circumstances, a provenance query service provides an alternative way 
-      </p>
-      <p>
-        The nature of a provenance query ... service is an implementation choice, to be agreed between provider and users of the service, but for ease of interoperability between different providers and users we recommend use of SPARQL [[RDF-SPARQL-PROTOCOL]] [[RDF-SPARQL-QUERY]].  The third party service URI would then be the URI of a SPARQL endpoint  (or, to use the SPARQL specification language, a <a href="http://www.w3.org/TR/rdf-sparql-protocol/#conformant-sparql-protocol-service">SPARQL protocol service</a>) which is queried for the desired provenance information.
+        For such circumstances, a provenance query service provides an alternative way to access provenance and/or provenance URIs.
       </p>
       <p>
-        If the requester has a URI for the original resource, they simple issue a simple SPARQL query for the URI(s) of any associated provenance data; e.g., if the original resource has URI <code>http://example.org/resource</code>, 
-        <code>
-          <pre class="example">
-            @prefix prov: &lt;@@TBD&gt;
-            SELECT ?provenance_uri WHERE
-            {
-              &lt;http://example.org/resource&gt; prov:hasProvenance ?provenance_uri
-            }
-          </pre>
-        </code>
-      </p>
-      <p class="issue">
-        @@TODO: specific provenance property to be determined by the model specification?
+        We assume that the requesting application has the URI of a provenance query service, and some information about the resource for which provenance is required that can be used as the basis for a query.  A query service is potentially a very general capability that can, in principle, subsume the provenance discovery service described in <a href="#independent-provenance-discovery-services" class="sectionRef"></a>.
       </p>
       <p>
-        If the requester has identifying information that is not the URI of the original resource, then they will need to construct a more elaborate query to locate the target resource and obtain its provenance URI(s).  The nature of identifying information that can be used in this way will depend upon the third party service used,  further definition of which is out of scope for this specification.  For example, a query for a document identified by a DOI, say <code>1234.5678</code>, might look like this:
-        <code>
-          <pre class="example">
-            @prefix prov: &lt;@@TBD&gt;
-            @prefix idservice: &lt;@@TBD&gt;
-            SELECT ?provenance_uri WHERE
-            {
-              [ idservice:hasDOI "1234.5678" ] prov:hasProvenance ?provenance_uri
-            }
-          </pre>
-        </code>
+        The details of a provenance query service is an implementation choice, to be agreed between provider and users of the service, but for ease of interoperability between different providers and users we recommend use of SPARQL [[RDF-SPARQL-PROTOCOL]] [[RDF-SPARQL-QUERY]].  The query service URI would then be the URI of a SPARQL endpoint  (or, to use the SPARQL specification language, a <a href="http://www.w3.org/TR/rdf-sparql-protocol/#conformant-sparql-protocol-service">SPARQL protocol service</a>).  A query service can potentially be used in many different ways, limited only by the available information and capabilities of theSPARQL query language; the following subsections provide examples for what are considered to be some plausible common scenarios.
       </p>
-      <p>
-        @@TODO
-      </p>
+
+      <section>
+        <h2>Find provenance URI given URI of resource</h2>
+        <p>
+          If the requester has a URI for the original resource, they might simply issue a simple SPARQL query for the URI(s) of any associated provenance data; e.g., if the original resource has URI <code>http://example.org/resource</code>, 
+          <code>
+            <pre class="example">
+              @prefix prov: &lt;@@TBD&gt;
+              SELECT ?provenance_uri WHERE
+              {
+                &lt;http://example.org/resource&gt; prov:hasProvenance ?provenance_uri
+              }
+            </pre>
+          </code>
+        </p>
+        <p class="issue">
+          @@TODO: specific provenance namespace and property to be determined by the model specification?
+        </p>
+      </section>
+
+      <section>
+        <h2>Find provenance URI given identifying information about a resource</h2>
+        <p>
+          If the requester has identifying information that is not the URI of the original resource, then they will need to construct a more elaborate query to locate the target resource and obtain its provenance URI(s).  The nature of identifying information that can be used in this way will depend upon the third party service used,  further definition of which is out of scope for this specification.  For example, a query for a document identified by a DOI, say <code>1234.5678</code>, using the PRISM vocabulary [[PRISM]] recommended by FaBio [[FABIO]], might look like this:
+          <code>
+            <pre class="example">
+              @prefix prov: &lt;@@TBD&gt;
+              @prefix prism: &lt;http://prismstandard.org/namespaces/basic/2.0/&gt;
+              SELECT ?provenance_uri WHERE
+              {
+                [ prism:doi "1234.5678" ] prov:hasProvenance ?provenance_uri
+              }
+            </pre>
+          </code>
+        </p>
+        <p class="issue">
+          @@TODO: specific provenance namespace and property to be determined by the model specification?
+        </p>
+      </section>
+
+      <section>
+        <h2>Obtain provenance directly given URI of a resource</h2>
+        <p>
+          This scenario retrieves provenance directly given the URI of a resource, and may be useful where the provenance has not been assigned a specific URI, or when the calling application is interested only in specific elements of provenance.
+        </p>
+        <p>
+          If the original resource has URI <code>http://example.org/resource</code>, a SPARQL query for provenance might look like this: 
+          <code>
+            <pre class="example">
+              @prefix prov: &lt;@@TBD&gt;
+              CONSTRUCT
+              {
+                &lt;http://example.org/resource&gt; ?p ?v
+              }
+              WHERE
+              {
+                &lt;http://example.org/resource&gt; ?p ?v
+              }
+            </pre>
+          </code>
+          This query essentially extracts all available properties and values available from the query service used that are directly about the specified resource, and returns them as an RDFG graph.  This may be fine if the service contains <em>only</em> provenance about the resource, or if the non-provenance information is also of interest.  A more complex query using specific provenance vocabulary terms may be needed to selectively retrieve just provenance information when other kinds might be available.
+        </p>
+        <p class="issue">
+          @@TODO: specific provenance namespace and property to be determined by the model specification?  The above query pattern assumes provenance is included in direct properties about the target resource.  When an RDF vocabulary is formulated provenance, this may well turn out to not be the case.  A better example would probably be one that retrieves specific provenance when the vocabulary terms have been defined.
+        </p>
+      </section>
+
     </section>
 
-    <section class="informative">
+    <!-- <section class="informative"> -->
+    <section>
       <h2>Provenance service discovery</h2>
-      <p>
+      <p class="issue">
         (How to discover provenance services.  There is nothing particular about provenance on this respect, and this section will discuss some of the available options without adding any new normative specification.)
       </p>
       <p>
@@ -450,7 +494,7 @@
           </dd>
           <dt>Description:</dt>
           <dd>
-            the resource identified by target URI of the link provides provenance information about the resource identified by the context link
+            the resource identified by target URI of the link provides provenance about the resource identified by the context link
           </dd>
           <dt>Reference:</dt>
           <dd>