Initial draft of simple HTTP interface for provenance discovery
authorGraham Klyne
Thu, 04 Aug 2011 15:57:43 +0100
changeset 109 70d2851b9179
parent 104 dcaef2c00405
child 110 a055a7987aa7
Initial draft of simple HTTP interface for provenance discovery
paq/provenance-access.html
--- a/paq/provenance-access.html	Thu Aug 04 13:18:18 2011 +0100
+++ b/paq/provenance-access.html	Thu Aug 04 15:57:43 2011 +0100
@@ -140,11 +140,9 @@
         </p>
         <p>
           The same basic mechanism can be used for referencing provenance data, for which a new link relation type is registered according to the template in <a href="#iana-considerations" class="sectionRef"></a>:
-          <code>
-            <pre class="pattern">
-              Link: <cite>provenance-URI</cite>; rel="provenance"
-            </pre>
-          </code>
+          <p class="pattern">
+            <code>Link: <cite>provenance-URI</cite>; rel="provenance"</code>
+          </p>
           When used in conjunction with an HTTP success response code (<code>2xx</code>), this HTTP header indicates that <code><cite>provenance-URI</cite></code> is the URI of some provenance for the requested resource. At this time, the meaning of provenance links returned with other HTTP response codes is not defined: future revisions of this specification may define interpretations for these.
         </p>
         <p>
@@ -172,16 +170,16 @@
           The same basic mechanism can be used for referencing provence data, for which a new link relation type is registered according to the template in <a href="#iana-considerations" class="sectionRef"></a>:
           <code>
             <pre class="pattern">
-              &lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
-                 &lt;head&gt;
-                    &lt;meta name="wdr.issuedby" content="http://authority.example.org/company.rdf#me"/&gt;
-                    &lt;link rel="provenance" href="<cite>provenance-URI</cite>"&gt;
-                    &lt;title&gt;Welcome to example.com &lt;/title&gt;
-                 &lt;/head&gt;
-                 &lt;body&gt;
-                    ...
-                 &lt;/body&gt;
-              &lt;/html&gt;
+  &lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
+     &lt;head&gt;
+        &lt;meta name="wdr.issuedby" content="http://authority.example.org/company.rdf#me"/&gt;
+        &lt;link rel="provenance" href="<cite>provenance-URI</cite>"&gt;
+        &lt;title&gt;Welcome to example.com &lt;/title&gt;
+     &lt;/head&gt;
+     &lt;body&gt;
+        ...
+     &lt;/body&gt;
+  &lt;/html&gt;
             </pre>
           </code>
           This element indicates that <code><cite>provenance-URI</cite></code> is the URI of a provenance resource for the containing document.
@@ -223,15 +221,160 @@
     </section>
 
     <section>
-      <h2>Third party services</h2>
+      <h2>Independent provenance discovery services</h2>
       <p>
-        The mechanisms for provenance discovery described above have all assumed the provenance URI is being supplied by the provider of the original resource.  Where provenance information is provided by a third party without any cooperation from the original resource provider, the provenance link cannot be provided through the same channels as the original resource, and a alternative approaches must be considered.
+        The mechanisms for provenance discovery described above have all assumed the provenance URI is being supplied by the provider of the original resource.  Where provenance information is provided independently without coordination with the original resource delivery channels (e.g. by a third party), alternative approaches must be considered.
       </p>
       <p>
-        We assume that the application or person requesting provenance information has the URI or other unique identification of the resource for which provenance is required, and also has a URI for a third-party provenance service.
+        The mechanism described here focuses on finding the URI(s) for provenance information.  Below, <a href="#querying-provenance" class="sectionRef"></a> will consider access to provenance for which there is no separate URI.
       </p>
       <p>
-        The nature of a third party provenance service is an implementation choice, to be agreed between provider and users of the service, but for ease of interoperability between different providers and users we recommend use of SPARQL [[RDF-SPARQL-PROTOCOL]] [[RDF-SPARQL-QUERY]].  The third party service URI would then be the URI of a SPARQL endpoint  (or, to use the SPARQL specification language, a <a href="http://www.w3.org/TR/rdf-sparql-protocol/#conformant-sparql-protocol-service">SPARQL protocol service</a>) which is queried for the desired provenance information.
+        We assume that the requesting application has the URI of a resource for which provenance is required, and also has a URI for an independent provenance discovery service.
+      </p>
+      <p>
+        A service based on a simple HTTP GET operation is used to retrieve the provenance URI(s) for a resource.  In designing such a service, there are two main factors to consider:
+        <ul>
+          <li>The construction of the HTTP request URI</li>
+          <li>The content and format(s) of the expected response</li>
+          <li>Possible outcomes and corresponding HTTP response status codes</li>
+        </ul>
+      </p>
+      <p class="note">
+        (The interface described here is based on that presented by <a href="http://sameas.org/about.php">SameAs.org</a>, which is also a service that essentially accepts a single URI as an input parameter, and returns a list of URIs.)
+      </p>
+      <section>
+        <h2>Request URI</h2>
+        <p>Given:
+          <dl>
+            <dt><code><cite>Service-URI</cite></code></dt>
+            <dd>is the URI of the provenance discovery service.</dd>
+            <dt><code><cite>Target-URI</cite></code></dt>
+            <dd>is the URI of the resource for which provenance is required.</dd>
+          </dl>
+        </p>
+        <p>
+          Then the request URI for provenance discovery is constructed as:
+        </p>
+        <pre class="pattern">
+<code><strong><cite>Service-URI</cite></strong>?uri=<strong><cite>Target-URI</cite></strong></code></pre>
+        <p>
+          For example, if the discovery service URI is <code>http://example.net/provenance-discovery</code> and the resource for which provenance is required is identified as <code>http://example.info/qdata/</code>, then the request URI to use for provenance discovery would be:
+        </p>
+        <pre class="example">
+          <code>http://example.net/provenance-discovery?uri=http://example.info/qdata/</code>
+        </pre>
+      </section>
+      <section>
+        <h2>Response content and formats</h2>
+        <p>If there is at least one provenance URI that the service is returning, a response with HTTP 200 status code is generated, along with one of the following response formats, determined by HTTP content negotiation (<code>Accept:</code> header).</p>
+        <dl>
+          <dt>application/json</dt>
+          <dd>Returns one or more provenance URIs as part of a JSON structure [[RFC4627]], possibly including additional metadata:
+            <code>
+              <pre class="example">
+                [
+                  {
+                    "uri": "http://example.info/qdata/",
+                    "numProvenance": "3",
+                    "provenance": [
+                      "http://source1.example.info/provenance/qdata/",
+                      "http://source2.example.info/prov/qdata/",
+                      "http://source3.example.com/prov?id=qdata"
+                    ]
+                  }
+                ]
+              </pre>
+            </code>
+          </dd>
+          <dt>application/rdf+xml</dt>
+          <dd>Returns an RDF graph with one or more provenance URIs associated with the original resource,
+            presented as an RDF/XML document [[RDF-SYNTAX-GRAMMAR]].
+            The vocabulary used is the same as that used when a resource presented as RDF contains references
+            to its own provenance, per <a href="#resource-presented-as-rdf" class="sectionRef"></a>.
+            <code>
+              <pre class="example">
+                &lt;rdf:RDF
+                  xmlns:rdf  = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+                  xmlns:rdfs = "http://www.w3.org/2000/01/rdf-schema#"
+                  xmlns:prov = "http://www.w3.org/@@TBD@@#"
+                &gt;
+                  &lt;rdf:Description rdf:about="http://example.info/qdata/"&gt;
+                    &lt;prov:hasProvenance rdf:resource="http://source1.example.info/provenance/qdata/" /&gt;
+                    &lt;prov:hasProvenance rdf:resource="http://source2.example.info/prov/qdata/" /&gt;
+                    &lt;prov:hasProvenance rdf:resource="http://source3.example.com/prov?id=qdata" /&gt;
+                  &lt;/rdf:Description&gt;
+                &lt;/rdf:RDF&gt;
+              </pre>
+            </code>
+          </dd>
+          <dd>Returns an RDF graph with one or more provenance URIs associated with the original resource,
+            presented as a Turtle or N3 document [[TURTLE]]:
+          <dd>
+            <code>
+              <pre class="example">
+                @prefix prov: &lt;http://www.w3.org/@@TBD@@#&gt; .
+
+                &lt;http://example.info/qdata/&gt;
+                  prov:hasProvenance
+                    &lt;http://source1.example.info/provenance/qdata/&gt; ,
+                    &lt;http://source2.example.info/prov/qdata/&gt; ,
+                    &lt;http://source3.example.com/prov?id=qdata&gt;
+                    .
+              </pre>
+            </code>
+          </dd>
+          <dt>text/plain</dt>
+          <dd>
+          <dd>Returns a simple text file containing just a list of provenance URIs, one per line.  (The original resource URI is not included in the result data.;)
+            <code>
+              <pre class="example">
+                http://source1.example.info/provenance/qdata/
+                http://source2.example.info/prov/qdata/
+                http://source3.example.com/prov?id=qdata
+              </pre>
+            </code>
+          </dd>
+        </dl>
+        <p class="issue">
+          SameAs.org also provides URIs for directly accessing the different result formats without content negotiation, by appending an extra segment to the SameAs.org service URI.  I'm reluctant to suggest this mechanism for a service with separately specified base URI.  Maybe allow the use of an additional query parameter, e.g. <code>&amp;format=rdf</code>, etc.
+        </p>
+      </section>
+      <section>
+        <h2>Response codes</h2>
+        <p>
+          <dl>
+            <dt><code>200 OK</code></dt>
+            <dd>Provenance URI(s) returned; see above.</dd>
+            <dt><code>204 No data</code></dt>
+            <dd>The request was valid, but the server is returning no provenance URIs *(either because it does not know of any provenance URIs, or possibly because it declines to provide the information for policy reasons.</dd>
+            <dt><code>...</code></dt>
+            <dd>...</dd>
+          </dl>
+        </p>
+      </section>
+
+    </section>
+    
+    <section>
+      <h2>Querying provenance</h2>
+      <p class="issue">
+        This section is work in progress, incomplete, not ready for review.
+      </p>
+      <p>
+        There are circumstances where simply identifying and retrieving provenance as a web resource may not best fit the requirements of a particular application or service, e.g.:
+        <ul>
+          <li>the entity for which provenance is required is not identified by a known URI</li>
+          <li>the provenance for an entity is not directly identified by a known URI</li>
+          <li>provenance for an entity is sufficiently large or complex that it is not desirable or useful to retrieve it all in a single operation</li>
+          <li>provenance for a number of distinct but related entities is required to be accessed in a single atomic operation</li>
+          <li><i>etc.</i></li>
+        </ul>
+      </p>
+      <p>
+        For such circumstances, a provenance query service provides an alternative way 
+      </p>
+      <p>
+        The nature of a provenance query ... service is an implementation choice, to be agreed between provider and users of the service, but for ease of interoperability between different providers and users we recommend use of SPARQL [[RDF-SPARQL-PROTOCOL]] [[RDF-SPARQL-QUERY]].  The third party service URI would then be the URI of a SPARQL endpoint  (or, to use the SPARQL specification language, a <a href="http://www.w3.org/TR/rdf-sparql-protocol/#conformant-sparql-protocol-service">SPARQL protocol service</a>) which is queried for the desired provenance information.
       </p>
       <p>
         If the requester has a URI for the original resource, they simple issue a simple SPARQL query for the URI(s) of any associated provenance data; e.g., if the original resource has URI <code>http://example.org/resource</code>, 
@@ -262,16 +405,6 @@
         </code>
       </p>
       <p>
-        The mechanisms described here focus on finding the URI(s) for provenance information.  Below, <a href="#querying-provenance-data" class="sectionRef"></a> will consider access to provenance information for which there is no separate URI.
-      </p>
-    </section>
-    
-    <section>
-      <h2>Querying provenance data</h2>
-      <p class="issue">
-        This section will build upon the previous section, describing the use of a SPARQL endpoint service to obtain provenance information directly from a service provider.  No new protocol or vocabulary elements are defined: the mechanisms are used are those described above, coupled with possible use of provenance vocabulary terms in a SPARQL query.
-      </p>
-      <p>
         @@TODO
       </p>
     </section>