Added scenario analysis appendix to PAQ docment
authorGraham Klyne
Thu, 28 Jul 2011 11:12:30 +0100
changeset 68 be3b7e1f2518
parent 67 8070d1083112
child 69 bc0bbf26efab
Added scenario analysis appendix to PAQ docment
paq/provenance-access.html
--- a/paq/provenance-access.html	Wed Jul 27 15:30:03 2011 +0100
+++ b/paq/provenance-access.html	Thu Jul 28 11:12:30 2011 +0100
@@ -15,7 +15,9 @@
               berjon.biblio[k] = extraReferences[k];
       };
       var extraReferences = {
-         "LINK-REL" : "M. Nottingham, <a href='http://www.ietf.org/rfc/rfc5988.txt'><cite>Web Linking</cite></a>, October 2010, Internet RFC 5988. URL: <a href='http://www.ietf.org/rfc/rfc5988.txt'>http://www.ietf.org/rfc/rfc5988.txt</a>",
+        "LINK-REL" : "M. Nottingham, <a href='http://www.ietf.org/rfc/rfc5988.txt'><cite>Web Linking</cite></a>, October 2010, Internet RFC 5988. URL: <a href='http://www.ietf.org/rfc/rfc5988.txt'>http://www.ietf.org/rfc/rfc5988.txt</a>",
+        "RFC2387" : "E. Levinson. <a href=\"http://www.ietf.org/rfc/rfc2387.txt\"><cite>The MIME Multipart/Related Content-type.</cite></a> August 1998. Internet RFC 2387. URL: <a href=\"http://www.ietf.org/rfc/rfc2387.txt\">http://www.ietf.org/rfc/rfc2387.txt</a>",
+        "RFC3297" : "G. Klyne; R. Iwazaki; D. Crocker. <a href=\"http://www.ietf.org/rfc/rfc3297.txt\"><cite>Content Negotiation for Messaging Services based on Email.</cite></a> July 2002. Internet RFC 3297. URL: <a href=\"http://www.ietf.org/rfc/rfc3297.txt\">http://www.ietf.org/rfc/rfc3297.txt</a>",
       };
       var respecConfig = {
           // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
@@ -331,5 +333,61 @@
         Many thanks to Robin Berjon for making our lives so much easier with his cool <a href="http://dev.w3.org/2009/dap/ReSpec.js/documentation.html">ReSpec</a> tool.
       </p>
     </section>
+    <section class="appendix">
+      <h2>Motivating scenario</h2>
+      <p><a href="http://www.w3.org/2011/prov/wiki/ProvenanceAccessScenario">This scenario</a> was selected by the provenance working group as a touchstone for evaluating any provenance access proposal.  This appendix evaluates the foregoing proposals against the requirements implied by that scenario.</p>
+      <p>
+        <ul>
+          <li>Obtaining the document D: for the purpose of this analysis, it is assumed that the access to the document is either from a known Web URI, or the document is available as HTML or RDF (the primary web standards for documents and data).  The mechanisms here are in principle applicable to other document forms of a per-format basis.
+            <ul>
+              <li>D1, D2: use the HTTP <code>Link:</code> header.  Any server providing the document may provide this information. Different servers might offer links to different provenance sources.</li>
+              <li>D3: information provided as an image with a known URI, but from a non-provenance-aware source.  The image URI can be used as a key to access a third party provenance discovery service.
+              <li>D4, D6, D7, D8: information provided as an image, without a known web location.  At the very least, some mechanism, not specified here, is needed to identify the image provided.  In the case of an email attachment, it is possible (but not guaranteed) that the email message MIME wrapper specifies a URI for the image, which can be used as a key.  Some image formats support embedded metadata which might be used for this purpose.  <em>(Arbitrary data files could be wrapped in a package, say MIME multipart/related [[RFC2387]], that could include additional metadata.  Image files could be wrapped in a minimal HTML document.  It is not clear to me at this stage that a single mechanism is appropriate for all situations)</em>.</li>
+              <li>D5: HTML email.  Depending on how the HTML is constructed, the HTML header could include a <code>&lt;link&gt;</code> element.  The <code>Link:</code> header might be extended for use with eMail messages [[EMAIL]], but it's not clear that would be a worthwhile effort.</li>
+            </ul>
+          </li>
+          <li>Lacking identification or in-band metadata, some independent identification of the thing represented by an available mechanism is required.  <em>I think this is unavoidable</em></li>
+          <li>Enacting the "Oh yeah?" feature
+            <ul>
+              <li>W: once a URI for provenance information has been determined, accessing it using a web browser or other web client software should be straightforward.  If the provenance is accessible via a third party query service, that may be less straightforward.</li>
+              <li>E: this scenario seems to envisage a wholesale overhaul of email client software, which seems unlikely.  If a URI for provenance can be provided, the natural way to access it would be via a web client of some kind, which might be a browser or other software.</li>
+              <li>S: this scenario effectively calls for this:  given an arbitrary data resource, implement a general purpose application to discover, retrieve and analyze provenance about that resource.  At the present time, this is a matter for experimental development, which could be based substantially on the mechanisms described for provenance discovery and access via third party services.</li>
+            </ul>
+          </li>
+          <li>I: Accessing the provenance
+            <ul>
+              <li>W: a web client needs one or more URIs for provenance information, and/or URI(s) for a provenance query service and sufficient additional information about the resource to formulate an effective query.  They may also need access information that can be used to assess (or help a user assess) the trustworthiness of provenance of information obtained, (which could be more provenance information)</li>
+              <li>E: an email client is a passive receiver of information, so asking one to retrieve provenance information is a perverse expectation.  There have been some attempts to standardize email protocols that interact with the email sender (e.g. [[RFC3297]]) but such mechanisms have not been significantly deployed in practice. This case can be viewed as a variation on the shell-client case (S) below.  If all provenance information is sent <em>with</em> the original content using standard email mechanisms (MIME multipart, etc.) then the email client may use that (or hand it off to a helper application) as the basis for provenance-based analysis or presentation.</li>
+              <li>S: command shell or other local application.  This is the general case for provenance access.  Given some arbitrary information, what does a provenance-aware application need to access the required provenance information?  It may employ any of the mechanisms described above.</li>
+            </ul>
+          </li>
+        </ul>
+      </p>
+      <section>
+        <h2>Gap analysis</h2>
+        <p>
+          There are clearly a number of capabilities needed for a provenance-aware application that are not covered by the mechanisms described above.  But most of these amount to implementation details and decisions for a particular application, and as such are beyond the scope of this document to specify.
+        </p>
+        <p>
+          One feature not covered above that might be a candidate for specification is a common format for a data package that combines original content along with provenance-related metadata or data.  At this stage, it is not clear what format that might take, but some possible candidates are:
+          <ul>
+            <li>MIME multipart/related [[RFC2387]]: both email and HTTP are based on MIME or MIME-derivatives, so this has the advantage of working well with the network transfer mechanisms discussed in the scenario.
+            </li>
+            <li>
+              Composite object-packaging work from the digital library community, or which there are several (ORE, MPEG-21, BagIt @@refs) to name a handful.  Practical implementations of these seem to commonly be based on the ZIP file format.
+            </li>
+            <li>
+              Packaging formats along the lines of those used for shipping Java web applications or (basically, a ZIP file with a manifest and some imposed structure)
+            </li>
+            <li>
+              Ongoing work in the research community (e.g. ...) to encapsulate data, code, annotations and metadata into a common exchangeable format.
+            </li>
+          </ul>
+        </p>
+        <p>
+          Given the extent of work already performed in this field, it seems to me that rather than recommending a particular approach at this stage, we would do better to catalogue some available solutions and see which ones (if any) that implementers choose to run with.  In any case, it seems that a specification that is specific for provenance to the exclusion of other metadata is unlikely to obtain traction, as provenance is just part of a wider landscape of information quality, trust, preservation and more.
+        </p>
+      </section>
+    </section>
   </body>
 </html>