Added section to discuss provenance discovery for arbitrary, isolated data objects
authorGraham Klyne
Tue, 09 Aug 2011 17:42:57 +0100
changeset 146 9eaae995267c
parent 145 5e4942fb8ae2
child 147 96c3df47e25e
Added section to discuss provenance discovery for arbitrary, isolated data objects
paq/provenance-access.html
--- a/paq/provenance-access.html	Tue Aug 09 17:28:35 2011 +0100
+++ b/paq/provenance-access.html	Tue Aug 09 17:42:57 2011 +0100
@@ -295,12 +295,41 @@
         </p>
       </section>
 
+      <section>
+        <h2>Arbitrary target</h2>
+        <p class="pending">
+          We have so far decided not to try and define a common mechanism for arbitrary data, because it's not clear to us what the correct choice would be.  Is this a reasonable position, or is there a real need for a generic solution for provenance discovery for arbitrary, non-web-accessible data objects?
+        </p>
+        <p>
+          If a resource is presented using a data format other than HTML or RDF, and no URI for the resource is known, provenance discovery becomes trickier to achieve.  This specification does not define a specific mechanism for such arbitrary resources, but this section discusses some of the options that might be considered.
+        </p>
+        <p>
+          For formats which have provision for including metadata within the file (e.g. JPEG images, PDF documents, etc.), use the format-specific metadata to include a Target-URI and/or Provenance-URI.
+        </p>
+        <p>
+          Use a generic packaging format that can combine an arbitrary data file with a separate metadata file in a known format, such as RDF.  At this time, it is not clear what format that should be, but some possible candidates are:
+          <ul>
+            <li>MIME multipart/related [[RFC2387]]: both email and HTTP are based on MIME or MIME-derivatives, so this has the advantage of working well with the network transfer mechanisms discussed in the motivating scenarios considered.
+            </li>
+            <li>
+              Composite object-packaging work from the digital library community, or which there are several (ORE, MPEG-21, BagIt @@refs) to name a handful.  Practical implementations of these seem to commonly be based on the ZIP file format.
+            </li>
+            <li>
+              Packaging formats along the lines of those used for shipping Java web applications or (basically, a ZIP file with a manifest and some imposed structure)
+            </li>
+            <li>
+              Ongoing work in the research community (e.g. <a href="http://eprints.ecs.soton.ac.uk/21587/">Why Linked Data is Not Enough for Scientists</a>, ePub, etc.) to encapsulate data, code, annotations and metadata into a common exchangeable format.
+            </li>
+          </ul>
+        </p>
+      </section>
+
     </section>
 
     <section>
       <h2>Provenance discovery service</h2>
       <p class="issue">
-        @@TODO: re-cast as a service description that defines how to construct a Provenance-URI.  This should be properly RESTful, per http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven.
+        @@TODO: re-cast as a service description that defines how to construct a Provenance-URI.  This should be properly RESTful, per <a href="http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven">http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven</a>
       </p>
       <p class="pending">
         Propose simple HTTP interface for discovery.  cf <a href="http://www.w3.org/2011/prov/track/issues/53">ISSUE 53</a>
@@ -691,7 +720,6 @@
         <p>
           There are clearly a number of capabilities needed for a provenance-aware application that are not covered by the mechanisms described above.  But most of these amount to implementation details and decisions for a particular application, and as such are beyond the scope of this document to specify.
         </p>
-        <p>@@TODO: move this to new section 3.4</p>
         <p>
           One feature not covered above that might be a candidate for specification is a common format for a data package that combines original content along with provenance-related metadata or data.  At this stage, it is not clear what format that might take, but some possible candidates are:
           <ul>