Added section to discuss provenance discovery for arbitrary, isolated data objects
--- a/paq/provenance-access.html Tue Aug 09 17:28:35 2011 +0100
+++ b/paq/provenance-access.html Tue Aug 09 17:42:57 2011 +0100
@@ -295,12 +295,41 @@
</p>
</section>
+ <section>
+ <h2>Arbitrary target</h2>
+ <p class="pending">
+ We have so far decided not to try and define a common mechanism for arbitrary data, because it's not clear to us what the correct choice would be. Is this a reasonable position, or is there a real need for a generic solution for provenance discovery for arbitrary, non-web-accessible data objects?
+ </p>
+ <p>
+ If a resource is presented using a data format other than HTML or RDF, and no URI for the resource is known, provenance discovery becomes trickier to achieve. This specification does not define a specific mechanism for such arbitrary resources, but this section discusses some of the options that might be considered.
+ </p>
+ <p>
+ For formats which have provision for including metadata within the file (e.g. JPEG images, PDF documents, etc.), use the format-specific metadata to include a Target-URI and/or Provenance-URI.
+ </p>
+ <p>
+ Use a generic packaging format that can combine an arbitrary data file with a separate metadata file in a known format, such as RDF. At this time, it is not clear what format that should be, but some possible candidates are:
+ <ul>
+ <li>MIME multipart/related [[RFC2387]]: both email and HTTP are based on MIME or MIME-derivatives, so this has the advantage of working well with the network transfer mechanisms discussed in the motivating scenarios considered.
+ </li>
+ <li>
+ Composite object-packaging work from the digital library community, or which there are several (ORE, MPEG-21, BagIt @@refs) to name a handful. Practical implementations of these seem to commonly be based on the ZIP file format.
+ </li>
+ <li>
+ Packaging formats along the lines of those used for shipping Java web applications or (basically, a ZIP file with a manifest and some imposed structure)
+ </li>
+ <li>
+ Ongoing work in the research community (e.g. <a href="http://eprints.ecs.soton.ac.uk/21587/">Why Linked Data is Not Enough for Scientists</a>, ePub, etc.) to encapsulate data, code, annotations and metadata into a common exchangeable format.
+ </li>
+ </ul>
+ </p>
+ </section>
+
</section>
<section>
<h2>Provenance discovery service</h2>
<p class="issue">
- @@TODO: re-cast as a service description that defines how to construct a Provenance-URI. This should be properly RESTful, per http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven.
+ @@TODO: re-cast as a service description that defines how to construct a Provenance-URI. This should be properly RESTful, per <a href="http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven">http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven</a>
</p>
<p class="pending">
Propose simple HTTP interface for discovery. cf <a href="http://www.w3.org/2011/prov/track/issues/53">ISSUE 53</a>
@@ -691,7 +720,6 @@
<p>
There are clearly a number of capabilities needed for a provenance-aware application that are not covered by the mechanisms described above. But most of these amount to implementation details and decisions for a particular application, and as such are beyond the scope of this document to specify.
</p>
- <p>@@TODO: move this to new section 3.4</p>
<p>
One feature not covered above that might be a candidate for specification is a common format for a data package that combines original content along with provenance-related metadata or data. At this stage, it is not clear what format that might take, but some possible candidates are:
<ul>