Added a few mentions of out-of-scope technologies
authorJeniT <jeni@jenitennison.com>
Sat, 14 Jan 2012 21:35:08 +0000
changeset 94 036fb509c01b
parent 93 8c04035f703e
child 95 81e80ce1337c
Added a few mentions of out-of-scope technologies

Examples are XMP, the Link: header, and how to merge data retrieved from the web; these were raised by Larry Masinter and Eric Wilde.
html-data-guide/index.html
--- a/html-data-guide/index.html	Sat Jan 14 20:32:19 2012 +0000
+++ b/html-data-guide/index.html	Sat Jan 14 21:35:08 2012 +0000
@@ -91,7 +91,10 @@
     <section>
       <h2>Introduction</h2>
       <p>
-        The first formal methods of embedding data within HTML pages were those pioneered by the microformats community. These sought to regularise the existing use of semantic classes and link relations within HTML markup for common subject areas such as people, organisations and events.
+        HTML pages naturally contain a lot of semantic information: the title of the page in the <code>&lt;title&gt;</code> element, addresses in <code>&lt;address&gt;</code> elements, the source of a quotation in the <code>@cite</code> attribute, arbitrary metadata about the page in <code>&lt;meta&gt;</code> elements and so on. These mechanisms primarily provide metadata about the HTML page itself, but it is also useful to embed data about <em>other things</em> within HTML pages.
+      </p>
+      <p>
+        The first formal methods of embedding data about things other than the HTML page itself within HTML pages were those pioneered by the microformats community. These sought to regularise the existing use of semantic classes and link relations within HTML markup for common subject areas such as people, organisations and events.
       </p>
       <p>
         Since then, the practice of embedding HTML data within web pages has gradually grown, particularly bolstered by search engines using embedded data to supplement the appearance of entries within their result pages and by the open linked data community seeking to bridge the gap between documents and data on the web. HTML data is used in a variety of ways, as evinced by the <a href="http://lists.w3.org/Archives/Public/public-html/2009May/0207.html">use cases collected during the design of microdata</a>. Consumers of HTML data include:
@@ -126,14 +129,17 @@
       <section>
         <h3>Scope</h3>
         <p>
-          There are many ways of publishing data on the web such that it is can be discovered from HTML pages or used by scripts and stylesheets that operate over your page.
+          There are many ways of publishing data on the web that do not necessarily involve HTML at all. This document does not cover how to provide data using other data formats, such as JSON or Turtle. It does not talk about HTTP-level mechanisms for providing information about the relationships between resources on the web, such as the <a href="http://tools.ietf.org/html/rfc5988"><code>Link:</code> header</a>. It does not discuss techniques for embedding data in non-HTML files, such as metadata embedded within PDFs or JPEGs through <a href="http://www.adobe.com/devnet/xmp.html">XMP</a>.
+        </p>
+        <p>
+          Even with a focus on methods that can be used in HTML, there are many techniques for publishing data such that it can be discovered from HTML pages or used by scripts and stylesheets that operate over your page.
         </p>
         <p>
           First, publishers may link to alternative versions of a document, using different syntax, through a <code>link</code> element. The <code>@rel</code> attribute should take the value <code>alternate</code> and the <code>@type</code> attribute should provide the mime type of the alternative representation. For example:
         </p>
         <pre>&lt;link rel="alternate" type="text/calendar" value="calendar.ics" /&gt;</pre>
         <p>
-          Second, publishers may embed data within the <code>head</code> of an HTML document, nested inside a <code>script</code> element with an appropriate <code>@type</code> attribute. This can be used for text-based formats, such as JSON or Turtle, as well as XML-based formats. For example:
+          Second, publishers may embed data within the <code>head</code> of an HTML document, nested inside a <code>script</code> element with an appropriate <code>@type</code> attribute. This method can be used for text-based formats, such as JSON or Turtle, as well as XML-based formats. For example:
         </p>
         <pre><strong>&lt;script type="text/turtle"&gt;</strong>
   @prefix foaf: &lt;http://xmlns.com/foaf/0.1/&gt; .
@@ -167,7 +173,7 @@
  &lt;/button&gt;
 &lt;/div&gt;</pre>
         <p>
-          This document focuses on methods of data markup that reuse visible data within the page. Embedding data within an HTML page has the advantage of avoiding repetition, enables access through scripts and stylesheets, and is more easily discoverable by browsers and search engines which regularly consume HTML documents.
+          This document focuses on methods of data markup within HTML that reuse visible data within the page. Embedding data within an HTML page has the advantage of avoiding repetition, enables access through scripts and stylesheets, and is more easily discoverable by browsers and search engines which regularly consume HTML documents.
         </p>
       </section>
       
@@ -875,6 +881,9 @@
         <p>
           Even when the use of data is unrestricted, it is good practice for consumers to record the source of the information that they use and, when republishing that data, provide metadata about the rights holder, source and license under which the information is available, using the same <a title="vocabulary">vocabularies</a> as those listed above.
         </p>
+        <p>
+          Working out how much to believe data gathered from the web may be complex. Consumers may use a variety of metrics based on the reliability of the publisher, the quality of the data itself and so on, to determine the extent to which the published data can be trusted. This is particularly important when combining data about the same <a>entity</a> from multiple publishers, where data from the same origin as the entity identifier may be given higher weight. These methods are outside the scope of this document.
+        </p>
       </section>
     </section>