HTML Data Guide
authorJeniT
Sat, 10 Dec 2011 21:15:33 +0000
changeset 45 95e49b1811d5
parent 44 62761f79513a
child 46 5e403ad6e1ac
HTML Data Guide
html-data-guide/index.html
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/html-data-guide/index.html	Sat Dec 10 21:15:33 2011 +0000
@@ -0,0 +1,871 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <title>HTML Data Guide</title>
+    <meta http-equiv='Content-Type' content='text/html;charset=utf-8'/>
+    <!-- 
+      === NOTA BENE ===
+      For the three scripts below, if your spec resides on dev.w3 you can check them
+      out in the same tree and use relative links so that they'll work offline,
+     -->
+    <script src='http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js' class='remove'></script>
+    <script class='remove'>
+      var respecConfig = {
+          // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
+          specStatus:           "ED",
+          
+          // the specification's short name, as in http://www.w3.org/TR/short-name/
+          shortName:            "html-data-guide",
+
+          // if your specification has a subtitle that goes below the main
+          // formal title, define it here
+          // subtitle   :  "an excellent document",
+
+          // if you wish the publication date to be other than today, set this
+          // publishDate:  "2009-08-06",
+
+          // if the specification's copyright date is a range of years, specify
+          // the start date here:
+          // copyrightStart: "2005"
+
+          // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
+          // and its maturity status
+          // previousPublishDate:  "1977-03-15",
+          // previousMaturity:  "WD",
+
+          // if there a publicly available Editor's Draft, this is the link
+          edDraftURI:           "https://dvcs.w3.org/hg/htmldata/raw-file/default/html-data-guide/index.html",
+
+          // if this is a LCWD, uncomment and set the end of its review period
+          // lcEnd: "2009-08-05",
+
+          // if you want to have extra CSS, append them to this list
+          // it is recommended that the respec.css stylesheet be kept
+          extraCSS:             ["http://dev.w3.org/2009/dap/ReSpec.js/css/respec.css"],
+
+          // editors, add as many as you like
+          // only "name" is required
+          editors:  [
+              { name: "Jeni Tennison", url: "http://www.jenitennison.com/blog/",
+                company: "Independent" },
+          ],
+
+          // authors, add as many as you like. 
+          // This is optional, uncomment if you have authors as well as editors.
+          // only "name" is required. Same format as editors.
+
+          //authors:  [
+          //    { name: "Your Name", url: "http://example.org/",
+          //      company: "Your Company", companyURL: "http://example.com/" },
+          //],
+          
+          // name of the WG
+          wg:           "HTML Data Task Force",
+          
+          // URI of the public WG page
+          wgURI:        "http://www.w3.org/wiki/Html-data-tf",
+          
+          // name (with the @w3c.org) of the public mailing to which comments are due
+          wgPublicList: "public-html-data-tf",
+          
+          // URI of the patent status for this WG, for Rec-track documents
+          // !!!! IMPORTANT !!!!
+          // This is important for Rec-track documents, do not copy a patent URI from a random
+          // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
+          // Team Contact.
+          wgPatentURI:  "",
+      };
+    </script>
+  </head>
+  <body>
+    <section id='abstract'>
+      <p>
+        Microformats, RDFa and microdata all enable consumers to extract data from HTML pages. This data may be embedded within enhanced search engine results, exposed to users through browser plug-ins, aggregated across websites or used by scripts running within those HTML pages.
+      </p>
+      <p>
+        This guide aims to help publishers and consumers of HTML data use it well. With several <a title="syntax">syntaxes</a> and <a title="vocabulary">vocabularies</a> to choose from, it provides guidance about how to decide which meets the publisher's or consumer's needs. It discusses when it is necessary to mix syntaxes and vocabularies and how to publish and consume data that uses multiple formats. It describes how to create vocabularies that can be used in multiple syntaxes and general best practices about the publication and consumption of HTML data.
+      </p>
+    </section>
+    
+    <section>
+      <h2>Introduction</h2>
+      <p>
+        The first formal methods of embedding data within HTML pages were those pioneered by the microformats community. These sought to regularise the existing use of semantic classes and link relations within HTML markup for common subject areas such as people, organisations and events.
+      </p>
+      <p>
+        Since then, the practice of embedding HTML data within web pages has gradually grown, particularly bolstered by search engines using embedded data to supplement the appearance of entries within their result pages. HTML data is used in a variety of ways, as evinced by the <a href="http://lists.w3.org/Archives/Public/public-html/2009May/0207.html">use cases collected during the design of microdata</a>. Consumers of HTML data include:
+      </p>
+      <ul>
+        <li>scripting libraries</li>
+        <li>browsers and browser plug-ins</li>
+        <li>general-purpose search engines</li>
+        <li>vertical or domain-specific search engines</li>
+        <li>data reusers known and unknown to the publisher of the data</li>
+      </ul>
+      <p>
+        There are currently three main <a title="syntax">syntaxes</a> for embedding data within HTML pages:
+      </p>
+      <dl>
+        <dt><a href="http://microformats.org">microformats</a></dt>
+        <dd>microformats use <code>@class</code>, <code>@rel</code> and other attributes to encode data using standard HTML markup. Traditionally, different microformat <a title="vocabulary">vocabularies</a> have followed different parsing rules, but <a href="http://microformats.org/wiki/microformats-2">microformats-2</a> provides a standard parsing algorithm.</dd>
+        <dt><a href="http://www.w3.org/TR/rdfa-in-html/">RDFa</a></dt>
+        <dd>RDFa reuses existing HTML attributes such as <code>@href</code> and <code>@rel</code> and adds a few of its own to enable data to be extracted from HTML pages as RDF. RDFa can also be used with other markup languages and was originally designed for XHTML 1.1.</dd>
+        <dt><a href="http://www.w3.org/TR/microdata/">microdata</a></dt>
+        <dd>Microdata adds attributes to HTML to provide machine-readable descriptions of items within the page in terms of <a title="property">properties</a> and <a title="value">values</a> for those properties. It is designed to be used alongside detailed specifications of how these descriptions should be processed by consumers.</dd>
+      </dl>
+      <p>
+        The three <a title="syntax">syntaxes</a> are similar in goals but differ in approach. This document provides guidance about how to choose between them and use them together.
+      </p>
+      
+      <section>
+        <h3>Terminology</h3>
+        <p>
+          Within this document, a <dfn>format</dfn> is a combination of a <a>syntax</a> and <a title="type">types</a> and <a title="property">properties</a> from one or more <a title="vocabulary">vocabularies</a>. Traditional microformats do not make the distinction between syntax and vocabulary, but RDFa, microdata and microformats-2 do make this distinction.
+        </p>
+        <p>
+          In this document, a <dfn>syntax</dfn> is a set of conventions for parsing data from an HTML page into a data structure. The three syntaxes discussed in this document are RDFa, microdata and microformats-2. Each of these can be used with different <a title="vocabulary">vocabularies</a>.
+        </p>
+        <p>
+          A <dfn>vocabulary</dfn> is a set of terms for describing <a title="entity">entities</a> within a particular domain. Different mechanisms are used for describing vocabularies. A microformat vocabulary is described within a wiki page. An RDFa vocabulary might be described through an RDFS schema or OWL ontology provided at the vocabulary's URI. A microdata vocabulary must be described within a specification that describes how it is processed.
+        </p>
+        <p>
+          All three <a title="syntax">syntaxes</a> follow the same general data model. Each is used to describe <dfn title="entity">entities</dfn> &mdash; things such as people or events (RDFa calls these resources, microdata calls these items). These entities each have one or more <dfn title="type">types</dfn> which indicate what kind of thing they are and a number of <dfn title="property">properties</dfn> that have <dfn title="value">values</dfn>, which provide the data about the entity.
+        </p>
+      </section>
+      
+    </section>
+    
+    <section>
+      <h2>Publishers</h2>
+      <p>
+        If you are publishing HTML data, you are likely to find that the markup within your pages is simpler and easier to maintain if you only use one <a>format</a> (<a>syntax</a> and <a>vocabulary</a>) within each page. To decide which to use, your first consideration has to be which consumers will read the data within your web pages, and which formats they support. These may include:
+      </p>
+      <ul>
+        <li>scripting libraries</li>
+        <li>browsers and browser plug-ins</li>
+        <li>general-purpose search engines</li>
+        <li>vertical or domain-specific search engines</li>
+        <li>data reusers with whom you have agreements</li>
+      </ul>
+      <p>
+        Your second consideration may be the current state of the tooling to support a particular format. For example:
+      </p>
+      <dl>
+        <dt>Are you able to publish using HTML5?</dt>
+        <dd>
+          If you are using a content-management system that doesn't support adding new attributes such as <code>@itemprop</code> or <code>@typeof</code> or if your publishing guidelines require validity against an older version of HTML, then you will be constrained to using microformats. If your publishing guidelines require validity against XHTML, then you might be able to use XHTML+RDFa, depending on how exacting your publishing guidelines are.
+        </dd>
+        <dt>Are there development tools available?</dt>
+        <dd>
+          Because it is not visible within a web page, it can be hard to tell whether HTML data has been written correctly. Consumers should provide validators that enable you to check that your data has been correctly detected and interpreted, but you may also want to consider tool support for generating the HTML data.
+        </dd>
+      </dl>
+      <p>
+        Once you have considered both your target consumers and the tooling support that is available, you will be in one of four situations:
+      </p>
+      <ol>
+        <li><strong>with a single choice of format</strong> in which case there are no further choices to be made</li>
+        <li><strong>unable to publish HTML data that your target consumers understand</strong> in which case you either have to lobby those consumers to add support for the format(s) you can publish in, or consider changing your toolset so that you can publish in something they understand</li>
+        <li><strong>still with a choice between a number of formats</strong> in which case you will want to pick one to use; this is covered in <a href="#choosing-a-publishing-format" class="sectionRef"></a></li>
+        <li><strong>having to use multiple <a title="format">formats</a> at the same time to provide data to all your target customers</strong> in which case you will need to mix formats within your pages; this is covered in <a href="#publishing-in-multiple-formats" class="sectionRef"></a></li>
+      </ol>
+      
+      <section id="choosing-a-publishing-format">
+        <h3>Choosing a Publishing Format</h3>
+        <p>
+          This section addresses a situation where all your target consumers recognise a set of <a title="format">formats</a> (each with a particular <a>syntax</a> and vocabulary), your toolset supports publishing in all of them, and you need to make a choice about which of these formats to use. It's assumed that you will want to choose a single format rather than mixing multiple formats as described in <a href="#publishing-in-multiple-formats" class="sectionRef"></a>, as this will mean less markup in your page and make your publishing task easier.
+        </p>
+        
+        <section>
+          <h4>Syntax Considerations</h4>
+          <p>
+            The different <a title="syntax">syntaxes</a> &mdash; microformats, microdata and RDFa &mdash; have different capabilities which may inform your choice.
+          </p>
+          <dl>
+            <dt>Structured HTML values</dt>
+            <dd>
+              Under appropriate conditions, RDFa and microformats will use markup within the content of an element to provide a <a>property</a> <a>value</a>; in microdata values never retain markup. If property values within your page contain markup (for example a <code>description</code> property containing emphasised text, multiple paragraphs, tables and so on), you may want to use RDFa or microformats to ensure that structure is available to consumers of your pages. In RDFa, this is done through adding <code>datatype="rdf:XMLLiteral"</code> to the relevant element. In traditional microformats, the handling of the content of an element is determined by the property; in microformats-2, those that retain the HTML structure are named with a <code>e-*</code> prefix, such as <code>e-content</code>.
+            </dd>
+            <dt>Language support</dt>
+            <dd>
+              Microformats and RDFa use the language of the HTML elements in the page (from the <code>@lang</code> attribute) to indicate the language of relevant <a title="value">values</a>. In microdata, the <a>vocabulary</a> has to provide a separate mechanism to indicate a language. If you have multi-lingual information in your pages, you may find it easier to use microformats or RDFa than microdata.
+            </dd>
+            <dt>CSS support</dt>
+            <dd>
+              Because microformats generally use classes to mark up data within an HTML page, it is easy to use CSS to style those elements based on their type. For example <code>.hcard .n { font-weight: bold; }</code> will enbolden any person's name. This is a little harder with microdata where the selector might be something like 
+              <pre>[itemtype~="http://microformats.org/profile/hcard"] [itemprop~="n"]</pre> 
+              or RDFa where it might be 
+              <pre>[typeof~="foaf:Person"] [property~="foaf:name"]</pre>
+              If you are planning to style your page based on the data embedded within it, you may find it easier to use microformats than either microdata or RDFa; if you do style RDFa, you should plan for dependencies between your CSS documents and any prefixes used within it.
+            </dd>
+          </dl>
+          
+          <p class="issue">
+            The handling of language by microdata <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=14470">may change in the future</a>.
+          </p>
+        </section>
+        
+        <section>
+          <h4>Vocabulary Considerations</h4>
+          <p>
+            <a title="vocabulary">Vocabularies</a> and <a title="syntax">syntaxes</a> are closely tied together, especially in the case of microformats. Aspects of a vocabulary to bear in mind are:
+          </p>
+          <ul>
+            <li>How closely does it match with the information that you have?</li>
+            <li>How much support does it have? Are there tools for validating and viewing it? Is there good documentation?</li>
+            <li>How stable is it? Who has control to make changes to it? How frequently might those changes be made?</li>
+            <li>Are other consumers likely to adopt it in the future?</li>
+          </ul>
+        </section>
+        
+        <section>
+          <h4>Usability Considerations</h4>
+          <p>
+            The usability of a particular <a>format</a> is likely to depend on your existing expertise and the match between the structure and content of your web pages and the required structure and content of the format. The best thing to do is to try using the format to mark up an example page from your site.
+          </p>
+        </section>
+      </section>
+      
+      <section id="publishing-in-multiple-formats">
+        <h3>Publishing in Multiple Formats</h3>
+        <p>
+          Publishing in multiple <a title="format">formats</a> can be easy. For example, it may be that different consumers expect HTML data to appear in different places within the page, such as Facebook requiring Open Graph Protocol data to appear within the <code>head</code> of an HTML page, while schema.org markup appears in the <code>body</code> of the page. Or it may be that the items that you need to mark up on the page appear in different places &mdash; events listed in a sidebar while company details are provided in a footer, for example.
+        </p>
+        <p>
+          Different <a title="format">formats</a> and <a title="vocabulary">vocabularies</a> can be used independently in these circumstances. Consumers of the data within your pages might read additional data if it is in a <a>syntax</a> that they recognise &mdash; for example, an processor that recognises both RDFa and microdata will interpret all such markup in the page &mdash; but it should ignore information that is in a vocabulary that it doesn't understand rather than giving an error.
+        </p>
+        <p>
+          Publishing can be harder when there are multiple consumers of information that require different <a title="format">formats</a>. If your target consumers will all accept the same <a>syntax</a>, it is usually easiest to use that single syntax in your pages. However, microdata does not support multiple <a title="type">types</a> for a single <a>entity</a>, so if your target consumers expect different <a title="vocabulary">vocabularies</a> to be used for the same entities you may find it easier to mix syntaxes or to use RDFa or microformats, which do support multiple vocabularies.
+        </p>
+        <section>
+          <h4>Mixing Vocabularies</h4>
+          <p>
+            Methods for marking up the same data in a page using different <a title="vocabulary">vocabularies</a> in the same <a>syntax</a> vary by syntax.
+          </p>
+          <section>
+            <h5>Mixing Vocabularies in Microformats</h5>
+            <p>
+              As microformats are simply indicated through classes, it's possible to mix several within the same set of content. An example is the <a href="http://www.bbc.co.uk/worldservice/bangladeshboat/" rel="nofollow">BBC Bangladesh River Journey</a> page which includes hAtom, hCalendar and geo microformats:
+            </p>
+            <pre>&lt;li class=&quot;<strong>hentry</strong> <strong>vevent</strong> xfolkentry postid-f2068841910&quot;&gt;
+  &lt;h3 class=&quot;<strong>entry-title</strong> <strong>summary</strong>&quot;&gt;
+    &lt;a href=&quot;http://www.flickr.com/photos/bangladeshboat/2068841910&quot; title=&quot;The final picture (on Flickr)&quot;&gt;The final picture&lt;/a&gt;
+  &lt;/h3&gt;
+  &lt;div class=&quot;<strong>entry-content</strong>&quot;&gt;
+    &lt;p class=&quot;photo&quot;&gt;
+      &lt;a rel=&quot;<strong>bookmark</strong>&quot; class=&quot;taggedlink <strong>url</strong>&quot; href=&quot;http://www.flickr.com/photos/bangladeshboat/2068841910&quot; title=&quot;The final picture (on Flickr)&quot;&gt;
+        &lt;img src=&quot;http://farm3.static.flickr.com/2175/2068841910_1162a8086b_s.jpg&quot; 
+             alt=&quot;The final picture (on Flickr)&quot; title=&quot;The final picture (on Flickr)&quot; width=&quot;64&quot; height=&quot;64&quot; /&gt;
+      &lt;/a&gt;
+    &lt;/p&gt;
+    &lt;p class=&quot;<strong>description</strong>&quot;&gt;As the BBC team prepare to disembark the boat, the sun sets overhead, and indeed on the trip itself.&lt;/p&gt;
+  &lt;/div&gt;
+  &lt;ul class=&quot;meta&quot;&gt;
+    &lt;li class=&quot;date&quot;&gt;&lt;abbr class=&quot;published <strong>dtstart</strong>&quot; <strong>title=&quot;2007-11-26T02:11:51+06:00&quot;</strong>&gt;2 days ago&lt;/abbr&gt;&lt;/li&gt;
+    &lt;li class=&quot;location&quot;&gt;&lt;abbr class=&quot;<strong>geo</strong> point-22&quot; <strong>title=&quot;+22.47157;+89.59534&quot;</strong>&gt;Mongla, Bangladesh&lt;/abbr&gt;&lt;/li&gt;
+  &lt;/ul&gt;
+&lt;/li&gt;</pre>
+          </section>
+          <section>
+            <h5>Mixing Vocabularies in RDFa</h5>
+            <p>
+              RDFa is designed to be used with multiple <a title="vocabulary">vocabularies</a>:
+            </p>
+            <ul>
+              <li><a title="type">types</a> and <a title="property">properties</a> are given IRIs as names, so do not have to be disambiguated; IRIs do not have to be written out in full (see below)</li>
+              <li> an <a>entity</a> can be assigned multiple <a title="type">types</a> from different <a title="vocabulary">vocabularies</a> by listing them within the <code>@typeof</code> attribute</li>
+              <li> attributes that indicate <a title="property">properties</a> (<code>@property</code>, <code>@rel</code> and <code>@rev</code>) can take multiple space-separated properties which may be from different vocabularies</li>
+            </ul>
+            <p>
+              Writing out IRIs in full can clutter HTML so RDFa provides four mechanisms to shorten IRIs:
+            </p>
+            <ul>
+              <li>There are several built-in prefixes which can be used for popular vocabularies. These are listed as part of the <a href="http://www.w3.org/2011/rdfa-context/rdfa-1.1.html" class="external text" title="http://www.w3.org/2011/rdfa-context/rdfa-1.1.html" rel="nofollow">RDFa 1.1 Core initial context</a>. Any IRI within one of these <a title="vocabulary">vocabularies</a> can be abbreviated using the <code>prefix:name</code> notation.</li>
+              <li>The <code>@prefix</code> attribute can be used to define additional prefixes for other vocabularies.</li>
+              <li>The <code>@vocab</code> attribute defines a default <a>vocabulary</a> within its scope; any IRIs that begin with this vocabulary can be abbreviated to a short name (the remainder of the IRI after the vocabulary IRI).</li>
+              <li>Namespace declarations (<code>@xmlns:prefix</code> attributes) can also be used to define prefixes. <strong>This mechanism is deprecated and should not be used.</strong></li>
+            </ul>
+            <p>
+              Note that if you use any of the last three mechanisms, the shortened IRIs can only be understood when they are within the scope of the relevant attributes. These can be easy to mislay when people copy and paste HTML from one place to another, or as the result of template changes in a content-management system. We therefore recommend that these attributes are avoided where possible &mdash; use the built-in prefixes or full IRIs in preference &mdash; and, where they are used, placed on elements that represent <a title="entity">entities</a> (those with <code>@about</code> or <code>@typeof</code> attributes) and repeated on each entity element rather than being inherited from an ancestor element.
+            </p>
+          </section>
+          <section id="mixing-vocabularies-microdata">
+            <h5>Mixing Vocabularies in microdata</h5>
+            <p>
+              Microdata is designed such that each piece of information in a page is assigned <a title="type">types</a> from a single <a>vocabulary</a>, though each <a>entity</a> may have multiple types and have <a title="property">properties</a> from other vocabularies.
+            </p>
+            <p>
+              <a title="property">Properties</a> in microdata are either short names (in which case they are scoped to the <a>vocabulary</a> of the <a title="type">types</a> of the entity) or URLs. A URL property has no relationship to a given short name property unless that relationship is specified within the vocabulary that defines the properties.
+            </p>
+            <p>
+              You might find that you need to target two consumers who each recognise items using <a title="type">types</a> from different <a title="vocabularies">vocabularies</a>. For example, you might want to both target schema.org and use the vEvent vocabulary with the original HTML:
+            </p>
+            <pre>&lt;a href=&quot;nba-miami-philidelphia-game3.html&quot;&gt;
+NBA Eastern Conference First Round Playoff Tickets:
+ Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1)
+&lt;/a&gt;
+
+Thu, 04/21/16
+8:00 p.m.
+
+&lt;a href=&quot;wells-fargo-center.html&quot;&gt;
+Wells Fargo Center
+&lt;/a&gt;
+Philadelphia, PA
+
+Priced from: $35
+1938 tickets left</pre>
+            <p>
+              In this case there are three options available to you. The first, if consumers support it, is to use a different <a>syntax</a> for one of the <a title="vocabulary">vocabularies</a>. For example, the vEvent vocabulary is only supported in microdata but schema.org can be consumed from either microdata or RDFa, so it would be possible to mark up the data using the vEvent vocabulary in microdata and the schema.org vocabulary in RDFa. This approach is described in more detail in <a href="#mixing-syntaxes" class="sectionRef"></a>. Mixing syntaxes within a single page is rarely a good option but in some circumstances it may be preferable to the other workarounds described here.
+            </p>
+            <p>
+              The other options for marking up using multiple vocabularies in microdata are described below.
+            </p>
+            <section>
+              <h6>Mixing Vocabularies using a Type Property</h6>
+              <p>
+                Some <a title="vocabulary">vocabularies</a> may define a <a>property</a> through which <a title="type">types</a> from that vocabulary can be assigned to items that are in a different vocabulary. For example, schema.org could define a <code><a href="http://schema.org/type" class="external free" title="http://schema.org/type" rel="nofollow">http://schema.org/type</a></code> property whose <a>value</a> is a URL, and state that any microdata item that a schema.org <a>type</a> as a value for that property is recognised as being an item of that type. In this case, the types specified in the <code>@itemtype</code> attribute are the <b>primary types</b> of the <a>entity</a> and those specified through the property are the <b>secondary types</b>.
+              </p>
+              <p>
+                Alongside the assertion that <a>property</a> URLs that begin with <code><a href="http://schema.org/" rel="nofollow">http://schema.org/</a></code> have the same semantics as short name properties on items with a schema.org type, this enables the schema.org <a>vocabulary</a> to be mixed in with an item marked up using vEvent:
+              </p>
+              <p class="note">
+                At time of writing schema.org does not specify a <code><a href="http://schema.org/type" rel="nofollow">http://schema.org/type</a></code> property and this example will not work.
+              </p>
+              <pre>&lt;div itemscope itemtype=&quot;http://microformats.org/profile/hcalendar#vevent&quot;&gt;
+  <strong>&lt;link itemprop=&quot;http://schema.org/type&quot; href=&quot;http://schema.org/Event&quot;&gt;</strong>
+  &lt;a itemprop=&quot;url <strong>http://schema.org/url</strong>&quot; href=&quot;nba-miami-philidelphia-game3.html&quot;&gt;
+  NBA Eastern Conference First Round Playoff Tickets:
+  &lt;span itemprop=&quot;summary <strong>http://schema.org/name</strong>&quot;&gt; Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1) &lt;/span&gt;
+  &lt;/a&gt;
+
+  &lt;meta itemprop=&quot;dtstart <strong>http://schema.org/startDate</strong>&quot; content=&quot;2016-04-21T20:00&quot;&gt;
+    Thu, 04/21/16
+    8:00 p.m.
+
+  &lt;div itemprop=&quot;location&quot;&gt;
+    <strong>&lt;div itemprop=&quot;http://schema.org/location&quot; itemscope itemtype=&quot;http://schema.org/Place&quot;&gt;
+      &lt;a itemprop=&quot;url&quot; href=&quot;wells-fargo-center.html&quot;&gt;
+      Wells Fargo Center
+      &lt;/a&gt;
+      &lt;div itemprop=&quot;address&quot; itemscope itemtype=&quot;http://schema.org/PostalAddress&quot;&gt;
+        &lt;span itemprop=&quot;addressLocality&quot;&gt;Philadelphia&lt;/span&gt;,
+        &lt;span itemprop=&quot;addressRegion&quot;&gt;PA&lt;/span&gt;
+      &lt;/div&gt;
+    &lt;/div&gt;</strong>
+  &lt;/div&gt;
+
+  <strong>&lt;div itemprop=&quot;http://schema.org/offers&quot; itemscope itemtype=&quot;http://schema.org/AggregateOffer&quot;&gt;
+    Priced from: &lt;span itemprop=&quot;lowPrice&quot;&gt;$35&lt;/span&gt;
+    &lt;span itemprop=&quot;offerCount&quot;&gt;1938&lt;/span&gt; tickets left
+  &lt;/div&gt;</strong>
+&lt;/div&gt;</pre>
+              <p>
+                Note in particular that the vEvent <code>location</code> <a>property</a> takes text while the schema.org <code>location</code> property takes structured information about the location. These are combined by having an element for the property which requires structured information nested within the property that requires text.
+              </p>
+              <p>
+                Also note that in this example the <code><a href="http://schema.org/type" class="external free" title="http://schema.org/type" rel="nofollow">http://schema.org/type</a></code> <a>property</a> is only used where necessary, on the <a>entity</a> which needs to be marked as an event in both <a title="vocabulary">vocabularies</a>. Where possible, the schema.org <a>type</a> for an entity is provided explicitly through the <code>@itemtype</code> attribute.
+              </p>
+              <p>
+                This method of mixing <a title="vocabulary">vocabularies</a> requires vocabularies to specify how consumers should recognise items of a particular <a>type</a>. It is recommended that vocabulary authors define an <code>@itemtype</code>-equivalent <a>property</a>, and that, for better integration with RDF tools, this property is <code><a href="http://www.w3.org/1999/02/22-rdf-syntax-ns#type">http://www.w3.org/1999/02/22-rdf-syntax-ns#type</a></code> (TODO: Issue about what to recommend here.)
+              </p>
+              <p>
+                The other disadvantage of this approach is that there is no support within the microdata API for retrieving items based on the <a>value</a> of a <a>property</a>. In the example above, it would be possible to retrieve the event using:
+              </p>
+              <pre>document.getItems('http://microformats.org/profile/hcalendar#vevent')</pre>
+              <p>
+                but not through:
+              </p>
+              <pre>document.getItems('http://schema.org/Event')</pre>
+              <p>
+                Scripts that extract microdata information using the DOM will be faster if they can use the primary <a title="type">types</a> for an item, specified within the <code>@itemtype</code> attribute, so you should specify types accessed through scripts within <code>@itemtype</code> rather than through a <a>property</a>  wherever possible.
+              </p>
+            </section>
+            <section>
+              <h6>Mixing Vocabularies using Repeated Content</h6>
+              <p>
+                The second method of supporting multiple <a title="property">properties</a> is to have the <a>entity</a> represented by two (or more) microdata items on the page. To enable dragging and dropping the data from these items, they should be nested inside each other. Properties can be set on the outer element using <code>link</code> and <code>meta</code> elements which are hidden from users, while the visible content of the page is marked up by the inner element.
+              </p>
+              <pre>&lt;div itemscope itemtype=&quot;<strong>http://microformats.org/profile/hcalendar#vevent</strong>&quot;&gt;
+  <strong>&lt;link itemprop=&quot;url&quot; href=&quot;nba-miami-philidelphia-game3.html&quot;&gt;
+  &lt;meta itemprop=&quot;summary&quot; content=&quot;Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1)&quot;&gt;
+  &lt;meta itemprop=&quot;dtstart&quot; content=&quot;2016-04-21T20:00&quot;&gt;
+  &lt;meta itemprop=&quot;location&quot; content=&quot;Wells Fargo Center, Philadelphia, PA&quot;&gt;</strong>
+  &lt;div itemscope itemtype=&quot;<strong>http://schema.org/Event</strong>&quot;&gt;
+    &lt;a itemprop=&quot;url&quot; href=&quot;nba-miami-philidelphia-game3.html&quot;&gt;
+    NBA Eastern Conference First Round Playoff Tickets:
+    &lt;span itemprop=&quot;name&quot;&gt; Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1) &lt;/span&gt;
+    &lt;/a&gt;
+
+    &lt;meta itemprop=&quot;startDate&quot; content=&quot;2016-04-21T20:00&quot;&gt;
+      Thu, 04/21/16
+      8:00 p.m.
+
+    &lt;div itemprop=&quot;location&quot; itemscope itemtype=&quot;http://schema.org/Place&quot;&gt;
+      &lt;a itemprop=&quot;url&quot; href=&quot;wells-fargo-center.html&quot;&gt;
+      Wells Fargo Center
+      &lt;/a&gt;
+      &lt;div itemprop=&quot;address&quot; itemscope itemtype=&quot;http://schema.org/PostalAddress&quot;&gt;
+        &lt;span itemprop=&quot;addressLocality&quot;&gt;Philadelphia&lt;/span&gt;,
+        &lt;span itemprop=&quot;addressRegion&quot;&gt;PA&lt;/span&gt;
+      &lt;/div&gt;
+    &lt;/div&gt;
+
+    &lt;div itemprop=&quot;offers&quot; itemscope itemtype=&quot;http://schema.org/AggregateOffer&quot;&gt;
+      Priced from: &lt;span itemprop=&quot;lowPrice&quot;&gt;$35&lt;/span&gt;
+      &lt;span itemprop=&quot;offerCount&quot;&gt;1938&lt;/span&gt; tickets left
+    &lt;/div&gt;
+  &lt;/div&gt;
+&lt;/div&gt;</pre>
+              <p>
+                This method does not require any special <a title="property">properties</a> to be defined in the <a title="vocabulary">vocabularies</a> used to mark up the page, and the two items are directly assigned the relevant <a>type</a> and are thus accessible to scripts through the <code>document.getItems()</code> method.
+              </p>
+              <p>
+                The disadvantages of this method are that the page contains more items than there are <a title="entity">entities</a> (in the above example, two items representing the same event), and it requires repetition of data within the page.
+              </p>
+            </section>
+          </section>
+        </section>
+        <section id="mixing-syntaxes">
+          <h3>Mixing Syntaxes</h3>
+          <p>
+            A requirement to support a large range of consumers can mean that it becomes necessary to publish using not only multiple <a title="vocabulary">vocabularies</a> but multiple syntaxes.
+          </p>
+          <p>
+            RDFa, microformats and microdata all share the same basic <a>entity</a>/<a>property</a>/<a>value</a> model, so in many cases it is possible to mirror attributes across the syntaxes. The following example shows the same content marked up with:
+          </p>
+          <ul>
+            <li>hCalendar (microformat)</li>
+            <li>schema.org (RDFa)</li>
+            <li>vEvent (microdata)</li>
+          </ul>
+          <pre>&lt;div <strong>class=&quot;vevent&quot;</strong>
+  <strong>itemscope itemtype=&quot;http://microformats.org/profile/hcalendar#vevent&quot;</strong>
+  <strong>vocab=&quot;http://schema.org/&quot; typeof=&quot;Event&quot;</strong>&gt;
+  &lt;a <strong>class=&quot;url&quot; itemprop=&quot;url&quot; property=&quot;url&quot;</strong> href=&quot;nba-miami-philidelphia-game3.html&quot;&gt;
+    NBA Eastern Conference First Round Playoff Tickets:
+    &lt;span <strong>class=&quot;summary&quot; itemprop=&quot;summary&quot; property=&quot;name&quot;</strong>&gt; Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1) &lt;/span&gt;
+  &lt;/a&gt;
+
+  <strong>&lt;meta itemprop=&quot;dtstart&quot; property=&quot;startDate&quot; content=&quot;2016-04-21T20:00:00&quot;&gt;</strong>
+  &lt;abbr <strong>class=&quot;dtstart&quot; title=&quot;2016-04-21T20:00:00&quot;</strong>&gt;
+    Thu, 04/21/16
+    8:00 p.m.
+  &lt;/abbr&gt;
+
+  &lt;div <strong>class=&quot;location&quot; itemprop=&quot;location&quot; 
+       vocab=&quot;http://schema.org/&quot; property=&quot;location&quot; typeof=&quot;Place&quot;</strong>&gt;
+    &lt;a <strong>property=&quot;url&quot;</strong> href=&quot;wells-fargo-center.html&quot;&gt;
+      Wells Fargo Center
+    &lt;/a&gt;
+    &lt;div <strong>property=&quot;address&quot; vocab=&quot;http://schema.org/&quot; typeof=&quot;PostalAddress&quot;</strong>&gt;
+      &lt;span <strong>property=&quot;addressLocality&quot;</strong>&gt;Philadelphia&lt;/span&gt;,
+      &lt;span <strong>property=&quot;addressRegion&quot;</strong>&gt;PA&lt;/span&gt;
+    &lt;/div&gt;
+  &lt;/div&gt;
+
+  &lt;div <strong>vocab=&quot;http://schema.org/&quot; property=&quot;offers&quot; typeof=&quot;AggregateOffer&quot;</strong>&gt;
+    Priced from: &lt;span <strong>property=&quot;lowPrice&quot;</strong>&gt;$35&lt;/span&gt;
+    &lt;span <strong>property=&quot;offerCount&quot;</strong>&gt;1938&lt;/span&gt; tickets left
+  &lt;/div&gt;
+&lt;/div&gt;</pre>
+          <p>
+            It is particularly important to check pages in which <a title="syntax">syntaxes</a> are mixed together using an appropriate validator for each format.
+          </p>
+          <p>
+            The following guidelines may help when creating pages in which different <a title="syntax">syntaxes</a> are mixed together.
+          </p>
+          <ul>
+            <li>
+              microformats do not use <code>link</code> or <code>meta</code> elements within the content of the page and in some cases require particular elements to be used to encode information, such as using <code>abbr</code> to support the <a href="http://microformats.org/wiki/datetime-design-pattern">datetime-design-pattern</a> as illustrated by the <code>dtstart</code> <a>property</a> in the example above
+            </li>
+            <li>
+              link relations required in certain microformats, particularly XFN, clash with the use of RDFa's <code>@vocab</code> attribute; avoid using <code>@vocab</code> on any ancestor of an element that contains a <code>@rel</code>
+            </li>
+            <li>
+              the following equivalencies between RDFa and microdata attributes generally hold true:
+              <ul>
+                <li><code>@itemid</code> = <code>@resource</code></li>
+                <li><code>@itemtype</code> = <code>@typeof</code> (+ <code>@vocab</code> to enable the use of short names for properties)</li>
+                <li><code>@itemprop</code> + <code>@itemscope</code> = <code>@property</code> + an empty <code>@typeof</code> if there's no <code>@itemtype</code></li>
+                <li><code>@itemprop</code> otherwise = <code>@property</code></li>
+              </ul>
+            </li>
+            <li>
+              when using RDFa, any <a>property</a> elements within an element with a <code>@href</code> will be taken as being properties of the <a>entity</a> identified by the URL in that <code>@href</code>; as long as the link doesn't have a <code>@rel</code>, this can be avoided by adding an empty <code>@property</code> to the link. If the link does have a <code>@rel</code>, you can either move the property elements outside the link or add a <code>@resource</code> attribute whose value is the same as the <code>@resource</code> on the entity element (this can be a local "blank node" identifier in the form <code>_:<var>localName</var></code>
+            </li>
+            <li>
+              RDFa <a title="vocabulary">vocabularies</a> are typically stricter in the range of <a title="value">values</a> that they accept for <a title="property">properties</a> that take dates and times; it is best to use the syntax <code>YYYY-MM-DD</code> for dates, <code>hh:mm:ss</code> for times and <code>YYYY-MM-DDThh:mm:ss</code> for dateTimes to be compliant with the <a href="http://www.w3.org/TR/xmlschema-2/#dateTime">XML Schema dates and times</a> which RDFa-based vocabularies will typically use
+            </li>
+            <li>
+              the <code>@datatype</code> attribute might be required for some RDFa <a title="vocabulary">vocabularies</a>/consumers; others will coerce <a title="value">values</a> into the appropriate datatype based on the <a>property</a> itself. However, if a property takes a structured value, the property element must have <code>datatype="rdf:XMLLiteral"</code> for that structure to be preserved
+            </li>
+          </ul>
+          <p class="issue">
+            The guidance above does not adhere to the RDFa 1.1 Lite set of attributes, because of the use of the <code>@resource</code> attribute rather than the <code>@about</code> attribute. However, using <code>@resource</code> gives a more natural mapping when mixing RDFa and microdata within a page. See <a href="http://www.w3.org/2010/02/rdfa/track/issues/119">ISSUE-119</a>.
+          </p>
+          <p class="issue">
+            It is likely that the HTML5 <code>time</code> element will accept types of values that do not have an equivalent XML Schema datatype. These should be avoided when using RDFa. See <a href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=14881">bug 14881</a>.
+          </p>
+          <p class="issue">
+            In (X)HTML5 markup, unprefixed values in the <code>@rel</code> attribute will usually be ignored unless there is a <code>@vocab</code> attribute in scope. In RDFa in XHTML 1.1, some unprefixed values will be recognised as known terms and used to create triples. See <a href="http://www.w3.org/2010/02/rdfa/track/issues/108">ISSUE-108</a>.
+          </p>
+        </section>
+      </section>
+        
+      <section>
+        <h3>Good Publishing Practice</h3>
+        <p>
+          Valid HTML is particularly important in pages that contain embedded markup. All methods of embedding data within HTML use the structure of the HTML to determine the meaning of the additional markup. For example, in microdata the item to which an element with an <code>@itemprop</code> attribute assigns a <a>property</a> is usually the closest ancestor element with a <code>@itemscope</code> attribute.
+        </p>
+        <p>
+          In some cases, elements can be moved when HTML is parsed into a DOM. This can lead to <a title="property">properties</a> unexpectedly referring to the wrong entity, and, if you are serving your documents as XHTML (with a <code>application/xhtml+xml</code> mime type), it can cause discrepancies between the data gleaned by XML-based consumers and HTML-aware consumers. There are two causes for this:
+        </p>
+        <ul>
+          <li>
+            Error correction in HTML parsing can restructure invalid HTML is restructured to make it valid, for example non-table markup within a table is moved to before the table. This includes <code>link</code> and <code>meta</code> elements that are directly within the <code>table</code> element. You can avoid this restructuring by making sure that your HTML is valid so that it is not needed.
+          </li>
+          <li>
+            Some older browsers may move <code>meta</code> and/or <code>link</code> elements in the <code>body</code> of an HTML document to within the <code>head</code> element, because they could not validly appear within the body in older versions of HTML. If you are targeting consumers which run within older browsers, such as scripts or plug-ins, you can avoid this restructuring by using empty <code>span</code> or other elements instead of <code>link</code> or <code>meta</code>; other consumers should be using an up-to-date HTML5 parser which will not do this.
+          </li>
+        </ul>
+        <p>
+          It is good practice to test the data that you expose within your page against a parser that will show you the data your page contains. It is also good practice to test the data that you expose using a tool that understands the <a>vocabulary</a> you are using. Consumers may provide testing tools and validators for this purpose, or you may need to check the way that vocabulary-specific tools behave with your data.
+        </p>
+        <p>
+          The goal of publishing HTML data is to enable consumers to reuse it. To make it clear how the HTML data you publish can be reused, you should include information about the rights holder and license that the information is made under. There are a number of <a title="vocabulary">vocabularies</a> that enable you to do this, such as schema.org, rel-license, Creative Commons and Dublin Core. Your target consumers should indicate which <a title="format">formats</a> they understand when it comes to expressing licensing information and which licenses they know about, and you should choose a relevant <a>format</a> in the same way as you do for the core data that you are publishing.
+        </p>
+      </section>
+    </section>
+    
+    <section>
+      <h2>Consumers</h2>
+      <p>
+        You will find it easier to consume and combine data published using a single <a>format</a> (syntax and vocabulary). To decide which to consume, you should first look at what formats your target publishers are currently using. It may be that these contain sufficient information for your application.
+      </p>
+      <p>
+        If the publishers whom you are targeting are already publishing using multiple formats, you may want to consume from all those formats (see <a href="#consuming-multiple-formats" class="sectionRef"></a>) in order to maximise the data that you can collect while minimising the impact on the publishers who are providing that information. If you are consuming microdata and storing the results as RDF, you should <a href="/wiki/Mapping_Microdata_to_RDF" title="Mapping Microdata to RDF">follow a standard mapping</a>.
+      </p>
+      <p>
+        If current <a title="format">formats</a> do not encode the information you need to the detail you need it for your application, publishers will be more likely to publish extra data for you to consume if you:
+      </p>
+      <ul>
+        <li><a href="/wiki/HTML_Data_Vocabularies" title="HTML Data Vocabularies">extend existing common vocabularies</a> they are already using</li>
+        <li>consume data from a <a>syntax</a> they already use</li>
+      </ul>
+      <p>
+        If you cannot simply extend an existing <a>vocabulary</a>, you will need to create your own vocabulary and choose which <a title="syntax">syntaxes</a> to support with that vocabulary.
+      </p>
+      
+      <section>
+        <h3>Choosing a Syntax to Consume</h3>
+        <p>
+          As you choose syntax, you should take into account the following considerations.
+        </p>
+        
+        <section>
+          <h3>Application Considerations</h3>
+          <p>
+            Microdata, RDFa and microformats-2 all use a generic <a>syntax</a>, which means that it's possible to have generic parsers operate over them to extract data. In the case of microdata and microformats-2, the data has a JSON structure; data extracted from RDFa has a RDF structure (microdata can also be converted into RDF).
+          </p>
+          <p>
+            Generic applications can work in the browser to do things such as highlighting markup that follows a particular syntax or enabling users to download the data embedded within a page into a separate file. These can also use the context in which the HTML data is found to provide additional features. For example, generic consumers may detect that each row in a table is associated with a distinct <a>entity</a>, and each cell with a particular <a>property</a>, and enable users to sort that table based on property <a title="value">values</a>. In this case, a consumer could ensure that when values are marked up as dates, times or durations using the <code>time</code> element, the items are sorted by date/time/duration rather than alphabetically.
+          </p>
+          <p>
+            Both microformats-2 and RDFa provide additional facilities that enable publishers to indicate the datatypes of <a>values</a> to support generic consumers. Microformats-2 properties have a prefix that can indicate when a value is a URL (<code>u-*</code>), a date/time (<code>dt-*</code>), extended HTML (<code>e-*</code>) or a string (<code>p-*</code>). RDFa supports a <code>@datatype</code> attribute that publishers can use to indicate the datatype of a value, usually an XML Schema datatype such as <code>xsd:integer</code> or <code>xsd:language</code>. Note that once microformats-2 data is extracted from a page into JSON, these prefixes are no longer available, so a consumer of the JSON has to know the <a>vocabulary</a> to tell whether a given value should be interpreted as a string or as HTML markup, for example. In contrast, the datatypes used to annotate RDFa values are carried within the RDF data.
+          </p>
+          <p>
+            RDFa also adheres to a follow-your-nose principle, whereby <a>vocabulary</a> authors are encouraged to provide a machine-readable description of <a title="type">types</a> and <a title="property">properties</a> at the URL used for the type or property. This can enable generic processors to automatically pick up additional information about the type or property such as labels, help text, supertypes, property cardinality and ranges and so on. While microdata also uses URLs for types and properties, microdata consumers are not permitted to dereference URLs that they do not already recognise.
+          </p>
+        </section>
+        
+        <section>
+          <h4>Tooling Considerations</h4>
+          <p>
+            Applications vary widely in terms of the tooling that they need. A script that runs in a publisher's page needs easy access to data through a DOM API. A crawler that creates a store of data from a set of distributed pages requires a server-side parser and good storage and querying support.
+          </p>
+          <p>
+            As a consumer, you will be led by the requirements you have for your application and the experience that you have with different technology sets. It's important, however, to also consider the experience and capabilities of the publishers that are providing you with data, and which <a title="format">formats</a> they will find easy to publish given their tooling. You should also consider the ease with which you can provide support tools for the format, such as validators or previewers that make it easy for publishers to tell whether they have published data correctly within their pages.
+          </p>
+          <p>
+            There are several specifications that can be used to provide standard mechanisms for accessing, manipulating, querying and validating data gleaned from HTML pages. However, you should check what has been implemented in your environment: it may be that there isn't an implementation that follows a standard, but there is one that provides its own API which enables you to do what you need to do.
+          </p>
+          
+          <section>
+            <h5>Microdata/Microformats-2 Data Model</h5>
+            <p>
+              Microdata and microformats-2 can be mapped to the same <a href="http://dev.w3.org/html5/md/Overview.html#json" class="external text" title="http://dev.w3.org/html5/md/Overview.html#json" rel="nofollow">basic (JSON) data model</a>. Processing JSON into native programming structures, in Javascript and other languages, is usually very easy. Vocabularies are usually described in specification prose rather than a formal language.
+            </p>
+            <ul>
+              <li><a href="http://dev.w3.org/html5/md/Overview.html#microdata-dom-api" class="external text" title="http://dev.w3.org/html5/md/Overview.html#microdata-dom-api" rel="nofollow">microdata DOM API</a> &mdash; part of microdata specification (W3C Last Call Working Draft)</li>
+              <li><a href="http://tools.ietf.org/html/draft-zyp-json-schema-03" class="external text" title="http://tools.ietf.org/html/draft-zyp-json-schema-03" rel="nofollow">JSON Schema</a> &mdash; schema language for JSON (IETF Internet Draft)</li>
+            </ul>
+          </section>
+          
+          <section>
+            <h5>RDF Data Model</h5>
+            <p>
+              RDFa processors extract an RDF data model and processors can also generate <a href="https://dvcs.w3.org/hg/htmldata/raw-file/default/microdata-rdf/index.html">RDF from microdata</a>. There are a number of standards for formally expressing RDF <a title="vocabulary">vocabularies</a> and querying RDF, and drafts in progress for DOM-based manipulation of RDFa content.
+            </p>
+            <ul>
+              <li><a href="http://www.w3.org/TR/rdfa-api/" class="external text" title="http://www.w3.org/TR/rdfa-api/" rel="nofollow">RDFa API</a> &mdash; W3C Working Draft</li>
+              <li> <a href="http://json-ld.org/spec/latest/" class="external text" title="http://json-ld.org/spec/latest/" rel="nofollow">JSON-LD</a> &mdash; JSON representation of RDF (Unofficial Draft)</li>
+              <li><a href="http://www.w3.org/TR/rdf-sparql-query/" class="external text" title="http://www.w3.org/TR/rdf-sparql-query/" rel="nofollow">SPARQL</a> &mdash; query language for RDF (W3C Recommendation)</li>
+              <li><a href="http://www.w3.org/TR/sparql11-overview/" class="external text" title="http://www.w3.org/TR/sparql11-overview/" rel="nofollow">SPARQL 1.1</a> &mdash; W3C Working Draft</li>
+              <li><a href="http://www.w3.org/TR/rdf-mt/" class="external text" title="http://www.w3.org/TR/rdf-mt/" rel="nofollow">RDFS</a> &mdash; vocabulary description language for RDF (W3C Recommendation)</li>
+              <li><a href="http://www.w3.org/TR/owl-primer/" class="external text" title="http://www.w3.org/TR/owl-primer/" rel="nofollow">OWL</a> &mdash; ontology language for RDF (W3C Recommendation)</li>
+            </ul>
+          </section>
+        </section>
+        
+        <section>
+          <h4>Data Model Considerations</h4>
+          <p>
+            Microdata uses a JSON-based data model of a tree of objects which may be identified through a URI, with <a title="property">properties</a> whose <a title="value">values</a> are strings. microformats-2 uses a similar JSON-based data model of a tree of objects, but they do not have identifiers and their property values may be strings, URLs, date/times or structured HTML values. RDFa uses RDF as its data model, which is a graph of objects identified by URLs with properties whose values may be other objects, lists or literal values which can be tagged with a language or any datatype. These different models have different capabilities.
+          </p>
+          <dl>
+            <dt>Structured HTML values</dt>
+            <dd>
+              Under appropriate conditions, RDFa and microformats will use markup within the content of an element to provide a <a>property</a> value; in microdata <a title="value">values</a> never retain markup. If you wish to consume data that may contain markup &mdash; be it structures such as multiple paragraphs, list items, tables, or inline markup such as emphases, links or ruby markup &mdash; you will need publishers to use RDFa or microformats to mark up that data. In RDFa, this is done by publishers adding <code>datatype="rdf:XMLLiteral"</code> to elements whose markup should be preserved. In microformats, the handling of the content of an element is determined by the property; in microformats-2, those that retain the HTML structure are named with a <code>e-*</code> prefix, such as <code>e-content</code>.
+            </dd>
+            <dt>Language support</dt>
+            <dd>
+              Microformats and RDFa use the language of the HTML elements in the page (from the <code>@lang</code> attribute) to indicate the language of relevant values. In microdata, the <a>vocabulary</a> has to provide a separate mechanism to indicate a language. If you are consuming information about the same things from pages that use different languages, or anticipate publishers using multiple languages in their pages to describe a particular entity, you can automatically pick up the language of the content of the page if publishers use microformats or RDFa. If you consume microdata, you need to provide specific <a title="property">properties</a> in your vocabulary that publishers can use to indicate the language of the content.
+            </dd>
+          </dl>
+          <p class="issue">
+            The handling of language by microdata <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=14470">may change in the future</a>.
+          </p>
+        </section>
+        
+        <section>
+          <h4>Usability Considerations</h4>
+          <p>
+            Publishing data within HTML can be a challenge for publishers, simply because the structure of the data that they publish is not immediately visible within their pages. The publishers you are targeting will have different levels of skill and experience, which may influence your choice of <a>syntax</a> and the way in which you design your vocabulary. If you can, you should try to work closely with a few target publishers to better understand their requirements and constraints. Experimenting with marking up a few of their existing pages will often highlight issues with both syntax and vocabulary.
+          </p>
+          <p>
+            Some usability issues may be addressed by restricting the set of attributes that you instruct publishers how to use, or by restricting their location to provide more consistency. For example:
+          </p>
+          <ul>
+            <li><a href="http://www.w3.org/2010/02/rdfa/sources/rdfa-lite/Overview-src.html" rel="nofollow">RDFa 1.1 Lite</a> is an authoring profile of RDFa 1.1 that is sufficient for most data publishing</li>
+            <li> most microdata markup does not require <code>@itemid</code> or <code>@itemref</code></li>
+            <li>constraining data markup to the <code>head</code> of an HTML document can make it easier to author and protect it from templating changes, although it also runs the risk of getting out of sync with the content of the page, increases repetition, and is hard to use for anything but flat data structures</li>
+          </ul>
+          <p>
+            Profiling microdata and RDFa is useful for documentation, but consumers should still recognise and understand the full set of syntactic constructs described by the standards. This ensures that those publishers who find that they need the more advanced constructs to mark up their pages can do so, and means that publishers can use general-purpose tools and documentation rather than just those that you provide.
+          </p>
+        </section>
+      </section>
+      
+      <section id="consuming-multiple-formats">
+        <h3>Consuming Pages with Multiple Formats</h3>
+        <p>
+          In attempting to provide information to multiple consumers, publishers may use several <a title="format">formats</a> within a single page. Consumers should ignore data in <a title="vocabulary">vocabularies</a> that they do not recognise and only raise errors for unexpected <a title="property">properties</a> in those vocabularies.
+        </p>
+        <p>
+          Consumers of HTML data may recognise several <a title="format">formats</a> embedded within a given page, and even within the same part of a page. In these cases, consumers should merge from the different formats; in the example above, a consumer should recognise that the data in vEvent, hCalendar and schema.org is about is a single event rather than interpreting it as three events and merge <a>property</a> <a title="value">values</a> so that the event ends up having a single URL rather than several. Different formats may provide information about different aspects of an <a>entity</a> to different levels of fidelity &mdash; in the example above, the schema.org RDFa provided extra details about the location of the event t to the vEvent or hCalendar formats &mdash; and consumers should seek to use whatever gives them the most detailed information.
+        </p>
+      </section>
+      
+      <section>
+        <h3>Good Consumption Practice</h3>
+        <p>
+          It is good practice for a consumer to provide tools that help publishers to see how the data within their pages is interpreted by the consumer and that highlight any errors in the markup, such as invalid <a title="value">values</a> or missing required <a title="property">properties</a>.
+        </p>
+        <p>
+          It is good practice for consumers to ignore markup that uses <a>syntax</a> or <a title="vocabulary">vocabularies</a> that they do not understand. Properties and <a title="type">types</a> in unrecognised vocabularies should be ignored by consumers.
+        </p>
+        <p>
+          The presence of HTML data within a website does not imply that the data can be used without restriction. Publishers may license the information provided through HTML data, for example to restrict it to non-commercial use or to use only with attribution. It is good practice for a consumer to honour licenses and to indicate to publishers which <a title="format">formats</a> they recognise for expressing licensing information within HTML pages, and which licenses they recognise as indicating that the data within the page is consumable. Typical <a title="vocabulary">vocabularies</a> for expressing this information are schema.org, rel-license, Creative Commons or Dublin Core.
+        </p>
+        <p>
+          Even when the use of data is unrestricted, it is good practice for consumers to record the source of the information that they use and, when republishing that data, provide metadata about the rights holder, source and license under which the information is available, using the same <a title="vocabulary">vocabularies</a> as those listed above.
+        </p>
+      </section>
+    </section>
+
+    <section>
+      <h2>Vocabulary Authors</h2>
+      <p>
+        Designing <a title="vocabulary">vocabularies</a> is a complex craft, and this document does not cover all aspects of how to go about it. There are several existing more general resources for vocabulary creators, such as:
+      </p>
+      <ul>
+        <li><a href="http://microformats.org/wiki/process" rel="nofollow">the microformats process</a></li>
+        <li><a href="http://www.w3.org/2001/sw/interest/webschema.html" rel="nofollow">SWIG Web Schemas Task Force</a></li>
+      </ul>
+
+      <section>
+        <h3>Extending Vocabularies</h3>
+        <p>
+          There are already many <a title="vocabulary">vocabularies</a> in existence, particularly for common domains such as people, organisations, events, products, reviews, recipes and so on. Reusing these vocabularies benefits consumers because it saves design time and means they do not have to create supporting tools and materials such as validators, previewers or documentation. It also benefits publishers because it increases the likelihood that the data within their pages can be consumed by other useful tools. It is therefore good practice to extend existing vocabularies rather than creating new ones, where possible.
+        </p>
+        <p>
+          This section describes some of the issues that <a>vocabulary</a> authors who extend existing vocabularies need to be aware of.
+        </p>
+        <section>
+          <h4>Extending Microformats</h4>
+          <p>
+            Microformats are developed using an iterative process whereby proposals for extensions are <a href="http://microformats.org/wiki/process#Brainstorm_Proposals">brainstormed</a> and eventually either accepted or rejected by the microformats community. It is not appropriate to create unilateral extensions to microformats. On the other hand, publishers should use semantic classes within their HTML, whether or not they are used within current microformats. Evidence of use of semantic classes within HTML pages is one input to the microdata standardisation process.
+          </p>
+        </section>
+        <section>
+          <h4>Extending RDF Vocabularies</h4>
+          <p>
+            RDF <a title="vocabulary">vocabularies</a>, which are used within RDFa, use IRIs for <a title="type">types</a> and <a title="property">properties</a>. Any resource in RDFa can be extended by adding new types to the <code>@typeof</code> attribute and/or adding new properties from different vocabularies. However, it is not general practice to allow RDF vocabularies themselves to be extended with new types or properties by third parties.
+          </p>
+          <p>
+            One pattern that is quite common is for one <a>vocabulary</a> to accept a string for a <a>property</a>, such as an address, and for an extension to provide more structure for that property. In this case, a useful pattern is to nest the more structured property inside the textual property within the HTML. For example:
+          </p>
+          <pre>&lt;div <strong>property=&quot;location&quot;</strong>&gt;
+  &lt;address <strong>property=&quot;http://example.org/address&quot; vocab=&quot;http://example.org/&quot; typeof=&quot;Address&quot;</strong>&gt;
+    &lt;span property=&quot;name&quot;&gt;The White House&lt;/span&gt;&lt;br&gt;
+    &lt;span property=&quot;street&quot;&gt;1600 Pennsylvania Avenue NW&lt;/span&gt;&lt;br&gt;
+    &lt;span property=&quot;city&quot;&gt;Washington&lt;/span&gt;, &lt;span property=&quot;state&quot;&gt;DC&lt;/span&gt; &lt;span property=&quot;zip&quot;&gt;20500&lt;/span&gt;
+  &lt;/address&gt;
+&lt;/div&gt;</pre>
+          <p>
+            This pattern also works for <a title="property">properties</a> whose <a title="value">values</a> are XML literals; in this case, the XML literal will include the RDFa markup.
+          </p>
+        </section>
+        <section>
+          <h4>Extending Microdata Vocabularies</h4>
+          <p>
+            Microdata items can have both <a title="property">properties</a> that are scoped to the <a>type</a> of the item and <a title="property">properties</a> that have absolute URLs. The acceptability of non-URL properties is determined by the <a>vocabulary</a> author of the type of the item; some vocabularies may define a set of acceptable properties, others say that any properties are acceptable. In all cases, however, it's possible to add properties to items if they are named with an absolute URL. Third parties who wish to extend an existing type with new properties should check the constraints of the type being extended to work out whether it's possible to use a non-URL property or not. Note that there is always a possibility, if you do use a non-URL property name, that your extension will conflict with an extension made by someone else; properties whose names are absolute URLs do not have this issue but are more verbose when used in markup.
+          </p>
+          <p>
+            Microdata does not allow items to have multiple <a title="type">types</a> from different vocabularies. Some vocabularies, such as schema.org, may permit third parties to freely extend existing types within that vocabulary. In this case, items should be assigned both the supertype and the extension type within the <code>@itemtype</code> attribute. For example, schema.org describes a <a href="http://schema.org/docs/extension.html">method of extending its vocabulary</a> that involves identifying an appropriate supertype or superproperty and appending a <code>/</code> and then the name of a subtype or subproperty. Schema.org also permits anyone to create additional non-URL properties on these new types. To extend schema.org's types with a type for a member of parliament, a <a>vocabulary</a> author might use the URI <code>http://schema.org/Person/MP</code>, and mark up their page with
+          </p>
+          <pre>&lt;p itemscope itemtype=&quot;<strong>http://schema.org/Person http://schema.org/Person/MP</strong>&quot;&gt;
+  &lt;span itemprop=&quot;<strong>name</strong>&quot;&gt;David Cameron&lt;/span&gt; is the member of parliament for &lt;span itemprop=&quot;<strong>constituency</strong>&quot;&gt;Witney&lt;/span&gt;.
+&lt;/p&gt;</pre>
+          <p>
+            Here, both <code>http://schema.org/Person</code> and <code>http://schema.org/Person/MP</code> are given as <a title="type">types</a>, and the non-URL <code>constituency</code> <a>property</a> is used despite it not being defined within the schema.org vocabulary.
+          </p>
+          <p>
+            Other microdata <a title="vocabulary">vocabularies</a> do not enable third parties to extend the vocabulary. In these cases, third parties should use a URL <a>property</a> to specify the additional <a>type</a> for the item. For compatibility with RDF, we recommend using <code>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</code> for this property, and using a full URL for the type. An alternative to the example above that didn't use the schema.org extension mechanism would be:
+          </p>
+          <pre>&lt;p itemscope itemtype=&quot;http://schema.org/Person&quot;&gt;
+  <strong>&lt;link itemprop=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&quot; href=&quot;http://gov.example.org/uk/MP&quot;&gt;</strong>
+  &lt;span itemprop=&quot;<strong>name</strong>&quot;&gt;David Cameron&lt;/span&gt; is the member of parliament for &lt;span itemprop=&quot;<strong>http://gov.example.org/uk/constituency</strong>&quot;&gt;Witney&lt;/span&gt;.
+&lt;/p&gt;</pre>
+          <p>
+            More details about the use and limitations of this technique can be found in <a href="#mixing-vocabularies-microdata" class="sectionRef"></a>.
+          </p>
+          <p>
+            The technique described for RDFa above, of nesting a <a>property</a> that contains more structure within a property that has less, can also be used with microdata content.
+          </p>
+        </section>
+      </section>
+      <section>
+        <h3>Designing Vocabularies</h3>
+        <p>
+          This section looks at the particular requirements of different HTML data <a href="syntax">syntaxes</a> on <a title="vocabulary">vocabularies</a>, and how to create vocabularies that can be used across HTML data syntaxes.
+        </p>
+        <section>
+          <h4>Syntax-Specific Requirements</h4>
+          <p>
+            Each HTML data <a>syntax</a> brings with it a set of constraints on both how <a title="vocabulary">vocabularies</a> are designed and their documentation.
+          </p>
+          <section>
+            <h5>Microformat Vocabularies</h5>
+            <p>
+              The <a href="http://microformats.org/wiki/microformats-2">microformats 2</a> page describes the constraints on the design of microformat vocabularies, and the <a href="http://microformats.org/wiki/process">microformats process</a> describes additional procedural guidelines on how to create a new microformat.
+            </p>
+          </section>
+          <section>
+            <h5>Microdata Vocabularies</h5>
+            <p>
+              Microdata <a title="vocabulary">vocabularies</a> must define, within a specification for that vocabulary, processing rules to be followed by consumers of that vocabulary, using the terms given by the <a href="http://dev.w3.org/html5/md/">microdata specification</a>. These include:
+            </p>
+            <ul>
+              <li>what <a title="type">types</a> the <a>vocabulary</a> includes</li>
+              <li>which <a title="type">types</a> support <code>@itemid</code> to provide global identifiers for items</li>
+              <li>whether and how two items described using microdata should be considered a single item by a consumer (such as when they have the same <code>@itemid</code>) and if so, how two items within an HTML page should be merged</li>
+              <li>whether URL <a title="value">values</a> that have the same value as an <code>@itemid</code> should be treated the same as if the item had been nested within the page</li>
+              <li>which non-URL <a title="property">properties</a> (<b>defined property names</b>) are permitted on each of those types, whether there are equivalent URL properties for them, and how properties will be merged if both are used</li>
+              <li>how many and what kinds of <a title="value">values</a> are allowed for each <a>property</a>, and what consumers should do if there are more or fewer values than required, how the values are parsed, and what happens when the values are of the wrong type</li>
+              <li>whether items that are the <a>value</a> of a <a>property</a> must explicitly have a <a>type</a> or if this can be inferred by consumers</li>
+              <li>what to do when an item has a <a>property</a> that it should not have</li>
+              <li>whether <a>type</a> and <a>property</a> URLs can be dereferenced</li>
+              <li>how consumers should recognise items belonging to the <a>vocabulary</a> (whether purely by <code>@itemtype</code> or through some other mechanism)</li>
+            </ul>
+            <p>
+              An example of a microdata <a>vocabulary</a> description is available for <a href="http://www.heppnetz.de/ontologies/goodrelations/v1.html#microdata" rel="nofollow">GoodRelations</a>. There are also example microdata vocabularies within the <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#mdvocabs" rel="nofollow">WHATWG version of the microdata specification</a>.
+            </p>
+            <p>
+              Microdata does not support the use of the HTML <code>@lang</code> attribute to provide language information for textual values; if this is important, a microdata <a>vocabulary</a> must provide a mechanism for supplying a language separately. This can be done by:
+            </p>
+            <ul>
+              <li>having a <a>property</a> that indicates the language used in the data for the item; this only works if all the data uses the same language</li>
+              <li>defining a <code>LanguageString</code> <a>type</a> that has properties for both content and language and specifying the use of items of that type as a <a>value</a> for any appropriate property</li>
+            </ul>
+            <p>
+              Microdata does not support structured HTML values. Where these need to be captured, <a title="vocabulary">vocabularies</a> can instead use URLs that reference fragments of HTML in the page. For example:
+            </p>
+            <pre><strong>&lt;link itemprop=&quot;breadcrumb&quot; href=&quot;#breadcrumb&quot;&gt;</strong>
+&lt;div <strong>id=&quot;breadcrumb&quot;</strong>&gt;
+  &lt;a href=&quot;category/books.html&quot;&gt;Books&lt;/a&gt; &gt;
+  &lt;a href=&quot;category/books-literature.html&quot;&gt;Literature &amp; Fiction&lt;/a&gt; &gt;
+  &lt;a href=&quot;category/books-classics&quot;&gt;Classics&lt;/a&gt;
+&lt;/div&gt;</pre>
+          </section>
+          <section>
+            <h5>RDFa Vocabularies</h5>
+            <p>
+              RDFa is used to create RDF graphs, so <a title="vocabulary">vocabularies</a> used within RDFa should bear in mind the constraints and conventions that commonly apply to RDF vocabularies. These include:
+            </p>
+            <ul>
+              <li><a title="type">types</a> should be named using CapitalCamelCase, and <a title="property">properties</a> using lowerCamelCase</li>
+              <li><a title="type">types</a> and <a title="property">properties</a> in the same <a>vocabulary</a> should share a IRI prefix &mdash; the vocabulary IRI &mdash; which should end in a <code>#</code> or a <code>/</code>; the local part of a <a>type</a> or property IRI, after this prefix, should be a valid <a href="http://www.w3.org/TR/REC-xml-names/#NT-NCName">NCName</a> so that it can be used within RDF/XML serialisations</li>
+              <li>the IRIs used for <a title="type">types</a> and <a title="property">properties</a> should resolve into documentation and/or (through content negotiation) an <a href="http://www.w3.org/TR/rdf-schema/">RDFS schema</a> or <a href="http://www.w3.org/TR/owl-overview/">OWL ontology</a> that describes the types and properties</li>
+            </ul>
+            <p>
+              More guidelines and patterns for modelling using RDF are available within <a href="http://patterns.dataincubator.org/book/modelling-patterns.html" rel="nofollow">Linked Data Patterns</a>.
+            </p>
+          </section>
+        </section>
+        <section>
+          <h4>Syntax-Neutral Vocabularies</h4>
+          <p>
+            Syntax-neutral <a title="vocabulary">vocabularies</a> must have variants for each <a>syntax</a> that meet the requirements for the syntax as described above, but the capabilities of each variant do not have to be identical.
+          </p>
+          <p>
+            For example, a syntax-neutral review <a>vocabulary</a> could specify a required <code>reviewLanguage</code> <a>property</a> to give the language of a review in microdata, but say that if microformats or RDFa were used, and this were left unspecified, the language would be assumed. Publishers who had content that included multiple languages in the review itself (which couldn't be represented using a property providing a language for the entire review) would be able to use microformats or RDFa to mark up the review.
+          </p>
+          <p>
+            There are a number of measures that make it easier for <a title="vocabulary">vocabularies</a> to be used across <a title="syntax">syntaxes</a> in ways that make it easier for consumers to combine data whichever <a>syntax</a> is used.
+          </p>
+          <dl>
+            <dt>Naming Conventions</dt>
+            <dd>
+              Adopt consistent names across <a title="syntax">syntaxes</a>, even if the naming conventions between the syntaxes differs. For example, microformats uses lowercase-hyphenated-names whereas RDF uses lowerCamelCase; all that is needed is a clear mapping between them. Although microdata allows defined <a>property</a> names to contain any character except <code>:</code> and <code>.</code>, non-URL properties should have names that are <a href="http://www.w3.org/TR/REC-xml-names/#NT-NCName">NCNames</a> so that they can be used in microformats and RDFa. Note that microdata's restrictions mean that <code>.</code>s should be avoided in these names.
+            </dd>
+            <dt>Entity Identity</dt>
+            <dd>
+              Microformats and microdata have a limited notion of <a>entity</a> identity: entities may have identifiers (in microdata, from the <code>@itemid</code> attribute) but these are not used within the data model to combine entities or link them together into graphs. Syntax-neutral <a title="vocabulary">vocabularies</a> use the RDF concept of identity whereby entities with the same identifier are the same entity, and references to that entity's identifier serve to create a graph of entities. This should be reflected in the definition of the microdata variant of the vocabulary, which should allow <code>@itemid</code> on all items, and specify that consumers should combine and link to items to create a graph.
+            </dd>
+          </dl>
+        </section>
+        <section>
+          <h4>Good Vocabulary Design Practices</h4>
+          <p>
+            It is good practice for <a>vocabulary</a> creators to collaborate with others who are consuming or publishing information in the relevant domains in order to create a vocabulary that can be used widely across an industry.
+          </p>
+          <p>
+            It is good practice for <a>vocabulary</a> creators to make available a validation tool that enables publishers who use a vocabulary to check that their HTML pages contain data that is valid against that vocabulary.
+          </p>
+          <p>
+            It is good practice for <a>vocabulary</a> creators to make available test suites that enable implementers to check the behaviour of their implementations. These test suites should cover error handling as well as the correct interpretation of valid data.
+          </p>
+        </section>
+      </section>
+    </section>
+    
+    <section class='appendix'>
+      <h2>Acknowledgements</h2>
+      <p>
+        Many thanks to the members of the HTML Data Working Group for their contributions to this document.
+      </p>
+    </section>
+  </body>
+</html>