rdf: changeset 305:30aac7b208b6

--- a/README.txt	Wed May 02 06:46:29 2012 -0700
+++ b/README.txt	Wed May 02 06:46:59 2012 -0700
@@ -35,6 +35,9 @@
 cd rdf
 hg push
 
+# (If you get the error "did you forget to merge? use push -f to force"
+# then you'll need to do a merge.  DO NOT use push -f.)
+
 # To receive changes made by others into your local repository:
 cd rdf
 hg pull

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/rdf-layers/index.html	Wed May 02 06:46:59 2012 -0700
@@ -0,0 +1,789 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>RDF Data Layers and Datasets</title>
+    <style type="text/css">
+.figure { font-weight: bold; text-align: center; }
+table.xsd-types td, table.xsd-types th { border: 1px solid #ddd; padding: 0.1em 0.5em; }
+    </style>
+    <script src='../ReSpec.js/js/respec.js' class='remove'></script>
+    <script class='remove'>
+      var respecConfig = {
+          // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
+          specStatus:           "ED",
+          
+          // the specification's short name, as in http://www.w3.org/TR/short-name/
+          shortName:            "rdf-layers",
+
+          // if your specification has a subtitle that goes below the main
+          // formal title, define it here
+          // subtitle   :  "an excellent document",
+
+          // if you wish the publication date to be other than today, set this
+          // publishDate:  "2009-08-06",
+
+          // if the specification's copyright date is a range of years, specify
+          // the start date here:
+          //          copyrightStart: "2012",
+
+          // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date
+          // and its maturity status
+//          previousPublishDate:  "2004-02-10",
+//          previousMaturity:  "REC",
+
+          // if there a publicly available Editor's Draft, this is the link
+//@@@
+          edDraftURI:           "http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-layers/index.html",
+
+          // if this is a LCWD, uncomment and set the end of its review period
+          // lcEnd: "2009-08-05",
+
+          // if there is an earler version of this specification at the Recommendation level,
+          // set this to the shortname of that version. This is optional and not usually
+          // necessary.
+          //          prevRecShortname: "rdf-concepts",
+
+          // if you want to have extra CSS, append them to this list
+          // it is recommended that the respec.css stylesheet be kept
+          extraCSS:             ["http://dvcs.w3.org/hg/rdf/raw-file/default/ReSpec.js/css/respec.css"],
+
+          // editors, add as many as you like
+          // only "name" is required
+          editors:  [
+              { name: "TBD", url: "",
+                company: "", companyURL: "",
+              },
+          ],
+          otherContributors: {
+              "Contributor": [
+	      { name: "Sandro Hawke", url:"http://www.w3.org/People/Sandro",
+	      company:"W3C", companyURL: "http://www.w3.org", note:"Initial text"},
+	      { name: "Dan Brickley", note:"Initial diagram" }
+	      ]
+          },
+
+          // authors, add as many as you like. 
+          // This is optional, uncomment if you have authors as well as editors.
+          // only "name" is required. Same format as editors.
+
+          //authors:  [
+          //    { name: "Your Name", url: "http://example.org/",
+          //      company: "Your Company", companyURL: "http://example.com/" },
+          //],
+          
+          // name of the WG
+          wg:           "RDF Working Group",
+          
+          // URI of the public WG page
+          wgURI:        "http://www.w3.org/2011/rdf-wg/",
+          
+          // name (with the @w3c.org) of the public mailing to which comments are due
+          wgPublicList: "public-rdf-comments",
+          
+          // URI of the patent status for this WG, for Rec-track documents
+          // !!!! IMPORTANT !!!!
+          // This is important for Rec-track documents, do not copy a patent URI from a random
+          // document unless you know what you're doing. If in doubt ask your friendly neighbourhood
+          // Team Contact.
+          wgPatentURI:  "http://www.w3.org/2004/01/pp-impl/46168/status",
+
+          // if this parameter is set to true, ReSpec.js will embed various RDFa attributes
+          // throughout the generated specification. The triples generated use vocabulary items
+          // from the dcterms, foaf, and bibo. The parameter defaults to false.
+          doRDFa: true,
+      };
+
+// @@@ A number of references have been patched into the local berjon.biblio and need to be added to the global biblio in CVS:
+    </script>
+  </head>
+
+  <body>
+
+<section id="abstract">
+  <p>This specification introduces the notion of RDF <i>data
+  layers</i>&mdash;places to store RDF triples&mdash;and defines a set
+  of languages for expressing information about them.  Examples of
+  data layers include: an HTML page with embedded RDFa or microdata, a
+  file containing RDF/XML or Turtle data, and a SQL database viewable
+  as RDF using R2RML.  The term "layer" is a metaphor: we imagine a
+  layer as transparent surfaces on which RDF graphs can be drawn and
+  then easily combined, still retaining their own identity and
+  properties.  RDF data layers are a generalization of SPARQL's named
+  graphs, providing a standard model with formal semantics for systems
+  which manage multiple collections of RDF data. </p>
+</section>
+
+
+<section class="informative">
+    <h2>Introduction</h2>
+
+    <p>@@@ background, context, and general motivation ... </p>
+
+    <p>The name "layer" is chosen to evoke the image of a transparent
+    surface on which an RDF Graph can be temporarily drawn and then
+    easily aligned with other layers to make a larger, combined
+    graph. Sometimes we work with the combined graph, but when we want
+    to, we can still separate the layers. This is particularly
+    important if the data come from different sources which might
+    potentially update their contribution; we need to be able to erase
+    and redraw just their layer.</p>
+
+    <img src="layers.jpg" />
+
+    <p class="note">Eric suggests that only the arcs should be drawn
+    on the layers; the nodes are fundamental, underneath, projecting
+    up through all the layers.</p>
+
+
+</section>
+
+<section>
+  <h2>Concepts</h2>
+
+  <section>
+    <h2>Layer</h2>
+
+    <p class="issue">The term "Layer" is a placeholder.  The final
+    term has not yet been selected by the Working Group.  Other
+    candidates include "G-Box", "(Data) Surface", "(Data) Space".  The
+    Editors do not consider <a
+    href="http://www.w3.org/2011/rdf-wg/meeting/2011-10-12#resolution_1">F2F2
+    Resolultion 1</a> still binding, given the degree to which the concept
+    and its role has shifted since that meeting.</p>
+
+    <p>An RDF <dfn>data layer</dfn> or just <dfn>layer</dfn> is a
+    conceptual place where RDF triples can stored.  Examples include:
+    </p>
+    
+    <ul>
+
+      <li>a human-readable Web page, such as an HTML page containing
+      RDFa markup, microdata markup, or embedded turtle.</li>
+
+      <li>a file, in a computer's filesystem, containing RDF data
+      expressed in RDF/XML, N-Triples, Turtle, etc.</li>
+
+      <li>a machine-readable Web page containing RDF data expressed in
+      RDF/XML, N-Triples, Turtle, etc.</li>
+
+      <li>a SQL database which provides an RDF view of its data,
+      perhaps using R2RML</li>
+
+      <li>the default graph or any of the named graphs available via a
+      SPARQL endpoint</li>
+    </ul>
+
+    <p>Formally, a layer is a function from points in time to RDF
+    Graphs.  This matches the notion of a container or a place or a
+    writing surface: at any given point in time (for which the
+    function is defined), there is a set of RDF triples in that spot.
+    Since an RDF Graph is defined as a set of RDF triples, the triples
+    there at time t form the graph which is the output of the function
+    for input t.</p>
+
+    <p>Web resources which do not behave as a function of time &mdash;
+    such as pages which provide different data to different users at
+    the exact same time &mdash; are not layers.  See also <a>static
+    layer</a> and <a>layer switch</a>.</p>
+
+  </section>
+
+  <section>
+    <h2>Dataset</h2>
+
+    <p>A <dfn>dataset</dfn> is defined by <a
+    href="http://www.w3.org/TR/sparql11-query/#rdfDataset">SPARQL
+    1.1</a> as a structure consisting of:</p>
+    
+    <ol>
+      
+      <li>A distinguished RDF Graph called the <dfn>default graph</dfn></li>
+      
+      <li>A set of (<i>name</i>, <i>graph</i>) pairs, where
+      <i>name</i> is an IRI and the <i>graph</i> is an RDF Graph.  No
+      two pairs in a dataset may have the same <i>name</i>.</li>
+      
+    </ol>
+    
+    <p>This definition forms the basis of the SPARQL Query semantics;
+    each query is performed against the information in a specific
+    dataset.</p>
+    
+    <p>A dataset is a pure mathematical structure, like an RDF Graph
+    or a set of integers, with no identity apart from its contents.
+    Two datasets with the same contents are in fact the same dataset,
+    and one dataset cannot change over time.  Logically, a dataset
+    cannot have different contents tomorrow; if it is changed, it is a
+    different dataset.  By analogy, one cannot change the set of prime
+    numbers or the set {1,3,5}; datasets, like these sets of integers,
+    are immutable, eternal concepts.</p>
+
+    <p>The word <strong>"default"</strong> in the term "default graph"
+    refers to the fact that in SPARQL, this is the graph a server uses
+    to perform a query when the client does not specify the name of a
+    graph to use.  The term is not related to the idea of a graph
+    containing default (overridable) information.  Logically, the role
+    of the default graph is such that it could reasonably be thought
+    of as containing the dataset's metadata.</p>
+
+  </section>
+
+  <section>
+    <h2>Graph Store</h2>
+    
+    <p>SPARQL 1.1 Update defines a mutable (time-dependent) structure
+    corresponding to a <a>dataset</a>, called <dfn>graph store</dfn>.
+    It is defined as:</p>
+
+    <ol>
+
+      <li>A distinguished slot for an RDF Graph</li>
+      
+      <li>A set of (<i>name</i>, <i>slot</i>) pairs, where the slot holds an RDF Graph
+      and the name is an IRI.  No two pairs in a graph store may have the same <i>name</i>.</li>
+      
+    </ol>
+    
+    <p>The term "slot" is synonymous with "layer".</p>
+
+    <p>Formally, a graph store can be seen as a function from points
+    in time to <a>dataset</a>s.</p>
+
+  </section>
+
+  <section>
+    <h2>Named Graph</h2>
+
+    <p>SPARQL formally defines the term <dfn>named graph</dfn>,
+    following [Carroll], as one of the (name, graph) pairs in a
+    <a>dataset</a>.</p>
+
+    <p>In practice, the term is used more loosely to refer to the
+    graph part of those pairs or to the slot part of the pairs in a
+    <a>graph store</a>.  The text of <a
+    href="http://www.w3.org/TR/2012/WD-sparql11-update-20120105/">SPARQL
+    1.1 Update</a> does this, for example, as seen in phrases like,
+    "This example copies triples from one named graph to another named
+    graph".</p>
+
+    <p>We continue that practice, using "named graph" to refer
+    to either the graph part of the (name, graph) pairs of a dataset,
+    or the corresponding slot part of graph store.  Note that even in
+    this loose usage, the default graph of a dataset and its
+    corresponding slot in a graph store are never called named
+    graphs.</p>
+
+  </section>
+
+  <section>
+    <h2>Quad</h2>
+
+    <p>We define an RDF <dfn>quad</dfn> as the 4-tuple
+    (<i>subject</i>, <i>predicate</i>, <i>object</i>,
+    <i>layer</i>).</p>
+
+    <p>Informally, a quad should be understand as a statement that the
+    RDF triple (<i>subject</i>, <i>predicate</i>, <i>object</i>) is in
+    the <a>layer</a> <i>layer</i>.</p>
+
+  </section>
+
+  <section>
+    <h2>Quadset</h2>
+
+    <p>We define an RDF <dfn>quadset</dfn> as a set containing (zero or more) RDF Quads and (zero or more) RDF Triples.</p>
+
+    <p>Quadsets and <a>dataset</a>s are isomorphic and semantically
+    equivalent:</p>
+
+    <ul>
+
+      <li>the triples in the quadset correspond to the triples in default
+     graph of the dataset;</li>
+
+     <li>each quad corresponds to a triple in a named graph: the quad (S P
+     O L) corresponds to the triple (S P O) in the graph with the name
+     L.</li>
+
+    </ul>
+
+    <p>Datasets and quadsets can thus be used interchangeably, with
+    the more appropriate one being use in any particular context.</p>
+
+  </section>
+
+  <section>
+    <h2>Static Layer</h2>
+
+    <p>A <dfn>static layer</dfn> is a <a>layer</a> which, by definition,
+    contains the same triples at all points in time.</p>
+    
+  </section>
+
+
+  <section>
+    <h2>Layer Switch</h2>
+
+    <p>A <dfn>layer switch</dfn> is a Web Resource which acts as a
+    different <a>layer</a> when responding to different client
+    information, such as the value of a cookie or the client IP
+    address.</p>
+
+    <p>Systems which use layer switches SHOULD also define working
+    URLs for each of the layers the switch can act as, and they SHOULD
+    provide that URL to the client via the Content-Location
+    header.</p>
+
+    <p>For example, the URL
+    <tt>http://example.org/my-account-balance</tt> might return the
+    account balance of the logged-in user.  If the user is logged in
+    as "alice", the system might return a Content-Location of
+    <tt>http://example.org/my-account-balance?user=alice</tt>.  The
+    system could reasonably be designed to allow Alice to use that URL as
+    well; for other users it would give <tt>403 Forbidden</tt>.</p>
+
+  </section>
+</section>
+
+<section>
+  <h2>Use Cases</h2>
+
+  <p>TBD.  See <a href="#detailed-example" class="sectionRef"></a> for now.</p>
+
+</section>
+
+
+<section id="syntax">
+  <h2>Dataset Languages</h2>
+
+  <section>
+    <h3>N-Quads</h3>
+
+  </section>
+
+
+  <section>
+    <h3>TriG</h3>
+
+  </section>
+
+
+  <section>
+    <h3>Turtle in HTML</h3>
+
+    <div class="note">
+      <p>This is higly speculative.  An interesting idea.</p>
+
+      <p>The key challenge is: can we get everyone to agree that when
+      the script tag has an id attribute, those contents are
+      <em>not</em> asserted?  Or maybe we need something like
+      class="unasserted"...  but do we have the right to say either of
+      those things?  Maybe in the Turtle spec...</p>
+    </div>
+    
+    <p>When text/turtle is used inside HTML script tags, if the script
+    tag has an id attribute, then that content goes into a named graph
+    (with its name being that section URL).  The rest goes into the
+    default graph.</p>
+
+  </section>
+
+
+
+</section>
+
+<section>
+  <h2>Semantics</h2>
+
+  <p>formalize what's been said above.</p>
+
+  <p>We now give <a>dataset</a>s declarative semantics, allowing
+  dataset to be used as logical statements (like RDF Graphs).  We
+  define a dataset as being true if and only if (1) its default graph
+  is true, and (2) for every (<i>name</i>, <i>graph</i>) pair in the
+  dataset, the layer denoted by <i>name</i> contains every triple in
+  <i>graph</i>.</p>
+
+</section>
+
+<section>   <!-- I don't like what respec does with id=conformance -->
+  <h2>Conformance</h2>
+
+  <p>Something like: If a layer has a URL, and you get a 200 response
+  dereferencing that URL, the the response MUST be a serialization of
+  the triples on that layer, give or take synchronization (caching)
+  issues. </p>
+
+  <p>What else?</p>
+
+</section>
+
+<section class="informative">
+  <h2>Detailed Example</h2>
+
+  <p>This section presents a design for using layers in constructing a
+  federated information system.  It is intended to help explain and
+  motivate RDF <a>data layer</a>s.</p>
+
+  <section>
+    <h3>A Federated Phonebook</h3>
+
+    <p>As a first example of how to use layers, consider an
+    organization which has 25 different divisions around the world,
+    each with its own system for managing the list of its employees
+    and their phone numbers.  The parent organization wants to create
+    a unified "HQ" directory with all this information.  With their HQ
+    directory, they will be able to look up an employee's phone number
+    without first knowing the employee's division. </p>
+
+    <p>They decide to use RDF layers.  Each division is asked to
+    publish its phonebook on an internal website, in a W3C-Recommended
+    RDF syntax, using <a href="http://www.w3.org/TR/vcard-rdf/">the
+    vcard-rdf vocabulary</a>.  Each division submits the URL at which
+    this file will appear.  For example, the uswest division might
+    publish the RDF version of its phonebook at
+    <tt>http://uswest.internal.example.com/employees.rdf</tt> and the
+    Japan division might publish theirs at
+    <tt>http://ja.example.com/hr/data/export371</tt>.  The URL itself
+    doesn't matter, but the division must be able to maintain the
+    content served there and HQ must be able to easily fetch the
+    content. </p>
+
+    <p>The HQ staff assembles this list of 25 feed URLs and puts them
+    into the default graph of a SPARQL database, so the database looks
+    like this: </p>
+
+<pre>   @prefix hq: &lt;<a href="http://example.com/hq-vocab#">http://example.com/hq-vocab#</a>&gt;.
+   # default graph
+   {
+      hq:parentCo hq:division hq:div1, hq:div2, hq:div3, ...
+      &lt;<a href="http://uswest.internal.example.com/employees.rdf">http://uswest.internal.example.com/employees.rdf</a>&gt; 
+         hq:feedFrom hq:div1.
+      &lt;<a href="http://ja.example.com/hr/data/export371">http://ja.example.com/hr/data/export371</a>&gt;
+         hq:feedFrom hq:div2.
+      ...
+   }
+</pre>
+<p>Then they write a simple Web client which looks in the database for
+those feed URLs, dereferences them, and parses the RDF.  It then puts
+the parse-result into the database in a layer whose name is the same as the
+name of the feed.  This makes sense, because in this deployment each
+feed is considered to be a layer; the name of the feed is the same as
+the name of the layer.  The HQ client is copying data about the layer
+from the division databases to the HQ database, but it's still the
+same information about the same layers.
+</p>
+
+<p>For performance reasons, the client is designed to use HTTP
+caching information.  This will allow it to efficiently re-fetch the
+information only when it has changed.  To make this work, the client will need
+to store the value of the "Last-Modified" HTTP header and also store
+(or compute, in some configurations) the value of the "Expires" header.
+</p>
+
+<p>In the end, the database looks something like this:
+</p>
+<pre> @prefix hq: &lt;<a href="http://example.com/hq-vocab#">http://example.com/hq-vocab#</a>&gt;.
+ @prefix v:  &lt;<a href="http://www.w3.org/2006/vcard/ns#">http://www.w3.org/2006/vcard/ns#</a>&gt;.
+ @prefix ht: &lt;<a href="http://example.org/http-vocab#">http://example.org/http-vocab#</a>&gt;.
+ &lt;<a href="http://uswest.internal.example.com/employees.rdf">http://uswest.internal.example.com/employees.rdf</a>&gt; {
+    # an employee
+    [ a v:VCard
+      v:fn "John Wayne"&nbsp;;
+      v:email "wayne@uswest.example.com" .
+      v:tel [ a v:Work, v:Pref&nbsp;;
+              rdf:value "+213 555 5555" ]
+    ]
+    # another employee
+    ...
+ }
+ &lt;<a href="http://ja.example.com/hr/data/export371">http://ja.example.com/hr/data/export371</a>&gt; {
+    # an employee
+    [ a v:VCard
+      v:fn "Toshiro Mifune"&nbsp;;
+      v:email "wayne@uswest.example.com" .
+      v:tel [ a v:Work, v:Pref&nbsp;;
+              rdf:value "+81 75 555 5555" ]
+    ]
+    # another employee
+    ...
+ }
+ ...    other divisions
+ # default graph, with all our metadata
+ {
+   hq:parentCo hq:division hq:div1, hq:div2, hq:div3, ...
+   # stuff we need to know the efficiently keep our copy in sync
+   &lt;<a href="http://uswest.internal.example.com/employees.rdf">http://uswest.internal.example.com/employees.rdf</a>&gt; 
+     hq:feedFrom hq:div1;
+     ht:last-modified "2012-03-14T02:22:10"^^xs:datetime;
+     ht:expires "2012-04-29T00:15:00"^^xs:datetime.
+   &lt;http://ja.example.com/hr/data/export371&gt; 
+     hq:feedFrom hq:div2;
+     ht:last-modified "2012-04-01T22:00:00"^^xs:datetime;
+     ht:expires "2012-04-29T00:35:00"^^xs:datetime.
+ }
+</pre>
+<p>The URL of each layer appears in four different roles in this example:
+</p>
+
+<p>1.  It is used as a label for a graph.  Here, it says which layer the
+triples in that graph are in.  That is, the triples about employee
+"John Wayne" are in the layer named
+"http://uswest.internal.example.com/employees.rdf".  Information about
+what triples are in that layer originates in the master database for
+each division, then is copied to the slave database at HQ.
+</p>
+
+<p>2.  It is used as the subject of an hq:feedFrom triple.  This
+information is manually maintained (or maintained through a corporate
+WebApp) and used to help guide the HQ fetching client.  Because in
+this deployment we are equating layers and feeds, the name of the
+layer is also the URL of the feed.
+</p>
+
+<p>3.  It is used as the subject of an ht:last-modified triple.  The
+information in this triples comes from the HTTP Last-Modified header.
+The meaning of this header in HTTP lines up well with its intuitive
+meaning here: this is the most recent time the set of triples in this
+layer changed.  (This header can be used during a refresh, with the
+If-Modified-Since headers, to make a client refresh operation very
+simple and fast if the data has not changed.)
+</p>
+
+<p>4.  It is used as the subject of an ht:expires triple.  This
+information also comes from HTTP headers, although some computation
+may be needed to turn it into the absolute datetime form used here.
+Strictly speaking, what is expiring at that time is this copy of the
+information about the layer, not the layer itself.  This slight
+conflation seems like a useful and unproblematic simplification.
+</p>
+
+<p>Given this design, it is straightforward to write SPARQL queries to
+find the phone number of an employee, no matter what their division.
+It is also easy to find out which layer is about to
+expire or has expired and should be refreshed soon.  
+</p>
+
+<p>Some alternative designs:
+</p>
+<ul><li> Divisions could push their data, instead of waiting to be polled.  That is, the divisions could be given write access to the HQ database and do SPARQL UPDATE operations to their own layers.  This is simpler in some ways but may require more expertise from people in each division.  It also requires trusting people in each division or having a SPARQL server that can be configure to grant certain users write access to only certain layers.   This also turns HQ into more of a bottleneck and single-point of failure.  With the polling approach, other systems could be given the list of feed URLs and then offer an alternative combined directory, or use the same data for other purposes, without any involvement from the divisions.
+</li></ul>
+<ul><li> The HQ client could fetch or query all the divisions at query time, rather that gathering the data in advance.  This might use the <a href="http://www.w3.org/TR/sparql11-federated-query/">SPARQL 1.1 Federated Query</a> features.  Which approach is superior will depend on the particulars of the situation, including how large the data is, how often it changes, and the frequency of queries which need data from different divisions.  Federated Query would probably not be ideal for the situation described here, but should be considered by people building systems like this.
+</li></ul>
+
+  </section>
+  <section>
+    <h3>Cache Management Using HTTP</h3>
+
+    <p class="note">Factor out the Last-Modified and Expires stuff,
+    from the previous section, and put it here.</p>
+
+    <p class="note">Show that instead of expires like this, it could
+    be done with [an Expiration; inDataset &lt;>; atTime ...;
+    ofLayer]</p>
+
+
+  </section>
+
+
+  <section>
+    <h3>Keeping Derived Information Separate</h3>
+
+    <p>The Federated Phonebook example shows several features of
+    layers, but leaves out a few.  In this example we will show the
+    use of privately-named layers and of sharing blank nodes between
+    layers.</p>
+
+    <p>The scenario is this: some divisions use only vcard:n to
+    provide structured name information (keeping given-name and
+    family-name separate), while others use only vcard:fn to provide a
+    formatted-name (with both parts combined).  The politics of the
+    organization make it impractical to tell the divisions to all use
+    vcard:n or all use vcard:fn, or both.  Meanwhile, several
+    different tools are being written to use this employee directory,
+    including a WebApp, a command-line app, and apps for several
+    different mobile platforms.  Does each app have to be written to
+    understand both vcard:n and vcard:fn? </p>
+
+    <p>HQ decides the solution is for them to run a single process
+    (which they call "namefill") to normalize the data, making
+    sure that every entry has both vcard:n and vcard:fn data, no
+    matter what the division provided.  The process is fairly simple;
+    after any layer is reloaded, a program runs which looks at that
+    layer and fills in the missing name data. </p>
+
+    <p>Because of the tricky politics of the situation, however, HQ
+    decides it would be best to keep this "filled in" data separate.
+    In some cases their program might not fill in the data properly.
+    For example, how can a program tell from the formatted name
+    "Hillary Rodham Clinton" that "Rodham Clinton" is the family-name?
+    The solution is to keep the output of the program in a separate
+    layer, so clients (and people trying to debug the system) can tell
+    this filled-in data did not come from the division itself. </p>
+
+    <p>The result is a dataset like this:
+    </p>
+
+    <pre class="example">@prefix hq: &lt;http://example.com/hq-vocab#>.
+@prefix v:  &lt;http://www.w3.org/2006/vcard/ns#>.
+@prefix ht: &lt;http://example.org/http-vocab#>.
+ &lt;http://uswest.internal.example.com/employees.rdf> {
+    # an employee
+    _:u331 a v:VCard
+           v:fn "John Wayne"&nbsp;;
+           v:email "wayne@uswest.example.com" .
+           v:tel [ a v:Work, v:Pref&nbsp;;
+                   rdf:value "+213 555 5555" ].
+    ...
+ }
+ hq:namefill602 {
+    _:u331 v:n [
+           v:family-name "Wayne";
+           v:given-name "John"
+    ]
+ }
+ ...
+ # default graph has metadata
+ {
+   hq:parentCo hq:division hq:div1, hq:div2, hq:div3, ...
+   &lt;http://uswest.internal.example.com/employees.rdf>; 
+     hq:feedFrom hq:div1;
+     hq:namefillLayer hq:namefill602
+  ...
+ } 
+</pre>
+
+<p>In serializing this, we needed to introduce a blank node label
+("_:u331"), because that blank node (representing the employee) occurs
+in two different layers.
+</p>
+
+<p>The example also shows the creation of a new layer name
+(hq:namefill602) for the layer filled in by our namefill program.  We
+use one new layer for each feed, instead of one layer for all the
+output of the namefill program, so we have less work to do when a
+single feed layer is reloaded.
+</p>
+
+<p>The techniques in this example apply equally well to information that
+is derived as part of logical inference, such as done by an RDFS, OWL,
+or RIF reasoner.  In these more general cases, it may be that one
+layer can be used for all derived information, or, at the other end of
+the granularity spectrum, that a new layer is used for the triples
+derived in each step of the process.
+</p>
+
+  </section>
+  <section>
+    <h3>Archiving Data With Static Layers</h3>
+
+    <p>One more variation on the federated phonebook scenario: what if
+    HQ wants to be able to look up old information?  For instance,
+    what happens when an employee leaves and is no longer listed in a
+    division phonebook?  It could be nice if the search client could
+    report that the employee is gone, rather than leaving people
+    wondering it they've made a mistake with the name. </p>
+
+    <p>To address this, HQ's data-loading client will not simply
+    delete a layer before reloading it.  Instead, it will first copy
+    the data to a new, archival layer.  After three reloads, the
+    database might look something like this: </p>
+
+<pre class="example">@prefix hq: &lt;http://example.com/hq-vocab#&gt;.
+@prefix hqa: &lt;http://example.com/hq/archive/&gt;
+@prefix v:  &lt;http://www.w3.org/2006/vcard/ns#&gt;.
+@prefix ht: &lt;http://example.org/http-vocab#&gt;.
+hqa:0001 {
+    ... oldest version
+}
+hqa:0002 {
+   ... middle version
+}
+&lt;http://uswest.internal.example.com/employees.rdf&gt; {
+    ... current version
+}
+# default graph
+{
+   hqa:0001 hq:startValidTime ...time... &nbsp;;
+            hq:endValidTime  ...time... .
+   hqa:0002 hq:startValidTime ...time... &nbsp;;
+            hq:endValidTime  ...time... .
+   &lt;http://uswest.internal.example.com/employees.rdf&gt; 
+       hq:snapshot hqa:0001, hqa:0002.
+   ....
+}
+</pre>
+
+<p>This model uses static layers, whose contents are never supposed to
+change.  (They are still different from RDF Graphs in that they retain
+their own identity; two static layers containing the same triples can
+have different metadata.)  For each static layer, we record the time
+interval for which it was current (its
+<a href="http://en.wikipedia.org/wiki/Valid_time">valid time</a>) and what it is a
+snapshot of.
+</p>
+
+<p>The URL for each static layer is generated by incrementing a sequence
+counter.  To follow Linked Data principles, HQ should provide RDF
+serializations of the contents of each layer in response to
+dereferences of these URLs.  (When the state of layers is
+obtained like this, with separate HTTP responses for each one, a blank
+node appearing on multiple layers will appear as multiple blank nodes.
+For blank node sharing to work, the dataset which serializes the
+contents of all the relevant layers must be transmitted or queried as
+a unit.)
+</p>
+
+<p>There is nothing about this architecture that prevents the archival
+data from being modified. The people maintaining the system simply agree
+not to change it.  If this is not sufficient, other approaches could
+be designed, such as generating the URL using a cryptographic hash of
+the layer contents.
+</p>
+
+<p>Another variant on this design is to put the feed data directly into
+an archival layer, instead of having the current data in a the same
+layer as the feed..  If the data is likely to grow stale (not be kept
+in sync with the feed master data), this may be a better approach,
+reducing the possibility of people unknowingly using outdated
+information.
+</p>
+
+</section>
+</section>
+
+<section>
+  <h2>Issues</h2>
+
+  <p>Do the named graphs in a dataset include all the triples in the
+  layers with those names, or only some of them?  Aka partial-graph or
+  complete-graph semantics.</p>
+
+  <p>Can blank nodes be shared between layers?</p>
+
+  <p>What do we do about RDF reification.  Should we try to fix it to
+  be talking about layers?</p>
+
+  <p>How should we talk about change-over-time?  The archiving example
+  gets into it right now.</p>
+  
+
+</section>
+
+
+<section class="appendix informative" id="changes">
+  <h2>Changes</h2>
+  <ul>
+    <li>2012-05-02: Removed obsolete text from the introduction, removed the section on datasets borrowed from RDF Concepts, and added many entries to Concepts (and renamed it from Terminology).</li>
+    <li>2012-05-01: Starting with a little text from RDF Concepts, a few ideas, and the text from <a href="http://www.w3.org/2011/rdf-wg/wiki/Layers">Layers</a></li>
+  </ul>
+</section>
+
+
+<section id="references">
+</section>
+
+  </body>
+</html>
+

Binary file rdf-layers/layers.jpg has changed

author	Gavin Carothers <gavin@carothers.name>
	Wed, 02 May 2012 06:46:59 -0700
changeset 305	30aac7b208b6
parent 304	5d5514de1998 (current diff)
parent 303	7ff029f6d89d (diff)
child 307	03865859e9d5
child 375	526e8f594e99