rdf: changeset 366:34be8e51efd8

--- a/rdf-spaces/index.html	Sun May 13 12:50:47 2012 -0400
+++ b/rdf-spaces/index.html	Mon May 14 09:14:02 2012 -0400
@@ -189,9 +189,9 @@
   <h2>Use Cases</h2>
 
   <p>Each of these use cases is initally described in terms of the
-  following scenario.  Details of how each use case is handled by the
-  <em>RDF spaces</em> design are in <a href="#detailed-example"
-  class="sectionRef"></a>.</p>
+  following scenario.  Details of how each use case might be addressed
+  using the technologies specified in this document are in <a
+  href="#detailed-example" class="sectionRef"></a>.</p>
 
   <blockquote style="font-style: italic">
 
@@ -202,7 +202,7 @@
     controlled by the parent organization (called "headquarters" or
     "HQ") in Geneva.</p>
 
-    <p>HQ wants to help the divisions work together better, and
+    <p>HQ wants to help the divisions work together better.  It
     decides a first step is to provide a simple but complete directory
     of all the Example personnel.  Until now, each division has
     maintained its own directory, using its own technology.  HQ would
@@ -211,76 +211,40 @@
     they hope to extend the system to allow finding people based on
     their areas of interest and expertise.</p>
 
-    <p>HQ decides to use RDF with the <a
-    href="http://www.w3.org/TR/vcard-rdf/">the vcard-rdf
-    vocabulary</a>.  They ask each division to put an up-to-date
-    directory somewhere on the Web, and mail kelly@hq.example.org the
-    URL.  They say: "Just tell Kelly the username/password if there is
-    one, or make it only available to the IP address of
-    dir.hq.example.org."  Kelly maintains a file which lists the URLs
-    and any username/password combinations she is given.</p>
+    <p>HQ understands that people will want access to the phonebook in
+    many different computing environments and with different
+    languages, social norms, and application styles.  Users are going
+    to want at least one Web based user interface (UI), but they will
+    also want mobile UIs for different platforms, desktop UIs for
+    different platforms, and even to look up information via text
+    messaging.  HQ does not have the resources to build all of these,
+    so they intend to provide direct access to the data so that the
+    divisions can do it themselves as needed.</p>
 
   </blockquote>
-
-  <p>For the first iteration of the design of their directory, HQ
-  builds a "harvester" which uses Kelly's file for input and fetches
-  the content from each of the provided URLs.  It operates behind a
-  caching Web proxy, so that if a division sets the right HTTP headers
-  (eg Expires and Last-Modified) the load on its servers will be
-  minimal, even if HQ runs the harvester every few minutes.</p>
-
-  <p>The harvester parses the RDF from each data source and loads it
-  into an in-memory triplestore, merging each new graph.  Once it's
-  done with all the harvesting, the harvester writes out the merged
-  graph into a <a href="http://www.w3.org/TR/turtle">turtle</a> file.
-  The file is published (with access control) where it can be used by
-  several different clients providing directory search services.</p>
-
-  <p>Although HQ provides a Web-based client, they makes this raw
-  merged data available.  They know people will want many different
-  kinds of clients, include mobile clients, SMS-based clients,
-  command-line clients on different operating systems, and possibly
-  even clients that do something more sophisticated than just looking
-  up a phone numer.  By making the raw data available, they empower
-  the divisions to build all these other applications.</p>
-
-  <p>This "version 1" system is functional, but it has several
-  shortcomings stemming from its use of simple graph merging.  The
-  following sections each discuss a shortcoming which can potentially
-  be addressed by the proper modeling of RDF <a>space</a>s.  Some
-  sections include more scenarios (not involving the Example
-  Foundation's federated phonebook) which illustrate the use case.
-  Each section also links to an appendix where a detailed solution is
-  provided. </p>
+  
+  <p>Each of the sections below, after the first, contains a new
+  requirement, something additional that users in this scenario want
+  the system to do.  Each of these will motivate the features of the
+  technologies specified in this document.</p>
 
   <section id="uc-start">
-    <h2>Baseline</h2>
-    
-    <p>@@@Old text -- not a use case.  Merge this with above into the
-    "baseline" system to which we want to add stuff.</p>
-
-    <p>An obvious drawback of version 1 is that for any data change in
-    a division database to show through to the users, the harvester
-    must be re-run, to again fetch and merge all the data.  HTTP
-    caching can reduce the load on the division servers, but HQ still
-    needs to parse 25 data feeds, and all the clients need to reload
-    the merged data feed.</p>
+    <h2>Baseline Solution (Just Triples)</h2>
 
-    <p>At first, HQ runs the harvester once a day and explains to
-    users that it takes a day for changes to propagate.  Users,
-    however, are still confused and unhappy.  A user corrects her
-    phone number in the division database, then sees it still wrong
-    in the HQ database.  She's not interested in hearing about
-    "propagation delay"; she wants her phone number to be correct.</p>
+    <blockquote style="font-style: italic">
 
-    <p>Several different technologies are needed to fully provide this
-    feature, but for a start, it would help if the harvester could
-    maintain its state between runs and only replace those parts of
-    the output that had changed.  Just storing the merged set of
-    triples is not enough; it needs to store them in such a way that
-    it can replace just the ones coming from a given source.</p>
+      <p>As a starting point, HQ needs to gather data from each
+      division and re-publish it, in one place, for use by the
+      different UIs.</p>
 
-    <p>For a discussion of how this use case could be addressed, see
+    </blockquote>
+    
+    <p>This is a general use case for RDF, with no specific need for
+    using <a>space</a>s or <a>dataset</a>s.  It simply involves
+    divisions pubishing RDF data, then HQ merging it and putting it on
+    their website (with some access control).</p>
+
+    <p>For an example of how this baseline could be implemented, see
     <a href="#example-start" class="sectionRef"></a></p>
 
   </section>
@@ -288,8 +252,17 @@
   <section id="uc-web">
     <h2>Showing Provenance</h2>
 
-    <p>@@@ the released db shows which division supplies each part of the information </p>
+    <blockquote style="font-style: italic">
 
+      <p>A use says: I'm looking at an incorrect phonebook entry.  It
+      has the name of the person I'm looking for, but it's missing
+      most of the record.  I can't even tell which division the person
+      works for.  I need to know who is responsible for this
+      information, so I can get it corrected.
+      </p>
+
+    </blockquote>
+    
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-web" class="sectionRef"></a></p>
 
@@ -298,7 +271,18 @@
   <section id="uc-process">
     <h2>Maintaining Derived Data</h2>
 
-    <p>@@@ namefill is needed, and its results need their own provenance</p>
+    <blockquote style="font-style: italic">
+
+      <p>It turns out different divisions are using somewhat different
+      vocabularies for publishing their data.  HQ writes a program to
+      translate, but they need the output of that program to be
+      correctly attributed, in case it turns out to be wrong.
+      </p>
+
+    </blockquote>
+    
+    <p>This use case motivates sharing of blank nodes between named
+    graphs, as seen in the example.</p>
 
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-process" class="sectionRef"></a></p>
@@ -309,9 +293,18 @@
   <section id="uc-reported">
     <h2>Distributed Harvesting</h2>
 
-    <p>@@@ divisions gather from departments who might gather from
-    individuals; we want end-users to see that provenance. </p>
+    <blockquote style="font-style: italic">
 
+      <p>It turns out some divisions do not have centralized
+      phonebooks.  Division 3 has twelve different departments, each
+      with its own phonebook.  Divsion 3 can do the harvesting from
+      its departments, but it does not want to be in the loop for
+      corrections; it wants those to go straight back to the relevant
+      department.
+      </p>
+
+    </blockquote>
+    
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-reported" class="sectionRef"></a></p>
 
@@ -321,11 +314,15 @@
   <section id="uc-untrusted">
     <h2>Loading Untrusted Datasets</h2>
 
-    <p>@@@ what if one of the divisions gives you bad quads?  It
-    better not mess up provenance.  Maybe suggest GSP-style name
-    mangling...?  Put "renaming datasets" in Concepts somewhere as a
-    standard thing?</p>
+    <blockquote style="font-style: italic">
 
+      <p>A user reports: There's information here that says it's from
+      our department, but it's not.  Somehow your provenance
+      information is wrong.  We need to see the provenance of your
+      provenance!</p>
+
+    </blockquote>
+    
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-untrusted" class="sectionRef"></a></p>
 
@@ -335,10 +332,17 @@
   <section id="uc-transtime">
     <h2>Showing Revision History</h2>
 
-    <p>@@@ we want to be able to see all the changes, for auditing, to
-    see what the DB said about anyone at any point in time.
-    (transaction time)</p>
+    <blockquote style="font-style: italic">
 
+      <p>Division 14's legal department says: "We're doing an
+      investigation and we need to be able to connect people and phone
+      numbers as they used to be.  Can you include archival data in
+      the data feed, so we we can search the phonebook as it was on
+      each day of September, last year?"
+      </p>
+
+    </blockquote>
+    
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-transtime" class="sectionRef"></a></p>
 
@@ -348,8 +352,31 @@
   <section id="uc-validtime">
     <h2>Expressing Past or Future States</h2>
 
-    <p>@@@ we want to be able to express when someone started and stopped having a particular role various ways, which might not be the time we put this into the db.
- </p>
+    <blockquote style="font-style: italic">
+
+      <p>Division 5 says: "We're planning a major move in three
+      months, to a neighboring city.  Everybody's office and phone
+      number will have to change.  Can we start putting that
+      information in the phonebook now, but mark it as not effective
+      until 20 July?  After the move, we'll also need to see the old
+      (no-longer-in-effect) data for a while, until we get everything
+      straightened out.</p>
+
+    </blockquote>
+    
+    <p>This use case, contrasted with the previous one, shows the
+    difference between <em>Transaction Time</em> and <em>Valid
+    Time</em> in bitemporal databases.  After Division 5's move, the
+    "old" phone numbers are not just the old state of the database;
+    they are the old state of the world.  It is possible that some time
+    after the move an error in some of the pre-move data might be
+    need to be corrected, giving it a new transaction time, even
+    though its valid time range has already ended.</p>
+
+    <p>Use case sighting: <a
+    href="http://www.jenitennison.com/blog/node/101">Temporal Scope
+    for RDF Triples</a>, Jeni Tennison's report of attempting to solve
+    this problem in UK Government data.</p>
 
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-validtime" class="sectionRef"></a></p>
@@ -360,6 +387,15 @@
   <section>
     <h2>Vendor-Neutral SPARQL Backup</h2>
 
+    <blockquote style="font-style: italic">
+
+      <p>
+      </p>
+
+    </blockquote>
+    
+
+
     <p>@@@ we want to be able to dump the database and load it in a different system</p>
   </section>
 
@@ -371,6 +407,7 @@
 <section>
   <h2>Concepts</h2>
 
+
   <section>
     <h2>Space</h2>
 
@@ -553,6 +590,15 @@
     to be used interchangably &mdash; systems which handle datasets
     SHOULD NOT give significance to empty named graphs.</p>
 
+    <p class="issue">
+      Can we take a stronger stand against non-quad-equivalent
+      datasets?  Maybe we can use the terms "proper" and "improper",
+      or something like that.  Improper datasets might also include
+      ones which use the same name in more than one pair.  Combining
+      these, like removing empty named graphs, is how you convert an
+      improper dataset to a proper one.
+    </p>
+
   </section>
 
   <section>
@@ -1017,6 +1063,12 @@
     <h2>Showing Untrusted Quads(v5)</h2>
 
     <p>@@@ Show how to address <a href="#uc-untrusted" class="sectionRef"></a></p>
+    <p>@@@ what if one of the divisions gives you bad quads?  It
+    better not mess up provenance.  Maybe suggest GSP-style name
+    mangling...?  Put "renaming datasets" in Concepts somewhere as a
+    standard thing?</p>
+
+
 
   </section>
 
@@ -1202,6 +1254,7 @@
 <section class="appendix informative" id="changes">
   <h2>Changes</h2>
   <ul>
+    <li>2012-05-14: Fill in the use cases, removing some of the text that was there and which can go into the example.</li>
     <li>2012-05-13: Fill in the example's skeleton, add a few issues/ideas on trig</li>
     <li>2012-05-11: Rewriting and reorganizing Concepts; some more work on Usecases and Example; removed the Detailed Example since it needs to be so re-written; renamed 'reflection' to 'folding'; reworked the Semanics</li>
     <li>2012-05-10: Wrote a short intro.  Started writing the Use Cases section for real.   Added grammar for N-Quads and Trig.  Did a first draft of the semantics.</li>
author	Sandro Hawke <sandro@hawke.org>
	Mon, 14 May 2012 09:14:02 -0400
changeset 366	34be8e51efd8
parent 365	a1a417793a94
child 367	360f854fcfa4