Fill in the use cases, removing some of the text that was there and which can go into the example.
authorSandro Hawke <sandro@hawke.org>
Mon, 14 May 2012 09:14:02 -0400
changeset 366 34be8e51efd8
parent 365 a1a417793a94
child 367 360f854fcfa4
Fill in the use cases, removing some of the text that was there and which can go into the example.
rdf-spaces/index.html
--- a/rdf-spaces/index.html	Sun May 13 12:50:47 2012 -0400
+++ b/rdf-spaces/index.html	Mon May 14 09:14:02 2012 -0400
@@ -189,9 +189,9 @@
   <h2>Use Cases</h2>
 
   <p>Each of these use cases is initally described in terms of the
-  following scenario.  Details of how each use case is handled by the
-  <em>RDF spaces</em> design are in <a href="#detailed-example"
-  class="sectionRef"></a>.</p>
+  following scenario.  Details of how each use case might be addressed
+  using the technologies specified in this document are in <a
+  href="#detailed-example" class="sectionRef"></a>.</p>
 
   <blockquote style="font-style: italic">
 
@@ -202,7 +202,7 @@
     controlled by the parent organization (called "headquarters" or
     "HQ") in Geneva.</p>
 
-    <p>HQ wants to help the divisions work together better, and
+    <p>HQ wants to help the divisions work together better.  It
     decides a first step is to provide a simple but complete directory
     of all the Example personnel.  Until now, each division has
     maintained its own directory, using its own technology.  HQ would
@@ -211,76 +211,40 @@
     they hope to extend the system to allow finding people based on
     their areas of interest and expertise.</p>
 
-    <p>HQ decides to use RDF with the <a
-    href="http://www.w3.org/TR/vcard-rdf/">the vcard-rdf
-    vocabulary</a>.  They ask each division to put an up-to-date
-    directory somewhere on the Web, and mail kelly@hq.example.org the
-    URL.  They say: "Just tell Kelly the username/password if there is
-    one, or make it only available to the IP address of
-    dir.hq.example.org."  Kelly maintains a file which lists the URLs
-    and any username/password combinations she is given.</p>
+    <p>HQ understands that people will want access to the phonebook in
+    many different computing environments and with different
+    languages, social norms, and application styles.  Users are going
+    to want at least one Web based user interface (UI), but they will
+    also want mobile UIs for different platforms, desktop UIs for
+    different platforms, and even to look up information via text
+    messaging.  HQ does not have the resources to build all of these,
+    so they intend to provide direct access to the data so that the
+    divisions can do it themselves as needed.</p>
 
   </blockquote>
-
-  <p>For the first iteration of the design of their directory, HQ
-  builds a "harvester" which uses Kelly's file for input and fetches
-  the content from each of the provided URLs.  It operates behind a
-  caching Web proxy, so that if a division sets the right HTTP headers
-  (eg Expires and Last-Modified) the load on its servers will be
-  minimal, even if HQ runs the harvester every few minutes.</p>
-
-  <p>The harvester parses the RDF from each data source and loads it
-  into an in-memory triplestore, merging each new graph.  Once it's
-  done with all the harvesting, the harvester writes out the merged
-  graph into a <a href="http://www.w3.org/TR/turtle">turtle</a> file.
-  The file is published (with access control) where it can be used by
-  several different clients providing directory search services.</p>
-
-  <p>Although HQ provides a Web-based client, they makes this raw
-  merged data available.  They know people will want many different
-  kinds of clients, include mobile clients, SMS-based clients,
-  command-line clients on different operating systems, and possibly
-  even clients that do something more sophisticated than just looking
-  up a phone numer.  By making the raw data available, they empower
-  the divisions to build all these other applications.</p>
-
-  <p>This "version 1" system is functional, but it has several
-  shortcomings stemming from its use of simple graph merging.  The
-  following sections each discuss a shortcoming which can potentially
-  be addressed by the proper modeling of RDF <a>space</a>s.  Some
-  sections include more scenarios (not involving the Example
-  Foundation's federated phonebook) which illustrate the use case.
-  Each section also links to an appendix where a detailed solution is
-  provided. </p>
+  
+  <p>Each of the sections below, after the first, contains a new
+  requirement, something additional that users in this scenario want
+  the system to do.  Each of these will motivate the features of the
+  technologies specified in this document.</p>
 
   <section id="uc-start">
-    <h2>Baseline</h2>
-    
-    <p>@@@Old text -- not a use case.  Merge this with above into the
-    "baseline" system to which we want to add stuff.</p>
-
-    <p>An obvious drawback of version 1 is that for any data change in
-    a division database to show through to the users, the harvester
-    must be re-run, to again fetch and merge all the data.  HTTP
-    caching can reduce the load on the division servers, but HQ still
-    needs to parse 25 data feeds, and all the clients need to reload
-    the merged data feed.</p>
+    <h2>Baseline Solution (Just Triples)</h2>
 
-    <p>At first, HQ runs the harvester once a day and explains to
-    users that it takes a day for changes to propagate.  Users,
-    however, are still confused and unhappy.  A user corrects her
-    phone number in the division database, then sees it still wrong
-    in the HQ database.  She's not interested in hearing about
-    "propagation delay"; she wants her phone number to be correct.</p>
+    <blockquote style="font-style: italic">
 
-    <p>Several different technologies are needed to fully provide this
-    feature, but for a start, it would help if the harvester could
-    maintain its state between runs and only replace those parts of
-    the output that had changed.  Just storing the merged set of
-    triples is not enough; it needs to store them in such a way that
-    it can replace just the ones coming from a given source.</p>
+      <p>As a starting point, HQ needs to gather data from each
+      division and re-publish it, in one place, for use by the
+      different UIs.</p>
 
-    <p>For a discussion of how this use case could be addressed, see
+    </blockquote>
+    
+    <p>This is a general use case for RDF, with no specific need for
+    using <a>space</a>s or <a>dataset</a>s.  It simply involves
+    divisions pubishing RDF data, then HQ merging it and putting it on
+    their website (with some access control).</p>
+
+    <p>For an example of how this baseline could be implemented, see
     <a href="#example-start" class="sectionRef"></a></p>
 
   </section>
@@ -288,8 +252,17 @@
   <section id="uc-web">
     <h2>Showing Provenance</h2>
 
-    <p>@@@ the released db shows which division supplies each part of the information </p>
+    <blockquote style="font-style: italic">
 
+      <p>A use says: I'm looking at an incorrect phonebook entry.  It
+      has the name of the person I'm looking for, but it's missing
+      most of the record.  I can't even tell which division the person
+      works for.  I need to know who is responsible for this
+      information, so I can get it corrected.
+      </p>
+
+    </blockquote>
+    
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-web" class="sectionRef"></a></p>
 
@@ -298,7 +271,18 @@
   <section id="uc-process">
     <h2>Maintaining Derived Data</h2>
 
-    <p>@@@ namefill is needed, and its results need their own provenance</p>
+    <blockquote style="font-style: italic">
+
+      <p>It turns out different divisions are using somewhat different
+      vocabularies for publishing their data.  HQ writes a program to
+      translate, but they need the output of that program to be
+      correctly attributed, in case it turns out to be wrong.
+      </p>
+
+    </blockquote>
+    
+    <p>This use case motivates sharing of blank nodes between named
+    graphs, as seen in the example.</p>
 
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-process" class="sectionRef"></a></p>
@@ -309,9 +293,18 @@
   <section id="uc-reported">
     <h2>Distributed Harvesting</h2>
 
-    <p>@@@ divisions gather from departments who might gather from
-    individuals; we want end-users to see that provenance. </p>
+    <blockquote style="font-style: italic">
 
+      <p>It turns out some divisions do not have centralized
+      phonebooks.  Division 3 has twelve different departments, each
+      with its own phonebook.  Divsion 3 can do the harvesting from
+      its departments, but it does not want to be in the loop for
+      corrections; it wants those to go straight back to the relevant
+      department.
+      </p>
+
+    </blockquote>
+    
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-reported" class="sectionRef"></a></p>
 
@@ -321,11 +314,15 @@
   <section id="uc-untrusted">
     <h2>Loading Untrusted Datasets</h2>
 
-    <p>@@@ what if one of the divisions gives you bad quads?  It
-    better not mess up provenance.  Maybe suggest GSP-style name
-    mangling...?  Put "renaming datasets" in Concepts somewhere as a
-    standard thing?</p>
+    <blockquote style="font-style: italic">
 
+      <p>A user reports: There's information here that says it's from
+      our department, but it's not.  Somehow your provenance
+      information is wrong.  We need to see the provenance of your
+      provenance!</p>
+
+    </blockquote>
+    
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-untrusted" class="sectionRef"></a></p>
 
@@ -335,10 +332,17 @@
   <section id="uc-transtime">
     <h2>Showing Revision History</h2>
 
-    <p>@@@ we want to be able to see all the changes, for auditing, to
-    see what the DB said about anyone at any point in time.
-    (transaction time)</p>
+    <blockquote style="font-style: italic">
 
+      <p>Division 14's legal department says: "We're doing an
+      investigation and we need to be able to connect people and phone
+      numbers as they used to be.  Can you include archival data in
+      the data feed, so we we can search the phonebook as it was on
+      each day of September, last year?"
+      </p>
+
+    </blockquote>
+    
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-transtime" class="sectionRef"></a></p>
 
@@ -348,8 +352,31 @@
   <section id="uc-validtime">
     <h2>Expressing Past or Future States</h2>
 
-    <p>@@@ we want to be able to express when someone started and stopped having a particular role various ways, which might not be the time we put this into the db.
- </p>
+    <blockquote style="font-style: italic">
+
+      <p>Division 5 says: "We're planning a major move in three
+      months, to a neighboring city.  Everybody's office and phone
+      number will have to change.  Can we start putting that
+      information in the phonebook now, but mark it as not effective
+      until 20 July?  After the move, we'll also need to see the old
+      (no-longer-in-effect) data for a while, until we get everything
+      straightened out.</p>
+
+    </blockquote>
+    
+    <p>This use case, contrasted with the previous one, shows the
+    difference between <em>Transaction Time</em> and <em>Valid
+    Time</em> in bitemporal databases.  After Division 5's move, the
+    "old" phone numbers are not just the old state of the database;
+    they are the old state of the world.  It is possible that some time
+    after the move an error in some of the pre-move data might be
+    need to be corrected, giving it a new transaction time, even
+    though its valid time range has already ended.</p>
+
+    <p>Use case sighting: <a
+    href="http://www.jenitennison.com/blog/node/101">Temporal Scope
+    for RDF Triples</a>, Jeni Tennison's report of attempting to solve
+    this problem in UK Government data.</p>
 
     <p>For a discussion of how this use case could be addressed, see
     <a href="#example-validtime" class="sectionRef"></a></p>
@@ -360,6 +387,15 @@
   <section>
     <h2>Vendor-Neutral SPARQL Backup</h2>
 
+    <blockquote style="font-style: italic">
+
+      <p>
+      </p>
+
+    </blockquote>
+    
+
+
     <p>@@@ we want to be able to dump the database and load it in a different system</p>
   </section>
 
@@ -371,6 +407,7 @@
 <section>
   <h2>Concepts</h2>
 
+
   <section>
     <h2>Space</h2>
 
@@ -553,6 +590,15 @@
     to be used interchangably &mdash; systems which handle datasets
     SHOULD NOT give significance to empty named graphs.</p>
 
+    <p class="issue">
+      Can we take a stronger stand against non-quad-equivalent
+      datasets?  Maybe we can use the terms "proper" and "improper",
+      or something like that.  Improper datasets might also include
+      ones which use the same name in more than one pair.  Combining
+      these, like removing empty named graphs, is how you convert an
+      improper dataset to a proper one.
+    </p>
+
   </section>
 
   <section>
@@ -1017,6 +1063,12 @@
     <h2>Showing Untrusted Quads(v5)</h2>
 
     <p>@@@ Show how to address <a href="#uc-untrusted" class="sectionRef"></a></p>
+    <p>@@@ what if one of the divisions gives you bad quads?  It
+    better not mess up provenance.  Maybe suggest GSP-style name
+    mangling...?  Put "renaming datasets" in Concepts somewhere as a
+    standard thing?</p>
+
+
 
   </section>
 
@@ -1202,6 +1254,7 @@
 <section class="appendix informative" id="changes">
   <h2>Changes</h2>
   <ul>
+    <li>2012-05-14: Fill in the use cases, removing some of the text that was there and which can go into the example.</li>
     <li>2012-05-13: Fill in the example's skeleton, add a few issues/ideas on trig</li>
     <li>2012-05-11: Rewriting and reorganizing Concepts; some more work on Usecases and Example; removed the Detailed Example since it needs to be so re-written; renamed 'reflection' to 'folding'; reworked the Semanics</li>
     <li>2012-05-10: Wrote a short intro.  Started writing the Use Cases section for real.   Added grammar for N-Quads and Trig.  Did a first draft of the semantics.</li>