--- a/rdf-spaces/index.html Sun May 13 12:50:47 2012 -0400
+++ b/rdf-spaces/index.html Mon May 14 09:14:02 2012 -0400
@@ -189,9 +189,9 @@
<h2>Use Cases</h2>
<p>Each of these use cases is initally described in terms of the
- following scenario. Details of how each use case is handled by the
- <em>RDF spaces</em> design are in <a href="#detailed-example"
- class="sectionRef"></a>.</p>
+ following scenario. Details of how each use case might be addressed
+ using the technologies specified in this document are in <a
+ href="#detailed-example" class="sectionRef"></a>.</p>
<blockquote style="font-style: italic">
@@ -202,7 +202,7 @@
controlled by the parent organization (called "headquarters" or
"HQ") in Geneva.</p>
- <p>HQ wants to help the divisions work together better, and
+ <p>HQ wants to help the divisions work together better. It
decides a first step is to provide a simple but complete directory
of all the Example personnel. Until now, each division has
maintained its own directory, using its own technology. HQ would
@@ -211,76 +211,40 @@
they hope to extend the system to allow finding people based on
their areas of interest and expertise.</p>
- <p>HQ decides to use RDF with the <a
- href="http://www.w3.org/TR/vcard-rdf/">the vcard-rdf
- vocabulary</a>. They ask each division to put an up-to-date
- directory somewhere on the Web, and mail kelly@hq.example.org the
- URL. They say: "Just tell Kelly the username/password if there is
- one, or make it only available to the IP address of
- dir.hq.example.org." Kelly maintains a file which lists the URLs
- and any username/password combinations she is given.</p>
+ <p>HQ understands that people will want access to the phonebook in
+ many different computing environments and with different
+ languages, social norms, and application styles. Users are going
+ to want at least one Web based user interface (UI), but they will
+ also want mobile UIs for different platforms, desktop UIs for
+ different platforms, and even to look up information via text
+ messaging. HQ does not have the resources to build all of these,
+ so they intend to provide direct access to the data so that the
+ divisions can do it themselves as needed.</p>
</blockquote>
-
- <p>For the first iteration of the design of their directory, HQ
- builds a "harvester" which uses Kelly's file for input and fetches
- the content from each of the provided URLs. It operates behind a
- caching Web proxy, so that if a division sets the right HTTP headers
- (eg Expires and Last-Modified) the load on its servers will be
- minimal, even if HQ runs the harvester every few minutes.</p>
-
- <p>The harvester parses the RDF from each data source and loads it
- into an in-memory triplestore, merging each new graph. Once it's
- done with all the harvesting, the harvester writes out the merged
- graph into a <a href="http://www.w3.org/TR/turtle">turtle</a> file.
- The file is published (with access control) where it can be used by
- several different clients providing directory search services.</p>
-
- <p>Although HQ provides a Web-based client, they makes this raw
- merged data available. They know people will want many different
- kinds of clients, include mobile clients, SMS-based clients,
- command-line clients on different operating systems, and possibly
- even clients that do something more sophisticated than just looking
- up a phone numer. By making the raw data available, they empower
- the divisions to build all these other applications.</p>
-
- <p>This "version 1" system is functional, but it has several
- shortcomings stemming from its use of simple graph merging. The
- following sections each discuss a shortcoming which can potentially
- be addressed by the proper modeling of RDF <a>space</a>s. Some
- sections include more scenarios (not involving the Example
- Foundation's federated phonebook) which illustrate the use case.
- Each section also links to an appendix where a detailed solution is
- provided. </p>
+
+ <p>Each of the sections below, after the first, contains a new
+ requirement, something additional that users in this scenario want
+ the system to do. Each of these will motivate the features of the
+ technologies specified in this document.</p>
<section id="uc-start">
- <h2>Baseline</h2>
-
- <p>@@@Old text -- not a use case. Merge this with above into the
- "baseline" system to which we want to add stuff.</p>
-
- <p>An obvious drawback of version 1 is that for any data change in
- a division database to show through to the users, the harvester
- must be re-run, to again fetch and merge all the data. HTTP
- caching can reduce the load on the division servers, but HQ still
- needs to parse 25 data feeds, and all the clients need to reload
- the merged data feed.</p>
+ <h2>Baseline Solution (Just Triples)</h2>
- <p>At first, HQ runs the harvester once a day and explains to
- users that it takes a day for changes to propagate. Users,
- however, are still confused and unhappy. A user corrects her
- phone number in the division database, then sees it still wrong
- in the HQ database. She's not interested in hearing about
- "propagation delay"; she wants her phone number to be correct.</p>
+ <blockquote style="font-style: italic">
- <p>Several different technologies are needed to fully provide this
- feature, but for a start, it would help if the harvester could
- maintain its state between runs and only replace those parts of
- the output that had changed. Just storing the merged set of
- triples is not enough; it needs to store them in such a way that
- it can replace just the ones coming from a given source.</p>
+ <p>As a starting point, HQ needs to gather data from each
+ division and re-publish it, in one place, for use by the
+ different UIs.</p>
- <p>For a discussion of how this use case could be addressed, see
+ </blockquote>
+
+ <p>This is a general use case for RDF, with no specific need for
+ using <a>space</a>s or <a>dataset</a>s. It simply involves
+ divisions pubishing RDF data, then HQ merging it and putting it on
+ their website (with some access control).</p>
+
+ <p>For an example of how this baseline could be implemented, see
<a href="#example-start" class="sectionRef"></a></p>
</section>
@@ -288,8 +252,17 @@
<section id="uc-web">
<h2>Showing Provenance</h2>
- <p>@@@ the released db shows which division supplies each part of the information </p>
+ <blockquote style="font-style: italic">
+ <p>A use says: I'm looking at an incorrect phonebook entry. It
+ has the name of the person I'm looking for, but it's missing
+ most of the record. I can't even tell which division the person
+ works for. I need to know who is responsible for this
+ information, so I can get it corrected.
+ </p>
+
+ </blockquote>
+
<p>For a discussion of how this use case could be addressed, see
<a href="#example-web" class="sectionRef"></a></p>
@@ -298,7 +271,18 @@
<section id="uc-process">
<h2>Maintaining Derived Data</h2>
- <p>@@@ namefill is needed, and its results need their own provenance</p>
+ <blockquote style="font-style: italic">
+
+ <p>It turns out different divisions are using somewhat different
+ vocabularies for publishing their data. HQ writes a program to
+ translate, but they need the output of that program to be
+ correctly attributed, in case it turns out to be wrong.
+ </p>
+
+ </blockquote>
+
+ <p>This use case motivates sharing of blank nodes between named
+ graphs, as seen in the example.</p>
<p>For a discussion of how this use case could be addressed, see
<a href="#example-process" class="sectionRef"></a></p>
@@ -309,9 +293,18 @@
<section id="uc-reported">
<h2>Distributed Harvesting</h2>
- <p>@@@ divisions gather from departments who might gather from
- individuals; we want end-users to see that provenance. </p>
+ <blockquote style="font-style: italic">
+ <p>It turns out some divisions do not have centralized
+ phonebooks. Division 3 has twelve different departments, each
+ with its own phonebook. Divsion 3 can do the harvesting from
+ its departments, but it does not want to be in the loop for
+ corrections; it wants those to go straight back to the relevant
+ department.
+ </p>
+
+ </blockquote>
+
<p>For a discussion of how this use case could be addressed, see
<a href="#example-reported" class="sectionRef"></a></p>
@@ -321,11 +314,15 @@
<section id="uc-untrusted">
<h2>Loading Untrusted Datasets</h2>
- <p>@@@ what if one of the divisions gives you bad quads? It
- better not mess up provenance. Maybe suggest GSP-style name
- mangling...? Put "renaming datasets" in Concepts somewhere as a
- standard thing?</p>
+ <blockquote style="font-style: italic">
+ <p>A user reports: There's information here that says it's from
+ our department, but it's not. Somehow your provenance
+ information is wrong. We need to see the provenance of your
+ provenance!</p>
+
+ </blockquote>
+
<p>For a discussion of how this use case could be addressed, see
<a href="#example-untrusted" class="sectionRef"></a></p>
@@ -335,10 +332,17 @@
<section id="uc-transtime">
<h2>Showing Revision History</h2>
- <p>@@@ we want to be able to see all the changes, for auditing, to
- see what the DB said about anyone at any point in time.
- (transaction time)</p>
+ <blockquote style="font-style: italic">
+ <p>Division 14's legal department says: "We're doing an
+ investigation and we need to be able to connect people and phone
+ numbers as they used to be. Can you include archival data in
+ the data feed, so we we can search the phonebook as it was on
+ each day of September, last year?"
+ </p>
+
+ </blockquote>
+
<p>For a discussion of how this use case could be addressed, see
<a href="#example-transtime" class="sectionRef"></a></p>
@@ -348,8 +352,31 @@
<section id="uc-validtime">
<h2>Expressing Past or Future States</h2>
- <p>@@@ we want to be able to express when someone started and stopped having a particular role various ways, which might not be the time we put this into the db.
- </p>
+ <blockquote style="font-style: italic">
+
+ <p>Division 5 says: "We're planning a major move in three
+ months, to a neighboring city. Everybody's office and phone
+ number will have to change. Can we start putting that
+ information in the phonebook now, but mark it as not effective
+ until 20 July? After the move, we'll also need to see the old
+ (no-longer-in-effect) data for a while, until we get everything
+ straightened out.</p>
+
+ </blockquote>
+
+ <p>This use case, contrasted with the previous one, shows the
+ difference between <em>Transaction Time</em> and <em>Valid
+ Time</em> in bitemporal databases. After Division 5's move, the
+ "old" phone numbers are not just the old state of the database;
+ they are the old state of the world. It is possible that some time
+ after the move an error in some of the pre-move data might be
+ need to be corrected, giving it a new transaction time, even
+ though its valid time range has already ended.</p>
+
+ <p>Use case sighting: <a
+ href="http://www.jenitennison.com/blog/node/101">Temporal Scope
+ for RDF Triples</a>, Jeni Tennison's report of attempting to solve
+ this problem in UK Government data.</p>
<p>For a discussion of how this use case could be addressed, see
<a href="#example-validtime" class="sectionRef"></a></p>
@@ -360,6 +387,15 @@
<section>
<h2>Vendor-Neutral SPARQL Backup</h2>
+ <blockquote style="font-style: italic">
+
+ <p>
+ </p>
+
+ </blockquote>
+
+
+
<p>@@@ we want to be able to dump the database and load it in a different system</p>
</section>
@@ -371,6 +407,7 @@
<section>
<h2>Concepts</h2>
+
<section>
<h2>Space</h2>
@@ -553,6 +590,15 @@
to be used interchangably — systems which handle datasets
SHOULD NOT give significance to empty named graphs.</p>
+ <p class="issue">
+ Can we take a stronger stand against non-quad-equivalent
+ datasets? Maybe we can use the terms "proper" and "improper",
+ or something like that. Improper datasets might also include
+ ones which use the same name in more than one pair. Combining
+ these, like removing empty named graphs, is how you convert an
+ improper dataset to a proper one.
+ </p>
+
</section>
<section>
@@ -1017,6 +1063,12 @@
<h2>Showing Untrusted Quads(v5)</h2>
<p>@@@ Show how to address <a href="#uc-untrusted" class="sectionRef"></a></p>
+ <p>@@@ what if one of the divisions gives you bad quads? It
+ better not mess up provenance. Maybe suggest GSP-style name
+ mangling...? Put "renaming datasets" in Concepts somewhere as a
+ standard thing?</p>
+
+
</section>
@@ -1202,6 +1254,7 @@
<section class="appendix informative" id="changes">
<h2>Changes</h2>
<ul>
+ <li>2012-05-14: Fill in the use cases, removing some of the text that was there and which can go into the example.</li>
<li>2012-05-13: Fill in the example's skeleton, add a few issues/ideas on trig</li>
<li>2012-05-11: Rewriting and reorganizing Concepts; some more work on Usecases and Example; removed the Detailed Example since it needs to be so re-written; renamed 'reflection' to 'folding'; reworked the Semanics</li>
<li>2012-05-10: Wrote a short intro. Started writing the Use Cases section for real. Added grammar for N-Quads and Trig. Did a first draft of the semantics.</li>