[ldpatch] added section on pathological graphs ldpatch
authorAlexandre Bertails <alexandre@bertails.org>
Sun, 27 Jul 2014 17:19:03 -0400
branchldpatch
changeset 727 8b261730c022
parent 726 8a2907faa8e6
child 728 b59441421b0f
[ldpatch] added section on pathological graphs
ldpatch.html
--- a/ldpatch.html	Sun Jul 27 10:58:36 2014 -0400
+++ b/ldpatch.html	Sun Jul 27 17:19:03 2014 -0400
@@ -190,7 +190,7 @@
       </p>
 
       <p id="relation-sparql-update">
-The LD Patch format described in this document should be seen as an assembly language for updating RDF Graphs. It is the intention to confine its expressive power to an RDF diff with minimal support for Blank Nodes. For more powerful operations on RDF Graphs and Quad Stores, the LDP WG recommends the reader to consider <a href="http://www.w3.org/TR/sparql11-update/">SPARQL Update</a>.
+        The LD Patch format described in this document should be seen as an "assembly language" for updating RDF Graphs. It is the intention to confine its expressive power to an RDF diff with minimal support for blank nodes. For more powerful operations on RDF Graphs and Quad Stores, the LDP WG recommends the reader to consider <a href="http://www.w3.org/TR/sparql11-update/">SPARQL Update</a>.
       </p>
 
       <!--p>
@@ -207,6 +207,14 @@
 summary, the only advantage would have been to have a familiar syntax
 for those coming from SPARQL, but with a different runtime semantics
 even, which could have been confusing, even if compatible.
+
+Furthermore SPARL has no built-in support for RDF collections, which
+makes them cumbersome to handle with that language (or any subset of
+it), all the more that RDF collections make an intensive use of blank
+nodes. We made the choice to provide opertions dedicated to RDF
+collections, both for easing their manipulation and for opening the
+way to specific optimizations in RDF stores that treat them specially.
+
       </p-->
 
       <h2 id="intro-example">Example</h2>
@@ -301,23 +309,6 @@
 
     </section>
 
-<!--      <p id='expressive-power'>
-        The scope of LD Patch is intentionally limited when compared to SPARQL Update capabilities. Its expressive power is limited to (relative) RDF Graphs. Matching nodes is achieved using property paths. Nodes can be bound to variables. The language includes support for blank nodes and RDF Lists.
-      </p>
-      <p id='design-choices'>
-        The authors of the document made conscious choices when designing the LD Patch language. Here are a few:
-        <ul>
-          <li>LD Patch was thought to be used in Linked Data applications. It is <strong>not</strong> designed for general purpose RDF applications.</li>
-          <li>if more expressive power is desired, the authors recommends the use of SPARQL Update.</li>
-          <li>LD Patch is not design to describe diffs between RDF Graphs: it is meants to be used in applications as a real patching language. That being said, it is strongly inspired from <a href="http://afs.github.io/rdf-patch/">RDF Patch</a> and is (mostly?) a superset.</li>
-          <li>compatibity with SPARQL Update was not sought as there is no intention to make LD Patch's semantics a subset of SPARQL Update's. As a result, an LDP Patch instance is not a valid SPARQL update query.</li>
-          <li>being able to match Blank Nodes is a desired and non-negociable property</li>
-          <li>good performance characteristics of the operational semantics requires some trade-offs in the features of the language.</li>
-          <li></li>
-        </ul>
-      </p>
-
--->
 
        
     <section class='informative' id='language-features'>
@@ -328,7 +319,7 @@
       <p>
 
       </p>
-      <section>
+      <section id="prefixes">
         <h2><tdef>Prefixes</tdef></h2>
         <p>
 LD Patch offers the possibility to abbreviate URIs by using Turtle's <i>@prefix</i> directive that allows declaring a short prefix name for a long prefix of repeated URIs. This is useful for many RDF vocabularies that are all defined in nearby namespace URIs, possibly using XML's namespace mechanism that works in a similar fashion.
@@ -338,17 +329,62 @@
         </p>
       </section>
 
-      <section>
+      <section id="node-matching-semantics">
         <h2><tdef>Node matching semantics</tdef></h2>
         <p>
-LD Patch borrows much of its syntax to Turtle (@@@ref) and SPARQL (@@@ref) for describing nodes. IRIs (either abbreviated or not) and literals represent the corresponding node in the graph being patched. Blank nodes, on the other hand, pose a problem, as they have no global identifier. Indeed, blank node identifiers have their scope limited to the document in which they appear. As a consequence, whenever a blank node identifiers appears in an LD Patch document, it is understood to denote a <em>fresh</em> blank node, that needs to be created in the patched RDF graph. (@@@ remark that this is different from RDF-Patch)
+LD Patch borrows much of its syntax to Turtle (@@@ref) and SPARQL (@@@ref) for describing nodes. IRIs (either abbreviated or not) and literals represent the corresponding node in the graph being patched. Blank nodes, on the other hand, pose a problem, as they have no global identifier. Indeed, blank node identifiers have their scope limited to the document in which they appear. As a consequence, whenever a blank node identifiers appears in an LD Patch document, it is understood to denote a <em>fresh</em> blank node, that needs to be created in the patched RDF graph. They cannot interfere with existing blank nodes in the graph.
         </p>
         <p>
-In order to be able to address blank nodes already present in the graph, LD Patch has two mechanisms: <tref>Bind</tref>ing a variable to a blank node reachable with a <tref>path expression</tref>, and <tref>UpdateList</tref> to deal with those blank nodes that constitute RDF collections. There are cases where those mechanisms will not be able to unambiuously adress a given blank node, but those cases are deemed pathological, and out of the scope of this specification. (@@@give here an example of pathological graph)
+          In order to be able to address blank nodes already present in the graph, LD Patch has two mechanisms: <tref>Bind</tref>ing a variable to a blank node reachable with a <tref>path expression</tref>, and <tref>UpdateList</tref> to deal with those blank nodes that constitute RDF collections. There are cases where those mechanisms will not be able to unambiuously adress a given blank node, but those cases are deemed <a href="#pathological-graph">pathological</a>, and out of the scope of this specification.
         </p>
       </section>
 
-      <section>
+
+      <section id="pathological-graph">
+        <h2><tdef>Pathological graph</tdef></h2>
+        <p>
+          Given an RDF graph <em>G</em>, a blank node <em>b</em> is said to be unambiguous in <em>G</em> if there exists a couple <em>(n, p)</em> where
+        <ul>
+          <li><em>n</em> is a URI or a literal</li>
+          <li><em>p</em> is an LD Path expression</li>
+        </ul>
+        such that applying <em>p</em> to <em>n</em> results in a unique <em>{b}</em>.
+        </p>
+
+        <p>
+It is easy to see that only the unambiguous blank nodes of a graph can be handled in LD Patch.
+        </p>
+
+        <p>
+Consider for example the following graph:
+        </p>
+
+      <pre class='example'>
+<#> foaf:name "Alice" ; foaf:knows _:b1, _:b2 .
+_:b1 a foaf:Person .
+_:b2 a foaf:Person ; schema:workLocation _:b3 .
+_:b3 schema:name "W3C/MIT" .
+      </pre>
+
+      <p>
+        The blank nodes <code>_:b2</code> and <code>_:b3</code> are unambiguous (they can for example be reached unambiguoulsy from the literal <code>"W3C/MIT"</code>). The blank node <code>_:b1</code>, on the other hand, is ambigious as all path expressions that can can match it would also match <code>_:b2</code>.
+      </p>
+
+      <!-- p>
+        This kind of node is not particularly interesting in RDF, as its presence does not change the semantics of the graph. Indeed, if we remove <code>_:b2</code> and all its triples from the graph above, the resulting graph would be semantically equivalent to the original graph.
+      </p-->
+
+      <p>
+        Another example is a graph containing only blank nodes. All its nodes are therefore ambiguous as they can not be reached from a URI or a literal. Such a graph is not interesting in the context of linked data as it contains no URI to link to or from it.
+      </p>
+
+      <p>
+        Therefore, ambiguous blank nodes are considered a pathological case in the context of LDP, and so the fact that they cannot be coped with in LD Patch is deemed acceptable. Furthermore, their presence in a graph does not prevent the other nodes of that graph to be handled by LD Patch.
+      </p>
+
+      </section>
+
+      <section id="path-expression">
         <h2><tdef>Path expression</tdef></h2>
         <p>
 LD Patch uses path expressions to describe possible routes through a graph between two graph nodes. The main goal is to allow addressing a blank node by “walking” the arcs of the graph from an already identified node. A path is composed by a number of steps, which can be of three kinds:
@@ -377,7 +413,7 @@
         </p>
       </section>
 
-      <section>
+      <section id="bind-statement">
         <h2><tdef>Bind</tdef></h2>
         <p>
 The Bind operation is used to create a new variable by binding or assigning an RDF Term to the variable. The bound variable has a global scope. Use of a given variable name anywhere in an LD Patch document identifies the same variable, although variables can be overriden in subsequent Bound statements. Following the example above, the Bind operation creates a new variable called <code>event</code>, starting from the RDF Term <code>&lt;#&gt;</code> and following the path expression <code>/schema:attendee[/schema:url = &lt;http://conferences.ted.com/TED2009/&gt;]</code> in order to identify the RDF Term to which this variable will be bound to -- i.e. <code>_:b2</code>.
@@ -401,7 +437,7 @@
         </p>
       </section>
 
-      <section>
+      <section id="add-statement">
         <h2><tdef>Add</tdef></h2>
         <p>
 The Add operation is used to add or append new RDF triples to the existing graph. To add new RDF triple, the operation requires a <tref>Subject</tref>, a <tref>Predicate</tref> and either an <tref>Object</tref> or a <tref>List</tref>.
@@ -416,7 +452,7 @@
         </p>
       </section>
 
-      <section>
+      <section id="delete-statement">
         <h2><tdef>Delete</tdef></h2>
         <p>
 The Delete operation is used to remove a single RDF triple from the existing graph. The syntax for the Delete operation requires a <tref>Subject</tref>, a <tref>Predicate</tref> and an <tref>Object</tref>.
@@ -430,7 +466,7 @@
         </p>
       </section>
 
-      <section>
+      <section id="update-list-statement">
         <h2><tdef>UpdateList</tdef></h2>
         <p>
 The UpdateList operation is used to update the members of an RDF collection (@@@ref). That collection is supposed to be the object of a triple, specified by its <tref>Subject</tref> and <tref>Predicate</tref>. A <tref>Slice</tref> specification then describes which members (if any) of the collections are affected by the operation, and then a <tref>List</tref> of new members is provided. In the example below, UpdateList is used to replace the second member of a collection by the literal "fr-CH".