Linked Data Platform Best Practices and Guidelines

About this Document

Impetus

While writing the Linked Data Platform Specification, the authors and contributors felt compelled to share common conventions and valuable lessons-learned. Yet, at the same time, they did not wish to impose or imply unnecessary restrictions, or to make the formal specification unnecessarily verbose. This document, along with the LDP Primer [[LDP-PRIMER]], was therefore developed to provide additional context. Drawing upon the professional experiences of its authors and contributors, research into the rich history of related technologies, and continuous feedback from the community at large, it aims to help system implementers avoid common pitfalls, improve quality, and achieve greater interoperability with other Linked Data systems.

Terminology

For the purposes of this document, it is useful to make a minor, yet important distinction between the term 'best practice' and the term 'guideline'. We define and differentiate the terms as follows:

best practice: An implementation practice (method or technique) that has consistently shown results superior to those achieved with other means and that is used as a benchmark. Best practices within this document apply specifically to the ways that LDP servers and clients are implemented as well as how certain resources are prepared and used with them. In this document, the best practices might be used as a kind of check-list against which an implementer can directly evaluate a system's design and code. Lack of adherence to any given best practice, however, does not necessarily imply a lack of quality; they are recommendations that are said to be 'best' in most cases and in most contexts, but not all. A best practice is always subject to improvement as we learn and evolve the Web together.
guideline: A tip, a trick, a note, a suggestion, or answer to a frequently asked question. Guidelines within this document provide useful information that can advance an implementer's knowledge and understanding, but that may not be directly applicable to an implementation or recognized by consensus as a 'best practice'.

Please see the Terminology section in Linked Data Platform 1.0 [[LDP]] as well as the Linked Data Glossary [[LD-GLOSSARY]] for definitions to a variety of terms used in this document and related to the Linked Data sphere of knowledge.

Prerequisites and Assumptions

Implementers should have at least a general familiarity with the informative references cited in this document - especially the following:

RDF Vocabulary Description Language 1.0: RDF Schema [[RDF-SCHEMA]] - The Resource Description Framework (RDF) is a general-purpose language for representing information in the Web and it is the defacto language for expressing Linked Data. This specification describes how to use RDF to describe RDF vocabularies.
RDF Primer 1.1 [[RDF-PRIMER11]] - This Primer is designed to provide the reader with the basic knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description Language, and gives an overview of some deployed RDF applications. It also describes the content and purpose of other RDF specification documents.
Turtle - Terse RDF Triple Language [[TURTLE]] - defines a textual syntax for RDF called Turtle that allows RDF graphs to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. RDF examples used in this document are expressed in Turtle.
Linked Data Glossary [[LD-GLOSSARY]] - a useful glossary containing terms defined and used to describe Linked Data, and its associated vocabularies and best practices for publishing structured data on the Web.
Linked Data Platform 1.0 [[LDP]] - the formal specification for the LDP read-write Linked Data architecture, based on HTTP access to web resources that describe their state using the RDF data model.
Linked Data Platform 1.0 Test Cases [[LDP-TESTS]] - a standard set of tests provided by the W3C, which can be use to verify an implementation's conformance to the LDP specification.
Linked Data Platform Primer [[LDP-PRIMER]] - an introduction to LDP, which describes the basic concepts of LDP such as Linked Data Platform Resources (LDPRs), Linked Data Platform Containers (LDPCs), and their affordances. The Primer provides a running example illustrating how an LDP client can interact with an LDP server in the context of a read-write Linked Data application (i.e. how to use HTTP for accessing, updating, creating and deleting resources from servers that expose their resources as Linked Data).
Linked Data Platform Use Cases and Requirements [[LDP-UCR]] - a set of user stories, use cases, scenarios and requirements that motivate a simple read-write Linked Data architecture, based on HTTP access to web resources that describe their state using RDF.

Best Practices

Predicate URIs should be HTTP URLs

URIs are used to uniquely identify resources and URLs are used to locate resources on the Web. That is to say that a URL is expected to resolve to an actual resource, which can be retrieved from the host. A URI, on the other hand, may also be a URL, but it does not have to be; it may refer to something that has no retrievable representation.

One of the fundamental ideas behind Linked Data is that the things referred to by HTTP URIs can actually be looked up ("dereferenced"). This important principle was originally outlined by Tim Berners-Lee as rule #2 of "the four rules" for linking data [[LD-DI]]. It is therefore ideal that predicate URIs identify resources with representations that are retrievable. LDP servers should at least provide [[RDF-SCHEMA]] representations of these predicates where possible.

Of course, it is also a common practice to reuse properties from open vocabularies that are publicly available. In this case, implementers have no control over the result when attempting to dereference the URI. For this reason, publishers who wish to make their vocabularies useful for linking data should strive to provide a retrievable representation of the properties their vocabularies define. Consequently, implementers are also expected to use this practice as a benchmark for which to judge the efficacy of a vocabulary's use for linking data.

Use and include the predicate rdf:type to represent the concept of type in LDPRs

It is often very useful to know the type (class) of an LDPR, though it is not essential to work with the interaction capabilities that LDP offers. Still, to make data more useful in the broadest context, type should be explicitly defined using the rdf:type predicate defined by [[RDF-SCHEMA]].

This provides a way for clients to easily determine the type(s) of a resource without having to perform additional processing or make additional HTTP requests. For example, clients that cannot infer the type because they do not support inferencing can benefit from this explicit declaration.

The token 'a' in the predicate position of a Turtle triple represents the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type. In the example above, therefore, a contact:Person is the same as rdf:type contact:Person or the fully-qualified form, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> contact:Person.

Use relative URIs

Relative URIs are useful to the Linked Data Platform in much the same ways that relative URLs [[RFC3986]] have been useful to traditional web systems. Since the things referred to by Linked Data URIs should provide a retrievable representation [[LD-DI]], Linked Data URIs are usually also URLs; they locate rather than just identify. As such, the utilitarian value of relative URLs still applies; especially since the LDP Container model promotes hierarchical representations.

Implementers should therefore align the function of relative URIs in LDP with those of traditional relative URLs where possible and appropriate. Aside from giving developers the comfort and convenience of familiarity, they provide many of the same advantages.

Relative URIs are shorter than absolute URIs.

In many cases, this can aid development by making code and RDF easier for humans to read. It can also reduce the size of payloads, which in turn, can reduce network traffic and stress on servers, while improving response times for end-users.

Relative URIs can make resources more portable.

When information which is already known from the context of the base resource's retrieval is omitted, there can be less information to modify when its location changes. This can make copying resources to new servers or to a new position in a containment hierarchy easier; in the preceding example, a process that clones just the container need not adjust any of its member URIs.

Relative URIs are convenient during development.

During development the scheme and network location information in a URI may either be unknown or likely to change. The commonly used 'localhost' for example, is better expressed by the server name or a domain name. Developers often experience less hassle by omitting this information. Additionally, the hierarchy implied by a relative URI may be mimicked in a server file system, which can help developers find and work with information, even when the server isn't running.

Relative URIs support arbitrary, machine-generated URIs.

RESTful URLs are often defined by a pattern of hierarchical 'collections', which clients interact with in very logical ways. For example, when creating a new resource the client does not typically know the name of the resource until after a successful POST to a collection's URL. A POST to /people/ for example, may create the resource /people/1 . LDP Containers are such collections whose URIs can benefit from the same model, which in some implementations, may actually be crucial.

Avoid dot-segments in URIs of POSTed content or use with caution

The semantics of dot-segments (eg. ../) within relative URIs may be implied by other specifications and by common historical use, but in the case of LDP, additional consideration is required.

The LDP specification states that...

LDP servers MUST assign the default base-URI for [[RFC3987]] relative-URI resolution to be the HTTP Request-URI when the resource already exists, and to the URI of the created resource when the request results in the creation of a new resource.

It follows from this definition that use of ../ and other non-null relative URI constructs during POST can cause the posted content to be referring to resources in a manner the client might not be able to predict. Dot-segments should therefore be avoided unless the client knows specifically what can be expected of the given implementation and/or deployment.

Represent container membership with hierarchical URIs

Hierarchical URIs are good for containers because they enable the use of relative URIs. They also promote easy interaction with resources that are modeled to represent parent-child relationships where the child logically belongs to the parent.

One example of such a model can be found in the case of the oslc_cm:attachment container from the vocabularies defined by the Open Services for Lifecycle Collaboration (OSLC) community. The OSLC defines specifications and vocabularies that are well-aligned to LDP. A resource in an OSLC compliant change management system such as an issue or bug tracker may have attachments represented by the oslc_cm:attachment container. The URI for such a container might be represented as follows:

http://example.org/bugs/2314/attachments/

From this URI, the URI of the parent resource which holds the attachments is easily discerned. The base container for other sibling resources can be discerned by moving up the hierarchy, which is implied by the URI. Meta-data or binary content might be fetched further down the hierarchy by using a URI such as the following:

http://example.org/bugs/2314/attachments/1

In addition to making the use of relative URIs possible, hierarchical URIs make interacting with resources easier for users because they represent the actual structure of the underlying graph. Software agents (code acting on behalf of users [[WEBARCH]]) must be careful before exploiting the structure of URIs, considering historical problems when doing so ([[WEBARCH]], [[metaDataInURI]]).

Include a trailing slash in container URIs

When representing container membership with hierarchical URLs, including the trailing slash in a container's URI makes it easier to use relative URIs. Take the following container URI for example:

http://example.org/container1

It is more advantageous to use the following instead:

http://example.org/container1/

To illustrate the advantage, let's start with the following container using absolute URIs:

Suppose now that we wish to reflect the same resource using relative URIs. If the URI of the container includes the trailing slash, we end up with a very elegant representation, as shown below.

But suppose that we omit the trailing slash, issued an HTTP GET, and the container returned the representation shown above. This could produce a graph that is equivalent to the following:

That is not what was intended; the member URLs lack the container path segment. The returned document would have to be more verbose in order to be correct:

So, clearly, the better solution is to ensure that container URIs end with a trailing slash.

Use fragments as relative identifiers

Resource URIs are permitted to end with a fragment; the fragment component is delimited from the rest of the URI because it is introduced by a hash mark (#). For this reason, URIs with non-empty fragments are often called hash URIs; a hash URI identifies a subordinate or related resource [[RFC3986]].

Take the URI, http://www.example.org/products#item10245 , for example. The non-fragment portion of the URI is the part preceding the hash mark, http://www.example.org/products , and the fragment identifier is the part that follows, item10245 .

When expressing Linked Data Platform Resources in RDF, fragments are useful because they can be expressed as relative URIs on the document describing them. This is particularly handy for describing multiple LDPRs whose representations are contained within a single document.

First, it provides the convenience and efficiency of brevity. Suppose, for example, the resources foo, bar, and baz are contained in the same document. Since serving all of the descriptions in a single document is acceptable, we can mint relative URIs within the document using the fragment identifier ( <#foo> , <#bar> and <#baz> ). [[LDP]] ensures that the default base URI is the document URI ( http://www.example.org/products ), so the absolute URI for each is the base URI, plus the hash mark, plus the fragment identifier.

Second, it can help avoid certain complexities inherent with other approaches. Achieving the same result using three independent dereferenceable URIs could be more involved because multiple documents would have to be published, perhaps also including the setup of 303 redirects.

See also:

Cool URIs for the Semantic Web

Axioms of Web Architecture, URI References: Fragment Identifiers on URIs
http://www.w3.org/DesignIssues/Fragment.html

Dereferencing HTTP URIs
http://www.w3.org/2001/tag/doc/httpRange-14/2007-05-31/HttpRange-14

Prefer standard datatypes

LDPR representations should use only the following standard datatypes. RDF does not by itself define datatypes to be used for literal property values, therefore a set of standard datatypes based on [[XMLSCHEMA11-2]] and [[RDF-PRIMER11]] should be used:

URI	Description
http://www.w3.org/2001/XMLSchema#boolean	Boolean type as specified by XSD Boolean
http://www.w3.org/2001/XMLSchema#date	Date type as specified by XSD date
http://www.w3.org/2001/XMLSchema#dateTime	Date and Time type as specified by XSD dateTime
http://www.w3.org/2001/XMLSchema#decimal	Decimal number type as specified by XSD Decimal
http://www.w3.org/2001/XMLSchema#double	Double floating-point number type as specified by XSD Double
http://www.w3.org/2001/XMLSchema#float	Floating-point number type as specified by XSD Float
http://www.w3.org/2001/XMLSchema#hexBinary	Arbitrary hex-encoded binary data as specified by XSD hexBinary
http://www.w3.org/2001/XMLSchema#integer	Integer number type as specified by XSD Integer
http://www.w3.org/2001/XMLSchema#string	String type as specified by XSD String
http://www.w3.org/2001/XMLSchema#base64Binary	Binary type as specified by XSD Base64Binary
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral	Literal XML value as specified by RDF

Re-use established linked data vocabularies instead of (re-)inventing duplicates

This section summarizes some well-known RDF vocabularies that should be used in Linked Data Platform Resources wherever a resource needs to use a predicate whose meaning matches one of these. For example, if a resource has a description, and the application semantics of that description is compatible with dcterms:description , then dcterms:description should be used. If needed, additional application-specific predicates may be used. A specification for a domain may require one or more of these properties for a particular resource type. The Range column in the tables below identifies the defined rdfs:range for the properties.

Common Properties

From Dublin Core URI: http://purl.org/dc/terms/

Property	Range/DataType	Comment
dcterms:contributor	dcterms:Agent
dcterms:creator	dcterms:Agent
dcterms:created	xsd:dateTime
dcterms:description	rdf:XMLLiteral	Descriptive text about the resource represented as rich text in XHTML format. should include only content that is valid and suitable inside an XHTML element.
dcterms:identifier	rdfs:Literal
dcterms:modified	xsd:dateTime
dcterms:relation	rdfs:Resource	The HTTP URI of a related resource. This is the predicate to use when you don't know what else to use. If you know more specifically what sort of relationship it is, use a more specific predicate.
dcterms:subject	rdfs:Resource
dcterms:title	rdf:XMLLiteral	A name given to the resource. Represented as rich text in XHTML format. should include only content that is valid inside an XHTML element.

The predicate dcterms:type should not be used, instead use rdf:type . [[DC-RDF]].

From RDF URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#

Property	Range/DataType	Comment
rdf:type	rdfs:Class	The type or types of the resource

From RDF Schema URI: http://www.w3.org/2000/01/rdf-schema#

Property	Range/DataType	Comment
rdfs:member	rdfs:Resource
rdfs:label	rdfs:Literal	Only use this in vocabulary documents, to define the name of the vocabulary term.

Use qvalues properly

Quality factors allow the user or user agent to indicate the relative degree of preference for a media-range, using the qvalue scale from 0 to 1. The default value is q=1. Take the following for example:

Accept: text/turtle; q=0.5, application/json

This should be interpreted as "I prefer application/json, but send me text/turtle if it is the best available after a 50% mark-down in quality."

Improper handling of qvalues is a common problem in implementations of content negotiation.

Refer to Section 14, Header Field Definitions, in the [[HTTP11]] specification to understand the proper use and evaluation of qvalues for both client and server implementations.

Respond with primary URLs and use them for identity comparison

Clients can access an LDPR using multiple URLs. An LDP server should respond to each of those requests using a single consistent URL, a primary URL, for the LDPR. This primary URL may be found in the response's Location and/or Content-Location headers, and potentially also in the representation of the LDPR. A common case is URLs that vary by protocol, one HTTP and one HTTPS, but are otherwise identical. In most cases those two URLs refer to the same resource, and the server should respond to requests on either URL with a single (primary) URL.

Clients should use the primary URL as an LDPR's identity; for example, when determining if two URLs refer to the same resource clients should compare the primary URLs, not the URLs used to access the resources. Note that this usage does imply that the client has sufficient reason to trust the headers and/or content by which the primary URL is communicated to the client, which is beyond what HTTP alone can guarantee [[RFC7231]].

Representing relationships between resources

LDPRs can use one RDF triple to represent a link (relationship) to another resource. Having the source resource’s URI as the subject and the target resource’s URI as the object of the triple is enough. Contrary to a misconception that readers with certain backgrounds may assume, the creation of an "intermediate link" resource is not required to express the relationship.

Minimize server-specific constraints

LDP servers should enable simple creation and modification of LDPRs.

It may be common for LDP servers to put restrictions on representations – for example, the range of rdf:type predicates, datatypes of the objects of predicates, and the number of occurrences of predicates in an LDPR, but servers should minimize such restrictions.

For some server applications, excessive constraints on modification of resources may be required. However, enforcement of more complex constraints will greatly restrict the set of clients that can modify resources. For interoperability with a wider range of clients, implementers are therefore encouraged to minimize server-specific constraints.