Update stability section
authorBoris Villazon-Terrazas <bvillazon@fi.upm.es>
Thu, 16 Feb 2012 02:16:54 +0100
changeset 89 7916a6ff936c
parent 88 5a181159ccf6
child 90 06fee5a244ce
Update stability section
--- a/bp/index.html	Sat Feb 11 17:51:18 2012 +0000
+++ b/bp/index.html	Thu Feb 16 02:16:54 2012 +0100
@@ -224,7 +224,7 @@
-<h3>Vocabulary Selection -  Boris</h3>
+<h3>Vocabulary Selection -  	Boris</h3>
 <p class='responsible'>Michael Hausenblas (DERI), Ghislain Atemezing (INSTITUT TELECOM), Boris Villazon-Terrazas (UPM),  Daniel Vila-Suero (UPM), George Thomas (Health & Human Services, US), John Erickson (RPI), Biplav Srivastava (IBM)</p>
 The group will provide advice on how governments should select RDF vocabulary terms (URIs), including advice as to when they should mint their own. This advice will take into account issues of stability, security, and long-term maintenance commitment, as well as other factors that may arise during the group's work.
@@ -444,17 +444,164 @@
-<!--    STABILITY   -->
+<!--  << STABILITY   -->
 <h3>Stability - Boris</h3>
 <p class='responsible'>Anne Washington (GMU), Ron Reck</p>
+<section> <!-- << STABILITY.overview -->
+<p>This section will focus how to publish data so that others can rely on it being available in perpetuity, persistently archived if necessary.</p>
+<p><i>The scope, limits and explanation of stability.</i></p>
+<p>The following definition describes stability of LOD.</p>
-This section specifies how to publish data so that others can rely on it being available in perpetuity, persistently archived if necessary.
+<b>Stability.</b> <u>Stable</u> LOD is persistent, predictable and machine accessible from externally visible locations. 
+	<li>Persistent = Information accessible for an unbounded period of time.</li>
+	<li>Predictable = Names and information follow a logical format.</li>
+	<li>Stable location = Externally visible locations are consistent in name, and availability.</li>
+	<li>Other things that impact stability
+		<ul>
+			<li>legacy = earlier naming schemes, formats, data storage devices</li>
+			<li>steward = people who are committed to consistently maintain specific datasets, either individuals or roles in organizations</li>
+			<li>provenance = the sources that establish a context for the production and/or use of an artifact. See <a href="http://www.w3.org/2011/prov/wiki/Main_Page" target="_blank">W3C Provenance working group</a></li>
+		</ul>
+	</li>
-<p class="todo">Integrate Wiki <a href="http://www.w3.org/2011/gld/wiki/225_Best_Practices_for_Stability">content</a>.</p>
+<p><i>The purpose of having a best practice for stability</i></p>
+<p>The length of time information is available is inherently connected to the value placed upon it. 
+If information is deemed valuable, it is likely to persist for a longer period of time. 
+Value, which can change over time, is always determined based on a cost-benefit relationship; 
+Any benefit derived from information is reduced by the cost(s) associated with using it. 
+Increasing stability requires the adoption of a strategy to allocate limited resources for achieving a goal. 
+Goals drive data providers' criteria to make a selection of what is best preserved.</b>
+<p>We believe that <b>preservation of content</b> is the main goal for stability, possible goals include:</p>
+<ol type="1">
+	<li><b>Preservation of content.</b> It might be important to have raw data available for analysis ad infinitum. This means the overall objective is to preserve only the scientific content.</li>
+	<li><b>Preservation of access.</b> It might be important to have information available immediately at all times.</li>
+	<li><b>Conservation.</b> From a historical perspective one could seek to preserve all information in the format and modality in which it was originally conveyed. The most demanding is conservation of the full look and feel of the publication.</li>
+<h5>Success Factors</h5>
+<p>ORGANIZATIONAL CONSIDERATIONS Without internal stability from the data stewards, any external technology stability is a challenge. These following are some organization characteristics for stable data.</p>
+	<li>Consistent human skills</li>
+	<li>Consistent infrastructure</li>
+	<li>Data related to organizational values or business needs</li>
+	<li>Internal champion or consistent business process</li>
+	<li>Internal politics on variation names do not impact external locations</li>
+<p>Mark metadata based on its intended audience</p>
+	<li>Internal-audience : management of the process</li>
+	<li>External-audience : final state, or no-update needed.</li>
+</section> <!-- STABILITY.overview >> -->
+<section> <!-- << STABILITY.examples -->
+<p><i>These are a few representative samples to generate discussion and comment. Additional suggestions are encouraged.</i></p>
+<p>These examples were discussed on the <a href="http://lists.w3.org/Archives/Public/public-gld-wg/" target="_blank">public-gld email listserv</a></p>
+<p><b>Technical examples</b>  What existing examples can we point to? (Need international ones...)</p>
+<ol type="1">
+	<li>Internet Archive - <a href="http:www.archive.org">http:www.archive.org</a></li>
+<p><b>Institutional examples</b> Who has the incentive to provide stable persistent data? Some real possibilities and some metaphors for discussion.</p>
+<ol type="1">
+	<li>Archives
+		<ol type="i">
+			<li>Third party entities that document provenance and provide access</li>
+		</ol>
+	</li>
+	<li>Estate Lawyer
+		<ol type="i">
+			<li>Someone responsible for tracking down heirs for an inheritance</li>
+		</ol>
+	</li>
+	<li>Private Foundation
+		<ol type="i">
+			<li>A philanthropic entity who is interested in the value proposition of stability and acts as archive</li>
+		</ol>
+	</li>
+	<li>Government
+		<ol type="i">
+			<li>A government organization which has the funds to steward others' data</li>
+		</ol>
+	</li>
+	<li>Internet organization
+		<ol type="i">
+			<li>A global open organization like W3C or IKAN</li>
+		</ol>
+	</li>
+</section> <!-- STABILITY.examples >> -->
+<section> <!-- << STABILITY.Properties -->
+<p><i>These are characteristics that influence the stability or longevity. Many of these properties are not unique to LOD, yet they influence data cost and therefore data value.</i></p>
+	<li><b><u>Integrity.</u></b>Provide checksums of downloads so that consumers can be assured that they have received the entire dataset. Data that is unreliable should not used for critical decisions and is therefore of less value than data that is deemed complete. Possible checksum types include MD5 and SHA.</li>
+	<li><b><u>Consistency.</u></b>Any design of a data format should recognize that change is necessary and will happen. Recognition that change is enviable while providing a mechanism for embracing modification increases continuity and longevity.
+	The following types of changes can be anticipated. Therefore, data design should be made to accommodate them:
+	<ol type="1">
+		<li>The person who published the data changes jobs - For <u>Contact Consistency</u> - Any support contact information should be published using a data steward so that the inherent transition of responsibility does not introduce inconsistency to consumers.</li>
+		<li>Departments, Agencies and Governments are reorganized - For <u>File Naming and Data Consistency</u> - Discourage the use of the originating source as component in the name of the data file, or the URIs it contains.The information can appropriately be contained within the file as metadata.</li>
+		<li>IT infrastructure overhaul - <u>For File Naming Consistency</u> - Discourage the use of the server or system as component in the name of the data file.</li>
+		<li>Merger/acquisition - For <u>Data Consistency</u> - Discourage the use of branding as it inherently and needlessly increases cost for new owners while providing no value at all to consumers.</li>
+		<li>Primary stakeholder loses interest in the data - As above For <u>Data Consistency</u> - Discourage the use of branding as it inherently increase cost for new owners.</li>
+	</ol>	
+	</li>
+<li><b><u>Data Repository Consistency.</u></b>As new data is produced, old data becomes legacy data. Consumers of data will write programs to automate processing of legacy data and the number of changes in format directly effects the cost incurred by data consumers. Data providers should carefully consider whether the benefit of the change exceeds the incurred cost of modifying ingestion procedures. Even changing formats between different serializations has a cost to consumers as they need to anticipate and provide for the change. Data providers should consider lifecycle workflow and when at all possible they should modify legacy data themselves so that all provided data is consistent and each consumer will not be required to perform exactly the same data conversion task to create a homogeneous data repository.</li>
+	<ul>
+		<li><b><u>Discrete</u></b> It is best to have a greater number of small files rather than fewer larger files. Smaller files reduce the cost on consumers. Files should be comprised of meaningful discrete units based on a time period, locality or other logical unit.</li>
+		<li><b><u>File names</u></b> Files should be meaningfully named without using non-printable characters.</li>
+		<li><b><u>Archive structure</u></b> Data archives should be nested in least a single directory. The directory name should be unique to accommodate multiple archives to be uncompressed without introducing collisions.</li>
+	</ul>
+<li><b><u>Organization.</u></b>The minimum metadata accompanying each data offering should include:
+	<ol type="1">
+		<li>Serialization type (such as NTriples or RDF/XML)</li>
+		<li>Publisher</li>
+		<li>Creation Date</li>
+		<li>Modification Date</li>
+		<li>Version</li>
+		<li>Email address for data steward</li>
+	</ol>
+<li><b><u>Complexity.</u></b>All serializations are equal to a back-end system, therefore providers should serialize RDF in either
+	<ul>
+		<li>turtle - The turtle serialization minimizes the disk space expenditure while also increasing human readability.</li>
+		<li>NTriples - The NTriples serialization increases integrity in that re-ordering will have no effect on semantics, and damaged lines only effect the assertion on those lines. NTriples also increases flexibility because files can be split into smaller files as long as the division happen at the end of the line.</li>
+	</ul>
+<li><b><u>Diskspace Resource.</u></b>Different serializations represent the same semantics but require varying amounts of characters (diskspace). While Turtle provides the most concise serialization and is arguably the easiest for humans to read. Turtle does not provide the integrity that NTriples does because NTriples can be reordered or split up based on size or line count without effecting the integrity of the dataset. In general NTriples will provide the greatest overall stability for LOD. Compression of data should be done using either GZIP or ZIP, do not choose to adopt other compression approaches just because they are "free". The maximum data compression should be chosen.</li>
+</section> <!--  STABILITY.Properties >> -->
+<section> <!-- << STABILITY.Interconnections -->
+<p><i>Ways that this best practice is connected to others.</i></p>
+<p><b>STABIILTY, URL, and URIs.</b> The identifiers used in LOD are a possible point of failure, therefore use URIs that dereference under DNS that you control or that have greatest likeliness to persist. Use URI's according to the best practices stated <b>elsewhere in this document</b> increases value. Other strategies for maximizing the longevity of URI's include:</p>
+<ol type="1">
+	<li>PURLs (Persistent Uniform Resource Locators) <a href="http://purl.oclc.org/docs/index.html" target="_blank">purl.oclc.org</a></li>
+	<li>Handle System <a href="http://www.handle.net/" target="_blank">http://www.handle.net/</a> and its commercial cousin <a href="http://www.doi.org/" target="_blank">Digital Object Identifier</a></li>
+<p><b>Vocabulary Choices Effect Value</b> When LOD uses or references vocabularies or vocabulary items it is a point of frailty, which therefore can effect cost. Vocabulary use according to the best practices stated <b>elsewhere in this document</b> increases value.</p>
+</section> <!-- STABILITY.Interconnections >> -->
+<!-- <p class="todo">Integrate Wiki <a href="http://www.w3.org/2011/gld/wiki/225_Best_Practices_for_Stability">content</a>.</p> -->
[email protected]@ TODO: Include references
+</section> <!-- STABILITY >> -->
 <!--   SOURCE DATA   -->
--- a/bp/respec-config.js	Sat Feb 11 17:51:18 2012 +0000
+++ b/bp/respec-config.js	Thu Feb 16 02:16:54 2012 +0100
@@ -1,7 +1,7 @@
 var respecConfig = {
     // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
     specStatus:           "ED",
-    publishDate:          "2012-02-06",
+    publishDate:          "2012-02-16",
     //copyrightStart:       "2010",
     // the specification's short name, as in http://www.w3.org/TR/short-name/
@@ -34,7 +34,7 @@
     editors:  [
         { name: "Michael Hausenblas", url: "http://sw-app.org/mic.xhtml#i", company: "DERI", companyURL: "http://www.deri.ie" },
 		{ name: "Bernadette Hyland", url: "https://twitter.com/bernhyland",  company: "3 Round Stones", companyURL: "http://3roundstones.com/"},
-		{ name: "Boris Villaz&oacute;n-Terrazas", url: "http://boris.villazon.terrazas.name",  company: "OEG-UPM", companyURL: "http://www.oeg-upm.net"}
+		{ name: "Boris Villaz&oacute;n-Terrazas", url: "http://boris.villazon.terrazas.name",  company: "OEG, UPM", companyURL: "http://www.oeg-upm.net"}
     // authors, add as many as you like.