gld: changeset 5:e51fe9848e2b

Binary file bp/img/vocabularycreation.PNG has changed

--- a/bp/index.html	Thu Sep 01 00:09:04 2011 +0100
+++ b/bp/index.html	Fri Jan 20 15:36:24 2012 +0100
@@ -52,17 +52,122 @@
 	<li>Guidance in explaining the value proposition for LOD to stakeholders, managers and executives.</li>
 	<li>Assist the Working Group in later stages of the Standards Process, in order to solicit feedback, use cases, etc.</li>
 </ul>
-<p class='todo'>QUESTION: would it make sense to base the above mentioned GLD life cycle on the general <a href="http://linked-data-life-cycles.info/">Linked Data life cycles</a>?</p>
+<p class='todo'>QUESTION: would it make sense to base the above mentioned GLD life cycle on the general <a href="http://linked-data-life-cycles.info/">Linked Data life cycles</a>? We start collecting the available <a href="http://www.w3.org/2011/gld/wiki/GLD_Life_cycle">GLD Life cycles</a> </p>
 </section>
 
 
+
 <!--    PROCUREMENT   -->
 <section>
 <h3>Procurement</h3>
-<p class='responsible'>TBD</p>
+<p class='responsible'>George Thomas (Health & Human Services, US), Mike Pendleton (Environmental Protection Agency, US), John Sheridan (OPSI, UK)</p>
 <p>
 Specific products and services involved in governments publishing linked data will be defined, suitable for use during government procurement. Just as the <a href="http://www.w3.org/WAI/intro/wcag" title="WCAG Overview">Web Content Accessibility Guidelines</a> allow governments to easily specify what they mean when they contract for an accessible Website, these definitions will simplify contracting for data sites and applications.
 </p>
+
+<p>
+Linked Open Data (LOD) offers novel approaches for publishing and consuming data on the Web. This procurement overview and companion glossary is intended to help contract officers and their technical representatives understand LOD activities, and their associated products and services. It is hoped that this will aid government officials in procuring LOD related products and services.
+</p>
+
+<h4>Overview</h4>
+<p>
+Recent Open Government initiatives call for more and better access to government data. To meet expanding consumer needs, many governments are now looking to go beyond traditional provisioning formats (e.g. CSV, XML), and are beginning to provision data using Linked Open Data (LOD) approaches.
+</p>
+
+<p>
+In contrast to provisioning data on the Web, LOD provisions data into the Web so it can be interlinked with other linked data, making it easier to discover, and more useful and reusable. LOD leverages World Wide Web standards such as Hypertext Transfer Protocol (HTTP), Resource Description Framework (RDF), and Uniform Resource Identifiers (URIs), which make data self-describing so that it is both human and machine readable. Self-describing data is important because most government data comes from relational data systems that do not fully describe the source data schema needed for application development by third parties.
+</p>
+
+<p>
+While LOD is a relatively new approach to data provisioning, growth has been exponential. LOD has been adopted by other national governments including the UK, Sweden, Germany, France, Spain, New Zealand and Australia.
+</p>
+
+<p>
+While LOD is a relatively new approach to data provisioning, growth has been exponential. LOD has been adopted by other national governments including the UK, Sweden, Germany, France, Spain, New Zealand and Australia.
+</p>
+
+<p>
+Development and maintenance of linked data is supported by the Semantic Web/Semantic Technologies industry. Useful information about industry vendors/contractors, and their associated products and services, is available from the World Wide Web Consortium’s Government Linked Data (W3C/GLD) workgroup Community Directory.
+<p>
+
+<p>
+The following categorizes activities associated with LOD development and maintenance, and identifies products and services and associated with these activities:
+</p>
+
+<ol type="1">
+	<li>LOD Preparation<li>
+	<p>Products :</p>
+	<p>Services : Services that support modeling relational or other data sources using URIs, developing scripts used to generate/create linked open data. Overlap exists between LOD preparation and publishing.</p>
+	<li>LOD Publishing</li>
+	<p>Products: RDF database (a.k.a. triple store) enables hosting of linked data</p>
+	<p>Services: These are services that support creation, interlinking and deployment of linked data (see also linked data preparation). Hosting data via a triple store is a key aspect of publishing. LD publishing may include implementing a PURL strategy. During preparation for publishing linked data, data and publishing infrastructure may be tested and debugged to ensure it adheres to linked data principles and best practices. (Source: Linked Data: Evolving the Web into a Global Data Space, Heath and Bizer, Morgan and Claypool, 2011, Section 5.4, p. 53)</p>
+	<li>LOD Discovery and Consumption</li>
+	<p>Products: Linked Data Browsers allow users to navigate between data sources by following RDF links; Linked Data Search Engines crawl linked data by following RDF links, and provide query capabilities over aggregated data.</p>
+	<p>Services: These are services that support describing, finding and using linked data on the Web. Publication of linked data contributes to a global data space often referred to as the Linked Open Data Cloud or ‘Web of Data.’ These are services that support the development of applications that use (i.e. consume) this ‘Web of Data.’</p>
+	<li>Management Consulting and Strategic Planning</li>
+	<p>Products: Not applicable</p>
+	<p>Services: There are a broad range of management related services; examples include briefings intended for decisions makers to provide a general understanding of the technology, business case, ROI; strategic planning support (e.g. enterprise linked data, implementation of PURLs, etc.)</p>
+	<li>Formal Education and Training</li>
+	<p>Products :</p>
+	<p>Services: Various private companies and universities offer training related to linked open data. These offerings vary widely. Trainings vary from high-level informational trainings intended to provide managers/decision makers with general understanding, to in-depth, hands-on instruction for the tech savvy on how to prepare, publish and consume linked data.</p>
+</ol>
+
+<h4>Procurement Checklist</h4>
+<p>
+Note: This portion of Procurement Best Practices was moved here from the LOD Cookbook?
+
+The following is an outline of questions a department/agency should consider reviewing as part of their decision to choose a service provider:
+<ul>
+<li>Is the infrastructure accessible and usable from developers’ environment?</li>
+
+<li>Is the documentation aimed at developers comprehensive and usable?</li>
+
+<li>Is the software supported and under active development?</li>
+
+<li>Is there an interface to load data and “follow your nose” through a Web interface?</li>
+
+<li>Can the data be queried programmatically via a SPARQL endpoint?</li>
+
+<li>Does the vendor have reference sites? Are they similar to what you are considering in production?</li>
+
+<li>What is the vendor’s past performance with government agencies or authorities?</li>
+
+<li>Does the vendor provide training for the products or services?</li>
+
+<li>What is the vendor’s Service Level Agreement?</li>
+
+<li>Is there a government approved contract vehicle to obtain this service or product?</li>
+
+<li>Is the vendor or provider an active contributor to Open Source Software, Standards groups, activities associated with data.gov and Linked Open Data projects at the enterprise and/or government level.</li>
+
+<li>Does the vendor or provider comply with the department/agency’s published Open Source Policy?</li>
+</ul>
+</p>
+
+<h4>Glossary</h4>
+<ul>
+<li>
+Linked Open Data: A pattern for hyper-linking machine-readable data sets to each other using Semantic Web techniques, especially via the use of RDF and URIs. Enables distributed SPAQL queries of the data sets and a “browsing” or “discovery” approach to finding information (as compared to a search strategy. (Source: Linking Enterprise Data, David Wood, Springer, 2010, p. 286)
+</li>
+<li>
+Linked Open Data Cloud: Linked Open Data that has been published is depicted in a LOD cloud diagram. The diagram shows connections between linked data sets and color codes them based on data type (e.g., government, media, life sciences, etc.). The diagram can be viewed at: http://richard.cyganiak.de/2007/10/lod/
+</li>
+<li>
+RDF (Resource Description Framework): A language for representing information about resources in the World Wide Web. RDF is based on the idea of identifying things using Web identifiers (called Uniform Resource Identifiers, or URIs), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values. (http://www.w3.org/TR/rdf-primer/)
+</li>
+<li>
+Semantic Technologies: The broad set of technologies that related to the extraction, representation, storage, retrieval and analysis of machine-readable information. The Semantic Web standards are a subset of semantic technologies and techniques. (Source: Linking Enterprise Data, David Wood, Springer, 2010, p. 286) Semantic Web: An evolution or part of the World Wide Web that consists of machine-readable data in RDF and an ability to query that information in standard ways (e.g. via SPARQL)
+</li>
+<li>
+Semantic Web Standards: Standards of the World Wide Web Consortium (W3C) relating to the Semantic Web, including RDF, RDFa, SKOS and OWL. (Source: Linking Enterprise Data, David Wood, Springer, 2010, p. 287)
+</li>
+<li>
+SPARQL: Simple Protocol and RDF Query Language (SPARQL) defines a standard query language and data access protocol for use with the Resource Description Framework (RDF) data model. (http://msdn.microsoft.com/en-us/library/aa303673.aspx) Just as SQL is used to query relational data, SPARQL is used to query graph, or linked, data.
+</li>
+<li>
+Uniform Resource Identifiers (URIs): URI’s play a key role in enabling linked data. To publish data on the Web, the items in a domain of interest must first be identified. These are the things whose properties and relationships will be described in the data, and may include Web documents as well as real-world entities and abstract concepts. As Linked Data builds directly on Web architecture [67], the Web architecture term resource is used to refer to these things of interest, which are, in turn, identified by HTTP URIs. Wide Web Consortium’s Government Linked Data (W3C/GLD) workgroup: http://www.w3.org/2011/gld/charter
+</li>
+</ul>
 </section>
 
 
@@ -73,6 +178,55 @@
 <p>
 The group will provide advice on how governments should select RDF vocabulary terms (URIs), including advice as to when they should mint their own. This advice will take into account issues of stability, security, and long-term maintenance commitment, as well as other factors that may arise during the group's work.
 </p>
+
+<b>Ghislain</b>
+<p>One of the most challenging task when publishing data set is to have metadata describing the model used to capture it. The model or the ontology gives the semantic of each term used within the data set or in the LOD cloud when published. The importance of selecting the appropriate vocabulary is threefold:</p>
+<ul>
+	<li>Ease the interoperability with existing vocabularies</li>
+	<li>Facilitate integration with other data source from others publishers</li>
+	<li>Speed-up the time of creating new vocabularies, since it is not created from scratch, but based on existing ones.</li>
+</ul>
+<p>
+Publishers should take time to see what is the domain application of their data set: finance, statistics, geograpraphy, weather, administration divisions, organisation, etc. Based on the relevant concepts presented in the Data set, one of these two options could be performed:
+</p>
+<ul>
+	<li>Searching vocabularies using Semantic Web Engines: The five most used SWEs are Swoogle, Watson (Ontology-oriented web engines); SWSE, Sindice (Triples-oriented Web engines); and Falcons (an hybrid-oriented Web engine). One of the difficult task sometimes in the reuse ontology process is to decide which Semantic Search engine to use for obtaining an efficient results in the search of ontologies. There are five well-known and typically used SWSEs in the literature. What are the criteria to choose one Semantic search engine in a particular domain. In the literature, there are no guidelines helping ontology developers to decide between one SWSEs. Guidelines proposed here could potentially help ontology designers in taking such a decision. However , we can divide SW search engines in 3 groups:
+		<ul>
+			<li>Those that are "Ontology-oriented" Web engines such as Swoogle and Watson.</li>
+			<li>The ones "Triple-oriented" Web engines or RDF-oriented like SWSE and Sindice.</li>
+			<li>and finally those which are "Hybrid-oriented" Web engine as the case of Falcons.</li>
+		</ul>
+	Also, a rapid observation while experimenting the use of the abovementioned engines is that there is not a clear separation between ontologies and RDF data coming from blogs and other sources like DBPedia.
+	Using the search engines consist in practice querying them using the set of relevant concepts of the domain (e.g., tourism, point of interest, organization, etc). The output of this exercise is a list of candidate ontologies to be assessed for reusing purpose.
+	</li>
+	<li>Searching vocabularies using the datahub the Data Hub. The datahub (previous CKAN) maintains the list of data sets shared and can be accessed by an API or a full JSON dump. The approach here could be to look for data sets or the similar domain of interest, and analyzed the metadata describing that data to find out the vocabularies reused. Another "data market" place worth mentioning could be Kasabi</li>
+	<li>Searching vocabularies using LOV LOV. The Linked Open Vocabularies (a.k.a LOV) is a set of data expressed in RDF, that inventories vocabularies for describing data sets but also the semantic relations between the vocabularies. Although it is in its preliminary state, it contains more than 100 vocabularies already identified. It came out that there are some vocabularies "commonly" used like SKOS, FOAF, Dublin Core, Geo and Event.</li>
+	<li>Composition of the three methods above-mentioned: It consists of combining the search process making use of the existing searching engines and some data sets catalogue.</li>
+</ul>
+
+@@TODO Assessment and criteria for vocabularies selection
+<br>
+<br>
+<b>Boris</b>
+<p>
+We need to determine the vocabulary to be used for modelling the domain of the government data sources. The most important recommendation in this context is to reuse as much as possible available vocabularies. This reuse-based approach speeds up the vocabulary development, and therefore, governments will save time, effort and resources. This activity consists of the following tasks:
+</p>
+<ul>
+	<li>Search for suitable vocabularies to reuse. Currently there are some useful repositories to find available vocabularies, such as, SchemaCache, Watson, Swoogle, Sindice, and LOV Linked Open Vocabularies.</li>
+	<li>In case that we did not find any vocabulary that is suitable for our purposes, we should create them, trying to reuse as much as possible existing resources, e.g., government catalogues, vocabularies available at sites like [1], etc.</li>
+	<li>Finally, if we did not find available vocabularies nor resources for building the vocabulary, we have to create the vocabulary from scratch.</li>
+</ul>
+<p>The following Figure shows the proposed workflow for creating the vocabulary</p>
+<img src="img/vocabularycreation.png" border="0">
+<p>The open questions are:</p>
+<ul>
+	<li>What is the best repository for vocabulary?</li>
+	<li>What is the criteria for using a given vocabulary?</li>
+	<li>Number of LD datasets using it?</li>
+	<li>We reuse the vocabulary by reusing directly its terms? or importing the whole vocabulary?</li>
+	<li>...</li>
+</ul>
+
 </section>

--- a/bp/respec-config.js	Thu Sep 01 00:09:04 2011 +0100
+++ b/bp/respec-config.js	Fri Jan 20 15:36:24 2012 +0100
@@ -34,7 +34,7 @@
     editors:  [
         { name: "Michael Hausenblas", url: "http://sw-app.org/mic.xhtml#i", company: "DERI", companyURL: "http://www.deri.ie" },
 		{ name: "Bernadette Hyland", url: "https://twitter.com/bernhyland",  company: "3 Round Stones", companyURL: "http://3roundstones.com/"},
-		{ name: "Boris Villaz&oacute;n-Terrazas", url: "",  company: "OEG-UPM", companyURL: "http://www.oeg-upm.net"}
+		{ name: "Boris Villaz&oacute;n-Terrazas", url: "http://boris.villazon.terrazas.name",  company: "OEG-UPM", companyURL: "http://www.oeg-upm.net"}
     ],
 
     // authors, add as many as you like.

author	Boris Villazon-Terrazas <bvillazon@fi.upm.es>
	Fri, 20 Jan 2012 15:36:24 +0100
changeset 5	e51fe9848e2b
parent 4	bcb72f87b5cc
child 6	fa2d46f63a2b

bp/img/vocabularycreation.PNG
bp/index.html
bp/respec-config.js