Include first version of Vocab selection section
authorBoris Villazon-Terrazas <bvillazon@fi.upm.es>
Thu, 23 Feb 2012 15:51:38 +0100
changeset 107 ec86f8011e36
parent 106 9e0e2df8f5f7
child 108 11e7dc631232
Include first version of Vocab selection section
bp/index.html
bp/respec-config.js
--- a/bp/index.html	Thu Feb 23 00:53:48 2012 +0100
+++ b/bp/index.html	Thu Feb 23 15:51:38 2012 +0100
@@ -222,103 +222,79 @@
 </section>
 
 
-<!--    VOCABULARY SELECTION   -->
+<!--    << VOCABULARY SELECTION   -->
 <section>
 <h3>Vocabulary Selection -  	Boris</h3>
 <p class='responsible'>Michael Hausenblas (DERI), Ghislain Atemezing (INSTITUT TELECOM), Boris Villazon-Terrazas (UPM),  Daniel Vila-Suero (UPM), George Thomas (Health & Human Services, US), John Erickson (RPI), Biplav Srivastava (IBM)</p>
 <p>
-The group will provide advice on how governments should select RDF vocabulary terms (URIs), including advice as to when they should mint their own. This advice will take into account issues of stability, security, and long-term maintenance commitment, as well as other factors that may arise during the group's work.
+Modeling is an important phase in any Government Linked Data life cycle. Within this phase Governments need to build a vocabulary that models the data sources they want to publish as Linked Data. The most important recommendation in this context is to reuse as much as possible available vocabularies. This reuse-based approach speeds up the vocabulary development, and therefore, governments will save time, effort and resources. However, the reuse-based approach leads to two main questions (1) where/how do I find/discover available vocabularies, and (2) how do I select a vocabulary that best fits my needs?. Moreover, we have to consider that there may be cases in which Governments will need to mint their own vocabulary terms, these cases lead to another question (3) how to mint my own vocabulary terms?. In this section we provide answers to those questions, by means of checklists for each question.
 </p>
 
-<h4>Discovery</h4>
-<p>First phase is discovery, that is, the process of finding a vocabulary that can represent entities of a domain, for example, organisations or people.</p>
-<p>Next we present the discovery checklist that provides some steps to take into account when trying to find out existing vocabularies that could best fit the needs of a Government or a specialized agency. The reason is to avoid as much as possible building from scratch a new vocabulary and to reuse as much as possible existing *good* vocabularies of the domain.</p>
-<ul>
-	<li>
-		<b>What is your domain of interest?</b> By this answer, you define and restrict the scope of your domain to quickly find out related works in LOD in your domain. 
-Example: Geography, Environment, Administrations, State Services, Statistics, etc.
-	</li>
-	<li>
-		<b>What are the keywords in your dataset?</b>By identifying the relevant keywords or categories of your dataset, it helps for the searching process using Semantic Web Search Engine.
-If you have raw data in csv, the columns of the tables can be used for the searching process.
-Example: commune, county, point, feature, address, etc.
-	</li>
-	<li>
-		<b>Are you looking for a vocabulary in one specific language?</b>Many of the vocabularies are available in english. You may be aware of having a vocabulary in your own language. Consider this issue
-as it may restrict your search. Sometimes it might be useful to translate some of the keywords to english.
-	</li>
-	<li>
-		<b>How could you find vocabularies?</b>There are some specific search tools (Falcons, Watson, Sindice, Semantic Web Search Engine, Swoogle) that collect, analyse and indexe
- vocabularies and semantic data available online for efficient access.
-	</li>
-	<li>
-		<b>Where could you find related vocabularies in datasets catalogues?</b>Another way around is to perform search using the previously identified key terms in datasets catalogues. The latter provide some samples
- of how the underlying data was modelled and how it was used for.
- Some existing catalogues are: Data Hub (previously CKAN), LOV directory, Kasabi, etc.
-	</li>
-</ul>
+<section> <!-- Discovery checklist -->
+<h4>Discovery checklist</h4>
+<p>As we already stated, following the reuse-based approach, governments have to look for available vocabularies to reuse, instead of building new vocabularies from scratch. This checklist provides some considerations when trying to find out existing vocabularies that could best fit the needs of a Government or a specialized agency.
+</p>
 
-<h4>Selection</h4>
-<p>This checklist aims at giving some advices to better assess and select the best vocabulary, according to the output of the vocabularies discovered in the *Discovery* section. The final result should be one or two vocabularies that could be reused for your own purpose (mappings, extension, etc..)</p>
-<ul>
-	<li>
-		<b>Are they good use of rdfs:label and rdfs:comment?</b>The vocabulary should be self-descriptive. Each Class and Property should have a label and comments associated.
-	</li>
-	<li>
-		<b>Is the vocabulary available in more than one language?</b>Multilingualism should be supported by the vocabulary at least for different lanquage different to English. That is also very important 
-as the documentation should be clear enough with appropriate tag for the language used for the comments or the labels.
-	</li>
-	<li>
-		<b>Who is using the vocabulary?</b>It is always better to check how the vocabulary is used by others initiatives around and  its popularity. 
-	</li>
-	<li>
-		<b>How is the vocabulary maintained?</b>The vocabulary selected should have a guarantee of maintenance in a long term, or at least the editors should be aware of that issue.
-It also include here checking the permanence of the URIs, and how is the policy of vocabulary versioning.
-	</li>
-	<li>
-		<b>Who is the publisher of the vocabulary?</b>Although anyone can create a vocabulary, it is always better to check if it is one person or a group or organization which is
-responsible for publishing and maintaining the vocabulary.
-It is recommended to better trust a well-known organization than a single person.
-	</li>
-	<li>
-		<b>How permanent are the URIs?</b>It refers here to not have a 404 http error when trying to access at any *thing* of the vocabulary.
-Also it refers to the permanent access to the server hosting the vocabulary.
-	</li>
-	<li>
-		<b>What policies are applied to control the changes?</b>It refers to the mechanism put in place by the publisher to always take care of backward compatibilities of the versions, the ways those changes affected the previous versions.
-	</li>
-	<li>
-		<b>Is the documentation available?</b>A vocabulary should be well-documented for machine readable (use of labels and comments; tags to language used), and 
-also for human-readable, that is an extra documentation should be provided by the publisher to better understands 
-the classes and properties, and if possible with some valuable use cases.
-	</li>
-</ul>
-<!--
-<h5>Key terms</h5>
-<p>Identify key domain terms</p>
+<section>
+	<h5>Define the scope of the domain</h5>
+<p>
+<i>What it means:</i> Developing a common understanding as to what is included in, or excluded from, in the domain. By defining the scope of the domain, it restricts and helps to quickly find out related works in Linked Open Data initiatives. Hence, it could help in reusing some existing vocabularies of the same domain. Most of the time, the dataset gives you some hints about the domain.
+</p>
+<p>
+Examples of domain: Geography, Environment, Administrations, State Services, Statistics, People, Organisation, etc.	
+</p>
+</section>
 
-<h5>Using Repositories</h5>
-<p>Search for key terms in vocabularies repositories</p>
-
-<h5>Using Catalogs</h5>
-<p>Search for datasets in data catalogues (CKAN, etc.) within the same domain. How do they model their data?</p>
-
-
-<h4>Vocabulary selection criteria checklist</h4>
-
-<p class='issue'>What other sections from the BP and/or other GLD documents can we re-use? See <a href="https://www.w3.org/2011/gld/track/issues/16">ISSUE-16</a></p>
+<section>
+	<h5>Identify relevant keywords in the dataset</h5>
+<p>
+<i>What it means:</i> Identifying words that describe the main ideas or concepts. By identifying the relevant keywords or categories of your dataset, it helps for the searching process using Semantic Web Search Engine. If you have raw data in csv, the columns of the tables can be used for the searching process.
+</p>
+<p>
+Examples: commune, county, point, feature, address, etc.
+</p>
+</section>
 
-<ul>
-<li>Labels and comments</li>
-<li>Multilingualism</li>
-<li>Usage</li>
-<li>Maintenance</li>
-<li>Permanent URIs</li>
-<li>Change control</li>
-<li>Publisher</li>
-<li>Documentation</li>
-</ul>
--->
+<section>
+	<h5>Searching for a vocabulary in one specific language</h5>
+<p>
+<i>What it means:</i>Many of the available vocabularies are in English. You may be aware of having a vocabulary in your own language.
+Consider this issue as it may restrict your search. Sometimes it might be useful to translate some of the keywords to English.
+</p>
+</section>
+
+<section>
+	<h5>How to find vocabularies</h5>
+<p>
+<i>What it means:</i>There are some specific search tools (<a href="http://ws.nju.edu.cn/falcons/" target="_blank">Falcons</a>, <a href="http://watson.kmi.open.ac.uk/WatsonWUI/" target="_blank">Watson</a>, <a href="http://sindice.com/" target="_blank">Sindice</a>, <a href="http://swse.deri.org/" target="_blank">Semantic Web Search Engine</a>, <a href="http://swoogle.umbc.edu/" target="_blank">Swoogle</a>) that collect, analyse and index vocabularies and semantic data available online for efficient access.
+</p>
+<p>
+	Examples: It is possible to perform a search on a relevant term or category present in your data.
+</p>
+</section>
+
+<section>
+	<h5>Where to find existing vocabularies in datasets catalogues</h5>
+<p>
+<i>What it means:</i>Another way around is to perform search using the previously identified key terms in datasets catalogues. Some of these catalogues provide samples of how the underlying data was modelled and how it was used for.
+</p>
+<p>
+	Some existing catalogues are: <a href="http://thedatahub.org/" target="_blank">Data Hub</a> (former CKAN), <a href="http://labs.mondeca.com/dataset/lov/" target="_blank">LOV</a> directory, etc.
+</p>
+</section>
+</section> <!-- Discovery checklist >> -->
+
+<section> <!-- << Vocabulary Selection Criteria checklist -->
+<h4>Vocabulary Selection Criteria checklist</h4>
+<p>This checklist aims at giving some advices to better assess and select the vocabulary that best fits your needs, according to the output of the vocabularies discovered in the Discovery section. The final result should be one or two vocabularies that could be reused for your own purpose (mappings, extension, etc..)
+</p>
+</section> <!--  Vocabulary Selection Criteria checklist >> -->
+
+<section> <!-- << Vocabulary management/creation -->
+<h4>Vocabulary management/creation</h4>
+<p>As we already mentioned, we have to take into account that there may be cases in which Governments will need to mint their own vocabulary terms. This section provides a set of considerations aimed at helping to government stakeholders to mint their own vocabulary terms. This section includes some items of the previous section because some recommendations for vocabulary selection also apply to vocabulary creation.
+</p>
+</section> <!-- Vocabulary management/creation >> -->
 
 <!-- Editorial notes for creators/maintainers:
 
@@ -333,56 +309,8 @@
 Cross-cutting issues: "Hit-by-bus" -->
 
 
-
-<!-- <b>Ghislain</b>
-<p>One of the most challenging task when publishing data set is to have metadata describing the model used to capture it. The model or the ontology gives the semantic of each term used within the data set or in the LOD cloud when published. The importance of selecting the appropriate vocabulary is threefold:</p>
-<ul>
-	<li>Ease the interoperability with existing vocabularies</li>
-	<li>Facilitate integration with other data source from others publishers</li>
-	<li>Speed-up the time of creating new vocabularies, since it is not created from scratch, but based on existing ones.</li>
-</ul>
-<p>
-Publishers should take time to see what is the domain application of their data set: finance, statistics, geograpraphy, weather, administration divisions, organisation, etc. Based on the relevant concepts presented in the Data set, one of these two options could be performed:
-</p>
-<ul>
-	<li>Searching vocabularies using Semantic Web Engines: The five most used SWEs are Swoogle, Watson (Ontology-oriented web engines); SWSE, Sindice (Triples-oriented Web engines); and Falcons (an hybrid-oriented Web engine). One of the difficult task sometimes in the reuse ontology process is to decide which Semantic Search engine to use for obtaining an efficient results in the search of ontologies. There are five well-known and typically used SWSEs in the literature. What are the criteria to choose one Semantic search engine in a particular domain. In the literature, there are no guidelines helping ontology developers to decide between one SWSEs. Guidelines proposed here could potentially help ontology designers in taking such a decision. However , we can divide SW search engines in 3 groups:
-		<ul>
-			<li>Those that are "Ontology-oriented" Web engines such as Swoogle and Watson.</li>
-			<li>The ones "Triple-oriented" Web engines or RDF-oriented like SWSE and Sindice.</li>
-			<li>and finally those which are "Hybrid-oriented" Web engine as the case of Falcons.</li>
-		</ul>
-	Also, a rapid observation while experimenting the use of the abovementioned engines is that there is not a clear separation between ontologies and RDF data coming from blogs and other sources like DBPedia.
-	Using the search engines consist in practice querying them using the set of relevant concepts of the domain (e.g., tourism, point of interest, organization, etc). The output of this exercise is a list of candidate ontologies to be assessed for reusing purpose.
-	</li>
-	<li>Searching vocabularies using the datahub the Data Hub. The datahub (previous CKAN) maintains the list of data sets shared and can be accessed by an API or a full JSON dump. The approach here could be to look for data sets or the similar domain of interest, and analyzed the metadata describing that data to find out the vocabularies reused. Another "data market" place worth mentioning could be Kasabi</li>
-	<li>Searching vocabularies using LOV LOV. The Linked Open Vocabularies (a.k.a LOV) is a set of data expressed in RDF, that inventories vocabularies for describing data sets but also the semantic relations between the vocabularies. Although it is in its preliminary state, it contains more than 100 vocabularies already identified. It came out that there are some vocabularies "commonly" used like SKOS, FOAF, Dublin Core, Geo and Event.</li>
-	<li>Composition of the three methods above-mentioned: It consists of combining the search process making use of the existing searching engines and some data sets catalogue.</li>
-</ul>
-
-@@TODO Assessment and criteria for vocabularies selection
-<br>
-<br>
-<b>Boris</b>
-<p>
-We need to determine the vocabulary to be used for modelling the domain of the government data sources. The most important recommendation in this context is to reuse as much as possible available vocabularies. This reuse-based approach speeds up the vocabulary development, and therefore, governments will save time, effort and resources. This activity consists of the following tasks:
-</p>
-<ul>
-	<li>Search for suitable vocabularies to reuse. Currently there are some useful repositories to find available vocabularies, such as, SchemaCache, Watson, Swoogle, Sindice, and LOV Linked Open Vocabularies.</li>
-	<li>In case that we did not find any vocabulary that is suitable for our purposes, we should create them, trying to reuse as much as possible existing resources, e.g., government catalogues, vocabularies available at sites like [1], etc.</li>
-	<li>Finally, if we did not find available vocabularies nor resources for building the vocabulary, we have to create the vocabulary from scratch.</li>
-</ul>
-<p>The following Figure shows the proposed workflow for creating the vocabulary</p>
-<img src="img/vocabularycreation.PNG" border="0">
-<p>The open questions are:</p>
-<ul>
-	<li>What is the best repository for vocabulary?</li>
-	<li>What is the criteria for using a given vocabulary?</li>
-	<li>Number of LD datasets using it?</li>
-	<li>We reuse the vocabulary by reusing directly its terms? or importing the whole vocabulary?</li>
-	<li>...</li>
-</ul> -->
-
 </section>
+<!--  VOCABULARY SELECTION >>  -->
 
 
 <!-- << URI CONSTRUCTION   -->
@@ -400,8 +328,8 @@
 </p>
 
 <section> 
-<h4>Definitions and General Principles</h4>
-<p>The Web makes use of the URI (Uniform Resource Identifiers) as a single global identification system. The global scope of URIs promotes large-scale "network effects". There are many benefits to participating in the existing network of URIs, including linking, caching, and indexing by search engines. This section aims at providing recommendations on how to create good URIs for use in government linked data:</p>
+<h4>Design principles</h4>
+<p>The Web makes use of the URI (Uniform Resource Identifiers) as a single global identification system. The global scope of URIs promotes large-scale "network effects", in order to benefit from the value of Linked Data government and governmental agencies need to identify their resources using URIs. This section provides a set of general principles aimed at helping to government stakeholders to define and manage URIs for their resources.</p>
 <p class="highlight"><b>Identify with URIs</b><br>
 To benefit from and increase the value of the World Wide Web, data publishers SHOULD provide URIs as identifiers for their resources.
 </p>
@@ -412,11 +340,11 @@
 </section>
 
 <section> 
-<h4>Representation of Resources<h4>
+<h4>URI Persistence<h4>
 </section>
 
 <section> 
-<h4>Working Notes<h4>
+<h4>Internationalized Resource Identifiers: Using non-ASCII characters in URIs<h4>
 </section>
 
 @@ TODO: include references
--- a/bp/respec-config.js	Thu Feb 23 00:53:48 2012 +0100
+++ b/bp/respec-config.js	Thu Feb 23 15:51:38 2012 +0100
@@ -1,7 +1,7 @@
 var respecConfig = {
     // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED.
     specStatus:           "ED",
-    publishDate:          "2012-02-16",
+    publishDate:          "2012-02-23",
     //copyrightStart:       "2010",
 
     // the specification's short name, as in http://www.w3.org/TR/short-name/