gld: changeset 115:28b2ee4a6f27

Binary file bp/img/GLF_Hausenblas.PNG has changed

Binary file bp/img/GLF_Hyland.PNG has changed

Binary file bp/img/GLF_Villazon-terrazas.PNG has changed

--- a/bp/index.html	Fri Mar 09 09:39:11 2012 +0000
+++ b/bp/index.html	Fri Mar 09 09:40:22 2012 +0000
@@ -85,6 +85,21 @@
 <p class='issue'>Does it make sense to base the GLD life cycle on one of the general LD life cycles? See <a href="https://www.w3.org/2011/gld/track/issues/15">ISSUE-15</a></p>
 </section>
 
+<p class='issue'>Michael suggests to include the three available GLD life cycles we have</p>
+
+<p>Currently we have identified the following Government Linked Data Life Cycles
+</p>
+<p class="todo"> Include a brief description for each one of them.
+</p>
+<p>Hyland et al.</p>
+<div id="centerImg">
+</div>
+<img src="img/GLF_Hyland.PNG"  width="550"/>
+<p>Hausenblas et al.</p>
+<img src="img/GLF_Hausenblas.PNG" width="600"/>
+<p>Villazon-terrazas</p>
+<img src="img/GLF_Villazon-terrazas.PNG" width="600" />
+
 <section>
 <h3>Brief History of Open Government Linked Data - Bernadette</h3>
 
@@ -231,62 +246,87 @@
 </p>
 
 <section> <!-- Discovery checklist -->
-<h4>Discovery checklist</h4>
-<p>As we already stated, following the reuse-based approach, governments have to look for available vocabularies to reuse, instead of building new vocabularies from scratch. This checklist provides some considerations when trying to find out existing vocabularies that could best fit the needs of a Government or a specialized agency.
+	<h4>Discovery checklist</h4>
+	<p>As we already stated, following the reuse-based approach, governments have to look for available vocabularies to reuse, instead of building new vocabularies from scratch. This checklist provides some considerations when trying to find out existing vocabularies that could best fit the needs of a Government or a specialized agency.
+	</p>
+	
+<p class="highlight"><b>Define the scope of the domain</b><br/>
+<i>What it means:</i> Developing a common understanding as to what is included in, or excluded from, in the domain. By defining the scope of the domain, it restricts and helps to quickly find out related works in Linked Open Data initiatives. Hence, it could help in reusing some existing vocabularies of the same domain. Most of the time, the dataset gives you some hints about the domain. <br/><br/>
+Examples of domain: Geography, Environment, Administrations, State Services, Statistics, People, Organisation, etc.	
+	</p>
+
+<p class="highlight"><b>Identify relevant keywords in the dataset</b><br/>
+	<i>What it means:</i> Identifying words that describe the main ideas or concepts. By identifying the relevant keywords or categories of your dataset, it helps for the searching process using Semantic Web Search Engine. If you have raw data in csv, the columns of the tables can be used for the searching process. <br/><br/>
+	Examples: commune, county, point, feature, address, etc.	
 </p>
 
-<section>
-	<h5>Define the scope of the domain</h5>
-<p>
-<i>What it means:</i> Developing a common understanding as to what is included in, or excluded from, in the domain. By defining the scope of the domain, it restricts and helps to quickly find out related works in Linked Open Data initiatives. Hence, it could help in reusing some existing vocabularies of the same domain. Most of the time, the dataset gives you some hints about the domain.
-</p>
-<p>
-Examples of domain: Geography, Environment, Administrations, State Services, Statistics, People, Organisation, etc.	
-</p>
-</section>
-
-<section>
-	<h5>Identify relevant keywords in the dataset</h5>
-<p>
-<i>What it means:</i> Identifying words that describe the main ideas or concepts. By identifying the relevant keywords or categories of your dataset, it helps for the searching process using Semantic Web Search Engine. If you have raw data in csv, the columns of the tables can be used for the searching process.
+<p class="highlight"><b>Searching for a vocabulary in one specific language</b><br/>
+	<i>What it means:</i>Many of the available vocabularies are in English. You may be aware of having a vocabulary in your own language.
+	Consider this issue as it may restrict your search. Sometimes it might be useful to translate some of the keywords to English. 
 </p>
-<p>
-Examples: commune, county, point, feature, address, etc.
-</p>
-</section>
 
-<section>
-	<h5>Searching for a vocabulary in one specific language</h5>
-<p>
-<i>What it means:</i>Many of the available vocabularies are in English. You may be aware of having a vocabulary in your own language.
-Consider this issue as it may restrict your search. Sometimes it might be useful to translate some of the keywords to English.
-</p>
-</section>
-
-<section>
-	<h5>How to find vocabularies</h5>
-<p>
-<i>What it means:</i>There are some specific search tools (<a href="http://ws.nju.edu.cn/falcons/" target="_blank">Falcons</a>, <a href="http://watson.kmi.open.ac.uk/WatsonWUI/" target="_blank">Watson</a>, <a href="http://sindice.com/" target="_blank">Sindice</a>, <a href="http://swse.deri.org/" target="_blank">Semantic Web Search Engine</a>, <a href="http://swoogle.umbc.edu/" target="_blank">Swoogle</a>) that collect, analyse and index vocabularies and semantic data available online for efficient access.
-</p>
-<p>
+<p class="highlight"><b>How to find vocabularies</b><br/>
+	<i>What it means:</i>There are some specific search tools (<a href="http://ws.nju.edu.cn/falcons/" target="_blank">Falcons</a>, <a href="http://watson.kmi.open.ac.uk/WatsonWUI/" target="_blank">Watson</a>, <a href="http://sindice.com/" target="_blank">Sindice</a>, <a href="http://swse.deri.org/" target="_blank">Semantic Web Search Engine</a>, <a href="http://swoogle.umbc.edu/" target="_blank">Swoogle</a>, <a href="http://schemapedia.com/" target="_blank">Schemapedia</a>) that collect, analyse and index vocabularies and semantic data available online for efficient access.<br/><br/>
 	Examples: It is possible to perform a search on a relevant term or category present in your data.
 </p>
-</section>
 
-<section>
-	<h5>Where to find existing vocabularies in datasets catalogues</h5>
-<p>
-<i>What it means:</i>Another way around is to perform search using the previously identified key terms in datasets catalogues. Some of these catalogues provide samples of how the underlying data was modelled and how it was used for.
-</p>
-<p>
+<p class="highlight"><b>Where to find existing vocabularies in datasets catalogues</b><br/>
+	<i>What it means:</i>Another way around is to perform search using the previously identified key terms in datasets catalogues. Some of these catalogues provide samples of how the underlying data was modelled and how it was used for.<br/><br/>
 	Some existing catalogues are: <a href="http://thedatahub.org/" target="_blank">Data Hub</a> (former CKAN), <a href="http://labs.mondeca.com/dataset/lov/" target="_blank">LOV</a> directory, etc.
 </p>
-</section>
+
 </section> <!-- Discovery checklist >> -->
 
 <section> <!-- << Vocabulary Selection Criteria checklist -->
 <h4>Vocabulary Selection Criteria checklist</h4>
-<p>This checklist aims at giving some advices to better assess and select the vocabulary that best fits your needs, according to the output of the vocabularies discovered in the Discovery section. The final result should be one or two vocabularies that could be reused for your own purpose (mappings, extension, etc..)
+<p>This checklist aims at giving some advices to better assess and select the vocabulary that best fits your needs, according to the output of the vocabularies discovered in the Discovery section. The final result should be one or two vocabularies that could be reused for your own purpose (mappings, extension, etc.)
+</p>
+
+<p class="highlight"><b>Vocabularies should be self-descriptive</b><br/>
+	<i>What it means:</i> Each property or term in a vocabulary should have a Label, Definition and Comment defined.
+	Self-describing data suggests that information about the encodings used for each representation is provided explicitly within the representation. The ability for Linked Data to describe itself, to place itself in context, contributes to the usefulness of the underlying data.<br/><br/>
+For example, popular vocabulary called DCMI Metadata Terms has a Term Name <a href="http://dublincore.org/documents/dcmi-terms/#terms-contributor" target="_blank">Contributor</a> which has a:</br>
+	  Label: Contributor<br/>
+	  Definition: An entity responsible for making contributions to the resource<br/>
+	  Comment: Examples of a Contributor include a person, an organization, or a service.<br/>
+</p>
+
+<p class="highlight"><b>Vocabularies should be described in more than one language</b><br/>
+	<i>What it means:</i> EMultilingualism should be supported by the vocabulary, i.e., all the elements of the vocabulary should have labels, definitions and comments available in the government's official language, e.g., Spanish, and at least in English.
+	That is also very important as the documentation should be clear enough with appropriate tag for the language used for the comments or labels.<br/><br/>
+For example, for the same term <a href="http://dublincore.org/documents/dcmi-terms/#terms-contributor" target="_blank">Contributor</a></br>
+	  rdfs:label "Contributor"@en, "Colaborador"@es<br/>
+	  rdfs:comment "Examples of a Contributor include a person, an organization, or a service"@en , "Ejemplos de collaborator incluyen persona, organización o servicio"@es<br/>
+</p>
+
+<p class="highlight"><b>Vocabulary reusability</b><br/>
+	<i>What it means:</i> It is always better to check how the vocabulary is used by others initiatives around and its popularity.<br/><br/>
+For example: The recent <a href="http://stats.lod2.eu/vocabularies" target="_blank">statistics</a> of the use of vocabularies in the cloud reveals that <a href="http://xmlns.com/foaf/0.1" target="_blank">foaf</a> is reused by more than 55 other vocabularies.
+</p>
+
+<p class="highlight"><b>Vocabularies should be accessible for a long period</b><br/>
+	<i>What it means:</i> The vocabulary selected should have a guarantee of maintenance in a long term, or at least the editors should be aware of that issue.
+	It also include here checking the permanence of the URIs, and how is the policy of vocabulary versioning. This is strongly related to the best practices described in the Stability section.
+</p>
+
+<p class="highlight"><b>Vocabularies should be published by a trusted group or organization</b><br/>
+	<i>What it means:</i> Although anyone can create a vocabulary, it is always better to check if it is one person, group or organization that is responsible for publishing and maintaining the vocabulary.
+	It is recommended to better trust a well-known organization than a single person.
+</p>
+
+<p class="highlight"><b>Vocabularies should have permanent URIs</b><br/>
+	<i>What it means:</i> It refers here to not have a 404 http error when trying to access at any *thing* of the vocabulary. It also refers to the permanent access to the server hosting the vocabulary, facilitating reusability and consumption of the data build upon them.<br/><br/>
+	Example: The <a href="http://www.w3.org/2003/01/geo/wgs84_pos#/" target="_blank">Geo W3C vocabulary</a> is one of the most used vocabulary for basic representation of geometry points (latitute/longitude) and has been around since 2009, always available at the same namespace. This is strongly related to the best practices described in the Stability section.	
+</p>
+
+<p class="highlight"><b>Vocabularies should provide a versioning policy</b><br/>
+	<i>What it means:</i> It refers to the mechanism put in place by the publisher to always take care of backward compatibilities of the versions, the ways those changes affected the previous versions.
+	Major changes of the vocabularies should be reflected on the documentation, in both machine or human-readable formats. This is strongly related to the best practices described in the Versioning section.	
+</p>
+
+<p class="highlight"><b>Vocabularies should provide documentations</b><br/>
+	<i>What it means:</i> A vocabulary should be well-documented for machine readable (use of labels and comments; tags to language used).
+	Also for human-readable, an extra documentation should be provided by the publisher to better understand the classes and properties, and if possible with some valuable use cases.	
 </p>
 </section> <!--  Vocabulary Selection Criteria checklist >> -->
 
@@ -294,8 +334,61 @@
 <h4>Vocabulary management/creation</h4>
 <p>As we already mentioned, we have to take into account that there may be cases in which Governments will need to mint their own vocabulary terms. This section provides a set of considerations aimed at helping to government stakeholders to mint their own vocabulary terms. This section includes some items of the previous section because some recommendations for vocabulary selection also apply to vocabulary creation.
 </p>
+
+<p class="highlight"><b>Define the URI of the vocabulary.</b><br/>
+	<i>What it means:</i> The URI that identifies your vocabulary must be defined. This is strongly related to the Best Practices described in section URI Construction.<br/><br/>
+	For example: If we are minting new vocabulary terms from a particular government, we should define the URI of that particular vocabulary.	
+</p>
+
+<p class="highlight"><b>Vocabularies should be self-descriptive</b><br/>
+	<i>What it means:</i> Each property or term in a vocabulary should have a Label, Definition and Comment defined.
+	Self-describing data suggests that information about the encodings used for each representation is provided explicitly within the representation. The ability for Linked Data to describe itself, to place itself in context, contributes to the usefulness of the underlying data.<br/><br/>
+For example, popular vocabulary called DCMI Metadata Terms has a Term Name <a href="http://dublincore.org/documents/dcmi-terms/#terms-contributor" target="_blank">Contributor</a> which has a:</br>
+	  Label: Contributor<br/>
+	  Definition: An entity responsible for making contributions to the resource<br/>
+	  Comment: Examples of a Contributor include a person, an organization, or a service.<br/>
+</p>
+
+<p class="highlight"><b>Vocabularies should be described in more than one language</b><br/>
+	<i>What it means:</i> EMultilingualism should be supported by the vocabulary, i.e., all the elements of the vocabulary should have labels, definitions and comments available in the government's official language, e.g., Spanish, and at least in English.
+	That is also very important as the documentation should be clear enough with appropriate tag for the language used for the comments or labels.<br/><br/>
+For example, for the same term <a href="http://dublincore.org/documents/dcmi-terms/#terms-contributor" target="_blank">Contributor</a></br>
+	  rdfs:label "Contributor"@en, "Colaborador"@es<br/>
+	  rdfs:comment "Examples of a Contributor include a person, an organization, or a service"@en , "Ejemplos de collaborator incluyen persona, organización o servicio"@es<br/>
+</p>
+
+<p class="highlight"><b>Vocabularies should provide a versioning policy</b><br/>
+	<i>What it means:</i> It refers to the mechanism put in place by the publisher to always take care of backward compatibilities of the versions, the ways those changes affected the previous versions.
+	Major changes of the vocabularies should be reflected on the documentation, in both machine or human-readable formats. This is strongly related to the best practices described in the Versioning section.	
+</p>
+
+<p class="highlight"><b>Vocabularies should provide documentations</b><br/>
+	<i>What it means:</i> A vocabulary should be well-documented for machine readable (use of labels and comments; tags to language used).
+	Also for human-readable, an extra documentation should be provided by the publisher to better understand the classes and properties, and if possible with some valuable use cases.	
+</p>
+
+<p class="highlight"><b>Vocabulary should be published following available best practices</b><br/>
+	<i>What it means:</i> One of the goals is to contribute to the community by sharing the new vocabulary. To this end, it is recommended to follow available recipes for publishing RDF vocabularies, e.g., <a href="http://www.w3.org/TR/swbp-vocab-pub/" target="_blank">Best Practice Recipes for Publishing RDF Vocabularies</a>.	
+</p>
 </section> <!-- Vocabulary management/creation >> -->
 
+<section> <!-- << Multilingualism in vocabs -->
+	<h4>Multilingualism in vocabs</h4>
+<p>
+This section provides some considerations when we are dealing with multilingualism in vocabularies. We have identified that multilingualism in vocabularies can be found nowadays in the following formats:
+</p>
+<ul>
+	<li>As a set of rdfs:label in which the language has been restricted (@en, @fr...). Currently, this is the most commonly used approach. It is also a best practice to always include an rdfs:label for which the language tag in not indicated. This term corresponds to the "default" language of the vocabulary</li>
+	<li>As skos:prefLabel (or skosxl:Label), in which the language has also been restricted.</li>
+	<li>As a set of monolingual ontologies (ontologies in which labels are expressed in one natural language) in the same domain mapped or aligned to each other (see the example of EuroWordNet, in which wordnets in different natural languages are mapped to each other through the so-called ILI - inter-lingual-index-, which consists of a set of concepts common to all categorizations).</li>
+	<li>As a set of ontology + lexicon. This represent the latest trend in the representation of linguistic (multilingual) information associated to ontologies. The idea is that the ontology is associated to an external ontology of linguistic descriptions. One of the best exponents in this case is the lemon model <a href="http://tia2011.crim.fr/Workshop-Proceedings/pdf/TIAW15.pdf" target="_blank">REF1</a>, <a href="http://lexinfo.net/" target="_blank">REF2</a>, an ontology of linguistic descriptions that is to be related with the concepts and properties in an ontology to provide lexical, terminological, morphosintactic, etc., information. One of the main advantages of this approach is that semantics and linguistic information are kept separated. One can link several lemon models in different natural languages to the same ontology.</li>
+</ul>
+The current trend is to follow the first approach, i.e., to use rdfs:label and rdfs:comment for each term in the vocabulary.
+	
+</section> <!-- Multilingualism in vocabs >> -->
+
+
+
 <!-- Editorial notes for creators/maintainers:
 
 Creation Namespace management
@@ -308,7 +401,7 @@
 Partial or full deprecation
 Cross-cutting issues: "Hit-by-bus" -->
 
-
+@@TO DO@@ Add references
 </section>
 <!--  VOCABULARY SELECTION >>  -->
 
@@ -696,12 +789,99 @@
 <!--    Pragmatic Provenance  -->
 <!-- Note to Editors: This section is not part of our charter and probably will be folded into another section.  Yet to be determined. -->
 
-<section>
-<h3>TBD - Pragmatic Provenance - Boris</h3>
+<section><!-- << Pragmatic Provenance -->
+<h3>Pragmatic Provenance - Boris</h3>
 <p class='responsible'>John Erickson (RPI)</p>
-<p class="todo">Integrate Wiki <a href="http://www.w3.org/2011/gld/wiki/228_Best_Practices_Pragmatic_Provenance">content</a>. 
+
+<p>Provide best practice recommendations for stakeholders on documenting the provenance of their linked government data and how to interpret that data so that consumers know what they are looking at.</p>
+
+<section><h4>Background</h4>
+<p>In 1997 Tim Berners-Lee called for pervasive provenance on the Web:</p>
+<p class="highlight">
+<i>At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons.</i>
+</p>
+<p>W3C GLD therefore seeks to recommend practices that enable government providers to create the metadata necessary to answer their users' "oh yeah?" questions about the linked data they publish. Our recommendations may include processes as well as the application of specific vocabularies/ontologies.
+</p>
+
+<section><h3>What do we mean by "Provenance?"</h3>
+<p>The W3C's Provenance Incubator Group (2010) <a href="http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#What_is_provenance" target="_blank">provides</a> this simple definition of provenance:</p>
+<p class="highlight">Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource. Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility. Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.</p>
+<p>More recently the W3C Provenance WG (PROV-WG) defines "provenance" for their work:</p>
+<p class="highlight"><i>The <b>provenance</b> of digital objects represents their origins. The PROV Data Model (<a href="http://www.w3.org/TR/2012/WD-prov-primer-20120110/#bib-PROV-DM" target="_blank">PROV-DM</a>) is a proposed standard to represent provenance records, which contain assertions about the entities and activities involved in producing and delivering or otherwise influencing a given object. By knowing the provenance of an object, we can make determinations about how to use it. Provenance records can be used for many purposes, such as understanding how data was collected so it can be meaningfully used, determining ownership and rights over an object, making judgments about information to determine whether to trust it, verifying that the process and steps used to obtain a result complies with given requirements, and reproducing how something it was generated...As a standard for provenance, PROV-DM accommodates all those different uses of provenance. Different people may have different perspectives on provenance, and as a result different types of information might be captured in provenance records.</i></p>
+</section>	
+
+
+<section><h3>What do we mean by "Pragmatic Provenance?"</h3>
+<p>The W3C Government Linked Data WG accepts PROV WG's definition of provenance but recognizes that PROV-DM is a powerful tool. W3C GLD WG seeks to provide best practice recommendations that will be useful to government data stakeholders, that make sense for GLD use cases and are easily adopted by practitioners.</p>
+<p>W3C GLD could recommend a simple <b>provenance scoring system</b> for GLD analogous to TBL's 5 stars for linked data. Such a system might include:</p>
+<ul>
+	<li><b>One star:</b> Using the basic <a href="http://bit.ly/imvRX1" target="_blank">W3C DCAT</a> for Linked Data at the catalogs and dataset level</li>
+	<li><b>Two stars:</b> DCAT enhanced with more complete Dublin Core and other metadata</li>
+	<li><b>Three stars:</b> Above, but with based provenance metadata "within" the datasets</li>
+	<li><b>More stars:</b> More rigorous use of PROV DM</li>
+</ul>
 </section>
 
+<section><h3>Use cases for provenance in GLD</h3>
+<p>Provide use cases here...</p>
+<ul>
+	<li>Specifying catalog- and dataset-level provenance</li>
+	<li>Specifying provenance within datasets</li>
+	<ul>	
+		<li>Preserving and encoding pre-existing provenance data</li>
+		<li>Generating provenance when processing data (e.g. during the Linked Data creation process)</li>
+	</ul>
+</ul>
+<p>Possible organization of use cases (Adapted from <a href="http://bit.ly/wlKOEF" target="_blank">Trust and Linked Data</a>):</p>
+
+<ul>
+	<li>Simple "Oh Yeah?" scenario</li>
+	<ul>
+		<li>User retrieves a dataset, then clicks on “oh yeah” button, then site returns a provenance record</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Licensing scenario</li>
+	<ul>
+		<li>User retrieves dataset, then wants to check permission to use</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Referral scenario</li>
+	<ul>
+		<li>Site refers queries about provenance in terms of pointers to another site’s provenance facilities</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Repeated queries scenario</li>
+	<ul>
+		<li>Service repeatedly queries a site, wants provenance for all the answers</li>
+		<li>This is similar to PROV WG example, where user follows provenance record, asking follow-up questions based on previous answers</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Versioning scenario</li>
+	<ul>
+		<li>User retrieves a dataset, then wants to see its provenance, but the dataset has been updated in the original site (its provenance as well)</li>
+	</ul>
+</ul>
+
+<ul>
+	<li>Dynamic scenario</li>
+	<ul>
+		<li>User retrieves a resource that is dynamically created</li>
+	</ul>
+</ul>
+</section>
+
+
+
+</section> <!-- Pragmatic Provenance >> -->
+
 
 <!--    Epilogue: The Social Contract of a Linked Open Data Publisher   -->
 <!-- Note to Editors: This section is not part of our charter and probably will be folded into another section.  Yet to be determined. -->

--- a/bp/local-style.css	Fri Mar 09 09:39:11 2012 +0000
+++ b/bp/local-style.css	Fri Mar 09 09:40:22 2012 +0000
@@ -61,7 +61,7 @@
 
 .highlight {
 border: 3px solid #005a9c;
-margin: 5 5 5 20px;
+margin: 5px 25px 0 25px;
 padding: 10px;
 }

author	Dave Reynolds <dave@epimorphics.com>
	Fri, 09 Mar 2012 09:40:22 +0000
changeset 115	28b2ee4a6f27
parent 114	b45e8860bb1d (current diff)
parent 113	f563a941aea6 (diff)
child 116	9ebc8387a499