clarified issues with identifier in entity record
authorLuc Moreau <l.moreau@ecs.soton.ac.uk>
Wed, 21 Dec 2011 12:09:58 +0000
changeset 1301 1d696f076f51
parent 1300 e82bc2d0bc3c
child 1302 9a94c9c2fbaf
clarified issues with identifier in entity record
model/ProvenanceModel.html
--- a/model/ProvenanceModel.html	Wed Dec 21 10:36:26 2011 +0000
+++ b/model/ProvenanceModel.html	Wed Dec 21 12:09:58 2011 +0000
@@ -309,7 +309,7 @@
 
 <p>Hence, to accommodate different perspectives on things and their situation in the world as perceived by us, we introduce the idea of a characterized thing, which refers to a thing and its situation in the world, as characterized by someone. We then define an <dfn id="concept-entity">entity</dfn> as an identifiable characterized thing. An entity <em>fixes some aspects</em> of a thing and its situation in the world, so that it becomes possible to express its provenance, and what causes these specific aspects to be as such. An alternative entity may fix other aspects, and its provenance may be different.</p>
 
-<div class="anexample">
+<div class="anexample" id="a-report-example">
 Different users may take different perspectives on a resource with
 a URL. These perspectives in this conceptualization of the world are
 referred to as entities. Three such entities may be
@@ -947,7 +947,7 @@
 
 <p>An entity record, noted <span class="name">entity(id, [ attr1=val1, ...])</span> in PROV-ASN, contains:</p>
 <ul>
-<li><em>id</em>: an identifier <span class="name">id</span> identifying an entity; the identifier of the entity record is defined to be the same as the identifier of the entity; </li>
+<li><em>id</em>: an identifier <span class="name">id</span> identifying an entity; </li>
 <li><em>attributes</em>: an OPTIONAL set of attribute-value pairs <span class="name">[ attr1=val1, ...]</span>, representing this entity's situation in the world.</li>
 </ul>
 
@@ -996,6 +996,9 @@
 
 Further considerations:
 <ul>
+<li>
+The entity identifier <span class="name">id</span> contained in an entity record is expected to be unique among all the identifiers contained in  the current account's records. 
+This constraint is elaborated upon in <a href="#identifiable-record-in-account">identifiable-record-in-account</a>. It means that the current account does not contain any other record for this identifier. Effectively, <span class="name">id</span>  acts as a <em>local</em> identifier for this record.  In this specification, whenever we write "an entity record identified by ... ",  this identification is to be understood in the context of the account that defines it. </li>
 <li>If an asserter wishes to characterize an entity  with the same attribute-value pairs over several intervals, then they are required to create multiple entity records (either by direct assertion or by inference), each with its own identifier (so as to allow potential dependencies between the various entity records to be expressed).  </li>
 
 <li>There is no assumption that the set of attributes is complete and that the attributes are independent/orthogonal of each other.</li>
@@ -1045,7 +1048,7 @@
 
 <p> An activity record, written <span class="name">activity(id, st, et, [ attr1=val1, ...])</span> in PROV-ASN, contains:</p>
 <ul>
-<li><em>id</em>: an identifier <span class="name">id</span> identifying an activity; the identifier of the activity record is defined to be the same as the identifier of the activity;</li>
+<li><em>id</em>: an identifier <span class="name">id</span> identifying an activity;</li>
 <!--<li><em>recipeLink</em>: an OPTIONAL <a href="#record-RecipeLink">recipe link</a> <span class="name">rl</span>, which consists of a domain specific specification of the activity;</li>-->
 <li><em>startTime</em>: an OPTIONAL time <span class="name">st</span> indicating the start of the activity;</li>
 <li><em>endTime</em>: an OPTIONAL time <span class="name">et</span> indicating the end of the activity;</li>
@@ -1089,7 +1092,12 @@
 <a title="activity start event">start event</a> <a>precedes</a> the <a title="activity end event">end event</a>.</div> 
 -->
 
-<p>An activity record is not an entity record.
+<p>Further considerations:</p>
+<ul>
+<li>The activity identifier <span class="name">id</span> contained in an activity record is also expected to be unique among all the identifiers contained in  the current account's records. 
+This constraint is elaborated upon in <a href="#identifiable-record-in-account">identifiable-record-in-account</a>. It means that the current account does not contain any other record for this identifier, and that effectively <span class="name">id</span>  acts as a <em>local</em> identifier for this record in the current account.</li>
+
+<li>An activity record is not an entity record.
 Indeed, an entity record represents an entity that exists in full at
 any point in its characterization interval, persists during this
 interval, and preserves the characteristics that makes it
@@ -1097,8 +1105,8 @@
 unfolds or develops through time, but is typically not identifiable by
 the characteristics it exhibits at any point during its duration. 
 This distinction is similar to the distinction between 
-'continuant' and 'occurrent' in logic [[Logic]].</p>
-
+'continuant' and 'occurrent' in logic [[Logic]].</li>
+</ul>
 
 
 </section> 
@@ -1128,7 +1136,7 @@
 
 <p>An agent record, noted <span class="name">agent(id, [ attr1=val1, ...])</span> in PROV-ASN, contains:</p>
 <ul>
-<li><em>id</em>: an identifier <span class="name">id</span> identifying an agent; the identifier of the agent record is defined to be the same as the identifier of the agent;</li>
+<li><em>id</em>: an identifier <span class="name">id</span> identifying an agent;</li>
 <li><em>attributes</em>: contains a set of attribute-value pairs <span class="name">[ attr1=val1, ...]</span>, representing this agent's situation in the world.</li>
 </ul>
 
@@ -2399,10 +2407,10 @@
  a file, there could be a provenance record kept by the mail client,
  and another by the mail server. Such provenance records may provide different explanations about something happening in the world, because they are created by different parties or observed by different witnesses. A given party could also create multiple provenance records about an execution, to capture different levels of details, targeted at different end-users: the programmer of an experiment may be interested in a detailed log of execution, while the scientists may focus more on the scientific-level description.   Given that multiple provenance records can co-exist, it is important to know who asserted these records. </p>
 
-<p>In PROV-DM, an <dfn id="dfn-Account">account record</dfn> is a wrapper of records with a dual purpose:  </p> 
+<p>In PROV-DM, an <dfn id="dfn-Account">account record</dfn> is a wrapper of records with the following purposes:  </p> 
 <ul>
 <li> It is the mechanism by which attribution of provenance can be assserted; it allows asserters to bundle up their assertions, and assert suitable attribution;</li>
-<li> It provides a scoping mechanism for record identifiers;</li>
+<li> It provides a scoping mechanism for identifier unicity since a record can be uniquely identified in an account by means of the identifier it contains;</li>
 <li> It provides a scoping mechanism for structural contraints (such as
    <a href="#generation-unicity">generation-unicity</a> and <a href="#derivation-use">derivation-use</a> discussed in Section <a href="#structural-constraints">structural-constraints</a>).</li>
 </ul>
@@ -2456,14 +2464,16 @@
 </p>
 </div>
 
-<p>Account records constitue a scope for record identifiers. A record identifier within the scope of an account is intended to denote a single record. However, nothing prevents an asserter from asserting an account containing, for example,  multiple entity records with a same identifier but different attribute-values. In that case, they should be understood as a single entity record with this identifier and the union of all attributes values, as formalized in <a href="#identified-entity-in-account">identified-entity-in-account</a>.</p>
-
-<div class='constraint' id='identified-entity-in-account'>
-Given an entity record identifier <span class="name">e</span>,  two sets of attribute-values denoted by <span class="name">av1</span> and <span class="name">av2</span>, 
- two entity records  <span class="name">entity(e,av1)</span> and <span class="name">entity(e,av2)</span> occurring in an account  are equivalent to the entity record <span class="name">entity(e,av)</span> where <span class="name">av</span> is the set of attribute-value pairs formed by the union of <span class="name">av1</span> and <span class="name">av2</span>.
+<p> An identifier in a record within the scope of an account is intended to denote a single record. However, nothing prevents an asserter from asserting an account containing, for example,  multiple entity records with a same identifier but different attribute-values. In that case, they should be understood as a single entity record with this identifier and the union of all attributes values, as formalized in <a href="#identifiable-record-in-account">identifiable-record-in-account</a>.</p>
+
+<div class='constraint' id='identifiable-record-in-account'>
+<p>Given an entity record identifier <span class="name">e</span>,  two sets of attribute-values denoted by <span class="name">av1</span> and <span class="name">av2</span>, 
+ two entity records  <span class="name">entity(e,av1)</span> and <span class="name">entity(e,av2)</span> occurring in an account  are equivalent to the entity record <span class="name">entity(e,av)</span> where <span class="name">av</span> is the set of attribute-value pairs formed by the union of <span class="name">av1</span> and <span class="name">av2</span>.</p>
+
+<p>This constraint similarly applies to all other types of records. As a result, the identifier that occurs in a record is unique and acts as a local identifier for that record in that account.</p>
 </div>
 
-<p>Whilst constraint <a href="#identified-entity-in-account">identified-entity-in-account</a> specifies how to understand multiple entity records with a same identifier within a given account, it does not guarantee that the entity record formed with the union of all attribute-value pairs is consistent. Indeed, a given attribute may be assigned multiple values, resulting in an inconsistent entity record, as illustrated by the following example.</p>
+<p>Whilst constraint <a href="#identifiable-record-in-account">identifiable-record-in-account</a> specifies how to understand multiple entity records with a same identifier within a given account, it does not guarantee that the entity record formed with the union of all attribute-value pairs is consistent. Indeed, a given attribute may be assigned multiple values, resulting in an inconsistent entity record, as illustrated by the following example.</p>
 
 <div class="anexample">
 <p>
@@ -2475,7 +2485,7 @@
           entity(e,[prov:type="person", ex:age=30])
           ...)
 </pre>
-<p>Application of <a href="#identified-entity-in-account">identified-entity-in-account</a> results in an entity record containing the attribute-value pairs <span class="name">age=20</span> and <span class="name">age=30</span>. This results in an inconsistent characterization of a person. We note that deciding whether a set of attribute-values is consistent or not is application specific and outside the scope of this specification.
+<p>Application of <a href="#identifiable-record-in-account">identifiable-record-in-account</a> results in an entity record containing the attribute-value pairs <span class="name">age=20</span> and <span class="name">age=30</span>. This results in an inconsistent characterization of a person. We note that deciding whether a set of attribute-values is consistent or not is application specific and outside the scope of this specification.
 </p></div>
 
 <p>Account records can be nested since  an account record can occur among the records being wrapped by another account. </p>
@@ -2489,7 +2499,7 @@
 
 <p> The union of two accounts is another account, 
 containing the unions of their respective records, where
- records with a same identifier should be understood according to constraint <a href="#identified-entity-in-account">identified-entity-in-account</a>. Well-formed
+ records with a same identifier should be understood according to constraint <a href="#identifiable-record-in-account">identifiable-record-in-account</a>. Well-formed
 accounts are not
 closed under union because the
 constraint <a href="#generation-unicity">generation-unicity</a> may no
@@ -2513,11 +2523,11 @@
 
 -->
 
-<p>Account records constitute a scope for record identifiers. Since accounts can be nested,  scopes can also be nested; thus, the scope of record identifiers should be understood in the context of such nested scopes.  When a record with an identifier occurs directly within an account, then its identifier denotes this record in the scope of this account, except in sub-accounts where records with the same identifier occur. </p>
+<p>Account records constitute a scope for identifier unicity. Since accounts can be nested,  scopes can also be nested; thus, the requirement on unicity of identifiers should be understood in the context of such nested scopes.  When a record with an identifier occurs directly within an account, then its identifier denotes this record in the scope of this account, except in sub-accounts where records with the same identifier occur. </p>
 
 <div class="anexample">
 <p>
-The following account record is inspired from section <a href="#example-prov-asn-encoding">example-prov-asn-encoding</a>. This account, identified by <span class="name">ex:acc3</span>, declares entity record with identifier <span class="name">e0</span>, which is being referred to in the nested account <span class="name">ex:acc4</span>. The scope of identifier <span class="name">e0</span> is account <span class="name">ex:acc3</span>, including subaccount <span class="name">ex:acc4</span>.</p>
+The following account record is inspired from section <a href="#example-prov-asn-encoding">example-prov-asn-encoding</a>. This account, identified by <span class="name">ex:acc3</span>, declares entity record with identifier <span class="name">e0</span>, which is being referred to in the nested account <span class="name">ex:acc4</span>. Identifier <span class="name">e0</span> is uniquely identify a record in account <span class="name">ex:acc3</span>, including subaccount <span class="name">ex:acc4</span>.</p>
 <pre class="codeexample">
 account(ex:acc3,
         http://example.org/asserter1, 
@@ -2535,6 +2545,9 @@
 </p>
 </div>
 
+<p>The identifier of an account record is expected to be globally unique, whereas identifiers for other records are expected to be unique within the scope of the account in which their record occurs. </p>
+
+
 <p>The account record is the hook by which further meta information can be expressed about provenance, such as asserter, time of creation, signatures. The annotation mechanism can be used for this purpose, but how general meta-information is expressed is beyond the scope of this specification, except for asserters.</p>
 
 <div class="structural-forward">
@@ -3676,7 +3689,7 @@
 
 <p> The union of two accounts is another account, 
 containing the unions of their respective records, where
- records with a same identifier should be understood according to constraint <a href="#identified-entity-in-account">identified-entity-in-account</a>. Structurally well-formed
+ records with a same identifier should be understood according to constraint <a href="#identifiable-record-in-account">identifiable-record-in-account</a>. Structurally well-formed
 accounts are not
 closed under union because the
 constraint <a href="#generation-unicity">generation-unicity</a> may no
@@ -3771,12 +3784,12 @@
 
 <p>In the context of PROV-DM, a resource is just a thing in the world. One may take multiple perspectives on such a thing and its situation in the world, fixing some its aspects.</p>
 
-<p> We refer to the example of section <a href="#conceptualization">2.1</a> for a resource (at some URL) and three different perspectives, referred to as entities.  Three different entity records can be expressed for this report, which in the PROV-ASN sample below, are expressed within a same account.
+<p> We refer to the <a href="#a-report-example">example</a> of section <a href="#conceptualization">2.1</a> for a resource (at some URL) and three different perspectives, referred to as entities.  Three different entity records can be expressed for this report, which in the PROV-ASN sample below, are expressed within a same account.
 </p>
 
 <pre>
 container
-prefix app urn:example:
+prefix app http://example.org/app/
 prefix cr  http://example.org/crime/
 
    account(acc1,
@@ -3789,10 +3802,12 @@
 endContainer
 </pre>
 
-<p>Each entity record contains an idenfier that identifies the entity it represents.
-In this example, three identifiers were minted, and their prefix uses the URN syntax with "example" namespace.</p>
-
-<p>Given that the report is a resource denoted by the URI <span class="name">http://example.org/crime.txt</span>, we could simply use this URI as the identifier of an entity. This would avoid us minting new URIs.  Hence, the report URI would play a double role: as a URI it denotes a resource accessible at that URI, and as a PROV-DM identifier, it identifies a specific characterization of this report. A given identifier  identifies a single entity record within the scope of an account. Hence, below, all entities records have been given the same identifier but appear in the scope of different accounts. </p>
+<p>Each entity record contains an identifier that is unique in
+account <span class="name">acc1</span>, and therefore locally
+identifies the entity record it is contained in.  In this example,
+three identifiers were minted.</p>
+
+<p>Given that the report is a resource denoted by the URI <span class="name">http://example.org/crime.txt</span>, we could simply use this URI as the identifier of an entity. This would avoid us minting new URIs.  Hence, the report URI would play a double role: as a URI it denotes a resource accessible at that URI, and as an identifier in a PROV-DM record, it helps identify a specific characterization of this report. A given identifier occurring in an entity record must be unique within the scope of an account. Hence, below, all entities records have been given the same identifier but appear in the scope of different accounts, so as to satisfy  <a href="#identifiable-record-in-account">identifiable-record-in-account</a>.</p>
 
 <pre>
 container 
@@ -3818,14 +3833,14 @@
 endContainer
 </pre>
 
-<p>In this case, the qualified name  <span class="name">app:crime.txt</span> maps to URI <span class="name">http://example.org/crime.txt</span> still denotes the same resource; however, the perspective we take about that resource is expressed as a different entity record, happening to have the same identifier in different accounts. </p>
+<p>In this case, the qualified name  <span class="name">app:crime.txt</span> maps to URI <span class="name">http://example.org/crime.txt</span> still denotes the same resource; however, the perspectives we take about that resource are expressed by multiple entity records, happening to all contain the same identifier but in different accounts. </p>
 
 <p> Alternatively, if we need to assert the existence of two different perspectives on the report within the same account, then alternate identifiers MUST be used, one of them being allowed to be the resource URI.</p>
 
 <pre>
 container 
  prefix app  http://example.org/
- prefix app2 urn:example:
+ prefix app2 http://example.org/app/
  prefix cr   http://example.org/crime/
 
    account(acc5,
@@ -3851,6 +3866,8 @@
 <section class="appendix"> 
       <h2>Changes Since Second Public Working Draft</h2> 
 <ul>
+<li>12/21/11: Clarified the issues with identifier in entity record. </li>
+<li>12/21/11: Explained overloading of wasStartedBy. </li>
 <li>12/21/11: Moved Collections from 6.1 to 6.8. </li>
 <li>12/20/11: Created a section on structural constraints. </li>
 <li>12/19/11: Made plan entity and added extra parameter to wasAssociatedWith.  </li>