prov: changeset 389:1373fac866b8

--- a/ontology/ProvenanceFormalModel.html	Tue Sep 27 17:59:46 2011 +0100
+++ b/ontology/ProvenanceFormalModel.html	Tue Sep 27 17:59:59 2011 +0100
@@ -80,6 +80,9 @@
           // only "name" is required. Same format as editors.
 
           authors:  [
+              { name: "Stian Soiland-Reyes",
+                url:"http://soiland-reyes.com/stian/",
+                company: "University of Manchester, UK" },
               { name: "TBD" },
           ],
           
@@ -464,7 +467,482 @@
 	</section>
 	<section> 
 		<h3>Modeling an Example Scientific Workflow Scenario</h3>
-		<p>Specialization of the PROV ontology to create a provenance ontology for scientific workflows.</p>
+        <p>This section describes an example of extending the PROV
+        ontology to create a provenance ontology for scientific
+        workflows.</p>
+
+        <p>Scientific workflow systems allow the specification of a
+        pipeline of processes which are linked from outputs to inputs. 
+        Such workflow definitions are typically created in a graphical
+        user interface or interactive web application, and can then be
+        <em>enacted</em> using particular inputs or parameters.
+        Scientists in fields like bioinformatics, chemistry and 
+        physics use such workflows to perform repeated analysis by
+        connecting together disparate set of domain-specific tools and
+        services.  
+        </p>
+
+        <p>
+          Capturing the provenance of executions in such a workflow
+          system will typically include details of each of the process
+          executions, such as its inputs and outputs, start and stop
+          time, and should ultimately be able to describe the complete
+          data lineage through the workflow for any returned output data.
+          </p>
+        <p>
+        This example is not attempting to be a complete or general
+        ontology for asserting workflow provenance, but highlights how
+        a particular application like a workflow system can express its 
+        domain specific attributes based on the PROV ontology.
+        </p>  
+        <p>
+        <img
+          src="examples/ontology-extensions/workflow/workflowOntology.png"
+          style="width: 60%; min-width: 20em; max-width: 40em" /><br>
+          <em>Example extension of PROV ontology in order to describe
+          workflow provenance</em>
+        </p>
+        </section>
+        <section>
+            <h4>Workflow extensions to PROV classes</h4>
+            <p>
+                In order to describe workflow executions following the
+                model above, the PROV ontology is extended with
+                workflow-specific subclasses described below:
+            </p>
+            <dl>
+                <dt>wf:Process</dt>
+                <dd>
+                    A subclass of <i>prov:ProcessExecution</i> to
+                    signify an execution of a process which
+                    <i>wf:wasDefinedBy</i> a
+                     a <i>wf:ProcessDefinition</i>, e.g. a workflow or a
+                     process in a workflow. A workflow process can also
+                     act as an <i>prov:Agent</i> when controlling nested
+                     process executions.
+                </dd>
+                <dt>wf:WorkflowEngine</dt>
+                <dd>
+                    A subclass of <i>prov:Agent</i> to indicate that a
+                    workflow process was controlled by a workflow
+                    engine. 
+                </dd>
+                <dt>wf:Value</dt>
+                <dd>
+                    A subclass of <i>prov:Entity</i>, representing a
+                    value appearing in the workflow execution, it will
+                    typically be <i>used</i> or <i>generated</i> by
+                    <i>wf:Process</i> executions. The actual value can
+                    be provided with a <i>wf:value</i> property.
+                </dd>
+                <dt>wf:ValueAtPort</dt>
+                <dd>
+                    A subclass of <i>wf:Value</i> and <i>prov:Role</i>,
+                    indicating a value while in the role of being used
+                    or generated by a <i>wf:Process</i> at a particular
+                    <i>wf:Port</i>.
+                </dd>
+                <dt>wf:FileValue</dt>
+                <dd>
+                    A <i>wf:Value</i> which has been read from a file.
+                    As an <i>prov:Entity</i> this represents 
+                    an entity with both attributes <i>wf:value</i> and
+                    <i>wf:filename</i> fixed, that is the entity describes
+                    the point when the given file contained the 
+                    content. As the file might be read a while before
+                    the <i>wf:Value</i> is used by a <i>wf:Process</i>, 
+                    at which point the file content might have changed,
+                    those values are declared as being derived from 
+                    this file value using the <i>wf:wasReadFrom</i>
+                    property.
+                </dd>
+            </dl>
+        </section>
+        <section>
+            <h4>Workflow extensions to PROV properties</h4>
+            <p>
+                While for most cases subclassing will provide the
+                additional expressionality the application needs, this
+                example ontology also expands on the PROV ontology
+                with more specific subproperties.
+            </p>
+            <dl>
+                <dt>wf:wasDefinedBy</dt>
+                <dd>
+                   This sub-property of <i>prov:recipe</i> (not yet
+                   defined in PROV ontology) links a
+                   <i>wf:Process</i> to the defining
+                   <i>wf:ProcessDefinition</i>. Thus, if there are
+                   multiple executions of the same workflow definition,
+                   each of the separate <i>wf:Process</i>es will link to
+                   the same definition.  
+                </dd>
+                <dt>wf:ranInWorkflowEngine</dt>
+                <dd>
+                 This subproperty of <i>prov:wasControlledBy</i> links a
+                 <i>wf:Process</i> to the <i>wf:WorkflowEngine</i> it
+                 was executed in. The engine instance might contain
+                 additional details such as which version of the
+                 workflow system was used. 
+                </dd>
+                <dt>wf:wasLaunchedBy</dt>
+                <dd>
+                 This second subproperty of <i>prov:wasControlledBy</i> links a
+                 <i>wf:Process</i> to a <i>prov:Agent</i>, indicating 
+                 which person asked to execute the given
+                 wf:ProcessDefinition in the specified
+                 <i>wf:WorkflowEngine</i>.
+                </dd>
+                <dt>wf:wasSubProcessExecutionOf</dt>
+                <dd>
+                 This subproperty of <i>prov:wasControlledBy</i> links a
+                 <i>wf:Process</i> to another <i>prov:Process</i>, indicating 
+                 this is a child execution 
+                </dd>
+                <dt>wf:wasReadFrom</dt>
+                <dd>
+                    <p>
+                     This subproperty of <i>prov:wasDerivedFrom</i> links a
+                     <i>wf:Value</i> to the <i>wf:FileValue</i> it was read
+                     from, typically when used as a workflow input. 
+                     As described for <i>wf:FileValue</i> this distinction
+                     is done because at the time the workflow input is used
+                     in the workflow, the file input might be different and
+                     thus should not be described as an attribute of that
+                     <i>wf:Value</i>. 
+                     </p>
+                     <p>
+                      This property hints of an undescribed "Read file"
+                      process execution which is not described. This is
+                      therefore an example of how the provenance asserter 
+                      is limiting the scope of its provenance. The engine
+                      knows that the file was read, but is not able or
+                      willing to provide any deeper assertions, because its
+                      primary scope is at the level of executing workflow
+                      definitions.
+                     </p>
+                 </dd>
+                 <dt>wf:sawValue</dt>
+                 <dd>
+                    A subproperty of <i>prov:wasComplementOf</i> which
+                    indicates that an <i>wf:Value</i> was
+                    <i>wf:seenAtPort</i> within an
+                    <i>wf:ValueAtPort</i>. This ValueAtPort is a complement of the
+                    pointed at Value because one can consider this
+                    entity to to have the same attributes, but in
+                    addition the <i>wf:seenAtPort</i> property is fixed.
+                 </dd>
+
+                 <dt>wf:wasSeenAtPort</dt>
+                 <dd>
+                    A subproperty of <i>prov:assumedRole</i> (not yet defined in
+                    PROV ontology) indicating which <i>wf:Port</i> a
+                    <i>wf:ValueAtPort</i> was seen at. Thus one can see
+                    at which output port a value was generated, or at
+                    which input port(s) it was used. 
+                    
+                    As a functional property this requires a different
+                    <i>wf:ValueAtPort</i> for each <i>use</i> and
+                    <i>generation</i> of a value. The
+                    <i>wf:ValueAtPort</i> is linked to the 
+                    <i>wf:Entity</i> using <i>prov:wasComplementOf</i>
+                </dd>
+            </dl>
+        </section>
+        <section>
+            <h4>Workflow structure</h4>
+            <p>
+             This ontology includes a simple definition language for
+             describing the overall workflow structure. This is not
+             meant as a general workflow definition language, but allows
+             us to describe <i>process executions</i>, <i>use</i> and 
+             <i>generation</i> with relation to particular sections of
+             the workflow definition. 
+            </p>
+            <dl>
+              <dt>wf:ProcessDefinition</dt>
+              <dd>A definition of how to execute a process. It will
+              typically refer to a command or service which will be
+              called. Each process definition also 
+              <i>wf:definesInput</i>s and <i>wf:definesOutput</i>s.
+              </dd>
+              <dt>wf:Port</dt>
+              <dd>
+              A port can be considered as a parameter or return value
+              for a process. These are typically given names which are
+              unique within a process definition. A value is either
+              provided to an input port before execution, or produced
+              from an output port after execution. 
+              </dd>
+              <dt>wf:linksTo</dt>
+              <dd>
+              Ports are connected using links. A link from an output
+              port to an input port means that the value received on
+              that output will be forwarded to the input of the next
+              process.  Note that in this simplified ontology links can
+              also go from Input to Input and Output to Output, these
+              are used to connect workflow ports to processor ports.
+              </dd>
+              <dt>wf:Input</dt>
+              <dd>
+              An input port for a process will receive a value which
+              will be <i>used</i> by the execution. In a dataflow driven
+              workflow model, a process will execute as soon as all its
+              defined input ports have been provided with values. 
+              </dd>
+              <dt>wf:Output</dt>
+              <dd>
+              A process execution might return multiple outputs, for
+              instance a table and a diagram. Each of these are declared
+              as an output port for that process definition.
+              </dd>
+              <dt>wf:definesSubProcess</dt>
+              <dd>
+              <p>
+              Scientific workflows can be composed of nested workflows
+              which can be shared and reused as components. Some
+              workflow systems also allow various execution settings
+              on the nested workflow, like looping or parallelisation. 
+              </p>
+              <p>
+              In this case a process definition will use
+              <i>wf:definesSubProcess</i> to indicate its consistent
+              parts, and there will be additional <i>wf:linksTo</i> from
+              the input ports of this process definition to the input
+              ports of some of its nested sub processes, and vice versa
+              for the outputs. The top-level workflow is always such a
+              process definition. 
+              </p>
+              </dd>
+            </dl>
+        </section>
+        <section>
+            <h4>Example workflow</h4>
+            <img src="http://www.w3.org/2011/prov/wiki/images/5/56/Concatsha1.png" />
+            <p>This is an example workflow which defines a workflow
+            input <i>input</i>, three processes <i>String_constant</i>,
+            <i>Concatenate_two_strings</i> and <i>sha1</i>, and finally
+            two workflow outputs <i>combined</i> and <i>sha1</i>. When
+            executed, it will execute from top to bottom, first
+            concatenating the provided input with the string constant,
+            which is returned on the <i>combined</i> output, but also
+            provided to the <i>sha1</i> process, which output is given
+            to the other workflow port.
+            </p>
+            <p>
+            Using the definition ontology above this workflow can be
+            expressed in RDF/XML as:
+            </p>
+		<div class="exampleOuter">
+				<pre class="example">
+&lt;rdf:RDF xml:base="http://www.example.com/workflow1#"
+    xmlns:impl="http://company.example.org/engine-implementation#"
+    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+    xmlns:wf="http://www.example.com/scientific-workflow#"&gt;
+
+    &lt;wf:ProcessDefinition rdf:about="#workflow"&gt;
+        &lt;rdf:type rdf:resource="http://company.example.org/engine-implementation#Workflow"/&gt;
+        &lt;wf:definesInput&gt;
+            &lt;wf:Input rdf:about="#inName"&gt;
+                &lt;wf:linksTo rdf:resource="#catIn2" /&gt;
+            &lt;/wf:Input&gt;
+        &lt;/wf:definesInput&gt;
+        &lt;wf:definesOutput rdf:resource="#combined" /&gt;
+        &lt;wf:definesOutput rdf:resource="#sha1" /&gt;
+        &lt;wf:definesSubProcess&gt;
+            &lt;impl:Constant rdf:about="#String_constant"&gt;
+                &lt;impl:constant&gt;Hello, &lt;/impl:constant&gt;
+                &lt;wf:definesOutput&gt;
+                    &lt;wf:Output rdf:about="#constantValue"&gt;
+                        &lt;wf:linksTo rdf:resource="#catIn1"/&gt;
+                    &lt;/wf:Output&gt;
+                &lt;/wf:definesOutput&gt;
+            &lt;/impl:Constant&gt;
+        &lt;/wf:definesSubProcess&gt;
+        &lt;wf:definesSubProcess&gt;
+            &lt;impl:Command rdf:about="#cat"&gt;
+                &lt;impl:command&gt;cat&lt;/impl:command&gt;
+                &lt;wf:definesInput rdf:resource="#catIn1" /&gt;
+                &lt;wf:definesInput rdf:resource="#catIn2" /&gt;
+                &lt;wf:definesOutput&gt;
+                    &lt;wf:Output rdf:about="#catOut"&gt;
+                        &lt;wf:linksTo rdf:resource="#shaIn"/&gt;
+                    &lt;/wf:Output&gt;
+                &lt;/wf:definesOutput&gt;
+            &lt;/impl:Command&gt;
+        &lt;/wf:definesSubProcess&gt;
+        &lt;wf:definesSubProcess&gt;
+            &lt;impl:Command rdf:about="#shasum"&gt;
+                &lt;impl:command&gt;shasum&lt;/impl:command&gt;
+                &lt;wf:definesInput rdf:resource="#shaIn" /&gt;
+                &lt;wf:definesOutput&gt;
+                    &lt;wf:Output rdf:about="#shaOut"&gt;
+                        &lt;wf:linksTo rdf:resource="#sha1"/&gt;
+                    &lt;/wf:Output&gt;
+                &lt;/wf:definesOutput&gt;
+            &lt;/impl:Command&gt;
+        &lt;/wf:definesSubProcess&gt;
+    &lt;/wf:ProcessDefinition&gt;
+&lt;/rdf:RDF&gt;            
+        </pre></div>
+        </section>
+        <section>
+            <h4>Example workflow run</h4>
+            <p>
+              This example shows how using the workflow extensions
+              together with PROV can provide the provenance of executing
+              the workflow defined above.
+            </p>
+            <div class="exampleOuter"><pre class="example">
+
+&lt;rdf:RDF xmlns="http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceOntology.owl#"
+    xmlns:cnt="http://www.w3.org/2011/content#"
+    xmlns:foaf="http://xmlns.com/foaf/0.1/"
+    xmlns:prov="http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceOntology.owl#"
+    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+    xmlns:wf="http://www.example.com/scientific-workflow#"
+    xmlns:base="http://www.example.com/run1#" &gt;
+
+    &lt;Agent rdf:about="#aUser"&gt;
+        &lt;rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/&gt;
+        &lt;foaf:name&gt;Stian Soiland-Reyes&lt;/foaf:name&gt;
+    &lt;/Agent&gt;
+
+    &lt;wf:WorkflowEngine rdf:about="#workflowEngine" /&gt;
+
+    &lt;wf:FileValue rdf:about="#inputFile"&gt;
+        &lt;wf:file&gt;/tmp/myinput.txt&lt;/wf:file&gt;
+        &lt;wf:value&gt;
+            &lt;cnt:ContentAsText&gt;
+                &lt;cnt:characterEncoding&gt;UTF-8&lt;/cnt:characterEncoding&gt;
+                &lt;cnt:chars&gt;Steve&lt;/cnt:chars&gt;
+            &lt;/cnt:ContentAsText&gt;
+        &lt;/wf:value&gt;
+    &lt;/wf:FileValue&gt;
+
+    &lt;wf:Value rdf:about="#input"&gt;
+        &lt;wf:wasReadFrom rdf:resource="#inputFile"/&gt;
+        &lt;wf:value&gt;
+            &lt;cnt:ContentAsText&gt;
+                &lt;cnt:characterEncoding&gt;UTF-8&lt;/cnt:characterEncoding&gt;
+                &lt;cnt:chars&gt;Steve&lt;/cnt:chars&gt;
+            &lt;/cnt:ContentAsText&gt;
+        &lt;/wf:value&gt;
+    &lt;/wf:Value&gt;
+
+    &lt;wf:Process rdf:about="#workflowRun"&gt;
+        &lt;used&gt;
+            &lt;wf:ValueAtPort&gt;
+                &lt;wf:sawValue rdf:resource="#input"/&gt;
+                &lt;wf:seenAtPort rdf:resource="http://www.example.com/workflow1#inName"/&gt;
+            &lt;/wf:ValueAtPort&gt;
+        &lt;/used&gt;
+        &lt;wf:ranInWorkflowEngine rdf:resource="#workflowEngine"/&gt;
+        &lt;wf:wasLaunchedBy rdf:resource="#aUser"/&gt;
+        &lt;wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#workflow"/&gt;
+    &lt;/wf:Process&gt;
+
+    &lt;wf:Process rdf:about="#constant"&gt;
+        &lt;wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/&gt;
+        &lt;wf:wasDefinedBy
+        rdf:resource="http://www.example.com/workflow1#String_Constant"/&gt;
+    &lt;/wf:Process&gt;
+
+    &lt;wf:Value rdf:about="#hello"&gt;
+        &lt;wasGeneratedBy rdf:resource="#constant"/&gt;
+        &lt;wf:value&gt;
+            &lt;cnt:ContentAsText&gt;
+                &lt;cnt:chars&gt;Hello, &lt;/cnt:chars&gt;
+            &lt;/cnt:ContentAsText&gt;
+        &lt;/wf:value&gt;
+    &lt;/wf:Value&gt;
+
+    &lt;wf:ValueAtPort rdf:about="#helloValue"&gt;
+        &lt;wasGeneratedBy rdf:resource="#constant"/&gt;
+        &lt;wf:value&gt;
+            &lt;cnt:ContentAsText&gt;
+                &lt;cnt:chars&gt;Hello, &lt;/cnt:chars&gt;
+            &lt;/cnt:ContentAsText&gt;
+        &lt;/wf:value&gt;
+        &lt;wf:sawEntity rdf:resource="#hello"/&gt;
+    &lt;/wf:Value&gt;
+
+    &lt;wf:Process rdf:about="#combine"&gt;
+        &lt;used&gt;
+          &lt;wf:EntityAtPort&gt;
+            &lt;wf:sawValue rdf:resource="#hello"/&gt;
+            &lt;wf:seenAtPort rdf:resource="http://www.example.com/workflow1#catIn1"/&gt;
+          &lt;/wf:EntityAtPort&gt;
+        &lt;/used&gt;
+        &lt;used&gt;
+          &lt;wf:EntityAtPort&gt;
+            &lt;wf:sawValue rdf:resource="#input"/&gt;
+            &lt;wf:seenAtPort rdf:resource="http://www.example.com/workflow1#catIn2"/&gt;
+          &lt;/wf:EntityAtPort&gt;
+        &lt;/used&gt;
+        &lt;wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/&gt;
+        &lt;wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#cat"/&gt;
+    &lt;/wf:Process&gt;
+
+    &lt;wf:Value rdf:about="#combined"&gt;
+        &lt;wasGeneratedBy rdf:resource="#combine"/&gt;
+        &lt;wf:value&gt;
+            &lt;cnt:ContentAsText&gt;
+                &lt;cnt:chars&gt;Hello, Steve&lt;/cnt:chars&gt;
+            &lt;/cnt:ContentAsText&gt;
+        &lt;/wf:value&gt;
+    &lt;/wf:Value&gt;
+
+    &lt;wf:Process rdf:about="#shasum"&gt;
+        &lt;used rdf:resource="#combined"/&gt;
+        &lt;wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/&gt;
+        &lt;wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#shasum"/&gt;
+    &lt;/wf:Process&gt;
+
+    &lt;wf:Value rdf:about="#sha1"&gt;
+        &lt;wf:value&gt;
+            &lt;cnt:ContentAsText&gt;
+                &lt;cnt:characterEncoding&gt;UTF-8&lt;/cnt:characterEncoding&gt;
+                &lt;cnt:chars&gt;a33d1fb1658d4fbf017de59ab67437a3eb5ff50d&lt;/cnt:chars&gt;
+            &lt;/cnt:ContentAsText&gt;
+        &lt;/wf:value&gt;
+    &lt;/wf:Value&gt;
+
+    &lt;wf:EntityAtPort rdf:about="#sha1OutputFromShasum"&gt;
+        &lt;wasGeneratedBy rdf:resource="#shasum"/&gt;
+        &lt;wf:value&gt;
+            &lt;cnt:ContentAsText&gt;
+                &lt;cnt:characterEncoding&gt;UTF-8&lt;/cnt:characterEncoding&gt;
+                &lt;cnt:chars&gt;a33d1fb1658d4fbf017de59ab67437a3eb5ff50d&lt;/cnt:chars&gt;
+            &lt;/cnt:ContentAsText&gt;
+        &lt;/wf:value&gt;
+        &lt;wf:sawValue rdf:resource="#sha1"/&gt;
+        &lt;wf:wasSeenAt rdf:resource="http://www.example.com/workflow1#shaOut"/&gt;
+    &lt;/wf:EntityAtPort&gt;
+
+    &lt;wf:EntityAtPort rdf:about="#sha1OutputFromWorkflow"&gt;
+        &lt;wasGeneratedBy rdf:resource="#workflowRun"/&gt;
+        &lt;wf:value&gt;
+            &lt;cnt:ContentAsText&gt;
+                &lt;cnt:characterEncoding&gt;UTF-8&lt;/cnt:characterEncoding&gt;
+                &lt;cnt:chars&gt;a33d1fb1658d4fbf017de59ab67437a3eb5ff50d&lt;/cnt:chars&gt;
+            &lt;/cnt:ContentAsText&gt;
+        &lt;/wf:value&gt;
+        &lt;wf:sawValue rdf:resource="#sha1"/&gt;
+        &lt;wf:wasSeenAt rdf:resource="http://www.example.com/workflow1#sha1"/&gt;
+    &lt;/wf:EntityAtPort&gt;
+
+&lt;/rdf:RDF&gt;            
+            </pre></div>
+            <p> Note that the example above does not show the inferred classes
+    and properties from the PROV ontology. For interoperability, applications
+    should also expressed such inferred statements, so that the provenance can be
+    read without using OWL2 inferencing and the customized ontologies.
+            </p>
+
+        </section>
+
+
 	  </section>	    			
 	</section>
 	<section>
author	Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
	Tue, 27 Sep 2011 17:59:59 +0100
changeset 389	1373fac866b8
parent 388	0c6c0edc6650
child 390	483e9715401a