--- a/ontology/ProvenanceFormalModel.html Tue Sep 27 17:59:46 2011 +0100
+++ b/ontology/ProvenanceFormalModel.html Tue Sep 27 17:59:59 2011 +0100
@@ -80,6 +80,9 @@
// only "name" is required. Same format as editors.
authors: [
+ { name: "Stian Soiland-Reyes",
+ url:"http://soiland-reyes.com/stian/",
+ company: "University of Manchester, UK" },
{ name: "TBD" },
],
@@ -464,7 +467,482 @@
</section>
<section>
<h3>Modeling an Example Scientific Workflow Scenario</h3>
- <p>Specialization of the PROV ontology to create a provenance ontology for scientific workflows.</p>
+ <p>This section describes an example of extending the PROV
+ ontology to create a provenance ontology for scientific
+ workflows.</p>
+
+ <p>Scientific workflow systems allow the specification of a
+ pipeline of processes which are linked from outputs to inputs.
+ Such workflow definitions are typically created in a graphical
+ user interface or interactive web application, and can then be
+ <em>enacted</em> using particular inputs or parameters.
+ Scientists in fields like bioinformatics, chemistry and
+ physics use such workflows to perform repeated analysis by
+ connecting together disparate set of domain-specific tools and
+ services.
+ </p>
+
+ <p>
+ Capturing the provenance of executions in such a workflow
+ system will typically include details of each of the process
+ executions, such as its inputs and outputs, start and stop
+ time, and should ultimately be able to describe the complete
+ data lineage through the workflow for any returned output data.
+ </p>
+ <p>
+ This example is not attempting to be a complete or general
+ ontology for asserting workflow provenance, but highlights how
+ a particular application like a workflow system can express its
+ domain specific attributes based on the PROV ontology.
+ </p>
+ <p>
+ <img
+ src="examples/ontology-extensions/workflow/workflowOntology.png"
+ style="width: 60%; min-width: 20em; max-width: 40em" /><br>
+ <em>Example extension of PROV ontology in order to describe
+ workflow provenance</em>
+ </p>
+ </section>
+ <section>
+ <h4>Workflow extensions to PROV classes</h4>
+ <p>
+ In order to describe workflow executions following the
+ model above, the PROV ontology is extended with
+ workflow-specific subclasses described below:
+ </p>
+ <dl>
+ <dt>wf:Process</dt>
+ <dd>
+ A subclass of <i>prov:ProcessExecution</i> to
+ signify an execution of a process which
+ <i>wf:wasDefinedBy</i> a
+ a <i>wf:ProcessDefinition</i>, e.g. a workflow or a
+ process in a workflow. A workflow process can also
+ act as an <i>prov:Agent</i> when controlling nested
+ process executions.
+ </dd>
+ <dt>wf:WorkflowEngine</dt>
+ <dd>
+ A subclass of <i>prov:Agent</i> to indicate that a
+ workflow process was controlled by a workflow
+ engine.
+ </dd>
+ <dt>wf:Value</dt>
+ <dd>
+ A subclass of <i>prov:Entity</i>, representing a
+ value appearing in the workflow execution, it will
+ typically be <i>used</i> or <i>generated</i> by
+ <i>wf:Process</i> executions. The actual value can
+ be provided with a <i>wf:value</i> property.
+ </dd>
+ <dt>wf:ValueAtPort</dt>
+ <dd>
+ A subclass of <i>wf:Value</i> and <i>prov:Role</i>,
+ indicating a value while in the role of being used
+ or generated by a <i>wf:Process</i> at a particular
+ <i>wf:Port</i>.
+ </dd>
+ <dt>wf:FileValue</dt>
+ <dd>
+ A <i>wf:Value</i> which has been read from a file.
+ As an <i>prov:Entity</i> this represents
+ an entity with both attributes <i>wf:value</i> and
+ <i>wf:filename</i> fixed, that is the entity describes
+ the point when the given file contained the
+ content. As the file might be read a while before
+ the <i>wf:Value</i> is used by a <i>wf:Process</i>,
+ at which point the file content might have changed,
+ those values are declared as being derived from
+ this file value using the <i>wf:wasReadFrom</i>
+ property.
+ </dd>
+ </dl>
+ </section>
+ <section>
+ <h4>Workflow extensions to PROV properties</h4>
+ <p>
+ While for most cases subclassing will provide the
+ additional expressionality the application needs, this
+ example ontology also expands on the PROV ontology
+ with more specific subproperties.
+ </p>
+ <dl>
+ <dt>wf:wasDefinedBy</dt>
+ <dd>
+ This sub-property of <i>prov:recipe</i> (not yet
+ defined in PROV ontology) links a
+ <i>wf:Process</i> to the defining
+ <i>wf:ProcessDefinition</i>. Thus, if there are
+ multiple executions of the same workflow definition,
+ each of the separate <i>wf:Process</i>es will link to
+ the same definition.
+ </dd>
+ <dt>wf:ranInWorkflowEngine</dt>
+ <dd>
+ This subproperty of <i>prov:wasControlledBy</i> links a
+ <i>wf:Process</i> to the <i>wf:WorkflowEngine</i> it
+ was executed in. The engine instance might contain
+ additional details such as which version of the
+ workflow system was used.
+ </dd>
+ <dt>wf:wasLaunchedBy</dt>
+ <dd>
+ This second subproperty of <i>prov:wasControlledBy</i> links a
+ <i>wf:Process</i> to a <i>prov:Agent</i>, indicating
+ which person asked to execute the given
+ wf:ProcessDefinition in the specified
+ <i>wf:WorkflowEngine</i>.
+ </dd>
+ <dt>wf:wasSubProcessExecutionOf</dt>
+ <dd>
+ This subproperty of <i>prov:wasControlledBy</i> links a
+ <i>wf:Process</i> to another <i>prov:Process</i>, indicating
+ this is a child execution
+ </dd>
+ <dt>wf:wasReadFrom</dt>
+ <dd>
+ <p>
+ This subproperty of <i>prov:wasDerivedFrom</i> links a
+ <i>wf:Value</i> to the <i>wf:FileValue</i> it was read
+ from, typically when used as a workflow input.
+ As described for <i>wf:FileValue</i> this distinction
+ is done because at the time the workflow input is used
+ in the workflow, the file input might be different and
+ thus should not be described as an attribute of that
+ <i>wf:Value</i>.
+ </p>
+ <p>
+ This property hints of an undescribed "Read file"
+ process execution which is not described. This is
+ therefore an example of how the provenance asserter
+ is limiting the scope of its provenance. The engine
+ knows that the file was read, but is not able or
+ willing to provide any deeper assertions, because its
+ primary scope is at the level of executing workflow
+ definitions.
+ </p>
+ </dd>
+ <dt>wf:sawValue</dt>
+ <dd>
+ A subproperty of <i>prov:wasComplementOf</i> which
+ indicates that an <i>wf:Value</i> was
+ <i>wf:seenAtPort</i> within an
+ <i>wf:ValueAtPort</i>. This ValueAtPort is a complement of the
+ pointed at Value because one can consider this
+ entity to to have the same attributes, but in
+ addition the <i>wf:seenAtPort</i> property is fixed.
+ </dd>
+
+ <dt>wf:wasSeenAtPort</dt>
+ <dd>
+ A subproperty of <i>prov:assumedRole</i> (not yet defined in
+ PROV ontology) indicating which <i>wf:Port</i> a
+ <i>wf:ValueAtPort</i> was seen at. Thus one can see
+ at which output port a value was generated, or at
+ which input port(s) it was used.
+
+ As a functional property this requires a different
+ <i>wf:ValueAtPort</i> for each <i>use</i> and
+ <i>generation</i> of a value. The
+ <i>wf:ValueAtPort</i> is linked to the
+ <i>wf:Entity</i> using <i>prov:wasComplementOf</i>
+ </dd>
+ </dl>
+ </section>
+ <section>
+ <h4>Workflow structure</h4>
+ <p>
+ This ontology includes a simple definition language for
+ describing the overall workflow structure. This is not
+ meant as a general workflow definition language, but allows
+ us to describe <i>process executions</i>, <i>use</i> and
+ <i>generation</i> with relation to particular sections of
+ the workflow definition.
+ </p>
+ <dl>
+ <dt>wf:ProcessDefinition</dt>
+ <dd>A definition of how to execute a process. It will
+ typically refer to a command or service which will be
+ called. Each process definition also
+ <i>wf:definesInput</i>s and <i>wf:definesOutput</i>s.
+ </dd>
+ <dt>wf:Port</dt>
+ <dd>
+ A port can be considered as a parameter or return value
+ for a process. These are typically given names which are
+ unique within a process definition. A value is either
+ provided to an input port before execution, or produced
+ from an output port after execution.
+ </dd>
+ <dt>wf:linksTo</dt>
+ <dd>
+ Ports are connected using links. A link from an output
+ port to an input port means that the value received on
+ that output will be forwarded to the input of the next
+ process. Note that in this simplified ontology links can
+ also go from Input to Input and Output to Output, these
+ are used to connect workflow ports to processor ports.
+ </dd>
+ <dt>wf:Input</dt>
+ <dd>
+ An input port for a process will receive a value which
+ will be <i>used</i> by the execution. In a dataflow driven
+ workflow model, a process will execute as soon as all its
+ defined input ports have been provided with values.
+ </dd>
+ <dt>wf:Output</dt>
+ <dd>
+ A process execution might return multiple outputs, for
+ instance a table and a diagram. Each of these are declared
+ as an output port for that process definition.
+ </dd>
+ <dt>wf:definesSubProcess</dt>
+ <dd>
+ <p>
+ Scientific workflows can be composed of nested workflows
+ which can be shared and reused as components. Some
+ workflow systems also allow various execution settings
+ on the nested workflow, like looping or parallelisation.
+ </p>
+ <p>
+ In this case a process definition will use
+ <i>wf:definesSubProcess</i> to indicate its consistent
+ parts, and there will be additional <i>wf:linksTo</i> from
+ the input ports of this process definition to the input
+ ports of some of its nested sub processes, and vice versa
+ for the outputs. The top-level workflow is always such a
+ process definition.
+ </p>
+ </dd>
+ </dl>
+ </section>
+ <section>
+ <h4>Example workflow</h4>
+ <img src="http://www.w3.org/2011/prov/wiki/images/5/56/Concatsha1.png" />
+ <p>This is an example workflow which defines a workflow
+ input <i>input</i>, three processes <i>String_constant</i>,
+ <i>Concatenate_two_strings</i> and <i>sha1</i>, and finally
+ two workflow outputs <i>combined</i> and <i>sha1</i>. When
+ executed, it will execute from top to bottom, first
+ concatenating the provided input with the string constant,
+ which is returned on the <i>combined</i> output, but also
+ provided to the <i>sha1</i> process, which output is given
+ to the other workflow port.
+ </p>
+ <p>
+ Using the definition ontology above this workflow can be
+ expressed in RDF/XML as:
+ </p>
+ <div class="exampleOuter">
+ <pre class="example">
+<rdf:RDF xml:base="http://www.example.com/workflow1#"
+ xmlns:impl="http://company.example.org/engine-implementation#"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns:wf="http://www.example.com/scientific-workflow#">
+
+ <wf:ProcessDefinition rdf:about="#workflow">
+ <rdf:type rdf:resource="http://company.example.org/engine-implementation#Workflow"/>
+ <wf:definesInput>
+ <wf:Input rdf:about="#inName">
+ <wf:linksTo rdf:resource="#catIn2" />
+ </wf:Input>
+ </wf:definesInput>
+ <wf:definesOutput rdf:resource="#combined" />
+ <wf:definesOutput rdf:resource="#sha1" />
+ <wf:definesSubProcess>
+ <impl:Constant rdf:about="#String_constant">
+ <impl:constant>Hello, </impl:constant>
+ <wf:definesOutput>
+ <wf:Output rdf:about="#constantValue">
+ <wf:linksTo rdf:resource="#catIn1"/>
+ </wf:Output>
+ </wf:definesOutput>
+ </impl:Constant>
+ </wf:definesSubProcess>
+ <wf:definesSubProcess>
+ <impl:Command rdf:about="#cat">
+ <impl:command>cat</impl:command>
+ <wf:definesInput rdf:resource="#catIn1" />
+ <wf:definesInput rdf:resource="#catIn2" />
+ <wf:definesOutput>
+ <wf:Output rdf:about="#catOut">
+ <wf:linksTo rdf:resource="#shaIn"/>
+ </wf:Output>
+ </wf:definesOutput>
+ </impl:Command>
+ </wf:definesSubProcess>
+ <wf:definesSubProcess>
+ <impl:Command rdf:about="#shasum">
+ <impl:command>shasum</impl:command>
+ <wf:definesInput rdf:resource="#shaIn" />
+ <wf:definesOutput>
+ <wf:Output rdf:about="#shaOut">
+ <wf:linksTo rdf:resource="#sha1"/>
+ </wf:Output>
+ </wf:definesOutput>
+ </impl:Command>
+ </wf:definesSubProcess>
+ </wf:ProcessDefinition>
+</rdf:RDF>
+ </pre></div>
+ </section>
+ <section>
+ <h4>Example workflow run</h4>
+ <p>
+ This example shows how using the workflow extensions
+ together with PROV can provide the provenance of executing
+ the workflow defined above.
+ </p>
+ <div class="exampleOuter"><pre class="example">
+
+<rdf:RDF xmlns="http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceOntology.owl#"
+ xmlns:cnt="http://www.w3.org/2011/content#"
+ xmlns:foaf="http://xmlns.com/foaf/0.1/"
+ xmlns:prov="http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceOntology.owl#"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns:wf="http://www.example.com/scientific-workflow#"
+ xmlns:base="http://www.example.com/run1#" >
+
+ <Agent rdf:about="#aUser">
+ <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
+ <foaf:name>Stian Soiland-Reyes</foaf:name>
+ </Agent>
+
+ <wf:WorkflowEngine rdf:about="#workflowEngine" />
+
+ <wf:FileValue rdf:about="#inputFile">
+ <wf:file>/tmp/myinput.txt</wf:file>
+ <wf:value>
+ <cnt:ContentAsText>
+ <cnt:characterEncoding>UTF-8</cnt:characterEncoding>
+ <cnt:chars>Steve</cnt:chars>
+ </cnt:ContentAsText>
+ </wf:value>
+ </wf:FileValue>
+
+ <wf:Value rdf:about="#input">
+ <wf:wasReadFrom rdf:resource="#inputFile"/>
+ <wf:value>
+ <cnt:ContentAsText>
+ <cnt:characterEncoding>UTF-8</cnt:characterEncoding>
+ <cnt:chars>Steve</cnt:chars>
+ </cnt:ContentAsText>
+ </wf:value>
+ </wf:Value>
+
+ <wf:Process rdf:about="#workflowRun">
+ <used>
+ <wf:ValueAtPort>
+ <wf:sawValue rdf:resource="#input"/>
+ <wf:seenAtPort rdf:resource="http://www.example.com/workflow1#inName"/>
+ </wf:ValueAtPort>
+ </used>
+ <wf:ranInWorkflowEngine rdf:resource="#workflowEngine"/>
+ <wf:wasLaunchedBy rdf:resource="#aUser"/>
+ <wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#workflow"/>
+ </wf:Process>
+
+ <wf:Process rdf:about="#constant">
+ <wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/>
+ <wf:wasDefinedBy
+ rdf:resource="http://www.example.com/workflow1#String_Constant"/>
+ </wf:Process>
+
+ <wf:Value rdf:about="#hello">
+ <wasGeneratedBy rdf:resource="#constant"/>
+ <wf:value>
+ <cnt:ContentAsText>
+ <cnt:chars>Hello, </cnt:chars>
+ </cnt:ContentAsText>
+ </wf:value>
+ </wf:Value>
+
+ <wf:ValueAtPort rdf:about="#helloValue">
+ <wasGeneratedBy rdf:resource="#constant"/>
+ <wf:value>
+ <cnt:ContentAsText>
+ <cnt:chars>Hello, </cnt:chars>
+ </cnt:ContentAsText>
+ </wf:value>
+ <wf:sawEntity rdf:resource="#hello"/>
+ </wf:Value>
+
+ <wf:Process rdf:about="#combine">
+ <used>
+ <wf:EntityAtPort>
+ <wf:sawValue rdf:resource="#hello"/>
+ <wf:seenAtPort rdf:resource="http://www.example.com/workflow1#catIn1"/>
+ </wf:EntityAtPort>
+ </used>
+ <used>
+ <wf:EntityAtPort>
+ <wf:sawValue rdf:resource="#input"/>
+ <wf:seenAtPort rdf:resource="http://www.example.com/workflow1#catIn2"/>
+ </wf:EntityAtPort>
+ </used>
+ <wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/>
+ <wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#cat"/>
+ </wf:Process>
+
+ <wf:Value rdf:about="#combined">
+ <wasGeneratedBy rdf:resource="#combine"/>
+ <wf:value>
+ <cnt:ContentAsText>
+ <cnt:chars>Hello, Steve</cnt:chars>
+ </cnt:ContentAsText>
+ </wf:value>
+ </wf:Value>
+
+ <wf:Process rdf:about="#shasum">
+ <used rdf:resource="#combined"/>
+ <wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/>
+ <wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#shasum"/>
+ </wf:Process>
+
+ <wf:Value rdf:about="#sha1">
+ <wf:value>
+ <cnt:ContentAsText>
+ <cnt:characterEncoding>UTF-8</cnt:characterEncoding>
+ <cnt:chars>a33d1fb1658d4fbf017de59ab67437a3eb5ff50d</cnt:chars>
+ </cnt:ContentAsText>
+ </wf:value>
+ </wf:Value>
+
+ <wf:EntityAtPort rdf:about="#sha1OutputFromShasum">
+ <wasGeneratedBy rdf:resource="#shasum"/>
+ <wf:value>
+ <cnt:ContentAsText>
+ <cnt:characterEncoding>UTF-8</cnt:characterEncoding>
+ <cnt:chars>a33d1fb1658d4fbf017de59ab67437a3eb5ff50d</cnt:chars>
+ </cnt:ContentAsText>
+ </wf:value>
+ <wf:sawValue rdf:resource="#sha1"/>
+ <wf:wasSeenAt rdf:resource="http://www.example.com/workflow1#shaOut"/>
+ </wf:EntityAtPort>
+
+ <wf:EntityAtPort rdf:about="#sha1OutputFromWorkflow">
+ <wasGeneratedBy rdf:resource="#workflowRun"/>
+ <wf:value>
+ <cnt:ContentAsText>
+ <cnt:characterEncoding>UTF-8</cnt:characterEncoding>
+ <cnt:chars>a33d1fb1658d4fbf017de59ab67437a3eb5ff50d</cnt:chars>
+ </cnt:ContentAsText>
+ </wf:value>
+ <wf:sawValue rdf:resource="#sha1"/>
+ <wf:wasSeenAt rdf:resource="http://www.example.com/workflow1#sha1"/>
+ </wf:EntityAtPort>
+
+</rdf:RDF>
+ </pre></div>
+ <p> Note that the example above does not show the inferred classes
+ and properties from the PROV ontology. For interoperability, applications
+ should also expressed such inferred statements, so that the provenance can be
+ read without using OWL2 inferencing and the customized ontologies.
+ </p>
+
+ </section>
+
+
</section>
</section>
<section>