The PROV Ontology Model (also PROV ontology) encodes the PROV Data Model [[PROV-DM]] in the OWL2 Web Ontology Language (OWL2). The PROV ontology consists of a set of classes, properties, and restrictions that can be used to represent provenance information. The PROV ontology is specialized to create domain-specific provenance ontologies that model the provenance information specific to different applications. The PROV ontology supports a set of entailments based on OWL2 formal semantics and provenance specific inference rules. The PROV ontology is available for download as a separate OWL2 document.
TODO: MUST include at least one customized paragraph. This section SHOULD include the title page date (i.e., the one next to the maturity level at the top of the document). These paragraphs SHOULD explain the publication context, including rationale and relationships to other work. See examples and more discussion in the Manual of Style.TODO
PROV Ontology Model (also PROV ontology) defines the normative modeling of the PROV Data Model [[PROV-DM]] using the W3C OWL2 Web Ontology Language. This document specification describes the set of classes, properties, and restrictions that constitute the PROV ontology, which have been introduced in the PROV Data Model [[PROV-DM]]. This ontology specification provides the foundation for implementation of provenance applications in different applications using the PROV ontology for representing, exchanging, and integrating provenance information. Together with the PROV Access and Query [[PROV-PAQ]] and PROV Data Model [[PROV-DM]], this document forms a framework for provenance information management in domain-specific Web-based applications.
The PROV ontology classes and properties are defined such that they can be specialized for modeling application-specific provenance information in a variety of domains. Thus, the PROV ontology is expected to serve as a reference model for domain-specific provenance ontology and thereby facilitate consistent provenance interchange. This document uses an example provenance scenario introduced in the PROV Data Model [[PROV-DM]] to demonstrate the specialization of PROV ontology.
Finally, this document describes the formal semantics of the PROV ontology using the OWL2 semantics, [[!OWL2-DIRECT-SEMANTICS]], [[!OWL2-RDF-BASED-SEMANTICS]], and a set of provenance-specific inference rules. This is expected to support provenance implementations to automatically check for consistency of provenance information represented using PROV ontology and explicitly assert implicit provenance knowledge.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [[!RFC2119]].
This document is intended for provide an understanding of the PROV ontology and how it can be used by various applications to represent their provenance information. The intended audience of this document include users who are new to provenance modeling as well as experienced users who would like their provenance model compatible with the PROV ontology to facilitate standardization.This document assumes a basic understanding of the W3C OWL2 specification , including modeling of classes, properties, and restrictions in an OWL2 ontology. Readers are referred to the OWL2 documentations, starting with the [[!OWL2-PRIMER]], for the OWL2 specification.
Section 2 describes the mapping of the PROV Data Model [[PROV-DM]] to the PROV ontology. Section 3 introduces the classes and properties of the PROV ontology. Section 4 describes the approach used to specialize the PROV ontology create a domain specific ontology for an example provenance scenario introduced in the PROV Data Model [[PROV-DM]]. The PROV ontology supports a set of provenance entailments and these are described in Section 5.
The PROV Data Model [[PROV-DM]] introduces a minimal set of concepts to represent provenance information in a variety of application domains. This document maps the PROV Data Model to PROV Ontology using the OWL2 ontology language, which facilitates a fixed interpretation and use of the PROV Data Model concepts based on the formal semantics of OWL2 [[!OWL2-DIRECT-SEMANTICS]] [[!OWL2-RDF-BASED-SEMANTICS]].
The PROV Ontology is not designed to be used directly in a domain application and its Classes and Properties represent "higher-level" or abstract level concepts that can be specialized further for representing domain-specific provenance information. We briefly introduce some of the OWL2 modeling terms that will be used to describe the PROV ontology. An OWL2 instance is an individual object in a domain of discourse, for example a person named Alice or a car, and a set of individuals sharing a set of common characteristics is called a class. Person and Car are examples of classes representing the set of individual persons and cars respectively. The OWL2 object properties are used to link individuals, classes, or create a property hierarchy. For example, the object property "hasOwner" can be used to link car with person. The OWL2 datatype properties are used to link individuals or classes to data values, including XML Schema datatypes [[!XMLSCHEMA-2]].
The PROV Data Model document [[PROV-DM]] introduces an example provenance scenario describing the creation of crime statistics file stored on a shared file system and edited by journalists Alice, Bob, Charles, David, and Edith. This scenario is used as a running example in this document to describe the PROV ontology classes and properties, the specialization mechanism and entailments supported by the PROV ontology.
We use the RDF/XML syntax, which is the mandatory syntax supported by all OWL2 syntax [[!OWL2-PRIMER]] to represent the PROV ontology. The OWL2 document for the PROV ontology is available at [[PROV-Ontology-Namespace]], which is also the namespace for the PROV ontology and is denoted by "PROV" and the prefix "prov".
We now introduce the classes and properties that constitute the PROV ontology. We first give a textual description of each ontology term, followed by OWL2 syntax representing the ontology term and an example use of the term in the provenance scenario.
The PROV ontology consists of classes that can be organized in a taxonomic structure.
Note: CamelBack notation is used for class names
prov:Role
has been
renamed to prov:EntityInRole
. (A new
prov:Role
might appear in the rdfs:range of
prov:assuming
Entity is defined to be "An Entity represents an identifiable characterized thing." [[PROV-DM]]
PROV:Entity rdfs:subClassOf owl:Thing.
Example of instances of class Entity from the provenance scenario are files with identifiers e1 and e2. The RDF/XML syntax for asserting that e1 is an instance of Entity is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#e1"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Entity"/> </rdf:Description>
Attributes that are characterising the entity (as defined
in PROV-DM) are stated using RDF properties of the asserted
entity. Such properties SHOULD be in a declared namespace,
and MAY be described by an application-specific vocabulary.
Specialisation by subclassing or rdf:type
is
equivalent to specifying the reserved attribute type
in PROV-DM.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#e2"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Entity"/> <rdf:type rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#CrimeFile"/> <cf:hasLocation rdf:resource="http://www.w3.org/PROV/CrimeFile#sharedDirectoryLocation1"/> <cf:hasFileContent rdf:datatype="http://www.w3.org/2001/XMLSchema#string">There was a lot of crime in London last month.</cf:hasFileContent> </rdf:Description>
ProcessExecution is defined to be "an identifiable activity, which performs a piece of work." [[PROV-DM]]
PROV:ProcessExecution rdfs:subClassOf owl:Thing.
Example of instances of class ProcessExecution from the provenance scenario are file creation (pe0) and file editing (pe2) . The RDF/XML syntax for asserting that pe2 is an instance of ProcessExecution is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#pe2"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#ProcessExecution"/> </rdf:Description>
Agent is defined to be a "characterized entity capable of activity" [[PROV-DM]]
PROV:Agent rdfs:subClassOf PROV:Entity.
Example of instances of class Agent from the provenance scenario are Alice and Edith. The RDF/XML syntax for asserting that Alice is an instance of Agent is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#Alice"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Agent"/> </rdf:Description>
Recipe represents a process specification. The definition of process specifications is outside the scope of PROV:DM. Therefore, this class acts as a place holder in the ontology that can be extended and specialized by users.
PROV:Recipe rdfs:subClassOf owl:Thing.
Recipe examples include backing recipes, programs and workflows.
Time represents temporal information about entities in the Provenance model.
PROV:Time rdfs:subClassOf owl:Thing.
Example of instances of class Time from the provenance scenario are t and t+1. The RDF/XML syntax for this asserting that t+1 is an instance of Time is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#t+1"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Time"/> </rdf:Description>
Revision is defined as a modified version of a Entity.
PROV:Revision rdfs:subClassOf owl:Thing.
ProvenanceContainer is defined to be an aggregation of provenance assertions. A provenance container should have an URI associated with it. The ProvenanceContainer class can also be used to model the PROV-DM concept of Account.
PROV:ProvenanceContainer rdfs:subClassOf owl:Thing.
Examples of instance of class ProvenanceContainer includes a file describing the manufacturing details of a car, such as its batch number, manufacturer, date of manufacture, place of manufacture etc.
Location is defined to be "is an identifiable geographic place (ISO 19112)." [[PROV-DM]]
PROV:Location rdfs:subClassOf owl:Thing.
Example of instances of class Location from the provenance scenario is the location of the crime file in the shared directory /share with file path /shared/crime.txt. The RDF/XML syntax for asserting that the location of the crime file is the shared directory.
<cf:hasLocation> <rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#sharedDirectoryLocation1"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Location"/> <cf:hasFilePath rdf:datatype="http://www.w3.org/2001/XMLSchema#string">/share/crime.txt</cf:hasFilePath> </rdf:Description> </cf:hasLocation>
EntityInRole is defined to be a "realizable entity" (cite?) "assumed by a Entity or an agent." [[PROV-DM]]
PROV:EntityInRole rdfs:subClassOf PROV:Entity.
Example of instances of class EntityInRole from the provenance scenario are author role assumed by Bob and file creator role assumed by Alice. The RDF/XML syntax for asserting that Bob assumes the role of an author is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#AliceAsAuthor"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#EntityInRole"/> <prov:assumedBy rdf:resource="http://www.w3.org/PROV/CrimeFile#Alice"/> <prov:assumedRole rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#author"/> </rdf:Description>
The Provenance Model consists of the following object properties.
Note: Names of properties starts with a verb in lower case followed by verb(s) starting with upper case
wasGeneratedBy links Entity with ProcessExecution representing that Entity was generated as a result of ProcessExecution
Note: No arity constraints are assumed between Entity and ProcessExecution
Example of wasGeneratedBy property from the provenance scenario is e1 wasGeneratedBy pe0. The RDF/XML syntax for asserting this is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#e1"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Entity"/> <rdf:type rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#CrimeFile"/> <prov:wasGeneratedBy> <rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#pe0"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#ProcessExecution"/> <rdf:type rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#FileCreation"/> </rdf:Description> <prov:wasGeneratedBy> </rdf:Description>
wasDerivedFrom links two distinct characterized entities, where "some characterized entity is transformed from, created from, or affected by another characterized entity."
Example of wasDerivedFrom property from the provenance scenario is e3 wasDerivedFrom e2. The RDF/XML syntax for asserting this is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#e3"> <prov:wasDerivedFrom rdf:resource="http://www.w3.org/PROV/CrimeFile#e2"/> </rdf:Description>
Used links ProcessExecution to Entity, where Entity is consumed by ProcessExecution.
Note: No arity constraints are assumed between Entity and ProcessExecution
Example of Used property from the provenance scenario is pe2 Used e2. The RDF/XML syntax for asserting this is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#pe2"> <prov:used rdf:resource="http://www.w3.org/PROV/CrimeFile#e2"/> </rdf:Description>
hadPariticipant links Entity to ProcessExecution, where Entity used or wasGeneratedBy ProcessExecution.
Note: No arity constraints are assumed between Entity and ProcessExecution
wasComplementOf links two instances of Entity, where "it is relationship between two characterized entities asserted to have compatible characterization over some continuous time interval." (from the Provenance Conceptual Model)
wasControlledBy links ProcessExecution to Agent, where "Control represents the involvement of an agent or a Entity in a process execution"(from the Provenance Conceptual Model)
Example of wasControlledBy property from the provenance scenario is FileAppending (ProcessExecution) wasControlledBy Bob. The RDF/XML syntax for asserting this is given below.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#pe1"> <prov:wasControlledBy> <rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#Bob"> <rdf:type rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#Journalist"/> </rdf:Description> </prov:wasControlledBy> </rdf:Description>
The ProcessExecution activity performed can be described as having the given recipe resource. It is out of the scope for PROV to define the structure or meaning of the recipe. The recipe might or might not have been followed exactly by the Process Execution.
This object property links two process executions. It is used to express the fact that a given process execution used an entity that was generated by another process execution.
This object property links two instances of ProcessExecution to specify the order in which they took place. Specifically, it is used to specify that a given process execution starts after the end of another process execution.
The table below summarizes the characteristics of the object properties that are defined in the OWL schema.
Functional | Reverse functional | Transitive | Symmetric | Asymmetric | Reflexive | Irreflexive | |
---|---|---|---|---|---|---|---|
wasControlledBy | No | No | ? | No | Yes | No | Yes |
wasDerivedFrom | No | No | Yes | No | Yes | No | Yes |
hadParticipant | No | No | ? | No | Yes | No | Yes |
wasGeneratedBy | Yes | No | ? | No | Yes | No | Yes |
used | No | No | ? | No | Yes | No | Yes |
wasInformedBy | No | No | No | No | No | No | No |
wasScheduledAfter | No | No | Yes | No | Yes | No | Yes |
The PROV ontology uses the OWL2 annotation properties to describe additional information about the PROV ontology classes, properties, individuals, and axioms. OWL2 defines nine annotation properties that are part of the OWL2 structural specification (see OWL2 Syntax document for additional details [[!OWL2-SYNTAX]]):
Additional annotation properties can be defined by provenance ontologies, but unlike the OWL2 annotation properties, these custom annotation properties may not be interpreted in a standard manner across different provenance applications.
The following diagram illustrates the complete PROV ontology schema along with the cardinality restrictions imposed on the properties.
The PROV Ontology is conceived as a reference ontology that can be extended by various domain-specific applications to model the required set of provenance terms. The PROv Ontology classes and properties can be specialized using the following two RDFS properties:
To illustrate the specialization mechanism, the PROV Ontology is extended to create an ontology schema for the provenance scenario describing the creation of the crime statistics file.
The example scenario can be encoded as a Resource Description Framework (RDF) graph in Figure X:
Figure X represents the ontology schema that extends the PROV ontology to model the provenance details of the crime file scenario. For example,
Example given below describes the provenance of Entity e2 using RDF/XML syntax
<?xml version="1.0"?> <rdf:RDF xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:prov="http://www.w3.org/PROV/ProvenanceOntology.owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cf="http://www.w3.org/PROV/CrimeFileOntology.owl#"> <rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#e2"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Entity"/> <rdf:type rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#CrimeFile"/> <prov:wasGeneratedBy> <rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#pe1"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#ProcessExecution"/> <rdf:type rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#FileAppending"/> <prov:wasControlledBy> <rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#Bob"> <rdf:type rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#Journalist"/> </rdf:Description> </prov:wasControlledBy> </rdf:Description> </prov:wasGeneratedBy> <prov:wasDerivedFrom rdf:resource="http://www.w3.org/PROV/CrimeFile#e1"/> <cf:hasLocation> <rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#sharedDirectoryLocation1"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Location"/> <cf:hasFilePath rdf:datatype="http://www.w3.org/2001/XMLSchema#string">/share/crime.txt</cf:hasFilePath> </rdf:Description> </cf:hasLocation> <cf:hasFileContent rdf:datatype="http://www.w3.org/2001/XMLSchema#string">There was a lot of crime in London last month.</cf:hasFileContent> </rdf:Description> <rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#pe2"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#ProcessExecution"/> <prov:used rdf:resource="http://www.w3.org/PROV/CrimeFile#e2"/> </rdf:Description> </rdf:RDF>
The following new classes were created in the CrimeFile Ontology by extending the PROV ontology classes:
The cf:Journalist is a specialization of the PROV ontology Agent class and models all individuals that participate in creating, editing, and sharing the crime file.The following RDF/XML code illustrates how cf:Journalist is asserted to be a specialization of PROV:Agent.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFileOntology.owl#Journalist"> <rdfs:subClassOf rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Agent"/> </rdf:Description>
The cf:CrimeFile is a specialization of the PROV ontology Entity class and it models the the file describing the crime statistics in the provenance scenario, including the multiple versions of the file. The following RDF/XML code illustrates how cf:Journalist is asserted to be a specialization of PROV:Entity.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFileOntology.owl#CrimeFile"> <rdfs:subClassOf rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Entity"/> </rdf:Description>
The classes cf:FileCreation, cf:FileEditing, cf:FileAppending, cf:EmailProcessExecution, cf:SpellChecking are specialization of the PROV ontology ProcessExecution and model the different activities in the provenance scenario. The following RDF/XML code illustrates the specialization of the PROV:ProcessExecution to define class cf:FileCreation (other classes can be similarly defined by using the subClassOf property).
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFileOntology.owl#FileCreation"> <rdfs:subClassOf rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#ProcessExecution"/> </rdf:Description>
The following diagram illustrates the above specialization:
The following new object property was created in the CrimeFile Ontology by extending the PROV ontology object property:
The property cf:hadFilePath is a specialization of the PROV ontology hadLocation object property and links the class CrimeFile to the FileDirectory class. The following RDF/XML code illustrates the use of rdfs:subPropertyOf to create hadFilePath property.
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFileOntology.owl#hadFilePath"> <rdfs:subPropertyOf rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#hadLocation"/> </rdf:Description>
The following diagram illustrates the above specialization:
This section describes an example of extending the PROV ontology to create a provenance ontology for scientific workflows.
Scientific workflow systems allow the specification of a pipeline of processes which are linked from outputs to inputs. Such workflow definitions are typically created in a graphical user interface or interactive web application, and can then be enacted using particular inputs or parameters. Scientists in fields like bioinformatics, chemistry and physics use such workflows to perform repeated analysis by connecting together disparate set of domain-specific tools and services.
Capturing the provenance of executions in such a workflow system will typically include details of each of the process executions, such as its inputs and outputs, start and stop time, and should ultimately be able to describe the complete data lineage through the workflow for any returned output data.
This example is not attempting to be a complete or general ontology for asserting workflow provenance, but highlights how a particular application like a workflow system can express its domain specific attributes based on the PROV ontology.
Example extension of PROV ontology in order to describe
workflow provenance
In order to describe workflow executions following the model above, the PROV ontology is extended with workflow-specific subclasses described below:
While for most cases subclassing will provide the additional expressionality the application needs, this example ontology also expands on the PROV ontology with more specific subproperties.
This subproperty of prov:wasDerivedFrom links a wf:Value to the wf:FileValue it was read from, typically when used as a workflow input. As described for wf:FileValue this distinction is done because at the time the workflow input is used in the workflow, the file input might be different and thus should not be described as an attribute of that wf:Value.
This property hints of an undescribed "Read file" process execution which is not described. This is therefore an example of how the provenance asserter is limiting the scope of its provenance. The engine knows that the file was read, but is not able or willing to provide any deeper assertions, because its primary scope is at the level of executing workflow definitions.
This ontology includes a simple definition language for describing the overall workflow structure. This is not meant as a general workflow definition language, but allows us to describe process executions, use and generation with relation to particular sections of the workflow definition.
Scientific workflows can be composed of nested workflows which can be shared and reused as components. Some workflow systems also allow various execution settings on the nested workflow, like looping or parallelisation.
In this case a process definition will use wf:definesSubProcess to indicate its consistent parts, and there will be additional wf:linksTo from the input ports of this process definition to the input ports of some of its nested sub processes, and vice versa for the outputs. The top-level workflow is always such a process definition.
This is an example workflow which defines a workflow input input, three processes String_constant, Concatenate_two_strings and sha1, and finally two workflow outputs combined and sha1. When executed, it will execute from top to bottom, first concatenating the provided input with the string constant, which is returned on the combined output, but also provided to the sha1 process, which output is given to the other workflow port.
Using the definition ontology above this workflow can be expressed in RDF/XML as:
<rdf:RDF xml:base="http://www.example.com/workflow1#" xmlns:impl="http://company.example.org/engine-implementation#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wf="http://www.example.com/scientific-workflow#"> <wf:ProcessDefinition rdf:about="#workflow"> <rdf:type rdf:resource="http://company.example.org/engine-implementation#Workflow"/> <wf:definesInput> <wf:Input rdf:about="#inName"> <wf:linksTo rdf:resource="#catIn2" /> </wf:Input> </wf:definesInput> <wf:definesOutput rdf:resource="#combined" /> <wf:definesOutput rdf:resource="#sha1" /> <wf:definesSubProcess> <impl:Constant rdf:about="#String_constant"> <impl:constant>Hello, </impl:constant> <wf:definesOutput> <wf:Output rdf:about="#constantValue"> <wf:linksTo rdf:resource="#catIn1"/> </wf:Output> </wf:definesOutput> </impl:Constant> </wf:definesSubProcess> <wf:definesSubProcess> <impl:Command rdf:about="#cat"> <impl:command>cat</impl:command> <wf:definesInput rdf:resource="#catIn1" /> <wf:definesInput rdf:resource="#catIn2" /> <wf:definesOutput> <wf:Output rdf:about="#catOut"> <wf:linksTo rdf:resource="#shaIn"/> </wf:Output> </wf:definesOutput> </impl:Command> </wf:definesSubProcess> <wf:definesSubProcess> <impl:Command rdf:about="#shasum"> <impl:command>shasum</impl:command> <wf:definesInput rdf:resource="#shaIn" /> <wf:definesOutput> <wf:Output rdf:about="#shaOut"> <wf:linksTo rdf:resource="#sha1"/> </wf:Output> </wf:definesOutput> </impl:Command> </wf:definesSubProcess> </wf:ProcessDefinition> </rdf:RDF>
This example shows how using the workflow extensions together with PROV can provide the provenance of executing the workflow defined above.
<rdf:RDF xmlns="http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceOntology.owl#" xmlns:cnt="http://www.w3.org/2011/content#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:prov="http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceOntology.owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wf="http://www.example.com/scientific-workflow#" xmlns:base="http://www.example.com/run1#" > <Agent rdf:about="#aUser"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> <foaf:name>Stian Soiland-Reyes</foaf:name> </Agent> <wf:WorkflowEngine rdf:about="#workflowEngine" /> <wf:FileValue rdf:about="#inputFile"> <wf:file>/tmp/myinput.txt</wf:file> <wf:value> <cnt:ContentAsText> <cnt:characterEncoding>UTF-8</cnt:characterEncoding> <cnt:chars>Steve</cnt:chars> </cnt:ContentAsText> </wf:value> </wf:FileValue> <wf:Value rdf:about="#input"> <wf:wasReadFrom rdf:resource="#inputFile"/> <wf:value> <cnt:ContentAsText> <cnt:characterEncoding>UTF-8</cnt:characterEncoding> <cnt:chars>Steve</cnt:chars> </cnt:ContentAsText> </wf:value> </wf:Value> <wf:Process rdf:about="#workflowRun"> <used> <wf:ValueAtPort> <wf:sawValue rdf:resource="#input"/> <wf:seenAtPort rdf:resource="http://www.example.com/workflow1#inName"/> </wf:ValueAtPort> </used> <wf:ranInWorkflowEngine rdf:resource="#workflowEngine"/> <wf:wasLaunchedBy rdf:resource="#aUser"/> <wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#workflow"/> </wf:Process> <wf:Process rdf:about="#constant"> <wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/> <wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#String_Constant"/> </wf:Process> <wf:Value rdf:about="#hello"> <wasGeneratedBy rdf:resource="#constant"/> <wf:value> <cnt:ContentAsText> <cnt:chars>Hello, </cnt:chars> </cnt:ContentAsText> </wf:value> </wf:Value> <wf:ValueAtPort rdf:about="#helloValue"> <wasGeneratedBy rdf:resource="#constant"/> <wf:value> <cnt:ContentAsText> <cnt:chars>Hello, </cnt:chars> </cnt:ContentAsText> </wf:value> <wf:sawEntity rdf:resource="#hello"/> </wf:ValueAtPort> <wf:Process rdf:about="#combine"> <used> <wf:ValueAtPort> <wf:sawValue rdf:resource="#hello"/> <wf:seenAtPort rdf:resource="http://www.example.com/workflow1#catIn1"/> </wf:ValueAtPort> </used> <used> <wf:ValueAtPort> <wf:sawValue rdf:resource="#input"/> <wf:seenAtPort rdf:resource="http://www.example.com/workflow1#catIn2"/> </wf:ValueAtPort> </used> <wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/> <wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#cat"/> </wf:Process> <wf:Value rdf:about="#combined"> <wasGeneratedBy rdf:resource="#combine"/> <wf:value> <cnt:ContentAsText> <cnt:chars>Hello, Steve</cnt:chars> </cnt:ContentAsText> </wf:value> </wf:Value> <wf:Process rdf:about="#shasum"> <used rdf:resource="#combined"/> <wf:wasSubProcessExecutionOf rdf:resource="#workflowRun"/> <wf:wasDefinedBy rdf:resource="http://www.example.com/workflow1#shasum"/> </wf:Process> <wf:Value rdf:about="#sha1"> <wf:value> <cnt:ContentAsText> <cnt:characterEncoding>UTF-8</cnt:characterEncoding> <cnt:chars>a33d1fb1658d4fbf017de59ab67437a3eb5ff50d</cnt:chars> </cnt:ContentAsText> </wf:value> </wf:Value> <wf:ValueAtPort rdf:about="#sha1OutputFromShasum"> <wasGeneratedBy rdf:resource="#shasum"/> <wf:value> <cnt:ContentAsText> <cnt:characterEncoding>UTF-8</cnt:characterEncoding> <cnt:chars>a33d1fb1658d4fbf017de59ab67437a3eb5ff50d</cnt:chars> </cnt:ContentAsText> </wf:value> <wf:sawValue rdf:resource="#sha1"/> <wf:wasSeenAt rdf:resource="http://www.example.com/workflow1#shaOut"/> </wf:ValueAtPort> <wf:ValueAtPort rdf:about="#sha1OutputFromWorkflow"> <wasGeneratedBy rdf:resource="#workflowRun"/> <wf:value> <cnt:ContentAsText> <cnt:characterEncoding>UTF-8</cnt:characterEncoding> <cnt:chars>a33d1fb1658d4fbf017de59ab67437a3eb5ff50d</cnt:chars> </cnt:ContentAsText> </wf:value> <wf:sawValue rdf:resource="#sha1"/> <wf:wasSeenAt rdf:resource="http://www.example.com/workflow1#sha1"/> </wf:ValueAtPort> </rdf:RDF>Example available as RDF/XML and Turtle
Note that for brevity, the example above does not show the inferred classes and properties from the PROV ontology. For interoperability, applications should also expressed such inferred statements in its serialisations, so that the provenance can be read without using OWL2 inferencing and the customized ontologies. See the workflow-inferred.rdf for the complete example showing both domain-specific and PROV ontology terms used side by side.
The PROV ontology uses OWL2 as the ontology language, hence it supports a set of entailments based on the standard RDF semantics [[!RDF-MT]] and OWL2 semantics ([[!OWL2-DIRECT-SEMANTICS]], [[!OWL2-RDF-BASED-SEMANTICS]]). In this section, we describe these set of semantics as applied to the PROV ontology along with a set of constraints introduced in the PROV-DM [[PROV-DM]] that are provenance-specific. It is intended that provenance applications can leverage this normative description of the formal semantics of PROV ontology to support:
We briefly summarize the essential features of the RDF Semantics and refer to the RDF semantics [[!RDF-MT]] for the normative specification. The RDF Semantics uses model theory, with a notion of interpretation I defined over RDF (rdf-interpretation) or RDFS (rdfs-interpretation) vocabulary, for specifying the formal semantics of a RDF or RDFS graph [[!RDF-MT]]. The rdf-interpretation is an interpretation that satisfies a set of constraints called "RDF semantic conditions" and a set of "RDF axiomatic triples" (see Section 3.1 of RDF Semantics [[!RDF-MT]]). The rdfs-interpretation is defined over the additional terms in the RDFS vocabulary, including rdfs:domain, rdfs:range, rdfs:Class, rdfs:subClassOf, and rdfs:subPropertyOf. An rdfs-interpretation satisfies a set of constraints called "RDFS semantic conditions" and "RDFS axiomatic triples" (see Section 4.1 of RDFS Semantics [[!RDF-MT]]).
The rdfs-interpretation supports the following set of the entailment rules that are applicable to the PROV ontology (we do not discuss the simple RDF entailments):
If a PROV ontology class X is defined to be domain of a PROV property, then an individual asserted as "subject" of that property in a RDF triple is an instance of the class X. (from rdf2 Rule defined in RDF Semantics)
Similar to Rule 1, if a PROV ontology class Y is defined to be range of a PROV object property, then an individual asserted as "object" of that property in a RDF triple is an instance of the class Y. (from rdf3 Rule defined in RDF Semantics)
Both the rdfs:subClassOf and rdfsubPropertyOf are transitive properties, hence provenance assertions, in form of RDF triples, using a specialized sub class or sub property can be inferred to be true for their parent class or parent property. For example, in the provenance scenario, though alice and bob are asserted to be individuals of the class Journalist, we can infer that they are also individuals of the PROV ontology class Agent and Entity. Given,
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#alice"> <rdf:type rdf:resource="http://www.w3.org/PROV/CrimeFileOntology.owl#Journalist"/> </rdf:Description>
and
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFileOntology.owl#Journalist"> <rdfs:subClassOf rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Agent"/> </rdf:Description> <rdf:Description rdf:about="http://www.w3.org/PROV/ProvenanceOntology.owl#Agent"> <rdfs:subClassOf rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Entity"/> </rdf:Description>
we can infer that
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#alice"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Agent"/> </rdf:Description>
and
<rdf:Description rdf:about="http://www.w3.org/PROV/CrimeFile#alice"> <rdf:type rdf:resource="http://www.w3.org/PROV/ProvenanceOntology.owl#Entity"/> </rdf:Description>
In addition to RDF Semantics, the OWL2 semantics as described in [[!OWL2-DIRECT-SEMANTICS]], [[!OWL2-RDF-BASED-SEMANTICS]] are also applicable to PROV ontology. We consider the OWL2 RDF-Based Semantics (since it is a semantics superset of OWL2 Direct Semantics) and specifically the extension of the D-interpretation, which satisfies the constraints for rdf-interpretation, rdfs-interpretation (as defined in previous section), graphs with blank nodes, and interpretation defined for RDF datatypes (see Section 5.1 in RDF Semantics [[!RDF-MT]]). The OWL2 RDF-based semantics introduces the notion of "facets" to constrain datatypes, both the rdf:XMLLiteral defined in the RDF Semantics [[!RDF-MT]] and datatypes defined in the OWL2 Structural Specifications [[!OWL2-SYNTAX]]. The OWL2 RDF-based interpretation, also called D-interpretation with facets is a D-interpretation that also satisfies the OWL2 RDF-based semantics called "semantic constraints" (see Section 5 in OWL2 RDF-Based Semantics [[!OWL2-RDF-BASED-SEMANTICS]]).
The PROV-DM [[PROV-DM]] introduces a set of specific constraints applicable to PROV ontology. The following is a list of constraints that will be supported by the PROV ontology and any provenance application that uses the PROV ontology.
The PROV-DM describes a constraint on ordering of time (or event) associated with a ProcessExecution.
The PROV-DM describes a constraint on wasGeneratedBy that associates the values of attributes of an Entity with the ProcessExecution that generated the Entity.
The second constraint on wasGeneratedBy associates an ordering of events associated with the generation of an Entity instance and the start, end time or event of the PE instance.
The PROV-DM describes a constraint on wasGeneratedBy that asserts that given an account, only one PE instance can be associated to an Entity instance by the property wasGeneratedBy.
The Provenance Working Group Members.