Expanded Terms

The terms introduced in this section provide additional ways to describe the provenance among Entities, Activities, and Agents. The additional terms are illustrated in the following figure and can be separated into five different categories.

PROV-O Starting Point terms
Figure 3. The expanded terms build upon those in the Starting Points section.
The diagrams in this document depict Entities as yellow ovals, Activities as blue rectangles, and Agents as orange pentagons.
The domain of prov:atLocation (prov:Activity or prov:Entity or prov:Agent or prov:InstantaneousEvent) is not illustrated.

The first category extends the Starting Point terms with subclasses, subproperties, and a superproperty.

Three subclasses of Agent (prov:Person, prov:Organization, and prov:SoftwareAgent) and three subclasses of Entity are provided (prov:Collection, prov:Bundle, and prov:Plan).

A prov:Collection is an Entity that provides a structure (e.g. set, list, etc.) to some constituents (which are themselves Entities). The prov:Collection class can be used to express the provenance of the collection itself: e.g. who maintained the collection, which members it contained as it evolved, and how it was assembled. The prov:hadMember property is used to assert membership in a collection.

A prov:Bundle is a named set of provenance descriptions, which may itself have provenance. The named set of provenance descriptions may be expressed as PROV-O or any other form. The subclass of Bundle that names a set of PROV-O assertions is not provided by PROV-O, since it is more appropriate to do so using other recommendations, standards, or technologies. In any case, a Bundle of PROV-O assertions is an abstract set of RDF triples, and adding or removing a triple creates a new distinct Bundle of PROV-O assertions.

A prov:Plan is an entity that represents a set of actions or steps intended by one or more agents to achieve some goals.

More general and more specific properties are also provided by the expanded terms. More generally, the property prov:wasInfluencedBy is a superproperty that relates any influenced Entity, Activity, or Agent to any other influencing Entity, Activity, or Agent that had an effect on its characteristics. Three subproperties of prov:wasDerivedFrom are also provided for certain kinds of derivation among Entities: prov:wasQuotedFrom cites a potentially larger Entity (such as a book, blog, or image) from which a new Entity was created by repeating some or all of the original, prov:wasRevisionOf indicates that the derived Entity contains substantial content from the original Entity (e.g., two editions of a book), and prov:hadPrimarySource cites a preceding Entity produced by some agent with direct experience and knowledge about the topic (such as a reading from a sensor, or a journal written during an historical event).

The second category of expanded terms relates Entities according to their levels of abstraction, where some Entities may present more specific aspects than their more general counterparts. While prov:specializationOf links a more specific Entity to a more general one (e.g., today's BBC news home page versus BBC's news home page on any day), prov:alternateOf links Entities that present aspects of the same thing, but not necessarily the same aspects or at the same time (e.g., the serialization of a document in different formats or a backup copy of a computer file).

The third category of expanded terms allows further description of Entities. The property prov:value provides a literal value that is a direct representation of an entity. For example, the prov:value of a quote could be a string of the sentences stated, or the prov:value of an Entity involved in a numeric calculation could be the xsd:integer four. The property prov:atLocation can be used to describe the prov:Location of any Entity, Activity, Agent, or prov:InstantaneousEvent (i.e., the starting or ending of an activity or the generation, usage, or invalidation of an entity). The properties used to describe instances of prov:Location are outside the scope of PROV-O; reuse of other existing vocabulary is encouraged.

The fourth category of expanded terms describes the lifetime of an Entity beyond being generated by an Activity and used by other Activities. For example, a painting could not have been displayed before it was painted, and it could not be sold after it was destroyed by fire. Similar to how Activities have start and end times, an Entity may be bound by points in time for which it was generated or is no longer usable. The properties prov:generatedAtTime and prov:invalidatedAtTime can be used to bound the starting and ending moments of an Entity's existence. The Activities that led to the generation or invalidation of an Entity can be provided using prov:wasGeneratedBy and prov:wasInvalidatedBy, respectively. prov:generated and prov:invalidated are the inverses of prov:wasGeneratedBy and prov:wasInvalidatedBy, respectively, and are defined to facilitate Activity-as-subject as well as Entity-as-subject descriptions. For more about inverses, see the Appendix B.

The fifth category of expanded terms describes the lifetime of an Activity beyond its start and end times and predecessor Activities. Activities may also be started or ended by Entities, which are described using the properties prov:wasStartedBy and prov:wasEndedBy, respectively. Since Entities may start or end Activities, and Agents may be Entities, then Agents may also start or end Activities.

The following examples illustrate the expanded terms by elaborating the crime chart example from the previous section. After aggregating the dataset and creating the chart, Derek published a post to exhibit his work.

Example 2:

{% escape %}{% include "includes/prov/examples/eg-25-extended-crime-file-example/rdf/extended-crime-file-pt1.ttl" %}{% endescape %}

Agent :derek, acting again on behalf of the :national_newspaper_inc organization, used the :postEditor tool to publish a post about his recent data analysis :aggregatedByRegions. The blog editing tool tracked Derek's actions as PROV-O assertions and published them as a Bundle (the current file <>). The tool recorded that :derek started and ended the publishing activity (:publicationActivity1123) that generated the post :post9821v1. The post included a permanent link where the content of the latest version is available (:more-crime-happens-in-cities) in addition to a textual snapshot of the current version (using prov:value). Derek also included additional domain-specific descriptions of the post, such as its title.

Shortly after publishing the post, Derek noticed a typographical error in his narrative. Because the fix would be minimal, he did not record the activity that led to the new version. Instead, he related the new version (:post9821v2) as a revision of the previous (:post9821v1). Since both versions of the blog are forms of the long-standing blog permalink :more-crime-happens-in-cities, the revisions are alternates of one another and each is a prov:specializationOf of :more-crime-happens-in-cities.

PROV-O Starting Point terms
Figure 4. An illustration of the PROV-O assertions in Example 2, where Derek
published two versions of a blog for the National Newspaper, Inc.
The diagrams in this document depict Entities as yellow ovals, Activities as blue rectangles,
and Agents as orange pentagons. The responsibility properties are shown in pink.

Shortly after Derek published his blog post, Monica adapted the text for a wider audience in a new post (:post9822). This rewrite is an alternate, abbreviated view of the same topic that Derek wrote about and was created from his original text. Since the provenance produced by the activities of Derek and Monica corresponded to different user views, the system automatically published it in a different prov:Bundle. The tool also asserted provenance about the bundle that it produced (e.g., the date of creation, its creator, and the fact that it Derek's bundle was used). Because a bundle is a kind of entity, all provenance assertions that can be made about entities can also be made about bundles. The use of bundles enables the creation of provenance of provenance.

Example 3:

{% escape %}{% include "includes/prov/examples/eg-25-extended-crime-file-example/rdf/extended-crime-file-pt1_a.ttl" %}{% endescape %}

After some time, John wrote his own conclusions in his own post (:post19201) quoting the previous two posts. Each quote that John makes (:quote_from_monica and :quote_from_derek) is a new entity derived from the previous blogs and is annotated with the time that the quote was taken. The provenance of John's blog notes that his post is the result of the quotes that he took from Derek and Monica. The blog post is also derived from Derek's :aggregatedByRegions dataset because John inspected it and found a concern that he discusses in his blog. All the provenance statements related to John's post are grouped in a new prov:Bundle.

Example 4:

{% escape %}{% include "includes/prov/examples/eg-25-extended-crime-file-example/rdf/extended-crime-file-pt2.ttl" %}{% endescape %}

Unfortunately, there was a problem in the servers where :post19201 was being stored, and all the data related to the post was lost permanently. Thus, the system invalidated the entity automatically and notified John about the error.

Example 5:

{% escape %}{% include "includes/prov/examples/eg-25-extended-crime-file-example/rdf/extended-crime-file-pt4.ttl" %}{% endescape %}