PROV-DM is a data model for provenance that describes the entities, people and activities involved in producing a piece of data or thing in the world. PROV-DM is domain-agnostic, but is equipped with extensibility points allowing further domain-specific and application-specific extensions to be defined. PROV-DM is accompanied by PROV-N, a technology-independent notation, which allows serializations of PROV-DM instances to be created for human consumption, which facilitates the mapping of PROV-DM to concrete syntax, and which is used as the basis for a formal semantics of PROV-DM. The purpose of this document is to define the PROV-N notation.

This document is released internally by the Provenance Working Group.
This document is part of the PROV family of specifications, a set of specifications aiming to define the various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. This document defines the PROV-DM data model for provenance, accompanied with a notation to express instances of that data model for human consumption. Other documents are:


Provenance is defined as a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing in the world. Two companion specifications respectively define PROV-DM, a data model for provenance, allowing such descriptions to be expressed [[PROV-DM]] and a set of constraints that provenance descriptions are expectively to satisfy [[PROV-DM-CONSTRAINTS]].

In this context, PROV-N was introduced as a notation to write instances of the data model, as close to its abstract syntax as possible. PROV-N is primarily aimed at human consumption. PROV-N allows serializations of PROV-DM instances to be written in a technology independent manner. So far, PROV-N has been used in the following ways:

PROV-N was designed to be as close as possible to PROV-DM without the syntactic bias and modelling constraints that concrete technologies bring with them, e.g., XML's choice between attribute and element, RDF's reliance on triples, or JSON's usage of dictionaries.

The purpose of this document is solely to define the syntax of PROV-N. For each construct of PROV-DM, a corresponding PROV-N expression is introduced, by way of a production in the PROV-N grammar presented in this document.

This specification is one of several specifications, referred to as the PROV family of specifications, defining the various aspects that are necessary to achieve the vision of inter-operable exchange of provenance:

Structure of this Document


PROV-DM Namespace

The PROV-DM namespace is

All the elements, relations, reserved names and attributes introduced in this specification belong to the PROV-DM namespace.

There is a desire to use a single namespace that all specifications of the PROV family can share to refer to common provenance terms. This is ISSUE-224.


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [[!RFC2119]].

Design Rationale for PROV-N

A key goal of PROV-DM is the specification of a machine-processable data model for provenance so that application having obtained the provenance of the resource they manipulate can reason about such provenance. As such, representations of PROV-DM are available in RDF and XML.

However, communicating provenance between humans is also important when teaching, illustrating, formalizing, and discussing provenance-related issues. To this end, PROV-N is a notation that is designed to write instances of the PROV-DM data model in a compact textual form, without the syntactic bagage and constraints coming with a markup language such as XML or a description framework such as RDF.

Grammar Notation

This specification includes a grammar for PROV-N expressed using the Extended Backus-Naur Form (EBNF) notation.

Each production rule (or production, for short) in the grammar defines one non-terminal symbol, in the form:

E ::= expression

Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

PROV-N Productions per Component

A PROV-N document allows writing down instances of the PROV-DM data model in a compact textual form. It consists of a sequence of expressions.

Instances of the PROV-DM data model are expressed in PROV-N by a text conformant with the toplevel production expression of the grammar.

expression  ::=

  entityExpression | activityExpression | generationExpression | usageExpression
| startExpression | endExpression | communicationExpression | startByActivityExpression
| agentExpression | attributionExpression | associationExpression | responsibilityExpression
| derivationExpression | revisionExpression | quotationExpression | hadOriginalSourceExpression | traceabilityExpression
| alternateExpression | specializationExpression
| noteExpression | annotationExpression

Component 1: Entities and Activities


An entity's text matches the entityExpression production.

entityExpression ::= entity ( identifier optional-attribute-values )

optional-attribute-values ::= , [ attribute-values ]
attribute-values ::= attribute-value | attribute-value , attribute-values
attribute-value ::= attribute = Literal
entity(tr:WD-prov-dm-20111215, [ prov:type="document" ])
entity(tr:WD-prov-dm-20111215, [ prov:type="document", ex:version=2 ])


An activity's text matches the activityExpression production.

activityExpression ::= activity ( identifier , (time | - ) , (time | - ) optional-attribute-values )
activity(ex:a10, -, -)
activity(ex:a10, -, -, [prov:type="edit"])
activity(ex:a10, -, 2011-11-16T16:00:00)
activity(ex:a10, 2011-11-16T16:00:00, -)
activity(ex:a10, 2011-11-16T16:00:00, -, [prov:type="createFile"])
activity(ex:a10, 2011-11-16T16:00:00, 2011-11-16T16:00:01, [prov:type="createFile"])
activity(ex:a10, [prov:type="edit"])


A generation's text matches the generationExpression production.

generationExpression ::= wasGeneratedBy ( ( identifier | - ) , eIdentifier , ( aIdentifier | - ) , ( time | - ) optional-attribute-values )
wasGeneratedBy(tr:WD-prov-dm-20111215, ex:edit1, -)
wasGeneratedBy(tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00)
wasGeneratedBy(e2, a1, -, [ex:fct="save"])     
wasGeneratedBy(e2, -, -, [ex:fct="save"])     
wasGeneratedBy(ex:g1, tr:WD-prov-dm-20111215, ex:edit1, -)
wasGeneratedBy(ex:g1, tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00)
wasGeneratedBy(-, tr:WD-prov-dm-20111215, ex:edit1, -)

Even though the production generationExpression allows for expressions wasGeneratedBy(e2, -, -) and wasGeneratedBy(-, e2, -, -), these expressions are not valid in PROV-N.


A usage's text matches the usageExpression production.

usageExpression ::= used ( ( identifier | - ) , aIdentifier , eIdentifier , ( time | - ) optional-attribute-values )
used(ex:act2, ar3:0111, -)
used(ex:act2, ar3:0111, 2011-11-16T16:00:00)
used(a1,e1, -, [ex:fct="load"])
used(ex:u1, ex:act2, ar3:0111, -)
used(-, ex:act2, ar3:0111, -)

Even though the production usageExpression allows for expressions used(e2, -, -) and used(-, e2, -, -), these expressions are not valid in PROV-N.


An activity start's text matches the startExpression production of the grammar

startExpression ::= wasStartedBy ( ( identifier | - ) , aIdentifier , ( eIdentifier | - ) , ( time | - ) optional-attribute-values )
wasStartedBy(ex:act2, ar3:0111, -)
wasStartedBy(ex:act2, ar3:0111, 2011-11-16T16:00:00)
wasStartedBy(ex:act2, -, 2011-11-16T16:00:00)
wasStartedBy(ex:act2, -, -)
wasStartedBy(ex:act2, -, -, [ex:param="a"])
wasStartedBy(s, ex:act2, ar3:0111, 2011-11-16T16:00:00)
wasStartedBy(-, ex:act2, ar3:0111, 2011-11-16T16:00:00)

Even though the production startExpression allows for expressions wasStartedBy(e2, -, -) and wasStartedBy(-, e2, -, -), these expressions are not valid in PROV-N.


An activity end's text matches the endExpression production of the grammar.

endExpression ::= wasEndedBy ( ( identifier | - ) , aIdentifier , ( eIdentifier | - ) , ( time | - ) optional-attribute-values )
wasEndedBy(ex:act2, ex:trigger, -)
wasEndedBy(ex:act2, ex:trigger, 2011-11-16T16:00:00)
wasEndedBy(ex:act2, -, 2011-11-16T16:00:00)
wasEndedBy(ex:act2, -, 2011-11-16T16:00:00, [ex:param="a"])
wasEndedBy(e,ex:act2, -, -)
wasEndedBy(e, ex:act2, ex:trigger, 2011-11-16T16:00:00)
wasEndedBy(-, ex:act2, ex:trigger, 2011-11-16T16:00:00)

Even though the production endExpression allows for expressions wasEndedBy(e2, -, -) and wasEndedBy(-, e2, -, -), these expressions are not valid in PROV-N.


communicationExpression  ::= wasInformedBy ( ( identifier | - ) , aIdentifier , aIdentifier optional-attribute-values )
wasInformedBy(ex:a1, ex:a2)
wasInformedBy(ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])
wasInformedBy(i, ex:a1, ex:a2)
wasInformedBy(i, ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])
wasInformedBy(-, ex:a1, ex:a2)
wasInformedBy(-, ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])

Start by Activity

startByActivityExpression  ::= wasStartedByActivity ( ( identifier | - ) , aIdentifier , aIdentifier optional-attribute-values )
wasStartedByActivity(ex:a1, ex:a2)
wasStartedByActivity(ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])
wasStartedByActivity(s,ex:a1, ex:a2)
wasStartedByActivity(s,ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])
wasStartedByActivity(-,ex:a1, ex:a2)
wasStartedByActivity(-,ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])

Component 2: Agents and Responsibility


An agent's text matches the agentExpression production.

agentExpression ::= agent ( identifier optional-attribute-values )
agent(ag4, [ prov:type="prov:Person" %% xsd:QName, ex:name="David" ])


attributionExpression ::= wasAttributedTo ( ( identifier | - ) , eIdentifier , agIdentifier optional-attribute-values )
wasAttributedTo(e,ag,[ex:license="cc:attributionURL" %% "xsd:QName"])
wasAttributedTo(id,e,ag,[ex:license="cc:attributionURL" %% "xsd:QName"])
wasAttributedTo(-,e,ag,[ex:license="cc:attributionURL" %% "xsd:QName"])


An association's text matches the activityAssociationExpression productions of the grammar defined in this specification document.

associationExpression ::= wasAssociatedWith ( ( identifier | - ) , aIdentifier , ( agIdentifier | - ) , ( eIdentifier | - ) optional-attribute-values )
wasAssociatedWith(ex:a1, -, ex:e1)
wasAssociatedWith(ex:a1, ex:ag1, -)
wasAssociatedWith(ex:a1, ex:ag1, ex:e1)
wasAssociatedWith(ex:a1, ex:ag1, ex:e1, [ex:param1="a", ex:param2="b"])
wasAssociatedWith(a, ex:a1, -, ex:e1)
wasAssociatedWith(-, ex:a1, -, ex:e1)
wasAssociatedWith(-, ex:a1, ex:ag1, -)

Even though the production associationExpression allows for expressions wasAssociatedWith(a, -, -) and wasAssociatiedWith(-, a, -, -), these expressions are not valid in PROV-N.


responsibilityExpression ::= actedOnBehalfOf ( ( identifier | - ) , agIdentifier , agIdentifier , ( aIdentifier | - ) optional-attribute-values )
actedOnBehalfOf(ag1, ag2, -)
actedOnBehalfOf(ag1, ag2, a)
actedOnBehalfOf(ag1, ag2, -, [prov:type="delegation"])
actedOnBehalfOf(ag2, ag3, a, [prov:type="contract"])
actedOnBehalfOf(r, ag2, ag3, a, [prov:type="contract"])
actedOnBehalfOf(-, ag1, ag2, -)

Component 3: Derivations


A derivation expression's text matches the derivationExpression production.

derivationExpression ::= wasDerivedFrom ( ( identifier | - ) , eIdentifier , eIdentifier , ( aIdentifier | - ) , ( gIdentifier | - ) , ( uIdentifier | - ) optional-attribute-values )
wasDerivedFrom(e2, e1)
wasDerivedFrom(e2, e1, a, g2, u1)
wasDerivedFrom(e2, e1, -, g2, u1)
wasDerivedFrom(e2, e1, a, -, u1)
wasDerivedFrom(e2, e1, a, g2, -)
wasDerivedFrom(e2, e1, a, -, -)
wasDerivedFrom(e2, e1, -, -, u1)
wasDerivedFrom(e2, e1, -, -, -)
wasDerivedFrom(d, e2, e1, a, g2, u1)
wasDerivedFrom(-, e2, e1, a, g2, u1)


revisionExpression ::= wasRevisionOf ( ( identifier | - ) , eIdentifier , eIdentifier , ( agIdentifier | - ) optional-attribute-values )
wasRevisionOf(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, -)
wasRevisionOf(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, w3:Consortium)
wasRevisionOf(id,tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, w3:Consortium)
wasRevisionOf(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, -)
wasRevisionOf(id,tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, -)
wasRevisionOf(-,tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, -)


A quotation expression's text matches the quotationExpression production of the grammar.

quotationExpression ::= wasQuotedFrom ( ( identifier | - ) , eIdentifier , eIdentifier , ( agIdentifier | - ) , ( agIdentifier | - ) optional-attribute-values )
wasQuotedFrom(ex:blockQuote,ex:blog, -, -)

Original Source

An original source record's text matches the originalSourceExpression production of the grammar.

originalSourceExpression ::= hadOriginalSource ( ( identifier | - ) , eIdentifier , eIdentifier optional-attribute-values )
hadOriginalSource(ex:e1, ex:e2)
hadOriginalSource(ex:e1, ex:e2,[ex:param="a"])
hadOriginalSource(-,ex:e1, ex:e2,[ex:param="a"])
hadOriginalSource(-,ex:e1, ex:e2)


A traceability expression's text matches the traceabilityExpression production of the grammar.

traceabilityExpression ::= tracedTo ( ( identifier | - ) , eIdentifier , eIdentifier , optional-attribute-values )

Component 4: Alternate Entities


An alternate relation's text matches the alternateExpression production.

alternateExpression ::= alternateOf ( eIdentifier , eIdentifier )


A specialization relation's text matches the specializationExpression production.

specializationExpression ::= specializationOf ( eIdentifier , eIdentifier )

Component 5: Collections

To be checked: Grammar has not been implemented yet.


A Derivation-by-Insertion relation's text matches the derivationByInsertionFromExpression production.

derivationByInsertionFromExpression ::= derivedByInsertionFrom ( identifier , afterIdentifier , beforeIdentifier , keyidentifier , validentifier , optional-attribute-values )
derivedByInsertionFrom(id, c1, c, "k1", v1)  
derivedByInsertionFrom(id, c1, c, "k1", v1, [])  


A Derivation-by-Removal relation's text matches the derivationByRemovalFromExpression production.

derivationByRemovalFromExpression ::= derivedByRemovalFrom ( identifier , afterIdentifier , beforeIdentifier , keyidentifier , optional-attribute-values )
derivedByRemovalFrom(id, c1, c, "k1")  
derivedByRemovalFrom(id, c1, c, "k1", [])


A Containment relation's text matches the containedExpression production.

containedExpression ::= contained ( identifier , afterIdentifier , keyidentifier , validentifier , optional-attribute-values )
contained(id, c, "k", v)
contained(id, c, "k", v,[])  

Convenience relations

A Derivation-by-Bulk-Insertion relation's text matches the derivationByBulkInsertionFromExpression production.

derivationByBulkInsertionFromExpression ::= derivedByBulkInsertionFrom ( identifier , afterIdentifier , beforeIdentifier , { keyValuePairs } , optional-attribute-values )
 derivedByBulkInsertionFrom(c1, c, {("k1", v1), ("k2", v2)})  
 derivedByBulkInsertionFrom(c1, c, {("k1", v1), ("k2", v2)}), []

A Derivation-by-Bulk-Removal relation's text matches the derivationByBulkRemovalFromExpression production.

derivationByBulkRemovalFromExpression ::= derivedByRemovalFrom ( identifier , afterIdentifier , beforeIdentifier , { keySet } , optional-attribute-values )
   derivedByBulkRemovalFrom(c3, c1, {"k1", "k3"})               
   derivedByBulkRemovalFrom(c3, c1, {"k1", "k3"}, [])               

A Bulk-Containment relation's text matches the containedBulkExpression production.

containedBulkExpression ::= contained ( identifier , afterIdentifier , { keyValuePairs } validentifier , optional-attribute-values )
   containedBulk(c3, {("k4", v4), ("k5", v5)})
   containedBulk(c3, {("k4", v4), ("k5", v5)},[])  
In the productions above, nonterminals keyValuePairs and keySet are defined as follows.
keyValuePairs  ::= ( keyidentifier , validentifier ) | ( keyidentifier , validentifier ) , keyValuePairs
keySet  ::= keyidentifier | keyidentifier , keySet

Component 6: Annotations


A note's text matches the noteExpression production.

noteExpression ::= note ( identifier optional-attribute-values )
note(ann1,[ex:color="blue", ex:screenX=20, ex:screenY=30])


A note's text matches the noteExpression production.

annotationExpression ::= hasAnnotation ( identifier , nIdentifier )

Further Expressions

This section defines further expressions of PROV-N.

Namespace Declaration

namespaceDeclarations ::= | defaultNamespaceDeclaration | namespaceDeclaration namespaceDeclaration
namespaceDeclaration ::= prefix prefix IRI
defaultNamespaceDeclaration ::= default IRI

In PROV-N, the prefix prov is reserved and denotes the PROV namespace.


identifier ::= qualifiedName
eIdentifier ::= identifier (intended to denote an entity)
aIdentifier ::= identifier (intended to denote an activity)
agIdentifier ::= identifier (intended to denote an agent)
gIdentifier::= identifier (intended to denote a generation)
uIdentifier::= identifier (intended to denote a usage)
nIdentifier::= identifier (intended to denote a note)
accIdentifier::= identifier (intended to denote an account)

qualifiedName  ::= prefixedName | unprefixedName
prefixedName  ::= prefix : localPart
unprefixedName  ::= localPart
prefix  ::= a name without colon compatible with the NC_NAME production [[!XML-NAMES]]
localPart  ::= a name without colon compatible with the NC_NAME production [[!XML-NAMES]]
Note that XML NC_NAME don't allow local identifiers to start with a number. Instead, should we use the productions used in SPARQL or TURTLE?


An attribute's text matches the attribute production.

attribute ::= qualifiedName

The reserved attributes in the PROV namespace are the following.

  1. prov:label
  2. prov:location
  3. prov:role
  4. prov:steps
  5. prov:type


A Literal's text matches the Literal production.

Literal  ::= typedLiteral | convenienceNotation
typedLiteral ::= quotedString %% datatype
datatype ::= qualifiedName
convenienceNotation  ::= stringLiteral | intLiteral
stringLiteral ::= quotedString
quotedString ::= a finite sequence of characters in which " (U+22) and \ (U+5C) occur only in pairs of the form \" (U+5C, U+22) and \\ (U+5C, U+5C), enclosed in a pair of " (U+22) characters
intLiteral ::= a finite-length sequence of decimal digits (#x30-#x39) with an optional leading negative sign (-)

The non terminals stringLiteral and intLiteral are syntactic sugar for quoted strings with datatype xsd:string and xsd:int, respectively.

In particular, a PROV-DM Literal may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no specific interpretation in the context of PROV-DM.

Reserved Type Values

The reserved type values in the PROV namespace are the following.

  1. prov:AccountEntity
  2. prov:SoftwareAgent
  3. prov:Person
  4. prov:Organization
  5. prov:Plan
  6. prov:Collection
  7. prov:EmptyCollection

Time Values

Time instants are defined according to xsd:dateTime [[!XMLSCHEMA-2]].

Expression Container

An expression container is a house-keeping construct of PROV-N capable of packaging up PROV-N expressions and namespace declarations. An expression container forms a self-contained package of provenance descriptions for the purpose of exchanging them. An expression container may be used to package up PROV-N expressions in response to a request for the provenance of something ([[PROV-AQ]]).

Given its status of house keeping construct for the purpose of exchanging provenance expressions, an expression container is not defined as a PROV-N expression (production expression).

An expression container, written container decls exprs endContainer in PROV-N, contains:

An expression container's text matches the expressionContainer production.

expressionContainer ::= container namespaceDeclarations expression endContainer

The following container contains expressions related to the provenance of entity e2.


  prefix ex:,

  entity(e2, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", 
             ex:content="There was a lot of crime in London last month."])
  activity(a1, 2011-11-16T16:05:00,,[prov:type="edit"])
  wasGeneratedBy(e2, a1, [ex:fct="save"])     
  wasAssociatedWith(a1, ag2, [prov:role="author"])
  agent(ag2, [ prov:type="prov:Person" %% xsd:QName, ex:name="Bob" ])


This container could for instance be returned as the result of a query to a provenance store for the provenance of entity e2 [[PROV-AQ]].

Clarify what records are. This is ISSUE-208.


PROV-DM has introduced a notion of account by which a set of provenance descriptions can be bundled up and named. PROV-DM assumes the existence of mechanisms to implement accounts, but such mechanisms remain outside its scope. It is suggested that specific serializations may offer solutions to name bundles of descriptions.

Given that the primary motivation for PROV-N is to provide a notation aimed at human consumption, it is therefore appropriate to introduce a notation for accounts, which would include an account name and a bundle of expressions.

An account, written account(id, exprs) in PROV-N, contains:

In PROV-N, an account's text matches the accountExpression production of the grammar.

accountExpression ::= account ( identifier , expression )

It is also useful to package up one or more account expressions in an expression container, for interchange purpose. Hence, expressionContainer is revised as follows.

expressionContainer ::= container namespaceDeclarations expression endContainer
| container namespaceDeclarations accountExpression endContainer

The following container

  prefix ex:,


illustrates how two accounts with identifiers ex:acc1 and ex:acc2 can be returned in a PROV-N serialization of the provenance of something.

The following container

  prefix ex:,

      entity(tr:WD-prov-dm-20111018, [ prov:type="pr:RecsWD" %% xsd:QName ])
      entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ])
      wasAssociatedWith(ex:pub2, w3:Consortium  @ pr:rec-advance))

      entity(ex:acc1, [prov:type="prov:AccountEntity" %% xsd:QName ])


illustrates a first account, with identifier ex:acc1, containing expressions describing the provenance of the technical report tr:WD-prov-dm-20111215, and a second account ex:acc2, describing the provenance of the first. In account ex:acc2, ex:acc1 is the identifier of an entity of type prov:AccountEntity.