RDF Context Associations Specification

Living Document,

This version:
https://knowledgeonwebscale.github.io/rdf-context-associations-spec/
Issue Tracking:
GitHub
Editors:
(IDLab - Ghent University)
Pieter Colpaert (IDLab - Ghent University)

Abstract

This specification introduces the RDF Context Associations approach to the management of local references in RDF data for Web ecosystems built on RDF data.

Glossary

| Term | Definition | |-------------------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Context Association | The association of contextual information to a closed, local target set of RDF statements encoded as a named graph. | | Context Statements | The contextual statements (e.g., metadata, provenance, policy) of a context association that target a named graph in the local RDF scope. | | Named Graph | An RDF 1.1 named graph (a set of triples identified by a graph name) used here to encapsulate target statements and keep their scope local. | | Blank Node Identifier | An RDF blank node identifier, in this context mostly used as the name value of an RDF named graph.| | Content Hash (named graph)| A cryptographic hash of the canonicalized representation of a named graph, evaluated via RDF Dataset Canonicalization | | RDF Stream | A stream of RDF data elements; see the W3C RDF Stream Processing Community Group for definitions and models. |

1. Introduction

The use of named graphs in RDF has been a widely discussed topic in the RDF community. From the initial publication in 2005 [carroll2005named], to their introduction in the RDF standard in version 1.1, named graphs see varied adoption for different use-cases throughout RDF ecosystems.

With their introduction in the RDF standard, the interpretation of named graphs was left intentionally blank due to the diverging interpretations that were in use at the moment of specification. Discussing the topic, the W3C working group provided a document on the semantics of RDF datasets.

Due to the lack of overarching semantics, the use of named graphs finds itself constricted to specific use-cases and data models, limiting their overall usability for exchange and reuse across ecosystems.

With this minimal specification, we aim to provide a pragmatic approach for the management of expressions of context over graphs of RDF statements. For this, we use named graphs as an indication of a shared context or meaning of a set of statements, that can be referenced using the graph name of the named graph in which this set of statements is embedded.

The definition of a context association is that a context association is an association of a target set of triples encoded as a named graph in RDF, where the blank node graph name identifier is referenced by a context definition in the local RDF scope. The graph name identifier MUST be interpreted with respect to its role as the name of a local named graph, and the target graph MUST syntactically be interpreted as the set of quads whose graph term is equal to that graph name. The association between the context definition and the target set of statements is local to the RDF scope, and the target set of statements MUST NOT be considered universally true or replicated in the default graph, but understood only within the scope of its named graph and the context associated with it.

2. Publishing RDF context associations

When publishing combined context and data, we must make sure the approach both is unambiguous in what context information is tied to what target graph of statements, as well as being exact in both its defined contextual information and the extent of the target set of statements. Additionally, we must hint to a consuming client that a specific interpretation of the context definition with regards to the targeted graph of statements is required.

2.1. Closing the open world

To solve the problem of exactly defining the sets of context statements and their targeted data, we need to restrict the open world interpretation of RDF to concrete and closed sets of statements that define exact sets of data statements and associated context statements, for which we use named graphs in RDF.

To ensure these sets of statements remain closed in definition, exchange and processing of the data and context, graph merge operations at the RDF level must be prevented to ensure that no changes occur to either the data or the context statements. For this, we make use of blank node identifiers for the graph name of these named graphs. This ensures the scope of the context statements and its association to a target set of statements is local to the scope of the storage, exchange or operation in which they are used.

Note: if the use of blank nodes is impractical due to limitations of having to extract specific graphs from the local graphstore based on their name value, skolem identifiers can be used to ensure unique generation of the graph name at the time of its construction. Although in theory these can be used interchangeably, their hash values will differ.

The data an metadata are published as blank node graphs

_:data {
    <http://people.org/Bob#me> foaf:name "Bob" .
        foaf:age "27" .
}

_:meta {
    _:data ex:creator <http://people.org/Bob#me> .
}

2.2. Advertising the interpretation

Secondly, we need to ensure the consuming client is aware of how the incoming RDF data should be processed to draw valid conclusions. For this, the term ca:LocalGraphReference is introduced. This term defines the identifier on which it is defined as being used only as the reference for a named graph local to the current document or operation scope.

This identifier can be directly embedded in the RDF data

_:data a ca:LocalGraphReference .
_:data { ... } .

_:meta a ca:LocalGraphReference .
_:meta { ... } .

Or it can be defined as the range or domain of a predicate in its OWL definition.

myOntology:hasGraphTarget a rdf:Property ;
    rdfs:range ca:LocalGraphReference ;
    rdfs:comment "This property points to a target named graph." .

2.3. Ensuring completeness

Finally, in situations (e.g., continuous streams of RDF data) where completeness is not guaranteed, a hash value may be provided over the canonicalized representation of a named graph using RDF Dataset Canonicalization, that allows incoming data to be verified on completeness.

The data an metadata are published as blank node graphs

_:data a ca:LocalGraphReference .
_:data ca:hash "..." .
_:data { ... } 

_:meta a ca:LocalGraphReference .
_:meta ca:hash "..." .
_:meta { ... } 

3. Interpreting the named graphs

To a processing client, the interpretation of RDF data that include context associations relies on the ability to understand references to named graphs in the data. To prevent ambiguity, the above publishing approach provides certain affordances to help the client.

Any client encountering a reference to a term that is defined as a ca:LocalGraphReference, must be interpreted as a reference to the named graph. Syntactically, this covers the set of quads of which the graph term is equal to the graph name. Semantically, this covers the interpretation of the target graph as subject to the contextual references that are associated to this graph.

The use of blank node identifiers minimizes the danger of misunderstanding the graph reference, as the graph name identifier cannot be misconstrued as a reference that leaves the local RDF scope, and should not be used for any use other than as a reference for its named graph.

The availability of content hashes helps ensure consistency even in cases such as streaming RDF where the scope of blank nodes is not well-defined.

4. Examples

To demonstrate this, we take a small example of a set of data that is annotated with metadata within the local RDF dataset.
Ruben has a set of data statements about himself issued by a government instance.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.

<http://people.org/ruben> foaf:name "Ruben";
    foaf:age 28.

However, as Ruben wants to prove his age to an external actor, he requests a signed representation by the issuing instance that defines its targetGraph predicate to have an rdfs:range of ca:LocalGraphReference.

Providing the following OWL definition

@prefix ca: <https://w3id.org/context-associations#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sign: <http://example.org/ns/sign#>. 
@prefix prov: <http://example.org/ns/prov#>. 

sign:targetGraph a rdfs:Property;
    rdfs:range ca:LocalGraphReference;
    rdfs:comment "The graph target of the signature." .

prov:contentGraph a rdfs:Property;
    rdfs:range ca:LocalGraphReference;
    rdfs:comment "The graph defining the contents retrieved from a source." .

and resulting data

@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix sign: <http://example.org/ns/sign#>. 
@prefix prov: <http://example.org/ns/prov#>. 
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. 
@prefix ca: <https://w3id.org/context-associations#>.  

_:data {
    <http://people.org/ruben> foaf:name "Ruben" ;
        foaf:age 28 .
}

_:retrieval prov:contentGraph _:data ;
    prov:source <http://gov.org/registry/14/> ;
    prov:actor <http://people.org/ruben> .

_:s a sign:Signature ;
    sign:targetGraph _:data ;
    sign:issuer <http://gov.org/issuers/14> ;
    sign:proofValue "..." .

# entailed
_:data a ca:LocalGraphReference.

For which now recursively, the user can sign its own metadata graph, providing validation of both the source data and the metadata by verifying signatures of different issuers.

@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix sign: <http://example.org/ns/sign#>. 
@prefix prov: <http://example.org/ns/prov#>. 
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. 
@prefix ca: <https://w3id.org/context-associations#>.  

_:data { ... }

_:meta {
    _:retrieval prov:contentGraph _:data ;
        prov:source <http://gov.org/registry/14/> ;
        prov:actor <http://people.org/ruben> .
}

_:meta_s a sign:Signature ;
    sign:targetGraph _:meta ;
    sign:issuer <http://people.org/ruben> ;
    sign:proofValue "..." .
 

_:s a sign:Signature ;
    sign:targetGraph _:data ;
    sign:issuer <http://gov.org/issuers/14> ;
    sign:proofValue "..." .

# entailed
_:data a ca:LocalGraphReference.
_:meta a ca:LocalGraphReference.

Appendix A: Considerations

This section discusses some considerations as to why certain decisions were made.

Why named graphs instead of reification / rdf-star / triple terms

The choice for named graphs is both a pragmatic choice of choosing an existing RDF standard that should be supported in all RDF 1.1 compatible tooling, that provides inherent support of annotating sets of triples instead of individual triples. Where reification and triple terms can be modeled as part of a collection or other entity that defines a selection of triples, these do not provide an inherent boundary within the RDF dataset, but is fully reliant on the documentation of the approach to indicate the intended boundary of that collection entity. Named graphs does not suffer from this problem, and provides an inherent boundary of its contained triples from the default graph and the other named graphs of an RDF dataset

Additionally, both Evaluation of Metadata Representations in RDF stores as well as an unpublished paper from our side found that named graphs provide competitive performance with other annotation methods.

Syntactic scope of the named graph

There are two possible interpretations in which a referenced named graph can be interpreted in RDF, as the set of quads where the term equals the given graph name, or the set of triples found by stripping the graph term from this set of quads.

To process a referenced graph as a set of triples, in a pass-by-value way, an unpacking of the named graph is required, such as using the GRAPH keyword in SPARQL to unpackage a named graph in triple graph, or in notation3 using graph terms.

However, the core RDF specification does not provide such an unpacking mechanism. Therefore, in order to enforce working with the graph in a by-value approach, an approach such as SPARQL is required that allows both the use of the graph identifier and working with the unpackaged triples at the same time.

CONSTRUCT {
    ?s ?p ?o.
    ?g :issuer ?issuer.
} WHERE {
    GRAPH ?g {
        ?s ?p ?o.
    }
    ?g :issuer ?issuer.
}

So unless a processing approach such as SPARQL can be enforced, the syntactic interpretation of named graphs must be constrained to its set of quads to retain consistency and functionality throughout the processing pipeline.

Semantic and syntactic interpretation of RDF named graphs

The semantics of named graphs have had extensive discussion previously in the semantic web community, much of which has been collected in a document published by the RDF working group On Semantics of RDF Datasets.

The evaluation of entailment regimes over RDF graphs is closely tied to the unpacking of said graphs in the RDF dataset. Therefore, it is left out of scope for this specification.

Working with remote references

For practical purposes, we restrict the interpretation of graph references to local graphs. This goes for blank nodes, dereferenceable URIs and non-dereferenceable URIs.

The integration of remote graphs through the dereferencing requires a more holistic approach that can resolve inconsistencies in the resulting dataset following a merge operation. A similar approach can be seen with the use of owl:imports in Notation3 reasoning.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Informative References

[CARROLL2005NAMED]
Jeremy J. Carroll; et al. Named graphs. December 2005. URL: https://www.sciencedirect.com/science/article/pii/S1570826805000235