- ability to execute SPARQL queries against the in-memory data structures via the [SPARQL.ex] package or against any SPARQL endpoint via the [SPARQL.Client] package
The [RDF standard](http://www.w3.org/TR/rdf11-concepts/) defines a graph data model for distributed information on the web. A RDF graph is a set of statements aka RDF triples consisting of three nodes:
RDF.ex follows the RDF specs and supports [IRIs](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier), an internationalized generalization of URIs, permitting a wider range of Unicode characters. They are represented with the `RDF.IRI` structure and can be constructed either with `RDF.IRI.new/1` or `RDF.IRI.new!/1`, the latter of which additionally validates, that the given IRI is actually a valid absolute IRI or raises an exception otherwise.
Besides being a little shorter than `RDF.IRI.new` and better `import`able, their usage will automatically benefit from any future IRI creation optimizations and is therefore recommended over the original functions.
RDF.ex supports modules which represent RDF vocabularies as `RDF.Vocabulary.Namespace`s. It comes with predefined modules for some fundamental vocabularies defined in the `RDF.NS` module.
These `RDF.Vocabulary.Namespace`s (a special case of a `RDF.Namespace`) allow for something similar to QNames in XML: an atom or function qualified with a `RDF.Vocabulary.Namespace` can be resolved to an IRI.
`RDF.Vocabulary.Namespace` module and return the IRI directly, but since `RDF.iri` can also handle IRIs directly, you can safely and consistently use it with lowercased terms too.
As this example shows, the namespace modules can be easily `alias`ed. When required, they can be also aliased to a completely different name. Since the `RDF` vocabulary namespace in `RDF.NS.RDF` can't be aliased (it would clash with the top-level `RDF` module), all of its elements can be accessed directly from the `RDF` module (without an alias).
This way of expressing IRIs has the additional benefit, that the existence of the referenced IRI is checked at compile time, i.e. whenever a term is used that is not part of the resp. vocabulary an error is raised by the Elixir compiler (unless the vocabulary namespace is non-strict; see below).
For terms not adhering to the capitalization rules (lowercase properties, capitalized non-properties) or containing characters not allowed within atoms, the predefined namespaces in `RDF.NS` define aliases accordingly. If unsure, have a look at the documentation or their definitions.
The functions for the properties on a vocabulary namespace module, are also available in a description builder variant, which accepts subject and objects as arguments.
If you want to state multiple statements with the same subject and predicate, you can either pass the objects as a list or as additional arguments, if there are not more than five of them:
It's recommended to introduce a dedicated module for the defined namespaces. In this module you'll `use RDF.Vocabulary.Namespace` and define your vocabulary namespaces with the `defvocab` macro.
a `"#"`. Terms will be checked for invalid characters at compile-time and will raise a compiler error. This handling of invalid characters can be modified with the `invalid_characters` options, which is set to `:fail` by default. By setting it to `:warn` only warnings will be raised or it can be turned off completely with `:ignore`.
A vocabulary namespace with extracted terms can be defined either by providing RDF data directly with the `data` option or files with serialized RDF data in the `priv/vocabs` directory using the `file` option:
During compilation the terms will be validated and checked for proper capitalisation by analysing the schema description of the resp. resource in the given data.
This validation behaviour can be modified with the `case_violations` options, which is by default set to `:warn`. By setting it explicitly to `:fail` errors will be raised during compilation or it can be turned off with `:ignore`.
Though strictly discouraged, a vocabulary namespace can be defined as non-strict with the `strict` option set to `false`. A non-strict vocabulary doesn't require any terms to be defined (although they can). A term is resolved dynamically at runtime by concatenation of the term and the base IRI of the resp. namespace module:
They can be created with `RDF.BlankNode.new` or its alias function `RDF.bnode`. You can either pass an atom, string, integer or Erlang reference with a custom local identifier or call it without any arguments, which will create a local identifier automatically.
```elixir
RDF.bnode(:foo)
RDF.bnode(42)
RDF.bnode
```
You can also use the `~B` sigil to create a blank node with a custom name:
Literals are used for values such as strings, numbers, and dates. They can be untyped, languaged-tagged or typed. In general they are created with the `RDF.Literal.new` constructor function or its alias function `RDF.literal`:
```elixir
RDF.Literal.new("foo")
RDF.literal("foo")
```
The actual value can be accessed via the `value` struct field:
```elixir
RDF.literal("foo").value
```
An untyped literal can also be created with the `~L` sigil:
A language-tagged literal can be created by providing the `language` option with a [BCP47]-conform language or by adding the language as a modifier to the `~L` sigil:
Note: Only languages without subtags are supported as modifiers of the `~L` sigil, i.e. if you want to use `en-US` as a language tag, you would have to use `RDF.literal` or `RDF.Literal.new`.
For all of these known datatypes the `value` struct field contains the native Elixir value representation according to this mapping. When a known XSD datatype is specified, the given value will be converted automatically if needed and possible.
For all of these supported XSD datatypes there are `RDF.Datatype`s available that allow the creation of `RDF.Literal`s with the respective datatype. Their `new` constructor function can be called also via the alias functions on the top-level `RDF` namespace.
If you want to prohibit the creation of invalid literals, you can use the `new!` constructor function of `RDF.Datatype` or `RDF.Literal`, which will fail in case of invalid values.
A RDF literal is bound to the lexical form of the initially given value. This lexical representation can be retrieved with the `RDF.Literal.lexical/1` function:
Note: Although you can create any XSD datatype by using the resp. IRI with the `datatype` option of `RDF.Literal.new`, not all of them support the validation and conversion behaviour of `RDF.Literal`s and the `value` field simply contains the initially given value unvalidated and unconverted.
The `RDF.Triple` and `RDF.Quad` modules both provide a function `new` for such tuples, which coerces the elements to proper nodes when possible or raises an error when such a coercion is not possible. In particular these functions also resolve qualified terms from a vocabulary namespace. They can also be called with the alias functions `RDF.triple` and `RDF.quad`.
If you want to explicitly create a quad in the default graph context, you can use `nil` as the graph name. The `nil` value is used consistently as the name of the default graph within RDF.ex.
RDF.ex provides various data structures for collections of statements:
-`RDF.Description`: a collection of triples about the same subject
-`RDF.Graph`: a named collection of statements
-`RDF.Dataset`: a named collection of graphs, i.e. a collection of statements from different graphs; it may have multiple named graphs and at most one unnamed ("default") graph
All of these structures have similar sets of functions and implement Elixirs `Enumerable` and `Collectable` protocol, Elixirs `Access` behaviour and the `RDF.Data` protocol of RDF.ex.
The `new` function of these data structures create new instances of the struct and optionally initialize them with initial statements. `RDF.Description.new` requires at least an IRI or blank node for the subject, while `RDF.Graph.new` and `RDF.Dataset.new` take an optional IRI for the name of the graph or dataset.
As you can see, qualified terms from a vocabulary namespace can be given instead of an IRI and will be resolved automatically. This applies to all of the functions discussed below.
The `new` functions also take optional initial data, which can be provided in various forms. Basically it takes the given data and hands it to the `add` function with the newly created struct.
In general, the object position of a statement can be a list of values, which will be interpreted as multiple statements with the same subject and predicate. So the former could be written more shortly:
```elixir
RDF.Description.new(EX.S, {EX.p, [EX.O1, EX.O2]})
```
Multiple statements with different subject and/or predicate can be given as a list of statements, where everything said before on single statements applies to the individual statements of these lists:
A `RDF.Description` can be added to any of the three data structures:
```elixir
input = RDF.Description.new(EX.S, {EX.p, EX.O1})
description |> RDF.Description.add input
graph |> RDF.Graph.add input
dataset |> RDF.Dataset.add input
```
Note that, unlike mismatches in the subjects of directly given statements, `RDF.Description.add` ignores the subject of a given `RDF.Description` and just adds the property-value pairs of the given description, because this is a common use case when merging the descriptions of differently named resources (eg. because they are linked via `owl:sameAs`).
`RDF.Graph.add` and `RDF.Dataset.add` can also add other graphs and `RDF.Dataset.add` can add the contents of another dataset.
`RDF.Dataset.add` is also special, in that it allows to overwrite the explicit or implicit graph context of the input data and redirect the input into another graph. For example, the following examples all add the given statements to the `EX.Other` graph:
Unlike the `add` function, which always returns the same data structure as the data structure to which the addition happens, which possible means ignoring some input statements (eg. when the subject of a statement doesn't match the description subject) or reinterpreting some parts of the input statement (eg. ignoring the subject of another description), the `merge` function of the `RDF.Data` protocol implemented by all three data structures will always add all of the input and possibly creates another type of data structure. For example, merging two `RDF.Description`s with different subjects results in a `RDF.Graph`. Or adding a quad to a `RDF.Graph` with a different name than the quad’s graph context results in a `RDF.Dataset`.
```elixir
RDF.Description.new(EX.S1, {EX.p, EX.O})
|> RDF.Data.merge(RDF.Description.new(EX.S2, {EX.p, EX.O})) # returns an unnamed RDF.Graph
|> RDF.Data.merge(RDF.Graph.new(EX.Graph, {EX.S2, EX.p, EX.O2})) # returns a RDF.Dataset
```
Statements added with `put` overwrite all existing statements with the same subject and predicate.
It is available on all three data structures and can handle all of the input data types as their `add` counterpart.
#### Accessing the content of RDF data structures
All three RDF data structures implement the `Enumerable` protocol over the set of contained statements. As a set of triples in the case of `RDF.Description` and `RDF.Graph` and as a set of quads in case of `RDF.Dataset`. This means you can use all `Enum` functions over the contained statements as tuples.
The `RDF.Data` protocol offers various functions to access the contents of RDF data structures:
-`RDF.Data.subjects/1` returns the set of all subject resources.
-`RDF.Data.predicates/1` returns the set of all used properties.
-`RDF.Data.objects/1` returns the set of all resources on the object position of statements. Note: Literals not included.
-`RDF.Data.resources/1` returns the set of all used resources at any position in the contained RDF statements.
-`RDF.Data.description/2` returns all statements from a data structure about the given resource as a `RDF.Description`. It will be empty if no such statements exist. On a `RDF.Dataset` it will aggregate the statements about the resource from all graphs.
Since all three RDF data structures implement the `Access` behaviour, you can also use `data[key]` syntax, which basically just calls the resp. `get` function.
Also, the familiar `fetch` function of the `Access` behaviour, as a variant of `get` which returns `ok` tuples, is available on all RDF data structures.
#### Querying graphs with the SPARQL query language
The [SPARQL.ex] package allows you to execute SPARQL queries against RDF.ex data structures. It's still very limited at the moment. It just supports `SELECT` queries with basic graph pattern matching, filtering and projection and works on `RDF.Graph`s only. But even in this early, limited form it allows to express more powerful queries in a simpler way than with the plain `RDF.Graph` API.
See the [SPARQL.ex README](https://github.com/marcelotto/sparql-ex#sparqlex) for more information and some examples.
Statements can be deleted in two slightly different ways. One way is to use the `delete` function of the resp. data structure. It accepts all the supported ways for specifying collections of statements supported by the resp. `add` counterparts and removes the found triples.
Another way to delete statements is the `delete` function of the `RDF.Data` protocol. The only difference to `delete` functions on the data structures directly is how it handles the deletion of a `RDF.Description` from another `RDF.Description` or `RDF.Graph` from another `RDF.Graph`. While the dedicated RDF data structure function ignores the description subject or graph name and removes the statements even when they don't match, `RDF.Data.delete` only deletes when the description’s subject resp. graph name matches.
An existing `RDF.List` in a given graph can be created with `RDF.List.new` or its alias `RDF.list`, passing it the head node of a list and the graph containing the statements constituting the list.
An entirely new `RDF.List` can be created with `RDF.List.from` or `RDF.list` and a native Elixir list or an Elixir `Enumerable` with values of all types that are allowed for objects of statements (including nested lists).
```elixir
list = RDF.list(["foo", EX.bar, ~B<bar>, [1, 2, 3]])
All structures of RDF terms also support a `values` function. The `values` functions on `RDF.Triple`, `RDF.Quad` and `RDF.Statement` are converting a tuple of RDF terms to a tuple of the resp. Elixir values. On all of the other RDF data structures (`RDF.Description`, `RDF.Graph` and `RDF.Dataset`) and the general `RDF.Data` protocol the `values` functions are producing a map of the converted Elixir values.
All of these `values` functions also support an optional second argument for a function with a custom mapping of the terms depending on their statement position. The function will be called with a tuple `{statement_position, rdf_term}` where `statement_position` is one of the atoms `:subject`, `:predicate`, `:object` or `:graph_name`, while `rdf_term` is the RDF term to be mapped.
RDF graphs and datasets can be read and written to files or strings in a RDF serialization format using the `read_file`, `read_string` and `write_file`, `write_string` functions of the resp. `RDF.Serialization.Format` module.
All of these `read_*` and `write_*` functions are also available in the top-level `RDF` module, where the serialization format can be specified in various ways, either by providing the format name via the `format` option, or via the `media_type` option.
For serialization formats which support it, you can provide a base IRI on the read functions with the `base` option. You can also provide a default base IRI in your application configuration, which will be used when no `base` option is given.
The `Date` and `DateTime` modules of Elixir versions <1.7.2don'thandlenegativeyearsproperly.Incaseyou'redatacontainsnegativeyearsin`xsd:date`or`xsd:dateTime`literals,you'llhavetoupgradetoanewerElixirversion.
There's still much to do for a complete RDF ecosystem for Elixir, which means there are plenty of opportunities for you to contribute. Here are some suggestions: