class SPARQL::Algebra::Operator::Dataset

The SPARQL GraphPattern `dataset` operator.

Instantiated with two operands, the first being an array of data source URIs, either bare, indicating a default dataset, or expressed as an array `[:named, <uri>]`, indicating that it represents a named data source.

This operator loads from the datasource, unless a graph named by the datasource URI already exists in the repository.

The contained BGP queries are then performed against the specified default and named graphs. Rather than using the actual default graph of the dataset, queries against the default dataset are run against named graphs matching a non-distinctive variable and the results are filtered against those URIs included in the default dataset.

Specifically, each BGP which is not part of a graph pattern is replaced with a union of graph patterns with that BGP repeated for each graph URI in the default dataset. This requires recursively updating the operator.

Each graph pattern containing a variable graph name is replaced by a filter on that variable such that the variable must match only those named datasets specified.

@example Dataset with one default and one named data source

(prefix ((: <http://example/>))
  (dataset (<data-g1.ttl> (named <data-g2.ttl>))
    (union
      (bgp (triple ?s ?p ?o))
      (graph ?g (bgp (triple ?s ?p ?o))))))

is effectively re-written to the following:

(prefix ((: <http://example/>))
  (union
    (graph <data-g1.ttl> (bgp (triple ?s ?p ?o)))
    (filter (= ?g <data-g2.ttl>)
      (graph ?g (bgp (triple ?s ?p ?o))))))

If no default or no named graphs are specified, these queries are eliminated.

@example Dataset with one default no named data sources

(prefix ((: <http://example/>))
  (dataset (<data-g1.ttl>)
    (union
      (bgp (triple ?s ?p ?o))
      (graph ?g (bgp (triple ?s ?p ?o))))))

is effectively re-written to the following:

(prefix ((: <http://example/>))
  (union
    (graph <data-g1.ttl> (bgp (triple ?s ?p ?o)))
    (bgp))

Multiple default graphs union the information from a graph query on each default datasource.

@example Dataset with two default data sources

(prefix ((: <http://example/>))
  (dataset (<data-g1.ttl> <data-g1.ttl)
    (bgp (triple ?s ?p ?o))))

is effectively re-written to the following:

(prefix ((: <http://example/>))
  (union
    (graph <data-g1.ttl> (bgp (triple ?s ?p ?o)))
    (graph <data-g2.ttl> (bgp (triple ?s ?p ?o)))))

Multiple named graphs place a filter on all variables used to identify those named graphs so that they are restricted to come only from the specified set. Note that this requires descending through expressions to find graph patterns using variables and placing a filter on each identified variable.

@example Dataset with two named data sources

(prefix ((: <http://example/>))
  (dataset ((named <data-g1.ttl>) (named <data-g2.ttl>))
    (graph ?g (bgp (triple ?s ?p ?o)))))

is effectively re-written to the following:

(prefix ((: <http://example/>))
  (filter ((= ?g <data-g1.ttl>) || (= ?g <data-g2.ttl>))
    (graph ?g (bgp (triple ?s ?p ?o))))))

@example Dataset with multiple named graphs @see www.w3.org/TR/sparql11-query/#specifyingDataset

Constants

NAME

Public Instance Methods

execute(queryable, **options, &base) click to toggle source

Executes this query on the given `queryable` graph or repository. Reads specified data sources into queryable. Named data sources are added using a context of the data source URI.

Datasets are specified in operand(1), which is an array of default or named graph URIs.

@param [RDF::Queryable] queryable

the graph or repository to query

@param [Hash{Symbol => Object}] options

any additional keyword options

@yield [solution]

each matching solution

@yieldparam [RDF::Query::Solution] solution @yieldreturn [void] ignored @return [RDF::Query::Solutions]

the resulting solution sequence

@see www.w3.org/TR/sparql11-query/#sparqlAlgebra

# File lib/sparql/algebra/operator/dataset.rb, line 129
def execute(queryable, **options, &base)
  debug(options) {"Dataset"}
  default_datasets = []
  named_datasets = []
  operand(0).each do |uri|
    case uri
    when Array
      # Format is (named <uri>), only need the URI part
      uri = uri.last
      debug(options) {"=> named data source #{uri}"}
      named_datasets << uri
    else
      debug(options) {"=> default data source #{uri}"}
      default_datasets << uri
    end
    load_opts = {logger: options.fetch(:logger, false), graph_name: uri, base_uri: uri}
    unless queryable.has_graph?(uri)
      debug(options) {"=> load #{uri}"}
      queryable.load(uri.to_s, **load_opts)
    end
  end
  debug(options) {
    require 'rdf/nquads'
    queryable.dump(:nquads)
  }

  # Create an aggregate based on queryable having just the bits we want
  aggregate = RDF::AggregateRepo.new(queryable)
  named_datasets.each {|name| aggregate.named(name) if queryable.has_graph?(name)}
  aggregate.default(*default_datasets.select {|name| queryable.has_graph?(name)})
  aggregate.query(operands.last, depth: options[:depth].to_i + 1, **options, &base)
end