class Aspire::Caching::Builder

Caches Aspire API objects and their references

Attributes

cache[RW]

@!attribute [rw] cache

@return [Aspire::Caching::Cache] the Aspire cache

Public Class Methods

new(cache = nil) click to toggle source

Initialises a new Cache instance @param cache [Aspire::Caching::Cache] the Aspire cache @return [void]

# File lib/aspire/caching/builder.rb, line 26
def initialize(cache = nil)
  self.cache = cache
end

Public Instance Methods

build(enumerator, clear: false) click to toggle source

Builds a cache of Aspire lists from the Aspire All Lists report @param enumerator [Aspire::Enumerator::ReportEnumerator] the Aspire

All Lists report enumerator

@param clear [Boolean] if true, clear the cache before building @return [Integer] the number of lists cached

# File lib/aspire/caching/builder.rb, line 35
def build(enumerator, clear: false)
  # Empty the cache if required
  cache.clear if clear
  # Cache the enumerated lists
  # - call with reload: false so that existing cache entries are ignored
  #   to speed up processing
  lists = 0
  time = Benchmark.measure do
    enumerator.each do |row|
      write_list(row['List Link'], reload: false)
      lists += 1
    end
  end
  # Log completion
  cache.logger.info("#{lists} lists cached in #{duration(time)}")
end
resume(enumerator) click to toggle source

Resumes an interrupted build @param enumerator [Aspire::Enumerator::ReportEnumerator] the Aspire

All Lists report enumerator
# File lib/aspire/caching/builder.rb, line 55
def resume(enumerator)
  # Log activity
  cache.logger.info('Resuming previous build')
  # Reload any list marked as in-progress
  reload_marked_lists
  # Resume the build
  build(enumerator, clear: false)
end
write(url = nil, data = nil, list: nil, reload: true, urls: {}) click to toggle source

Caches an Aspire linked data API object.

Use write(url) to build a cache for the first time.
Use write(url, reload: true) to reload parts of the cache.

@param url [String, Aspire::Caching::CacheEntry] the URL or cache entry # of the API object @param data [Hash, nil] the parsed JSON data to be written to the cache;

if omitted, this is read from the API

@param list [Aspire::Caching::CacheEntry] the parent list cache entry;

if present, this implies that references to other lists are ignored

@param reload [Boolean] if true, reload the cache entry from the API,

otherwise do nothing if the entry is already in the cache

@param urls [Hash] the set of URLs handled in the current operation @return [void]

# File lib/aspire/caching/builder.rb, line 77
def write(url = nil, data = nil, list: nil, reload: true, urls: {})
  #
  # Parsed data from the Linked Data API has the following structure:
  # { url => {primary-object},
  #   related-url1 => {related-object1}, ... }
  # where url => {primary-object} is the object referenced by the url
  # parameter, and the related URLs/objects are objects referenced by
  # the primary object and included in the API response.
  #
  # The primary and related objects are written to the caching before any
  # object references within the primary and related objects are followed.
  # This should reduce unnecessary duplication of API calls.
  #
  # Some objects with a linked data URL are not accessible through that
  # API(e.g. users /users/<user-id> are not accessible, but user notes
  # /users/<user-id>/notes<note-id> are accessible).
  #
  # Some objects with a linked data URL are accessible though the API but
  # do not return JSON-LD (e.g. events /events/<event-id> return regular
  # JSON rather than JSON-LD). These objects are cached but no attempt is
  # made to follow LD references within them.
  #
  # byebug if url.is_a?(String) && url.include?('34C1190E-F50E-35CB-94C9-F476963D69C0')
  # byebug if url.is_a?(Aspire::Caching::CacheEntry) && url.url.include?('34C1190E-F50E-35CB-94C9-F476963D69C0')
  entry = cache_entry(url, list)
  return unless entry && write?(entry, urls, list, reload)
  write_data(entry, urls, data, list, reload)
rescue NotCacheable
  # cache.logger.debug("#{url} not cacheable")
rescue StandardError => e
  # Log the error and continue processing
  Raven.capture_exception(e)
  # cache.logger.error("#{e}\n#{e.backtrace.join('\n')}")
  cache.logger.error(e.to_s)
rescue Exception => e
  # Log the error and fail
  Raven.capture_exception(e)
  # cache.logger.fatal("#{e}\n#{e.backtrace.join('\n')}")
  cache.logger.fatal(e.to_s)
  raise e
end
write_list(url = nil, data = nil, reload: true) click to toggle source

Caches an Aspire linked data API list object and ignores any references to other lists @param url [String, Aspire::Caching::CacheEntry] the URL or cache entry

of the API list object

@param data [Hash, nil] the parsed JSON data to be written to the cache;

if omitted, this is read from the API

@param reload [Boolean] if true, reload the cache entry from the API,

otherwise do nothing if the entry is already in the cache

@return [void]

# File lib/aspire/caching/builder.rb, line 128
def write_list(url = nil, data = nil, reload: true)
  entry = cache_entry(url)
  raise ArgumentError, 'List expected' unless entry.list?
  write(entry, data, list: entry, reload: reload)
rescue NotCacheable
  # cache.logger.debug("#{url} not cacheable")
end

Private Instance Methods

already_cached?(entry, reload) click to toggle source

Returns true if a cached URL should be reloaded, false if not @param entry [Aspire::Caching::CacheEntry] the cache entry @param reload [Boolean] if true, reload the cache entry from the API,

otherwise do nothing if the entry is already in the cache
# File lib/aspire/caching/builder.rb, line 142
def already_cached?(entry, reload)
  # If reloading, skip cached entries only if marked as in-progress
  # If not reloading, skip all cached entries
  if entry.marked? && reload
    cache.logger.debug("#{entry.url} ignored, in progress (reload)")
    return true
  end
  if entry.cached? && !reload
    cache.logger.debug("#{entry.url} ignored, in cache")
    return true
  end
  # Otherwise the entry is not cached
  false
end
already_handled?(entry, urls) click to toggle source

Returns true if a URL has already been handled in this transaction @param entry [Aspire::Caching::CacheEntry] the cache entry @param urls [Hash] the set of URLs handled in the current operation @return [Boolean] true if the URL has already been handled, false if not

# File lib/aspire/caching/builder.rb, line 161
def already_handled?(entry, urls)
  return false unless urls.include?(entry.url)
  # cache.logger.debug("#{entry.url} already handled")
  true
end
cache_entry(url, default = nil) click to toggle source

Returns the CacheEntry instance for a URL @param url [String, Aspire::Caching::CacheEntry] the URL or cache entry @param default [Aspire::Caching::CacheEntry, nil] the default if URL is

not given

@return [Aspire::Caching::CacheEntry] the cache entry for the URL

# File lib/aspire/caching/builder.rb, line 172
def cache_entry(url, default = nil)
  return default if url.nil?
  return url if url.is_a?(CacheEntry)
  CacheEntry.new(url, cache)
end
reload(entry) click to toggle source

Reloads a cache entry @param entry [Aspire::Caching::CacheEntry] the cache entry @return [void]

# File lib/aspire/caching/builder.rb, line 181
def reload(entry)
  cache.logger.log(Logger::INFO, "Reloading #{entry.url}")
  entry.delete(force: true)
  if entry.list?(strict: true)
    write_list(entry, reload: true)
  else
    write(entry, reload: true)
  end
end
reload_marked_entries(*types) click to toggle source

Reloads any entry marked as in-progress Positional parameters are the object types to include, e.g. ‘lists’, ‘resources’ etc. - default: all object types @return [void]

# File lib/aspire/caching/builder.rb, line 195
def reload_marked_entries(*types)
  cache.marked_entries(*types) { |entry| reload(entry) }
end
reload_marked_lists() click to toggle source

Reloads any list marked as in-progress @return [void]

# File lib/aspire/caching/builder.rb, line 201
def reload_marked_lists
  cache.marked_entries('lists') { |entry| reload(entry) }
end
unrelated_list?(entry, parent_list) click to toggle source

Returns true if the cache entry is a list which is unrelated to the parent list. This prevents unrelated lists being downloaded through paths such as list.usedBy -> module.usesList -> [unrelated lists]). Returns false if:

no parent list is provided,
or the cache entry is not a list,
or it is the same as the parent list,
or it is a child of the parent list.

@param entry [Aspire::Caching::CacheEntry] the cache entry @param parent_list [Aspire::Caching::CacheEntry] the parent list entry @return [Boolean] true if the cache entry is a list unrelated to the

parent list, otherwise false
# File lib/aspire/caching/builder.rb, line 217
def unrelated_list?(entry, parent_list)
  # Ignore if no parent list is given or the entry is not a list/child
  return false unless parent_list
  # Ignore if the entry is not a list
  return false unless entry.list?(strict: false)
  # Ignore if the entry is a child of (or the same as) the parent list
  return false if entry.child_of?(parent_list, strict: false)
  # Otherwise the entry is a list unrelated to the parent list
  msg = "#{entry.url} ignored, not related to #{parent_list.url}"
  cache.logger.debug(msg)
  true
end
write?(entry, urls, parent_list = nil, reload = true) click to toggle source

Returns true if the URL should be written to the cache, false if not @param entry [Aspire::Caching::CacheEntry] the cache entry @param urls [Hash] the set of URLs handled in the current operation @param parent_list [Aspire::Caching::CacheEntry] the parent list entry @param reload [Boolean] if true, reload the cache entry from the API,

otherwise do nothing if the entry is already in the cache

@return [Boolean] true if the URL should be written to the cache, false

if not
# File lib/aspire/caching/builder.rb, line 345
def write?(entry, urls, parent_list = nil, reload = true)
  # Ignore URLs previously handled in the current operation
  return false if already_handled?(entry, urls)
  # Ignore cached URLs
  return false if already_cached?(entry, reload)
  # Only follow list links for the same parent list
  return false if unrelated_list?(entry, parent_list)
  true
end
write_data(entry, urls, data = nil, parent_list = nil, reload = true) click to toggle source

Writes a linked data API object and its references to the caching @param entry [Aspire::Caching::CacheEntry] the cache entry @param urls [Hash] the set of URLs handled in the current operation @param data [Hash, nil] the parsed JSON data to be written to the cache;

if omitted, this is read from the API

@param parent_list [Aspire::Caching::CacheEntry] the parent list entry @param reload [Boolean] if true, reload the cache entry from the API,

otherwise do nothing if the entry is already in the cache

@return [void]

# File lib/aspire/caching/builder.rb, line 239
def write_data(entry, urls, data = nil, parent_list = nil, reload = true)
  # Read the linked data and associated JSON API data into the cache
  linked_data, json_data = write_object(entry, urls, data, reload)
  if linked_data && entry.references?
    # Start processing this URL
    entry.mark
    # Write the related linked data objects to the cache
    write_related(entry, urls, linked_data, parent_list, reload)
    # Write the referenced API objects to the cache
    write_references(urls, linked_data, parent_list, reload)
    # Finish processing this URL
    entry.unmark
  end
  # Return the linked data and JSON API objects
  [linked_data, json_data]
end
write_object(entry, urls, data = nil, reload = true) click to toggle source

Caches a linked data API object and any associated JSON API object @param entry [Aspire::Caching::CacheEntry] the cache entry @param urls [Hash] the set of URLs handled in the current operation @param data [Hash, nil] the parsed JSON linked data of the object; if

omitted, the data is read from the API URL

@param reload [Boolean] if true, reload the cache entry from the API,

otherwise do nothing if the entry is already in the cache

@return [Array] the unparsed and parsed linked data of the object

# File lib/aspire/caching/builder.rb, line 264
def write_object(entry, urls, data = nil, reload = true)
  # Ignore the cache if reloading
  use_cache = !reload
  # Get the linked data object
  data = write_object_data(entry, data, use_cache)
  # Get the JSON API object if available
  json = write_object_json(entry, use_cache)
  # Flag the URL as handled
  urls[entry.url] = true
  # Return the object data
  [data, json]
end
write_object_data(entry, data, use_cache) click to toggle source

Writes a linked data API object to the cache @param entry [Aspire::Caching::CacheEntry] the cache entry @param data [Hash] the data to write to the cache @param use_cache [Boolean] if true, return data from the cache,

otherwise update the cache with data from the API
# File lib/aspire/caching/builder.rb, line 282
def write_object_data(entry, data, use_cache)
  if data
    cache.write(data: data, entry: entry)
  else
    cache.read(entry: entry, use_cache: use_cache)
  end
end
write_object_json(entry, use_cache) click to toggle source

Writes a JSON API object to the cache @param entry [Aspire::Caching::CacheEntry] the cache entry @param use_cache [Boolean] if true, return data from the cache,

otherwise update the cache with data from the API
# File lib/aspire/caching/builder.rb, line 294
def write_object_json(entry, use_cache)
  return nil unless entry.json?
  cache.read(entry: entry, json: true, use_cache: use_cache)
end
write_references(urls, data, parent_list = nil, reload = true) click to toggle source

Caches all the objects referenced by the argument object @param urls [Hash] the set of URLs handled in the current operation @param data [Hash] the parsed linked data object @param parent_list [Aspire::Caching::CacheEntry] the parent list entry @param reload [Boolean] if true, reload the cache entry from the API,

otherwise do nothing if the entry is already in the cache

@return [void]

# File lib/aspire/caching/builder.rb, line 306
def write_references(urls, data, parent_list = nil, reload = true)
  data.each do |url, object|
    # Write each URI to the cache
    references(url, object).each do |uri|
      # byebug if uri.is_a?(String) && uri.include?('34C1190E-F50E-35CB-94C9-F476963D69C0')
      # byebug if uri.is_a?(Aspire::Caching::CacheEntry) && uri.url.include?('34C1190E-F50E-35CB-94C9-F476963D69C0')
      write(uri, list: parent_list, reload: reload, urls: urls)
    end
  end
end