module RDF::Util::File

Wrapper for retrieving RDF resources from HTTP(S) and file: scheme locations.

By default, HTTP(S) resources are retrieved using Net::HTTP. However, If the [Rest Client](rubygems.org/gems/rest-client) gem is included, it will be used for retrieving resources, allowing for sophisticated HTTP caching using [REST Client Components](rubygems.org/gems/rest-client-components) allowing the use of ‘Rack::Cache` to avoid network access.

To use other HTTP clients, consumers can subclass {RDF::Util::File::HttpAdapter} and set the {RDF::Util::File.}.

Also supports the file: scheme for access to local files.

@since 0.2.4

Public Class Methods

http_adapter(use_net_http = false) click to toggle source

Get current HTTP adapter. If no adapter has been explicitly set, use RestClientAdapter (if RestClient is loaded), or the NetHttpAdapter

@param [Boolean] use_net_http use the NetHttpAdapter, even if other

adapters have been configured

@return [HttpAdapter] @since 1.2

# File lib/rdf/util/file.rb, line 244
def http_adapter(use_net_http = false)
  if use_net_http
    NetHttpAdapter
  else
    @http_adapter ||= begin
      # Otherwise, fallback to Net::HTTP
      if defined?(RestClient)
        RestClientAdapter
      else
        NetHttpAdapter
      end
    end
  end
end
http_adapter=(http_adapter) click to toggle source

Set the HTTP adapter @see .http_adapter @param [HttpAdapter] http_adapter @return [HttpAdapter] @since 1.2

# File lib/rdf/util/file.rb, line 232
def http_adapter= http_adapter
  @http_adapter = http_adapter
end
open_file(filename_or_url, proxy: nil, headers: {}, verify_none: false, **options) { |remote_document| ... } click to toggle source

Open the file, returning or yielding {RemoteDocument}.

Input received as non-unicode, is transformed to UTF-8. With Ruby >= 2.2, all UTF is normalized to [Unicode Normalization Form C (NFC)](unicode.org/reports/tr15/#Norm_Forms).

HTTP resources may be retrieved via proxy using the ‘proxy` option. If `RestClient` is loaded, they will use the proxy globally by setting something like the following:

`RestClient.proxy = "http://proxy.example.com/"`.

When retrieving documents over HTTP(S), use the mechanism described in [Providing and Discovering URI Documentation](www.w3.org/2001/tag/awwsw/issue57/latest/) to pass the appropriate ‘base_uri` to the block or as the return.

Applications needing HTTP caching may consider [Rest Client](rubygems.org/gems/rest-client) and [REST Client Components](rubygems.org/gems/rest-client-components) allowing the use of ‘Rack::Cache` as a local file cache.

@example using a local HTTP cache

require 'restclient/components'
require 'rack/cache'
RestClient.enable Rack::Cache
RDF::Util::File.open_file("http://example.org/some/resource")
  # => Cached resource if current, otherwise returned resource

@param [String] filename_or_url to open @param [String] proxy

HTTP Proxy to use for requests.

@param [Array, String] headers ({})

HTTP Request headers

Defaults `Accept` header based on available reader content types to allow for content negotiation based on available readers.

Defaults  `User-Agent` header, unless one is specified.

@param [Boolean] verify_none (false)

Don't verify SSL certificates

@param [Hash{Symbol => Object}] options

options are ignored in this implementation. Applications are encouraged
to override this implementation to provide more control over HTTP
headers and redirect following. If opening as a file,
options are passed to `Kernel.open`.

@return [RemoteDocument, Object] A {RemoteDocument}. If a block is given, the result of evaluating the block is returned. @yield [ RemoteDocument] A {RemoteDocument} for local files @yieldreturn [Object] returned from open_file @raise [IOError] if not found

# File lib/rdf/util/file.rb, line 299
def self.open_file(filename_or_url, proxy: nil, headers: {}, verify_none: false, **options, &block)
  filename_or_url = $1 if filename_or_url.to_s.match(/^file:(.*)$/)
  remote_document = nil

  if filename_or_url.to_s.match?(/^https?/)
    base_uri = filename_or_url.to_s

    remote_document = self.http_adapter(!!options[:use_net_http]).
      open_url(base_uri,
               proxy:       proxy,
               headers:     headers,
               verify_none: verify_none,
               **options)
  else
    # Fake content type based on found format
    format = RDF::Format.for(filename_or_url.to_s)
    content_type = format ? format.content_type.first : 'text/plain'
    # Open as a file, passing any options
    begin
      url_no_frag_or_query = RDF::URI(filename_or_url).dup
      url_no_frag_or_query.query = nil
      url_no_frag_or_query.fragment = nil
      options[:encoding] ||= Encoding::UTF_8
      Kernel.open(url_no_frag_or_query, "r", **options) do |file|
        document_options = {
          base_uri:     filename_or_url.to_s,
          charset:      file.external_encoding.to_s,
          code:         200,
          content_type: content_type,
          last_modified:file.mtime,
          headers:      {content_type: content_type, last_modified: file.mtime.xmlschema}
        }

        remote_document = RemoteDocument.new(file.read, document_options)
      end
    rescue Errno::ENOENT => e
      raise IOError, e.message
    end
  end

  if block_given?
    yield remote_document
  else
    remote_document
  end
end