module Spidr::Sanitizers

The {Sanitizers} module adds methods to {Agent} which control the sanitation of incoming links.

Attributes

strip_fragments[RW]

Specifies whether the Agent will strip URI fragments

strip_query[RW]

Specifies whether the Agent will strip URI queries

Public Instance Methods

sanitize_url(url) click to toggle source

Sanitizes a URL based on filtering options.

@param [URI::HTTP, URI::HTTPS, String] url

The URL to be sanitized

@return [URI::HTTP, URI::HTTPS]

The new sanitized URL.

@since 0.2.2

# File lib/spidr/sanitizers.rb, line 26
def sanitize_url(url)
  url = URI(url.to_s) unless url.kind_of?(URI)

  url.fragment = nil if @strip_fragments
  url.query    = nil if @strip_query

  return url
end

Protected Instance Methods

initialize_sanitizers(options={}) click to toggle source

Initializes the Sanitizer rules.

@param [Hash] options

Additional options.

@option options [Boolean] :strip_fragments (true)

Specifies whether or not to strip the fragment component from URLs.

@option options [Boolean] :strip_query (false)

Specifies whether or not to strip the query component from URLs.

@since 0.2.2

# File lib/spidr/sanitizers.rb, line 51
def initialize_sanitizers(options={})
  @strip_fragments = options.fetch(:strip_fragments,true)
  @strip_query     = options.fetch(:strip_query,false)
end