class WebRobots
Public Class Methods
Source
# File lib/webrobots.rb, line 28 def initialize(user_agent, options = nil) @user_agent = user_agent options ||= {} @http_get = options[:http_get] || method(:http_get) crawl_delay_handler = case value = options[:crawl_delay] || :sleep when :ignore nil when :sleep method(:crawl_delay_handler) else if value.respond_to?(:call) value else raise ArgumentError, "invalid Crawl-delay handler: #{value.inspect}" end end @parser = RobotsTxt::Parser.new(user_agent, crawl_delay_handler) @parser_mutex = Mutex.new @robotstxt = create_cache() end
Creates a WebRobots
object for a robot named user_agent
, with optional options
.
-
:http_get => a custom method, proc, or anything that responds to .call(uri), to be used for fetching robots.txt. It must return the response body if successful, return an empty string if the resource is not found, and return nil or raise any error on failure. Redirects should be handled within this proc.
-
:crawl_delay => determines how to react to Crawl-delay directives. If
:sleep
is given,WebRobots
sleeps as demanded when allowed?(url)/disallowed?(url) is called. This is the default behavior. If:ignore
is given,WebRobots
does nothing. If a custom method, proc, or anything that responds to .call(delay, last_checked_at), it is called.