class DaimonSkycrawlers::Filter::UpdateChecker

This filter provides update checker for given URL.

Skip processing URLs that is latest (not updated since previous access).

Public Class Methods

new(storage: nil, base_url: nil) click to toggle source
Calls superclass method DaimonSkycrawlers::Filter::Base::new
# File lib/daimon_skycrawlers/filter/update_checker.rb, line 13
def initialize(storage: nil, base_url: nil)
  super(storage: storage)
  @base_url = nil
  @base_url = URI(base_url) if base_url
end

Public Instance Methods

call(message, connection: nil) click to toggle source

@param message [Hash] message includes `:url` @param connection [Faraday] @return [true|false] Return true when need update, otherwise return false

# File lib/daimon_skycrawlers/filter/update_checker.rb, line 24
def call(message, connection: nil)
  url = normalize_url(message[:url])
  message[:url] = url
  page = storage.read(message)
  return true unless page
  if connection
    response = connection.head(url)
  else
    response = Faraday.head(url)
  end
  headers = response.headers
  case
  when headers.key?("etag") && page.etag
    headers["etag"] != page.etag
  when headers.key?("last-modified") && page.last_modified_at
    if headers["last-modified"] < page.last_modified_at
      log.warn("#{url} returns old contents. #{headers["last-modified"]} < #{page.last_modified_at}")
    end
    headers["last-modified"] > page.last_modified_at
  else
    true
  end
end
Also aliased as: updated?
updated?(message, connection: nil)
Alias for: call