class DaimonSkycrawlers::Filter::UpdateChecker
This filter provides update checker for given URL.
Skip processing URLs that is latest (not updated since previous access).
Public Class Methods
new(storage: nil, base_url: nil)
click to toggle source
Calls superclass method
DaimonSkycrawlers::Filter::Base::new
# File lib/daimon_skycrawlers/filter/update_checker.rb, line 13 def initialize(storage: nil, base_url: nil) super(storage: storage) @base_url = nil @base_url = URI(base_url) if base_url end
Public Instance Methods
call(message, connection: nil)
click to toggle source
@param message [Hash] message includes `:url` @param connection [Faraday] @return [true|false] Return true when need update, otherwise return false
# File lib/daimon_skycrawlers/filter/update_checker.rb, line 24 def call(message, connection: nil) url = normalize_url(message[:url]) message[:url] = url page = storage.read(message) return true unless page if connection response = connection.head(url) else response = Faraday.head(url) end headers = response.headers case when headers.key?("etag") && page.etag headers["etag"] != page.etag when headers.key?("last-modified") && page.last_modified_at if headers["last-modified"] < page.last_modified_at log.warn("#{url} returns old contents. #{headers["last-modified"]} < #{page.last_modified_at}") end headers["last-modified"] > page.last_modified_at else true end end
Also aliased as: updated?