class ProxyFetcher::Providers::Base
Base
class for all the ProxyFetcher
providers.
Public Class Methods
Just synthetic sugar to make it easier to call fetch_proxies!
method.
# File lib/proxy_fetcher/providers/base.rb, line 42 def self.fetch_proxies!(*args) new.fetch_proxies!(*args) end
Public Instance Methods
Loads proxy provider page content, extract proxy list from it and convert every entry to proxy object.
# File lib/proxy_fetcher/providers/base.rb, line 9 def fetch_proxies(filters = {}) raw_proxies = load_proxy_list(filters) proxies = raw_proxies.map { |html_node| build_proxy(html_node) }.compact proxies.reject { |proxy| proxy.addr.nil? } end
@return [Hash]
Provider headers required to fetch the proxy list
# File lib/proxy_fetcher/providers/base.rb, line 33 def provider_headers {} end
# File lib/proxy_fetcher/providers/base.rb, line 22 def provider_method :get end
# File lib/proxy_fetcher/providers/base.rb, line 26 def provider_params {} end
# File lib/proxy_fetcher/providers/base.rb, line 18 def provider_url raise NotImplementedError, "#{__method__} must be implemented in a descendant class!" end
# File lib/proxy_fetcher/providers/base.rb, line 37 def xpath raise NotImplementedError, "#{__method__} must be implemented in a descendant class!" end
Protected Instance Methods
# File lib/proxy_fetcher/providers/base.rb, line 107 def build_proxy(*args) to_proxy(*args) rescue StandardError => e ProxyFetcher.logger.warn( "Failed to build Proxy for #{self.class.name.split("::").last} " \ "due to error: #{e.message}" ) nil end
Loads provider HTML and parses it with internal document object.
@param url [String]
URL to fetch
@param filters [Hash]
filters for proxy provider
@return [ProxyFetcher::Document]
ProxyFetcher document object
# File lib/proxy_fetcher/providers/base.rb, line 90 def load_document(url, filters = {}) html = load_html(url, filters) ProxyFetcher::Document.parse(html) end
Loads raw provider HTML with proxies.
@param url [String]
Provider URL
@param filters [#to_h]
Provider filters (Hash-like object)
@return [String]
HTML body from the response
# File lib/proxy_fetcher/providers/base.rb, line 59 def load_html(url, filters = {}) unless filters.respond_to?(:to_h) raise ArgumentError, "filters must be a Hash or respond to #to_h" end if filters&.any? # TODO: query for post request? uri = URI.parse(url) uri.query = URI.encode_www_form(provider_params.merge(filters.to_h)) url = uri.to_s end ProxyFetcher.config.http_client.fetch( url, method: provider_method, headers: provider_headers, params: provider_params ) end
Fetches HTML content by sending HTTP request to the provider URL and parses the document (built as abstract ProxyFetcher::Document
) to return all the proxy entries (HTML nodes).
@return [Array<ProxyFetcher::Document::Node>]
Collection of extracted HTML nodes with full proxy info
# File lib/proxy_fetcher/providers/base.rb, line 102 def load_proxy_list(filters = {}) doc = load_document(provider_url, filters) doc.xpath(xpath) end
Convert HTML element with proxy info to ProxyFetcher::Proxy
instance.
Abstract method. Must be implemented in a descendant class
@return [Proxy]
new proxy object from the HTML node
# File lib/proxy_fetcher/providers/base.rb, line 125 def to_proxy(*) raise NotImplementedError, "#{__method__} must be implemented in a descendant class!" end