class NewsCrawler::Storage::URLQueue::URLQueueEngine
Basic class for URLQueue
engine. Subclass and implement all its method to create new URLQueue
engine, you should keep methods’ singature unchanged
Public Class Methods
Get engine list @return [ Array ] list of url queue engines
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 35 def self.get_engines @engine_list = @engine_list || [] @engine_list.inject({}) do | memo, klass | memo[klass::NAME.intern] = klass memo end end
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 29 def self.inherited(klass) @engine_list = (@engine_list || []) + [klass] end
Public Instance Methods
Add url with reference url @param [ String ] url URL @param [ String ] ref_url reference URL
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 94 def add(url, ref_url = '') raise NotImplementedError end
Get all url with status @return [ Array ] URL list
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 117 def all raise NotImplementedError end
Clear URLQueue
@return [ Fixnum ] number of urls removed
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 100 def clear raise NotImplementedError end
Find all visited urls with module’s state @param [ String ] module_name @param [ String ] state @param [ Fixnum ] max_depth max url depth return (inclusive) @return [ Array ] URL list
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 71 def find_all(module_name, state, max_depth = -1) raise NotImplementedError end
Find one visited url with given module process state @param [ String ] module_name @param [ String ] state one of unprocessed, processing, processed @param [ Fixnum ] max_depth max url depth return (inclusive) @return [ String, nil ] URL
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 80 def find_one(module_name, state, max_depth = -1) raise NotImplementedError end
Get list of unvisited URL @param [ Fixnum ] max_depth maximum depth of url return @return [ Array ] unvisited url with maximum depth (option)
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 87 def find_unvisited(max_depth = -1) raise NotImplementedError end
Set processing state of url in given module @param [ String ] module_name @param [ String ] url @param [ String ] state one of unprocessed, processing, processed
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 47 def mark(module_name, url, state) raise NotImplementedError end
Change all url in an state to other state @param [ String ] module_name @param [ String ] new_state new state @param [ String ] orig_state original state
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 55 def mark_all(module_name, new_state, orig_state = nil) raise NotImplementedError end
Mark all URLs as unvisited
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 111 def mark_all_unvisited raise NotImplementedError end
Mark an URL as visited @param [ String ] url
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 106 def mark_visited(url) raise NotImplementedError end
Produce next unprocessed url and mark it as processing @param [ String ] module_name @return [ String, nil ]
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 62 def next_unprocessed(module_name) raise NotImplementedError end