class NewsCrawler::Storage::URLQueue::URLQueueEngine

Basic class for URLQueue engine. Subclass and implement all its method to create new URLQueue engine, you should keep methods’ singature unchanged

Public Class Methods

get_engines() click to toggle source

Get engine list @return [ Array ] list of url queue engines

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 35
def self.get_engines
  @engine_list = @engine_list || []
  @engine_list.inject({}) do | memo, klass |
    memo[klass::NAME.intern] = klass
    memo
  end
end
inherited(klass) click to toggle source
# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 29
def self.inherited(klass)
  @engine_list = (@engine_list || []) + [klass]
end

Public Instance Methods

add(url, ref_url = '') click to toggle source

Add url with reference url @param [ String ] url URL @param [ String ] ref_url reference URL

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 94
def add(url, ref_url = '')
  raise NotImplementedError
end
all() click to toggle source

Get all url with status @return [ Array ] URL list

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 117
def all
  raise NotImplementedError
end
clear() click to toggle source

Clear URLQueue @return [ Fixnum ] number of urls removed

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 100
def clear
  raise NotImplementedError
end
find_all(module_name, state, max_depth = -1) click to toggle source

Find all visited urls with module’s state @param [ String ] module_name @param [ String ] state @param [ Fixnum ] max_depth max url depth return (inclusive) @return [ Array ] URL list

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 71
def find_all(module_name, state, max_depth = -1)
  raise NotImplementedError
end
find_one(module_name, state, max_depth = -1) click to toggle source

Find one visited url with given module process state @param [ String ] module_name @param [ String ] state one of unprocessed, processing, processed @param [ Fixnum ] max_depth max url depth return (inclusive) @return [ String, nil ] URL

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 80
def find_one(module_name, state, max_depth = -1)
  raise NotImplementedError
end
find_unvisited(max_depth = -1) click to toggle source

Get list of unvisited URL @param [ Fixnum ] max_depth maximum depth of url return @return [ Array ] unvisited url with maximum depth (option)

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 87
def find_unvisited(max_depth = -1)
  raise NotImplementedError
end
mark(module_name, url, state) click to toggle source

Set processing state of url in given module @param [ String ] module_name @param [ String ] url @param [ String ] state one of unprocessed, processing, processed

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 47
def mark(module_name, url, state)
  raise NotImplementedError
end
mark_all(module_name, new_state, orig_state = nil) click to toggle source

Change all url in an state to other state @param [ String ] module_name @param [ String ] new_state new state @param [ String ] orig_state original state

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 55
def mark_all(module_name, new_state, orig_state = nil)
  raise NotImplementedError
end
mark_all_unvisited() click to toggle source

Mark all URLs as unvisited

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 111
def mark_all_unvisited
  raise NotImplementedError
end
mark_visited(url) click to toggle source

Mark an URL as visited @param [ String ] url

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 106
def mark_visited(url)
  raise NotImplementedError
end
next_unprocessed(module_name) click to toggle source

Produce next unprocessed url and mark it as processing @param [ String ] module_name @return [ String, nil ]

# File lib/news_crawler/storage/url_queue/url_queue_engine.rb, line 62
def next_unprocessed(module_name)
  raise NotImplementedError
end