class IncludedInFile
A specialized class using a plain text to track items stored. It supports three operations: new, <<, and include? . Together these can be used to add items to the text file, then determine whether the item has been added.
To use it with Spider
use the check_already_seen_with method:
Spider.start_at('http://example.com/') do |s| s.check_already_seen_with IncludedInFile.new('/tmp/crawled.log') end
Public Class Methods
new(filepath)
click to toggle source
Construct a new IncludedInFile
instance. @param filepath [String] as path of file to store crawled URL
# File lib/spider/included_in_file.rb, line 15 def initialize(filepath) @filepath = filepath # create file if not exists File.write(@filepath, '') unless File.file?(@filepath) @urls = File.readlines(@filepath).map(&:chomp) end
Public Instance Methods
<<(v)
click to toggle source
Add an item to the file & array of URL.
# File lib/spider/included_in_file.rb, line 23 def <<(v) @urls << v.to_s File.write(@filepath, "#{v}\r\n", File.size(@filepath), mode: 'a') end
include?(v)
click to toggle source
True if the item is in the file.
# File lib/spider/included_in_file.rb, line 29 def include?(v) @urls.include? v.to_s end