class IncludedInFile

A specialized class using a plain text to track items stored. It supports three operations: new, <<, and include? . Together these can be used to add items to the text file, then determine whether the item has been added.

To use it with Spider use the check_already_seen_with method:

Spider.start_at('http://example.com/') do |s|
  s.check_already_seen_with IncludedInFile.new('/tmp/crawled.log')
end

Public Class Methods

new(filepath) click to toggle source

Construct a new IncludedInFile instance. @param filepath [String] as path of file to store crawled URL

# File lib/spider/included_in_file.rb, line 15
def initialize(filepath)
  @filepath = filepath
  # create file if not exists
  File.write(@filepath, '') unless File.file?(@filepath)
  @urls = File.readlines(@filepath).map(&:chomp)
end

Public Instance Methods

<<(v) click to toggle source

Add an item to the file & array of URL.

# File lib/spider/included_in_file.rb, line 23
def <<(v)
  @urls << v.to_s
  File.write(@filepath, "#{v}\r\n", File.size(@filepath), mode: 'a')
end
include?(v) click to toggle source

True if the item is in the file.

# File lib/spider/included_in_file.rb, line 29
def include?(v)
  @urls.include? v.to_s
end