class Wmap::GoogleSearchScraper

We build our own Google search class by querying Google search engine from its web interface, by simulating

an anonymous web surfer. Note: we don't use the native Google API due to its pricing structure - We don't have budget for

this project, and we can not use the free version due to the limitation of 100 queries per day for free. See https://github.com/google/google-api-ruby-client for details.

Constants

File_keywords

Google search key words

File_locator

Google search engine web interface locators

Attributes

discovered_sites_from_scraper[R]
discovered_urls_from_scraper[R]
http_timeout[RW]
keyword_list[RW]
verbose[RW]

Public Class Methods

new(params = {}) click to toggle source

Scraper default variables

# File lib/wmap/google_search_scraper.rb, line 29
def initialize (params = {})           
        @verbose=params.fetch(:verbose, false)
        @http_timeout=params.fetch(:http_timeout, 5000)
        # Discovered data store
        @discovered_urls_from_scraper=Hash.new
        @discovered_sites_from_scraper=Hash.new
end

Public Instance Methods

get_discovered_sites_from_scraper() click to toggle source

'getter' for the discovered sites from the Google search

# File lib/wmap/google_search_scraper.rb, line 138
def get_discovered_sites_from_scraper
        puts "Getter for the discovered sites by the scraper. " if @verbose
        begin
                return @discovered_sites_from_scraper.keys.sort
rescue => ee
                puts "Error on method get_discovered_sites_from_scraper: #{ee}" if @verbose
end
end
Also aliased as: print
get_discovered_urls_from_scraper() click to toggle source

'getter' for the discovered urls from the Google search

# File lib/wmap/google_search_scraper.rb, line 149
def get_discovered_urls_from_scraper
        puts "Getter for the discovered urls by the scraper. " if @verbose
        begin
                return @discovered_urls_from_scraper.keys.sort
rescue => ee
                puts "Error on method get_discovered_urls_from_scraper: #{ee}" if @verbose
end
end
google_worker(keyword) click to toggle source

Main worker method to simulate extensive google keyword searches on over 100+ countries and regions. The search will extract known web services related to the keyword by the Google Inc.

# File lib/wmap/google_search_scraper.rb, line 38
def google_worker (keyword)
        begin
                puts "Start the Google worker for: #{keyword}" if @verbose
                links=Array.new
                keyword=keyword.strip
                google_locators = file_2_list(File_locator)
                google_locators.map do |locator|
                        doc=google_search(locator,keyword) unless keyword.nil?
                        links+=extract_links(doc) unless doc.nil? 
                end
                return links.uniq.sort-["",nil]
        rescue Exception => ee
                puts "Exception on the method google_worker for #{keyword}: #{ee}" if @verbose
                return nil
        end   
end
Also aliased as: worker, search
google_workers(keyword_list=file_2_list(File_keywords)) click to toggle source

Main method to collect intelligences on the Google vast data warehouse. It works by hitting the Google engines with the keyword list. This exhausive method will sweep through the Google engines in over 100+ countries and regions one by one, in order to collect all related web service links collected by known the Google, Inc. across the global Internet.

# File lib/wmap/google_search_scraper.rb, line 58
def google_workers(keyword_list=file_2_list(File_keywords)) 
        begin
                puts "Start the Google worker for: #{keyword_list}" if @verbose
                links=Array.new                      
                keyword_list.map do |keyword|
                        links+=google_worker(keyword)
                end
                return links.uniq.sort
        rescue Exception => ee
                puts "Exception on the method google_workers for #{keyword_list}: #{ee}" if @verbose
                return nil
        end   
end
Also aliased as: workers
print_discovered_sites_from_scraper() click to toggle source

Method to print out discovery Sites result

print_discovered_urls_from_scraper() click to toggle source

Method to print out discovery URL result

save(file)
save_discovered_sites_from_scraper(file) click to toggle source

Save the discovered sites into a local file

# File lib/wmap/google_search_scraper.rb, line 159
def save_discovered_sites_from_scraper (file)
        puts "Save the discovery result(sites) into a local file: #{file}" if @verbose
        begin
                f=File.open(file, 'w')
                timestamp=Time.now
                f.puts "# Discovery result written by Wmap::GoogleSearchScraper.save_discovered_sites_from_scraper method at #{timestamp}\n"
                @discovered_sites_from_scraper.keys.sort.map { |x| f.puts "#{x}\n" }
                f.close
                raise "Unknown problem saving the result to file: #{file}" unless File.exist?(file)
                puts "Done saving the discovery result into the local file: #{file}" 
rescue => ee
                puts "Error on method save_discovered_sites_from_scraper: #{ee}" if @verbose
end
end
Also aliased as: save
worker(keyword)
Alias for: google_worker
workers(keyword_list=file_2_list(File_keywords))
Alias for: google_workers