class Greenmonster::Spider
The Gameday XML Spider
utility
Public Instance Methods
pull_day(date, sport_code)
click to toggle source
Pull Gameday XML files for a given date. Default options for the spider are to pull games with sport_code of ‘mlb’ (games played by MLB games rather than MiLB teams or foreign teams) and to pull games on the current date.
Example:
# Pull games from July 4, 2011 >> Gameday::Spider.pull_day({:date => Date.new(2011,7,1), :games_folder => '/Users/geoff/games'})
Arguments:
args: (Hash)
# File lib/greenmonster/spider.rb, line 37 def pull_day(date, sport_code) game_links_on_gameday_date_page(date, sport_code).each do |game_id| pull_game(game_id, date) end end
pull_days(range, sport_code)
click to toggle source
Pull Gameday XML files for a range of dates. The args hash passes arguments like games_folder location on to Spider.pull.
Example:
# Pull all games in MLB in July 2011 >> Gameday::Spider.pull_days(Date.new(2011,7,1)..Date.new(2011,7,31), {:games_folder => '/Users/geoff/games'})
Arguments:
range: (Range) args: (Hash)
# File lib/greenmonster/spider.rb, line 55 def pull_days(range, sport_code) range.each { |date| self.pull_day(date, sport_code) } end
pull_game(game_id, date)
click to toggle source
Pull Gameday XML files for a given game, specified by the game ID. If date and sport code are not specified as options, these values are guessed from the game ID string using the home team’s sport code and the date from the scheduled date values in the game ID.
Example:
>> Gameday::Spider.pull_game('',{:games_folder => })
# File lib/greenmonster/spider.rb, line 15 def pull_game(game_id, date) make_folders_for_game(game_id, date) %w(boxscore.xml game_events.xml inning_all.xml linescore.xml players.xml).each do |file_name| copy_gameday_xml(game_id, date, file_name) end end
Private Instance Methods
copy_gameday_xml(game_id, date, file_name)
click to toggle source
# File lib/greenmonster/spider.rb, line 123 def copy_gameday_xml(game_id, date, file_name) download = download_gameday_xml(game_id, date, file_name) unless download.include?('404 Not Found') open(local_game_path(game_id, date) + inning_prefix(file_name) + file_name, 'w') do |file| file.write(download) end end end
download_gameday_xml(game_id, date, file_name)
click to toggle source
# File lib/greenmonster/spider.rb, line 110 def download_gameday_xml(game_id, date, file_name) self.class.get(remote_file_url(game_id, date, file_name)).body.force_encoding("ISO-8859-1").encode("UTF-8") end
format_date_as_folder(date)
click to toggle source
# File lib/greenmonster/spider.rb, line 133 def format_date_as_folder(date) Greenmonster.format_date_as_folder(date) end
game_links_on_gameday_date_page(date, sport_code)
click to toggle source
# File lib/greenmonster/spider.rb, line 72 def game_links_on_gameday_date_page(date, sport_code) links_on_gameday_date_page(date, sport_code).select do |link| link[0,4] == "gid_" && link[-5,4] != "_bak" end end
gameday_date_and_sport_code_url(date, sport_code)
click to toggle source
# File lib/greenmonster/spider.rb, line 82 def gameday_date_and_sport_code_url(date, sport_code) "#{gameday_url_root}#{sport_code}/#{format_date_as_folder(date)}" end
gameday_game_url(game_id, date)
click to toggle source
# File lib/greenmonster/spider.rb, line 86 def gameday_game_url(game_id, date) gameday_url_root + remote_game_path(game_id, date) end
gameday_url_root()
click to toggle source
# File lib/greenmonster/spider.rb, line 78 def gameday_url_root "http://gd2.mlb.com/components/game/" end
get_gameday_date_page(date, sport_code)
click to toggle source
# File lib/greenmonster/spider.rb, line 62 def get_gameday_date_page(date, sport_code) self.class.get(gameday_date_and_sport_code_url(date, sport_code)) end
home_sport_code_from_game_id(game_id)
click to toggle source
# File lib/greenmonster/spider.rb, line 94 def home_sport_code_from_game_id(game_id) game_id[-5,3] end
inning_prefix(file_name)
click to toggle source
# File lib/greenmonster/spider.rb, line 98 def inning_prefix(file_name) if file_name =~ /inning/ 'inning/' else '' end end
links_on_gameday_date_page(date, sport_code)
click to toggle source
# File lib/greenmonster/spider.rb, line 66 def links_on_gameday_date_page(date, sport_code) Nokogiri::XML(get_gameday_date_page(date, sport_code)).search('a').map do |a| a.attribute('href').value end end
local_game_path(game_id, date)
click to toggle source
# File lib/greenmonster/spider.rb, line 114 def local_game_path(game_id, date) Pathname.new( Greenmonster.games_folder + home_sport_code_from_game_id(game_id) + format_date_as_folder(date) + game_id ) end
make_folders_for_game(game_id, date)
click to toggle source
# File lib/greenmonster/spider.rb, line 137 def make_folders_for_game(game_id, date) FileUtils.mkdir_p(local_game_path(game_id, date) + 'inning') end
remote_file_url(game_id, date, file_name)
click to toggle source
# File lib/greenmonster/spider.rb, line 106 def remote_file_url(game_id, date, file_name) gameday_game_url(game_id, date) + '/' + inning_prefix(file_name) + '/' + file_name end
remote_game_path(game_id, date)
click to toggle source
# File lib/greenmonster/spider.rb, line 90 def remote_game_path(game_id, date) "#{home_sport_code_from_game_id(game_id)}/#{format_date_as_folder(date)}/#{game_id}" end