class WaybackArchiver::Sitemap
Parse Sitemaps, www.sitemaps.org
Attributes
document[R]
Public Class Methods
new(xml, strict: false)
click to toggle source
# File lib/wayback_archiver/sitemap.rb, line 8 def initialize(xml, strict: false) @document = REXML::Document.new(xml) rescue REXML::ParseException => _e raise if strict @document = REXML::Document.new('') end
Public Instance Methods
plain_document?()
click to toggle source
Check if sitemap is a plain file @return [Boolean] whether document is plain
# File lib/wayback_archiver/sitemap.rb, line 36 def plain_document? document.elements.empty? end
root_name()
click to toggle source
Return the name of the document (if there is one) @return [String] the document root name
# File lib/wayback_archiver/sitemap.rb, line 42 def root_name return unless document.root document.root.name end
sitemap_index?()
click to toggle source
Returns true of Sitemap
is a Sitemap
index @return [Boolean] of whether the Sitemap
is an Sitemap
index or not @example Check if Sitemap
is a sitemap index
sitemap = Sitemap.new(xml) sitemap.sitemap_index?
# File lib/wayback_archiver/sitemap.rb, line 53 def sitemap_index? root_name == 'sitemapindex' end
sitemaps()
click to toggle source
urls()
click to toggle source
urlset?()
click to toggle source
Private Instance Methods
extract_urls(node_name)
click to toggle source
Extract URLs from Sitemap
# File lib/wayback_archiver/sitemap.rb, line 69 def extract_urls(node_name) return document.to_s.each_line.map(&:strip) if plain_document? urls = [] document.root.elements.each("#{node_name}/loc") do |element| urls << element.text end urls end