class Anaximander::Discovery::Links
Collection of internal links on the given page.
Relative Paths¶ ↑
‘Anaximander::Discovery::Links` converts all relative paths into absolute paths using the base URL of the page being crawled.
# http://example.com <a href="/contact">Contact</a> Anaximander::Discovery::Links.new(Nokogiri::HTML(open("http://example.com"))) # => ["http://example.com/contact"]
Exclusions¶ ↑
- External links (ones outside the domain of the page - Hash links (Javascript style links with href of "#")
Example¶ ↑
page = Nokogiri::HTML(open("http://example.com")) Anaximander::Discovery::Links.new(page) # => ["http://www.iana.org/domains/example"]
Attributes
page[R]
Public Class Methods
new(page, url)
click to toggle source
Parameters
page [Nokogiri::HTML] Parsed html of the page. url [String|URI] URL of the page to discover.
# File lib/anaximander/discovery/links.rb, line 40 def initialize(page, url) @page = page @url = Url.new(url) end
Public Instance Methods
<=>(other)
click to toggle source
# File lib/anaximander/discovery/links.rb, line 49 def <=>(other) to_a <=> other.to_a end
each(&block)
click to toggle source
# File lib/anaximander/discovery/links.rb, line 45 def each(&block) links.each(&block) end
Private Instance Methods
absolute(link)
click to toggle source
# File lib/anaximander/discovery/links.rb, line 69 def absolute(link) Url.new(link).absolute(@url.base).without_fragment rescue URI::InvalidURIError nil end
all_links()
click to toggle source
# File lib/anaximander/discovery/links.rb, line 65 def all_links page.css("a").map { |a| absolute(a[:href]) }.compact.uniq end
internal_links()
click to toggle source
# File lib/anaximander/discovery/links.rb, line 61 def internal_links all_links.select { |link| @url.base == link.base } end
links()
click to toggle source
# File lib/anaximander/discovery/links.rb, line 57 def links internal_links.map(&:to_s) end