class Anaximander::Page
Represents a single page of a website being crawled. Exposes the assets and links on the page.
Errors¶ ↑
‘Anaximander::Page` will raise a `PageNotAccessibleError` when the page cannot be fetched for some reason. This is often due to it not existing (404), SSL errors or infinite redirect loops.
Example¶ ↑
page = Page.new("http://example.com") page.links # => ["http://www.iana.org/domains/example"] page.assets # => ["/main.css", "/default.js"]
Attributes
children[RW]
Collection of ‘Page` objects that are linked to from the current page.
html[R]
Parsed Nokogiri HTML document.
url[R]
Absolute url of the page.
Public Class Methods
new(url)
click to toggle source
Parameters
[String] url URL to discover.
OpenURI raises a generic RuntimeError when it cannot fetch a page, for a variety of reasons. Some of which are 404s, SSL errors, or redirect loops.
raises ‘PageNotAccessibleError` when OpenURI fails to fetch the page, for any reason.
# File lib/anaximander/page.rb, line 50 def initialize(url) @url = url @html = Nokogiri::HTML(open(url)) rescue RuntimeError, OpenURI::HTTPError raise PageNotAccessibleError end
Public Instance Methods
<=>(other)
click to toggle source
# File lib/anaximander/page.rb, line 65 def <=>(other) self.url <=> other.url end
assets()
click to toggle source
# File lib/anaximander/page.rb, line 61 def assets Discovery::Assets.new(html) end
inspect()
click to toggle source
# File lib/anaximander/page.rb, line 69 def inspect %(#<Anaximander::Page:#{object_id} url="#{url}">) end
links()
click to toggle source
# File lib/anaximander/page.rb, line 57 def links Discovery::Links.new(html, url) end