class Anaximander::Page

Represents a single page of a website being crawled. Exposes the assets and links on the page.

Errors

‘Anaximander::Page` will raise a `PageNotAccessibleError` when the page cannot be fetched for some reason. This is often due to it not existing (404), SSL errors or infinite redirect loops.

Example

page = Page.new("http://example.com")
page.links  # => ["http://www.iana.org/domains/example"]
page.assets # => ["/main.css", "/default.js"]

Attributes

children[RW]

Collection of ‘Page` objects that are linked to from the current page.

html[R]

Parsed Nokogiri HTML document.

url[R]

Absolute url of the page.

Public Class Methods

new(url) click to toggle source

Parameters

[String] url URL to discover.

OpenURI raises a generic RuntimeError when it cannot fetch a page, for a variety of reasons. Some of which are 404s, SSL errors, or redirect loops.

raises ‘PageNotAccessibleError` when OpenURI fails to fetch the page, for any reason.

# File lib/anaximander/page.rb, line 50
def initialize(url)
  @url  = url
  @html = Nokogiri::HTML(open(url))
rescue RuntimeError, OpenURI::HTTPError
  raise PageNotAccessibleError
end

Public Instance Methods

<=>(other) click to toggle source
# File lib/anaximander/page.rb, line 65
def <=>(other)
  self.url <=> other.url
end
assets() click to toggle source
# File lib/anaximander/page.rb, line 61
def assets
  Discovery::Assets.new(html)
end
inspect() click to toggle source
# File lib/anaximander/page.rb, line 69
def inspect
  %(#<Anaximander::Page:#{object_id} url="#{url}">)
end