class SiteInspector::Domain
Attributes
Public Class Methods
# File lib/site-inspector/domain.rb, line 7 def initialize(host) host = host.downcase host = host.sub(/^https?:/, '') host = host.sub(%r{^/+}, '') host = host.sub(/^www\./, '') uri = Addressable::URI.parse "//#{host}" @host = uri.host end
Public Instance Methods
# File lib/site-inspector/domain.rb, line 25 def canonical_endpoint @canonical_endpoint ||= begin prefetch endpoints.find do |e| e.https? == canonically_https? && e.www? == canonically_www? end end end
A domain is “canonically” at https if:
* at least one of its https endpoints is live and doesn't have an invalid hostname * both http endpoints are either down or redirect *somewhere* * at least one http endpoint redirects immediately to an *internal* https endpoint
This is meant to affirm situations like:
http:// -> http://www -> https:// https:// -> http:// -> https://www
and meant to avoid affirming situations like:
http:// -> http://non-www http://www -> http://non-www
or:
http:// -> 200, http://www -> https://www
It allows a site to be canonically HTTPS if the cert has a valid hostname but invalid chain issues.
# File lib/site-inspector/domain.rb, line 156 def canonically_https? # Does any endpoint respond? return false unless up? # At least one of its https endpoints is live and doesn't have an invalid hostname return false unless https? # Both http endpoints are down return true if endpoints.select(&:http?).all? { |e| !e.up? } # at least one http endpoint redirects immediately to https endpoints.select(&:http?).any? { |e| e.redirect&.https? } end
A domain is “canonically” at www if:
* at least one of its www endpoints responds * both root endpoints are either down ~~or redirect *somewhere*~~, or * at least one root endpoint redirect should immediately go to an *internal* www endpoint
This is meant to affirm situations like:
http:// -> https:// -> https://www https:// -> http:// -> https://www
and meant to avoid affirming situations like:
http:// -> http://non-www, http://www -> http://non-www
or like:
https:// -> 200, http:// -> http://www
# File lib/site-inspector/domain.rb, line 125 def canonically_www? # Does any endpoint respond? return false unless up? # Does at least one www endpoint respond? return false unless www? # Are both root endpoints down? return true if endpoints.select(&:root?).all? { |e| !e.up? } # Does either root endpoint redirect to a www endpoint? endpoints.select(&:root?).any? { |e| e.redirect&.www? } end
we can say that a canonical HTTPS site “defaults” to HTTPS, even if it doesn't strictly enforce it (e.g. having a www subdomain first to go HTTP root before HTTPS root).
TODO: not implemented.
# File lib/site-inspector/domain.rb, line 96 def defaults_https? raise 'Not implemented. Halp?' end
HTTPS is “downgraded” if both:
-
HTTPS is supported, and
-
The 'canonical' endpoint gets an immediate internal redirect to HTTP.
TODO: the redirect must be internal.
# File lib/site-inspector/domain.rb, line 106 def downgrades_https? return false unless https? canonical_endpoint.redirect? && canonical_endpoint.redirect.http? end
# File lib/site-inspector/domain.rb, line 16 def endpoints @endpoints ||= [ Endpoint.new("https://#{host}", domain: self), Endpoint.new("https://www.#{host}", domain: self), Endpoint.new("http://#{host}", domain: self), Endpoint.new("http://www.#{host}", domain: self) ] end
HTTPS is enforced if one of the HTTPS endpoints is “up”, and if both HTTP endpoints are either:
* down, or * redirect immediately to HTTPS.
This is different than whether a domain is “canonically” HTTPS.
-
an HTTP redirect can go to HTTPS on another domain, as long as it's immediate.
-
a domain with an invalid cert can still be enforcing HTTPS.
TODO: need to ensure the redirect immediately goes to HTTPS. TODO: don't need to require that the HTTPS cert is valid for this purpose.
# File lib/site-inspector/domain.rb, line 85 def enforces_https? return false unless https? endpoints.select(&:http?).all? { |e| !e.up? || e.redirect&.https? } end
# File lib/site-inspector/domain.rb, line 34 def government? require 'gman' Gman.valid? host end
HSTS on the canonical domain?
# File lib/site-inspector/domain.rb, line 185 def hsts? canonical_endpoint.hsts&.enabled? end
# File lib/site-inspector/domain.rb, line 193 def hsts_preload_ready? return false unless hsts_subdomains? endpoints.find { |e| e.root? && e.https? }.hsts.preload_ready? end
# File lib/site-inspector/domain.rb, line 189 def hsts_subdomains? endpoints.find { |e| e.root? && e.https? }.hsts.include_subdomains? end
HTTPS is “supported” (different than “canonical” or “enforced”) if:
-
Either of the HTTPS endpoints is listening, and doesn't have an invalid hostname.
TODO: needs to allow an invalid chain.
# File lib/site-inspector/domain.rb, line 67 def https? endpoints.any? { |e| e.https? && e.up? && e.https.valid? } end
# File lib/site-inspector/domain.rb, line 203 def inspect "#<SiteInspector::Domain host=\"#{host}\">" end
We know most API calls to the domain model are going to require That the root of all four endpoints are called. Rather than process them In serial, lets grab them in parallel and cache the results to speed up later calls.
# File lib/site-inspector/domain.rb, line 211 def prefetch endpoints.each do |endpoint| request = Typhoeus::Request.new(endpoint.uri, SiteInspector.typhoeus_defaults) SiteInspector.hydra.queue(request) end SiteInspector.hydra.run end
The first endpoint to respond with a redirect
# File lib/site-inspector/domain.rb, line 180 def redirect endpoints.find(&:external_redirect?) end
A domain redirects if
-
At least one endpoint is an external redirect, and
-
All endpoints are either down or an external redirect
# File lib/site-inspector/domain.rb, line 173 def redirect? return false unless redirect endpoints.all? { |e| !e.up? || e.external_redirect? } end
Does any endpoint respond to HTTP? TODO: needs to allow an invalid chain.
# File lib/site-inspector/domain.rb, line 46 def responds? endpoints.any?(&:responds?) end
Can you connect without www?
# File lib/site-inspector/domain.rb, line 57 def root? endpoints.any? { |e| e.root? && e.up? } end
Converts the domain to a hash
By default, it only returns domain-wide information and information about the canonical endpoint
It will also pass options allong to each endpoint's to_h
method
options:
:all - return information about all endpoints
Returns a complete hash of the domain's information
# File lib/site-inspector/domain.rb, line 230 def to_h(options = {}) prefetch hash = { host: host, up: up?, responds: responds?, www: www?, root: root?, https: https?, enforces_https: enforces_https?, downgrades_https: downgrades_https?, canonically_www: canonically_www?, canonically_https: canonically_https?, redirect: redirect?, hsts: hsts?, hsts_subdomains: hsts_subdomains?, hsts_preload_ready: hsts_preload_ready?, canonical_endpoint: canonical_endpoint.to_h(options) } if options['all'] hash[:endpoints] = { https: { root: endpoints[0].to_h(options), www: endpoints[1].to_h(options) }, http: { root: endpoints[2].to_h(options), www: endpoints[3].to_h(options) } } end hash end
# File lib/site-inspector/domain.rb, line 267 def to_json(*_args) to_h.to_json end
# File lib/site-inspector/domain.rb, line 199 def to_s host end
Does any endpoint return a 200 or 300 response code?
# File lib/site-inspector/domain.rb, line 40 def up? endpoints.any?(&:up?) end
TODO: These weren't present before, and may not be useful. Can you connect to www?
# File lib/site-inspector/domain.rb, line 52 def www? endpoints.any? { |e| e.www? && e.up? } end