module Failbot
Failbot
asynchronously takes exceptions and reports them to the exception logger du jour. Keeps the main app from failing or lagging if the exception logger service is down or slow.
This file exists so that the unhandled exception hook may easily be injected into programs that don’t register it themselves. It also provides a lightweight failbot interface that doesn’t bring in any other libraries until a report is made, which is useful for environments where boot time is important.
To use, set RUBYOPT or pass an -r argument to ruby:
RUBYOPT=rfailbot/exit_hook some-program.rb
Or:
ruby -rfailbot/exit_hook some-program.rb
Your program can also require this library instead of ‘failbot’ to minimize the amount of up-front processing required and automatically install the exit hook.
require 'failbot/exit_hook'
The ‘failbot’ lib is loaded in full the first time an actual report is made.
Constants
- DEFAULT_ROLLUP
Default rollup for an exception. Exceptions with the same rollup are grouped together in
Haystack
. The rollup is an MD5 hash of the exception class and the raising file, line, and method.- EXCEPTION_DETAIL
- EXCEPTION_FORMATS
Enumerates the available exception formats this gem supports. The original format and the default is :haystack and the newer format is :structured
- MAXIMUM_CAUSE_DEPTH
We’ll include this many nested Exception#cause objects in the needle context. We limit the number of objects to prevent excessive recursion and large needle contexts.
- VERSION
Attributes
Public: Set an instrumenter to be called when exceptions are reported.
class CustomInstrumenter def instrument(name, payload = {}) warn "Exception: #{payload["class"]}\n#{payload.inspect}" end end Failbot.instrumenter = CustomInstrumenter
The instrumenter must conform to the ‘ActiveSupport::Notifications` interface, which defines `#instrument` and accepts:
name - the String name of the event (e.g. “report.failbot”) payload - a Hash of the exception context.
Public Class Methods
Set a callable that is responsible for parsing and formatting ruby backtraces. This is only necessary to set if your app deals with exceptions that are manipulated to contain something other than actual stackframe strings in the format produced by ‘caller`. The argument passed must respond to `call` with an arity of 1. The callable expects to be passed Exception instances as its argument.
# File lib/failbot.rb, line 100 def self.backtrace_parser=(callable) unless callable.respond_to?(:call) raise ArgumentError, "backtrace_parser= passed #{callable.inspect}, which is not callable" end if callable.method(:call).arity != 1 raise ArgumentError, "backtrace_parser= passed #{callable.inspect}, whose `#call` has arity =! 1" end @backtrace_parser = callable end
# File lib/failbot.rb, line 120 def self.exception_classname_from_hash(hash) @exception_formatter.exception_classname_from_hash(hash) end
Set the current exception format.
# File lib/failbot.rb, line 84 def self.exception_format=(identifier) @exception_formatter = EXCEPTION_FORMATS.fetch(identifier) do fail ArgumentError, "#{identifier} is not an available exception_format (want one of #{EXCEPTION_FORMATS.keys})" end end
Helpers needed to parse hashes included in e.g. Failbot.reports
.
# File lib/failbot.rb, line 116 def self.exception_message_from_hash(hash) @exception_formatter.exception_message_from_hash(hash) end
Public Instance Methods
# File lib/failbot.rb, line 502 def already_reporting @thread_local_already_reporting.value end
# File lib/failbot.rb, line 498 def already_reporting=(bool) @thread_local_already_reporting.value = bool end
Public: your last chance to modify the context that is to be reported with an exception.
The key value pairs that are returned from your block will get squashed into the context, replacing the values of any keys that were already present.
Example:
Failbot.before_report
do |exception, context|
# context is { "a" => 1, "b" => 2 } { :a => 0, :c => 3 }
end
context gets reported as { “a” => 0, “b” => “2”, “c” => 3 }
# File lib/failbot.rb, line 274 def before_report(&block) @before_report = block end
For tests
# File lib/failbot.rb, line 279 def clear_before_report @before_report = nil end
Public: Disable exception reporting. This is equivalent to calling ‘Failbot.setup(“FAILBOT_REPORT” => 0)`, but can be called after setup.
Failbot.disable do something_that_might_go_kaboom end
block - an optional block to perform while reporting is disabled. If a block
is passed, reporting will be re-enabled after the block is called.
# File lib/failbot.rb, line 373 def disable(&block) original_report_errors = @thread_local_report_errors.value @thread_local_report_errors.value = false if block begin block.call ensure @thread_local_report_errors.value = original_report_errors end end end
Public: Enable exception reporting. Reporting is enabled by default, but this can be called if it is explicitly disabled by calling ‘Failbot.disable` or setting `FAILBOT_REPORTING => “0”` in `Failbot.setup`.
# File lib/failbot.rb, line 389 def enable @thread_local_report_errors.value = true end
Extract exception info into a simple Hash.
e - The exception object to turn into a Hash.
Returns a Hash.
# File lib/failbot.rb, line 458 def exception_info(e) res = @exception_formatter.call(e) if exception_context = (e.respond_to?(:failbot_context) && e.failbot_context) res.merge!(exception_context) end if original = (e.respond_to?(:original_exception) && e.original_exception) remote_backtrace = [] remote_backtrace << original.message if original.backtrace remote_backtrace.concat(Array(original.backtrace)[0,500]) end res['remote_backtrace'] = remote_backtrace.join("\n") end res end
# File lib/failbot.rb, line 494 def hostname @hostname ||= Socket.gethostname end
Installs an at_exit hook to report exceptions that raise all the way out of the stack and halt the interpreter. This is useful for catching boot time errors as well and even signal kills.
To use, call this method very early during the program’s boot to cover as much code as possible:
require 'failbot' Failbot.install_unhandled_exception_hook!
Returns true when the hook was installed, nil when the hook had previously been installed by another component.
# File lib/failbot/exit_hook.rb, line 51 def install_unhandled_exception_hook! # only install the hook once, even when called from multiple locations return if @unhandled_exception_hook_installed # the $! is set when the interpreter is exiting due to an exception at_exit do boom = $! if boom && !@raise_errors && !boom.is_a?(SystemExit) report(boom, 'argv' => ([$0]+ARGV).join(" "), 'halting' => true) end end @unhandled_exception_hook_installed = true end
# File lib/failbot.rb, line 477 def logger @logger ||= Logger.new($stderr, formatter: proc { |severity, datetime, progname, msg| log = case msg when Hash msg.map { |k,v| "#{k}=#{v.inspect}" }.join(" ") else %Q|msg="#{msg.inspect}"| end log_line = %Q|ts="#{datetime.utc.iso8601}" level=#{severity} logger=Failbot #{log}\n| log_line.lstrip }) end
# File lib/failbot.rb, line 490 def logger=(logger) @logger = logger end
Tap into any other method invocation on the Failbot
module (especially report) and lazy load and configure everything the first time.
# File lib/failbot/exit_hook.rb, line 75 def method_missing(method, *args, &block) return super if @failbot_loaded require 'failbot' send(method, *args, &block) end
Remove the last info hash from the context stack.
# File lib/failbot.rb, line 233 def pop context.pop if context.size > 1 end
Add info to be sent in the next failbot report, should one occur.
info - Hash of name => value pairs to include in the exception report. block - When given, the info is removed from the current context after the
block is executed.
Returns the value returned by the block when given; otherwise, returns nil.
# File lib/failbot.rb, line 220 def push(info={}) info.each do |key, value| if value.kind_of?(Proc) raise ArgumentError, "Proc usage has been removed from Failbot" end end context.push(info) yield if block_given? ensure pop if block_given? end
Loops through the stack of contexts and deletes the given key if it exists.
key - Name of key to remove.
Examples
remove_from_report(:some_key) remove_from_report("another_key")
Returns nothing.
# File lib/failbot.rb, line 253 def remove_from_report(key) context.each do |hash| hash.delete(key.to_s) hash.delete(key.to_sym) end end
Public: Sends an exception to the exception tracking service along with a hash of custom attributes to be included with the report. When the raise_errors option is set, this method raises the exception instead of reporting to the exception tracking service.
e - The Exception object. Must respond to message and backtrace. other - Hash of additional attributes to include with the report.
Examples
begin my_code rescue => e Failbot.report(e, :user => current_user) end
Returns nothing.
# File lib/failbot.rb, line 334 def report(e, other = {}) return if ignore_error?(e) if @raise_errors squash_contexts(context, exception_info(e), other) # surface problems squashing raise e else report!(e, other) end end
# File lib/failbot.rb, line 345 def report!(e, other = {}) report_with_context!(Thread.current, context, e, other) end
# File lib/failbot.rb, line 349 def report_from_thread(thread, e, other = {}) if @raise_errors squash_contexts(@thread_local_context.value_from_thread(thread), exception_info(e), other) # surface problems squashing raise e else report_from_thread!(thread, e, other) end end
# File lib/failbot.rb, line 358 def report_from_thread!(thread, e, other = {}) return if ignore_error?(e) report_with_context!(thread, @thread_local_context.value_from_thread(thread), e, other) end
Public: exceptions that were reported. Only available when using the memory and file backends.
Returns an Array of exceptions data Hash.
# File lib/failbot.rb, line 397 def reports backend.reports end
Reset the context stack to a pristine state.
# File lib/failbot.rb, line 238 def reset! @thread_local_context.value = [context[0]].dup end
Specify a custom block for calculating rollups. It should accept:
exception - The exception object context - The context hash
The block must return a String.
If a ‘rollup` attribute is supplied at the time of reporting, either via the `failbot_context` method on an exception, or passed to `Failbot.report`, it will be used as the rollup and this block will not be called.
# File lib/failbot.rb, line 293 def rollup(&block) @rollup = block end
# File lib/failbot.rb, line 421 def sanitize(attrs) result = {} attrs.each do |key, value| result[key] = case value when Time value.iso8601 when Date value.strftime("%F") # equivalent to %Y-%m-%d when Numeric value when String, true, false value.to_s when Proc "proc usage is deprecated" when Array if key == EXCEPTION_DETAIL # special-casing for the exception_detail key, which is allowed to # be an array with a specific structure. value else value.inspect end else value.inspect end end result end
Public: Setup the backend for reporting exceptions.
# File lib/failbot.rb, line 125 def setup(settings={}, default_context={}) deprecated_settings = %w[ backend host port haystack raise_errors ] if settings.empty? || settings.keys.any? { |key| deprecated_settings.include?(key) } warn "%s Deprecated Failbot.setup usage. See %s for details." % [ caller[0], "https://github.com/github/failbot" ] return setup_deprecated(settings) end initial_context = if default_context.respond_to?(:to_hash) && !default_context.to_hash.empty? default_context.to_hash else { 'server' => hostname } end @thread_local_context = ::Failbot::ThreadLocalVariable.new do [initial_context] end @thread_local_already_reporting = ::Failbot::ThreadLocalVariable.new { false } populate_context_from_settings(settings) @enable_timeout = false if settings.key?("FAILBOT_TIMEOUT_MS") @timeout_seconds = settings["FAILBOT_TIMEOUT_MS"].to_f / 1000 @enable_timeout = (@timeout_seconds > 0.0) end @connect_timeout_seconds = nil if settings.key?("FAILBOT_CONNECT_TIMEOUT_MS") @connect_timeout_seconds = settings["FAILBOT_CONNECT_TIMEOUT_MS"].to_f / 1000 # unset the value if it's not parsing to something valid @connect_timeout_seconds = nil unless @connect_timeout_seconds > 0 end self.backend = case (name = settings["FAILBOT_BACKEND"]) when "memory" Failbot::MemoryBackend.new when "waiter" Failbot::WaiterBackend.new when "file" Failbot::FileBackend.new(settings["FAILBOT_BACKEND_FILE_PATH"]) when "http" Failbot::HTTPBackend.new(URI(settings["FAILBOT_HAYSTACK_URL"]), @connect_timeout_seconds, @timeout_seconds) when 'json' Failbot::JSONBackend.new(settings["FAILBOT_BACKEND_JSON_HOST"], settings["FAILBOT_BACKEND_JSON_PORT"]) when 'console' Failbot::ConsoleBackend.new else raise ArgumentError, "Unknown backend: #{name.inspect}" end @raise_errors = !settings["FAILBOT_RAISE"].to_s.empty? @thread_local_report_errors = ::Failbot::ThreadLocalVariable.new do settings["FAILBOT_REPORT"] != "0" end # allows overriding the 'app' value to send to single haystack bucket. # used primarily on ghe.io. @app_override = settings["FAILBOT_APP_OVERRIDE"] # Support setting exception_format from ENV/settings if settings["FAILBOT_EXCEPTION_FORMAT"] self.exception_format = settings["FAILBOT_EXCEPTION_FORMAT"].to_sym end @ignored_error_classes = settings.fetch("FAILBOT_IGNORED_ERROR_CLASSES", "").split(",").map do |class_name| Module.const_get(class_name.strip) end end
Root directory of the project’s source. Used to clean up stack traces if the exception format supports it
# File lib/failbot.rb, line 53 def source_root=(str) @source_root = if str File.join(str, '') end end
Combines all context hashes into a single hash converting non-standard data types in values to strings, then combines the result with a custom info hash provided in the other argument.
other - Optional array of hashes to also squash in on top of the context
stack hashes.
Returns a Hash with all keys and values.
# File lib/failbot.rb, line 409 def squash_contexts(*contexts_to_squash) squashed = {} contexts_to_squash.flatten.each do |hash| hash.each do |key, value| squashed[key.to_s] = value end end squashed end
# File lib/failbot.rb, line 311 def use_default_rollup rollup(&DEFAULT_ROLLUP) end
Private Instance Methods
# File lib/failbot.rb, line 612 def ignore_error?(error) @cache ||= Hash.new do |hash, error_class| hash[error_class] = @ignored_error_classes.any? do |ignored_error_class| error_class.ancestors.include?(ignored_error_class) end end @cache[error.class] end
Internal: Publish an event to the instrumenter
# File lib/failbot.rb, line 566 def instrument(name, payload = {}) Failbot.instrumenter.instrument(name, payload) if Failbot.instrumenter end
# File lib/failbot.rb, line 581 def log_failure(action, data, original_exception, exception) begin record = { "msg" => "exception", "action" => action, "data" => data, } record.merge!(to_semconv(exception)) logger.debug record record = { "msg" => "report-failed", "action" => action, "data" => data, } record.merge!(to_semconv(original_exception)) logger.debug record rescue => e raise e end end
Populate default context from settings. Since settings commonly comes from ENV, this allows setting defaults for the context via the environment.
# File lib/failbot.rb, line 572 def populate_context_from_settings(settings) settings.each do |key, value| if /\AFAILBOT_CONTEXT_(.+)\z/ =~ key key = $1.downcase context[0][key] = value unless context[0][key] end end end
# File lib/failbot.rb, line 508 def report_with_context!(thread, provided_context, e, other = {}) return unless @thread_local_report_errors.value return if ignore_error?(e) if already_reporting logger.warn "FAILBOT: asked to report while reporting!" rescue nil logger.warn e.message rescue nil logger.warn e.backtrace.join("\n") rescue nil return end self.already_reporting = true begin data = squash_contexts(provided_context, exception_info(e), other) if !data.has_key?("rollup") data = data.merge("rollup" => @rollup.call(e, data, thread)) end if defined?(@before_report) && @before_report data = squash_contexts(data, @before_report.call(e, data, thread)) end if @app_override data = data.merge("app" => @app_override) end data = scrub(sanitize(data)) rescue Object => i log_failure("processing", data, e, i) self.already_reporting = false return end start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) instrumentation_data = { "report_status" => "error", } begin if @enable_timeout Timeout.timeout(@timeout_seconds) do backend.report(data) end else backend.report(data) end instrumentation_data["report_status"] = "success" rescue Object => i log_failure("reporting", data, e, i) instrumentation_data["exception_type"] = i.class.name ensure instrumentation_data["elapsed_ms"] = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time) * 1000).to_i instrument("report.failbot", data.merge(instrumentation_data)) rescue nil self.already_reporting = false end end
# File lib/failbot.rb, line 604 def to_semconv(exception) { "exception.type" => exception.class.to_s, "exception.message" => exception.message.encode("UTF-8", invalid: :replace, undef: :replace, replace: '�'), "exception.backtrace" => exception.full_message(highlight: false, order: :top).encode('UTF-8', invalid: :replace, undef: :replace, replace: '�'), } end