class Bosh::Monitor::Plugins::ResurrectorHelper::AlertTracker
Service which tracks alerts and decides whether or not the cluster is melting down. When the cluster is melting down, the resurrector backs off on fixing instances.
Attributes
minimum_down_jobs[RW]
Below this number of down agents we don't consider a meltdown occurring
percent_threshold[RW]
Percentage of the cluster which must be down for scanning to stop. Float fraction between 0 and 1.
time_threshold[RW]
Number of seconds at which an alert is considered “current”; alerts older than this are ignored. Integer number of seconds.
Public Class Methods
new(args={})
click to toggle source
# File lib/bosh/monitor/plugins/resurrector_helper.rb, line 44 def initialize(args={}) @agent_manager = Bhm.agent_manager @alert_times = {} # maps JobInstanceKey to time of last Alert @minimum_down_jobs = args.fetch('minimum_down_jobs', 5) @percent_threshold = args.fetch('percent_threshold', 0.2) @time_threshold = args.fetch('time_threshold', 600) end
Public Instance Methods
melting_down?(deployment)
click to toggle source
“Melting down” means a large part of the cluster is offline and manual intervention may be required to fix.
# File lib/bosh/monitor/plugins/resurrector_helper.rb, line 54 def melting_down?(deployment) agent_alerts = alerts_for_deployment(deployment) total_number_of_agents = agent_alerts.size number_of_down_agents = agent_alerts.select { |_, alert_time| alert_time > (Time.now - time_threshold) }.size return false if number_of_down_agents < minimum_down_jobs (number_of_down_agents.to_f / total_number_of_agents) >= percent_threshold end
record(agent_key, alert_time)
click to toggle source
# File lib/bosh/monitor/plugins/resurrector_helper.rb, line 66 def record(agent_key, alert_time) @alert_times[agent_key] = alert_time end
Private Instance Methods
alerts_for_deployment(deployment)
click to toggle source
# File lib/bosh/monitor/plugins/resurrector_helper.rb, line 72 def alerts_for_deployment(deployment) agents = @agent_manager.get_agents_for_deployment(deployment) keys = agents.values.map { |agent| JobInstanceKey.new(agent.deployment, agent.job, agent.instance_id) } result = {} keys.each { |key| result[key] = @alert_times.fetch(key, Time.at(0)) } result end