class GithubStats::Data
Data
class for calculations
Attributes
Public Class Methods
Create a data object and turn on caching
# File lib/githubstats/data.rb, line 35 def initialize(data) @raw = data.map { |d, s| Datapoint.new(Date.parse(d), s.to_i) }.sort_by(&:date) enable_caching %i[to_h today streaks longest_streak streak max mean std_var quartile_boundaries quartiles start_date end_date] end
Public Instance Methods
The score for a given day
# File lib/githubstats/data.rb, line 74 def [](date) to_h[Date.parse(date)] end
The end of the dataset
# File lib/githubstats/data.rb, line 60 def end_date @raw.last.date end
Outliers as calculated by GitHub They only consider the first 3 or 1, based on the mean and max of the set
# File lib/githubstats/data.rb, line 148 def gh_outliers outliers.take(max.score - mean < 6 || max.score < 15 ? 1 : 3) end
The longest streak
# File lib/githubstats/data.rb, line 99 def longest_streak return [] if streaks.empty? streaks.max_by(&:length) end
The highest scoring day
# File lib/githubstats/data.rb, line 115 def max @raw.max_by(&:score) end
The mean score
# File lib/githubstats/data.rb, line 122 def mean scores.sum / @raw.size.to_f end
Outliers of the set
# File lib/githubstats/data.rb, line 139 def outliers return [] if scores.uniq.size < 5 scores.select { |x| ((mean - x) / std_var).abs > GITHUB_MAGIC }.uniq end
Pad the dataset to full week increments
# File lib/githubstats/data.rb, line 190 def pad(fill_value = -1, data = @raw.clone) data = _pad data, 0, fill_value, 0 _pad data, -1, fill_value, 6 end
Return the quartile of a given score
# File lib/githubstats/data.rb, line 182 def quartile(score) return nil if score.negative? || score > max.score quartile_boundaries.count { |bound| score > bound } end
The boundaries of the quartiles The index represents the quartile number The value is the upper bound of the quartile (inclusive)
# File lib/githubstats/data.rb, line 157 def quartile_boundaries # rubocop:disable Metrics/AbcSize top = scores.reject { |x| gh_outliers.include? x }.max range = (1..top).to_a range = [0] * 3 if range.empty? mids = (1..3).map do |q| index = (q * range.size / 4) - 1 range[index] end bounds = (mids + [max.score]).uniq.sort ([0] * (5 - bounds.size)) + bounds end
Return the list split into quartiles
# File lib/githubstats/data.rb, line 172 def quartiles quartiles = Array.new(5) { [] } @raw.each_with_object(quartiles) do |elem, acc| acc[quartile(elem.score)] << elem end end
Scores in chronological order
# File lib/githubstats/data.rb, line 81 def scores @raw.map(&:score) end
The start of the dataset
# File lib/githubstats/data.rb, line 54 def start_date @raw.first.date end
The standard variance (two pass)
# File lib/githubstats/data.rb, line 129 def std_var first_pass = @raw.reduce(0) do |acc, elem| ((elem.score.to_f - mean)**2) + acc end Math.sqrt(first_pass / (@raw.size - 1)) end
The current streak, or nil
# File lib/githubstats/data.rb, line 107 def streak return [] if streaks.empty? streaks.last.last.date >= Date.today - 1 ? streaks.last : [] end
All streaks for a user
# File lib/githubstats/data.rb, line 88 def streaks streaks = @raw.each_with_object(Array.new(1, [])) do |point, acc| point.score.zero? ? acc << [] : acc.last << point end streaks.reject!(&:empty?) streaks end
The data as a hash where the keys are dates and values are scores
# File lib/githubstats/data.rb, line 45 def to_h @raw.reduce(Hash.new(0)) do |acc, elem| acc.merge(elem.date => elem.score) end end
The score for today
# File lib/githubstats/data.rb, line 67 def today to_h[Date.today] end
Private Instance Methods
# File lib/githubstats/data.rb, line 197 def _pad(data, index, fill_value, goal) mod = (index * -2) - 1 # 0 index moves -1 in time, -1 move +1 in time point = GithubStats::Datapoint data.insert index, point.new(data[index].date + mod, fill_value) until data[index].date.wday == goal data end